On Tue, Oct 17, 2017 at 09:02:18AM -0500, Dan Rue wrote:
I think we just need to iterate on the framework we have until we're stable for a period of time. We are presently running many tests that once trusted, will find regressions that nobody else notices in a timely manner. My biggest concern is trust - our results need to be rock solid, stable, so that they are trusted and so that people jump when there is a reported regression. Currently, that is not the case.
Well, we also need the reporting quality to be good (I know Milosz and Antonio are working on this) and to be directing the reports outwards so that other people trust them too (once the results are stable). That helps enormously with getting people to pay attention when issues are found, and will hopefully also help motivate people to work more on testsuites.
What I would like to see, and I don't know if it is even possible, is something that actually measures test coverage based on code paths in the linux kernel so that we have a means to actually measure our effectiveness. If we knew we were testing 43% (number pulled out of thin air) of the linux kernel code paths, then we would know what areas to focus on to bring that number up, and we would know which subsystems to have some confidence in, and which are uncovered.
That's come up before. I personally feel that collecting and trying to optimize coverage numbers is really premature here and is likely to be a distraction when we inevitably sign ourselves up for metrics based targets. It's not like it is a struggle for us to identify areas where we could usefully add coverage, nor is it likely to be so for quite a while, but I have seen testing efforts failing to deliver value while showing great metrics (eg, by adding tests for things that are easy to test but rarely break so don't really help people find defects).
Instead I think we should focus on two directions for expanding coverage. One is bringing in existing testsuites. That's obviously less development cost for us and has the additional advantage of bringing the testing community more together - we can learn from other people working in the area, they feel more appreciated and it all helps push collaboration on best practices. The other direction I see as likely to bring good results is to look at where current activity that could be supported by automated testing is. That's a combination of looking at areas where people frequently report problems and looking at the things that are most actively developed (with the angle on stable that'd be areas that get the most stable backports for example).
Right now all it takes is momentary thought to find areas where we're lacking coverage so it seems much more interesting to try to prioritize where we're going to get most value from efforts to improve coverage rather than go hunting for them. As coverage improves it's going to start to become more and more useful to bring things like coverage metrics in.