On Thu, Oct 17, 2019 at 3:25 PM Tim.Bird@sony.com wrote:
-----Original Message----- From: Theodore Y. Ts'o on October 17, 2019 2:09 AM
On Wed, Oct 16, 2019 at 05:26:29PM -0600, Shuah Khan wrote:
I don't really buy the argument that unit tests should be deterministic Possibly, but I would opt for having the ability to feed test data.
I strongly believe that unit tests should be deterministic. Non-deterministic tests are essentially fuzz tests. And fuzz tests should be different from unit tests.
I'm not sure I have the entire context here, but I think deterministic might not be the right word, or it might not capture the exact meaning intended.
I think there are multiple issues here:
- Does the test enclose all its data, including working data and expected results?
Or, does the test allow someone to provide working data? This alternative implies that either the some of testcases or the results might be different depending on the data that is provided. IMHO the test would be deterministic if it always produced the same results based on the same data inputs. And if the input data was deterministic. I would call this a data-driven test.
Since the results would be dependent on the data provided, the results from tests using different data would not be comparable. Essentially, changing the input data changes the test so maybe it's best to consider this a different test. Like 'test-with-data-A' and 'test-with-data-B'.
That kind of sound like parameterized tests[1]; it was a feature I was thinking about adding to KUnit, but I think the general idea of parameterized tests has fallen out of favor; I am not sure why. In any case, I have used parameterized tests before and have found them useful in certain circumstances.
- Does the test automatically detect some attribute of the system, and adjust
its operation based on that (does the test probe?) This is actually quite common if you include things like when a test requires root access to run. Sometimes such tests, when run without root privilege, run as many testcases as possible not as root, and skip the testcases that require root.
In general, altering the test based on probed data is a form of data-driven test, except the data is not provided by the user. Whether this is deterministic in the sense of (1) depends on whether the data that is probed is deterministic. In the case or requiring root, then it should not change from run to run (and it should probably be reflected in the characterization of the results).
Maybe neither of the above cases fall in the category of unit tests, but they are not necessarily fuzzing tests. IMHO a fuzzing test is one which randomizes
Kind of sounds remotely similar to Haskell's QuickCheck[2]; it's sort of a mix of unit testing and fuzz testing. I have used this style of testing for other projects and it can be pretty useful. I actually have a little experiment somewhere trying to port the idea to KUnit.
the data for a data-driven test (hence using non-deterministic data). Once the fuzzer has found a bug, and the data and code for a test is fixed into a reproducer program, then at that point it should be deterministic (modulo what I say about race condition tests below).
We want unit tests to run quickly. Fuzz tests need to be run for a large number of passes (perhaps hours) in order to be sure that we've hit any possible bad cases. We want to be able to easily bisect fuzz tests --- preferably, automatically. And any kind of flakey test is hell to bisect.
Agreed.
It's bad enough when a test is flakey because of the underlying code. But when a test is flakey because the test inputs are non-deterministic, it's even worse.
I very much agree on this as well.
I'm not sure how one classes a program that seeks to invoke a race condition. This can take variable time, so in that sense it is not deterministic. But it should produce the same result if the probabilities required for the race condition to be hit are fulfilled. Probably (see what I did there :-), one needs to take a probabilistic approach to reproducing and bisecting such bugs. The duration or iterations required to reproduce the bug (to some confidence level) may need to be included with the reproducer program. I'm not sure if the syskaller reproducers do this or not, or if they just run forever. One I looked at ran forever. But you would want to limit this in order to produce results with some confidence level (and not waste testing resources).
The reason I want get clarity on the issue of data-driven tests is that I think data-driven tests and tests that probe are very much desirable. This allows a test to be able to be more generalized and allows for specialization of the test for more scenarios without re-coding it. I'm not sure if this still qualifies as unit testing, but it's very useful as a means to extend the value of a test. We haven't trod into the mocking parts of kunit, but I'm hoping that it may be possible to have that be data-driven (depending on what's being mocked), to make it easier to test more things using the same code.
I imagine it wouldn't be that hard to add that on as a feature of a parameterized testing implementation.
Finally, I think the issue of testing speed is orthogonal to whether a test is self-enclosed or data-driven. Definitely fuzzers, which are experimenting with system interaction in a non-deterministic way, have speed problems.
[1] https://dzone.com/articles/junit-parameterized-test [2] http://hackage.haskell.org/package/QuickCheck
Cheers!