Currently we don't have an explicit check that when it's been a second since we have seen output produced from the test programs starting up that means all of them are running and we should start both sending signals and timing out. This is not reliable, especially on very heavily loaded systems where the test programs might take longer than a second to run.
We do skip sending signals to children that have not produced output yet so we won't cause them to exit unexpectedly by sending a signal but this can create confusion when interpreting output, for example appearing to show the tests running for less time than expected or appearing to show missed signal deliveries. Avoid issues by explicitly checking that we have seen output from all the child processes before we start sending signals or counting test run time.
This is especially likely on virtual platforms with large numbers of vector lengths supported since the platforms are slow and there will be a lot of tasks per CPU.
Signed-off-by: Mark Brown broonie@kernel.org --- tools/testing/selftests/arm64/fp/fp-stress.c | 23 ++++++++++++++++++++ 1 file changed, 23 insertions(+)
diff --git a/tools/testing/selftests/arm64/fp/fp-stress.c b/tools/testing/selftests/arm64/fp/fp-stress.c index 4e62a9199f97..35dc07648d52 100644 --- a/tools/testing/selftests/arm64/fp/fp-stress.c +++ b/tools/testing/selftests/arm64/fp/fp-stress.c @@ -403,6 +403,8 @@ int main(int argc, char **argv) int timeout = 10; int cpus, tests, i, j, c; int sve_vl_count, sme_vl_count, fpsimd_per_cpu; + bool all_children_started = false; + int seen_children; int sve_vls[MAX_VLS], sme_vls[MAX_VLS]; struct epoll_event ev; struct sigaction sa; @@ -526,6 +528,27 @@ int main(int argc, char **argv)
/* Otherwise epoll_wait() timed out */
+ /* + * If the child processes have not produced output they + * aren't actually running the tests yet . + */ + if (!all_children_started) { + seen_children = 0; + + for (i = 0; i < num_children; i++) + if (children[i].output_seen || + children[i].exited) + seen_children++; + + if (seen_children != num_children) { + ksft_print_msg("Waiting for %d children\n", + num_children - seen_children); + continue; + } + + all_children_started = true; + } + for (i = 0; i < num_children; i++) child_tickle(&children[i]);