Hi Mark,
On Tue, Oct 29, 2024 at 12:10:39AM +0000, Mark Brown wrote:
Currently we only deliver signals to the processes being tested about once a second, meaning that the signal code paths are subject to relatively little stress. Increase this frequency substantially to 25ms intervals, along with some minor refactoring to make this more readily tuneable and maintain the 1s logging interval. This interval was chosen based on some experimentation with emulated platforms to avoid causing so much extra load that the test starts to run into the 45s limit for selftests or generally completely disconnect the timeout numbers from the
Looks like the end of the sentence got deleted?
On those emulated platforms (FVP?), does this change trigger the faukure more often?
I gave this a quick test, and with this change, running fp-stress on a defconfig kernel running on 1 CPU triggers the "Bad SVCR: 0" splat in 35/100 runs. Hacking that down to 5ms brings that to 89/100 runs. So even if we have to keep this high for an emulated platform, it'd probably be worth being able to override that as a parameter?
Otherwise, maybe it's worth increasing the timeout for the fp-stress test specifically? The docs at:
https://docs.kernel.org/dev-tools/kselftest.html#timeout-for-selftests
... imply that is possible, but don't say exactly how, and it seems legitimate for a stress test.
We could increase this if we moved the signal generation out of the main supervisor thread, though we should also consider that he percentage of time that we spend interacting with the floating point state is also a consideration.
Suggested-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Mark Brown broonie@kernel.org
tools/testing/selftests/arm64/fp/fp-stress.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/arm64/fp/fp-stress.c b/tools/testing/selftests/arm64/fp/fp-stress.c index faac24bdefeb9436e2daf20b7250d0ae25ca23a7..c986c68fbcacdd295f4db57277075209193cb943 100644 --- a/tools/testing/selftests/arm64/fp/fp-stress.c +++ b/tools/testing/selftests/arm64/fp/fp-stress.c @@ -28,6 +28,9 @@ #define MAX_VLS 16 +#define SIGNAL_INTERVAL_MS 25 +#define LOG_INTERVALS (1000 / SIGNAL_INTERVAL_MS)
Running this, I see that by default test logs:
# Will run for 400s
... for a timeout that's actually ~10s, due to the following, which isn't in the diff:
if (timeout > 0) ksft_print_msg("Will run for %ds\n", timeout);
... so that probably wants an update to either convert to seconds or be in terms of signals, and likewise for the "timeout remaining" message below.
Otherwise, this looks good to me.
Mark.
struct child_data { char *name, *output; pid_t pid; @@ -449,7 +452,7 @@ static const struct option options[] = { int main(int argc, char **argv) { int ret;
- int timeout = 10;
- int timeout = 10 * (1000 / SIGNAL_INTERVAL_MS); int cpus, i, j, c; int sve_vl_count, sme_vl_count; bool all_children_started = false;
@@ -578,14 +581,14 @@ int main(int argc, char **argv) break; /*
* Timeout is counted in seconds with no output, the
* tests print during startup then are silent when
* running so this should ensure they all ran enough
* to install the signal handler, this is especially
* useful in emulation where we will both be slow and
* likely to have a large set of VLs.
* Timeout is counted in poll intervals with no
* output, the tests print during startup then are
* silent when running so this should ensure they all
* ran enough to install the signal handler, this is
* especially useful in emulation where we will both
*/* be slow and likely to have a large set of VLs.
ret = epoll_wait(epoll_fd, evs, tests, 1000);
if (ret < 0) { if (errno == EINTR) continue;ret = epoll_wait(epoll_fd, evs, tests, SIGNAL_INTERVAL_MS);
@@ -625,8 +628,9 @@ int main(int argc, char **argv) all_children_started = true; }
ksft_print_msg("Sending signals, timeout remaining: %d\n",
timeout);
if ((timeout % LOG_INTERVALS) == 0)
ksft_print_msg("Sending signals, timeout remaining: %d\n",
timeout);
for (i = 0; i < num_children; i++) child_tickle(&children[i]);
-- 2.39.2