One of the things that fp-stress does to stress the floating point context switching is send signals to the test threads it spawns. Currently we do this once per second but as suggested by Mark Rutland if we increase this we can improve the chances of triggering any issues with context switching the signal handling code. Do a quick change to increase the rate of signal delivery, trying to avoid excessive impact on emulated platforms, and a further change to mitigate the impact that this creates during startup.
Signed-off-by: Mark Brown broonie@kernel.org --- Mark Brown (2): kselftest/arm64: Increase frequency of signal delivery in fp-stress kselftest/arm64: Lower poll interval while waiting for fp-stress children
tools/testing/selftests/arm64/fp/fp-stress.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) --- base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354 change-id: 20241028-arm64-fp-stress-interval-8f5e62c06e12
Best regards,
Currently we only deliver signals to the processes being tested about once a second, meaning that the signal code paths are subject to relatively little stress. Increase this frequency substantially to 25ms intervals, along with some minor refactoring to make this more readily tuneable and maintain the 1s logging interval. This interval was chosen based on some experimentation with emulated platforms to avoid causing so much extra load that the test starts to run into the 45s limit for selftests or generally completely disconnect the timeout numbers from the
We could increase this if we moved the signal generation out of the main supervisor thread, though we should also consider that he percentage of time that we spend interacting with the floating point state is also a consideration.
Suggested-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Mark Brown broonie@kernel.org --- tools/testing/selftests/arm64/fp/fp-stress.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/arm64/fp/fp-stress.c b/tools/testing/selftests/arm64/fp/fp-stress.c index faac24bdefeb9436e2daf20b7250d0ae25ca23a7..c986c68fbcacdd295f4db57277075209193cb943 100644 --- a/tools/testing/selftests/arm64/fp/fp-stress.c +++ b/tools/testing/selftests/arm64/fp/fp-stress.c @@ -28,6 +28,9 @@
#define MAX_VLS 16
+#define SIGNAL_INTERVAL_MS 25 +#define LOG_INTERVALS (1000 / SIGNAL_INTERVAL_MS) + struct child_data { char *name, *output; pid_t pid; @@ -449,7 +452,7 @@ static const struct option options[] = { int main(int argc, char **argv) { int ret; - int timeout = 10; + int timeout = 10 * (1000 / SIGNAL_INTERVAL_MS); int cpus, i, j, c; int sve_vl_count, sme_vl_count; bool all_children_started = false; @@ -578,14 +581,14 @@ int main(int argc, char **argv) break;
/* - * Timeout is counted in seconds with no output, the - * tests print during startup then are silent when - * running so this should ensure they all ran enough - * to install the signal handler, this is especially - * useful in emulation where we will both be slow and - * likely to have a large set of VLs. + * Timeout is counted in poll intervals with no + * output, the tests print during startup then are + * silent when running so this should ensure they all + * ran enough to install the signal handler, this is + * especially useful in emulation where we will both + * be slow and likely to have a large set of VLs. */ - ret = epoll_wait(epoll_fd, evs, tests, 1000); + ret = epoll_wait(epoll_fd, evs, tests, SIGNAL_INTERVAL_MS); if (ret < 0) { if (errno == EINTR) continue; @@ -625,8 +628,9 @@ int main(int argc, char **argv) all_children_started = true; }
- ksft_print_msg("Sending signals, timeout remaining: %d\n", - timeout); + if ((timeout % LOG_INTERVALS) == 0) + ksft_print_msg("Sending signals, timeout remaining: %d\n", + timeout);
for (i = 0; i < num_children; i++) child_tickle(&children[i]);
Hi Mark,
On Tue, Oct 29, 2024 at 12:10:39AM +0000, Mark Brown wrote:
Currently we only deliver signals to the processes being tested about once a second, meaning that the signal code paths are subject to relatively little stress. Increase this frequency substantially to 25ms intervals, along with some minor refactoring to make this more readily tuneable and maintain the 1s logging interval. This interval was chosen based on some experimentation with emulated platforms to avoid causing so much extra load that the test starts to run into the 45s limit for selftests or generally completely disconnect the timeout numbers from the
Looks like the end of the sentence got deleted?
On those emulated platforms (FVP?), does this change trigger the faukure more often?
I gave this a quick test, and with this change, running fp-stress on a defconfig kernel running on 1 CPU triggers the "Bad SVCR: 0" splat in 35/100 runs. Hacking that down to 5ms brings that to 89/100 runs. So even if we have to keep this high for an emulated platform, it'd probably be worth being able to override that as a parameter?
Otherwise, maybe it's worth increasing the timeout for the fp-stress test specifically? The docs at:
https://docs.kernel.org/dev-tools/kselftest.html#timeout-for-selftests
... imply that is possible, but don't say exactly how, and it seems legitimate for a stress test.
We could increase this if we moved the signal generation out of the main supervisor thread, though we should also consider that he percentage of time that we spend interacting with the floating point state is also a consideration.
Suggested-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Mark Brown broonie@kernel.org
tools/testing/selftests/arm64/fp/fp-stress.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/arm64/fp/fp-stress.c b/tools/testing/selftests/arm64/fp/fp-stress.c index faac24bdefeb9436e2daf20b7250d0ae25ca23a7..c986c68fbcacdd295f4db57277075209193cb943 100644 --- a/tools/testing/selftests/arm64/fp/fp-stress.c +++ b/tools/testing/selftests/arm64/fp/fp-stress.c @@ -28,6 +28,9 @@ #define MAX_VLS 16 +#define SIGNAL_INTERVAL_MS 25 +#define LOG_INTERVALS (1000 / SIGNAL_INTERVAL_MS)
Running this, I see that by default test logs:
# Will run for 400s
... for a timeout that's actually ~10s, due to the following, which isn't in the diff:
if (timeout > 0) ksft_print_msg("Will run for %ds\n", timeout);
... so that probably wants an update to either convert to seconds or be in terms of signals, and likewise for the "timeout remaining" message below.
Otherwise, this looks good to me.
Mark.
struct child_data { char *name, *output; pid_t pid; @@ -449,7 +452,7 @@ static const struct option options[] = { int main(int argc, char **argv) { int ret;
- int timeout = 10;
- int timeout = 10 * (1000 / SIGNAL_INTERVAL_MS); int cpus, i, j, c; int sve_vl_count, sme_vl_count; bool all_children_started = false;
@@ -578,14 +581,14 @@ int main(int argc, char **argv) break; /*
* Timeout is counted in seconds with no output, the
* tests print during startup then are silent when
* running so this should ensure they all ran enough
* to install the signal handler, this is especially
* useful in emulation where we will both be slow and
* likely to have a large set of VLs.
* Timeout is counted in poll intervals with no
* output, the tests print during startup then are
* silent when running so this should ensure they all
* ran enough to install the signal handler, this is
* especially useful in emulation where we will both
*/* be slow and likely to have a large set of VLs.
ret = epoll_wait(epoll_fd, evs, tests, 1000);
if (ret < 0) { if (errno == EINTR) continue;ret = epoll_wait(epoll_fd, evs, tests, SIGNAL_INTERVAL_MS);
@@ -625,8 +628,9 @@ int main(int argc, char **argv) all_children_started = true; }
ksft_print_msg("Sending signals, timeout remaining: %d\n",
timeout);
if ((timeout % LOG_INTERVALS) == 0)
ksft_print_msg("Sending signals, timeout remaining: %d\n",
timeout);
for (i = 0; i < num_children; i++) child_tickle(&children[i]);
-- 2.39.2
On Tue, Oct 29, 2024 at 03:43:37PM +0000, Mark Rutland wrote:
On those emulated platforms (FVP?), does this change trigger the faukure more often?
Yes.
I gave this a quick test, and with this change, running fp-stress on a defconfig kernel running on 1 CPU triggers the "Bad SVCR: 0" splat in 35/100 runs. Hacking that down to 5ms brings that to 89/100 runs. So even if we have to keep this high for an emulated platform, it'd probably be worth being able to override that as a parameter?
I was getting better numbers than that with the default multi-CPU setups on my particular machine, most runs were showing something IIRC. I do agree that it'd be a useful command line argument to add incrementally.
Otherwise, maybe it's worth increasing the timeout for the fp-stress test specifically? The docs at:
https://docs.kernel.org/dev-tools/kselftest.html#timeout-for-selftests
... imply that is possible, but don't say exactly how, and it seems legitimate for a stress test.
IIRC it's per suite and there's a bunch of pushback on using it in default configurations since the selftests are expected to run quickly in other cases where I'd have said it was a reasonable change to make. Stress tests are not entirely idiomatic for kselftest, it's supposed to run quickly.
+#define SIGNAL_INTERVAL_MS 25 +#define LOG_INTERVALS (1000 / SIGNAL_INTERVAL_MS)
Running this, I see that by default test logs:
# Will run for 400s
... for a timeout that's actually ~10s, due to the following, which isn't in the diff:
if (timeout > 0) ksft_print_msg("Will run for %ds\n", timeout);
... so that probably wants an update to either convert to seconds or be in terms of signals, and likewise for the "timeout remaining" message below.
Otherwise, this looks good to me.
Oh, yeah - we should probably just remove the unit from that one. I never see it due to all the spam from the subprocesses starting.
While fp-stress is waiting for children to start it doesn't send any signals to them so there is no need for it to have as short an epoll() timeout as it does when the children are all running. We do still want to have some timeout so that we can log diagnostics about missing children but this can be relatively large. On emulated platforms the overhead of running the supervisor process is quite high, especially during the process of execing the test binaries.
Implement a longer epoll() timeout during the setup phase, using a 5s timeout while waiting for children and switching to the signal raise interval when all the children are started and we start sending signals.
Signed-off-by: Mark Brown broonie@kernel.org --- tools/testing/selftests/arm64/fp/fp-stress.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/fp/fp-stress.c b/tools/testing/selftests/arm64/fp/fp-stress.c index c986c68fbcacdd295f4db57277075209193cb943..963e2d891ced72fb8d6eff4fdb5c7df0724b14f1 100644 --- a/tools/testing/selftests/arm64/fp/fp-stress.c +++ b/tools/testing/selftests/arm64/fp/fp-stress.c @@ -453,6 +453,7 @@ int main(int argc, char **argv) { int ret; int timeout = 10 * (1000 / SIGNAL_INTERVAL_MS); + int poll_interval = 5000; int cpus, i, j, c; int sve_vl_count, sme_vl_count; bool all_children_started = false; @@ -588,7 +589,7 @@ int main(int argc, char **argv) * especially useful in emulation where we will both * be slow and likely to have a large set of VLs. */ - ret = epoll_wait(epoll_fd, evs, tests, SIGNAL_INTERVAL_MS); + ret = epoll_wait(epoll_fd, evs, tests, poll_interval); if (ret < 0) { if (errno == EINTR) continue; @@ -626,6 +627,7 @@ int main(int argc, char **argv) }
all_children_started = true; + poll_interval = SIGNAL_INTERVAL_MS; }
if ((timeout % LOG_INTERVALS) == 0)
Nit: the title says we lower the poll interval, while we actually raise it. Maybe that'd be clearer as:
kselftest/arm64: Raise poll timeout while waiting for fp-stress children
... or:
kselftest/arm64: Poll less frequently while waiting for fp-stress children
That aside, this looks fine.
Mark.
On Tue, Oct 29, 2024 at 12:10:40AM +0000, Mark Brown wrote:
While fp-stress is waiting for children to start it doesn't send any signals to them so there is no need for it to have as short an epoll() timeout as it does when the children are all running. We do still want to have some timeout so that we can log diagnostics about missing children but this can be relatively large. On emulated platforms the overhead of running the supervisor process is quite high, especially during the process of execing the test binaries.
Implement a longer epoll() timeout during the setup phase, using a 5s timeout while waiting for children and switching to the signal raise interval when all the children are started and we start sending signals.
Signed-off-by: Mark Brown broonie@kernel.org
tools/testing/selftests/arm64/fp/fp-stress.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/fp/fp-stress.c b/tools/testing/selftests/arm64/fp/fp-stress.c index c986c68fbcacdd295f4db57277075209193cb943..963e2d891ced72fb8d6eff4fdb5c7df0724b14f1 100644 --- a/tools/testing/selftests/arm64/fp/fp-stress.c +++ b/tools/testing/selftests/arm64/fp/fp-stress.c @@ -453,6 +453,7 @@ int main(int argc, char **argv) { int ret; int timeout = 10 * (1000 / SIGNAL_INTERVAL_MS);
- int poll_interval = 5000; int cpus, i, j, c; int sve_vl_count, sme_vl_count; bool all_children_started = false;
@@ -588,7 +589,7 @@ int main(int argc, char **argv) * especially useful in emulation where we will both * be slow and likely to have a large set of VLs. */
ret = epoll_wait(epoll_fd, evs, tests, SIGNAL_INTERVAL_MS);
if (ret < 0) { if (errno == EINTR) continue;ret = epoll_wait(epoll_fd, evs, tests, poll_interval);
@@ -626,6 +627,7 @@ int main(int argc, char **argv) } all_children_started = true;
}poll_interval = SIGNAL_INTERVAL_MS;
if ((timeout % LOG_INTERVALS) == 0)
-- 2.39.2
linux-kselftest-mirror@lists.linaro.org