Some small fixes for arch_timer_edge_cases that I stumbled upon while debugging failures for this selftest on ampere-one.
Sebastian Ott (3): KVM: arm64: selftests: fix help text for arch_timer_edge_cases KVM: arm64: selftests: fix thread migration in arch_timer_edge_cases KVM: arm64: selftests: arch_timer_edge_cases - workaround for AC03_CPU_14
.../selftests/kvm/arm64/arch_timer_edge_cases.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
base-commit: 9c69f88849045499e8ad114e5e13dbb3c85f4443
Fix the help text for arch_timer_edge_cases to show the correct option for setting the wait time.
Signed-off-by: Sebastian Ott sebott@redhat.com --- tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c b/tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c index a36a7e2db434..c4716e0c1438 100644 --- a/tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c +++ b/tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c @@ -986,7 +986,7 @@ static void test_print_help(char *name) pr_info("\t-b: Test both physical and virtual timers (default: true)\n"); pr_info("\t-l: Delta (in ms) used for long wait time test (default: %u)\n", LONG_WAIT_TEST_MS); - pr_info("\t-l: Delta (in ms) used for wait times (default: %u)\n", + pr_info("\t-w: Delta (in ms) used for wait times (default: %u)\n", WAIT_TEST_MS); pr_info("\t-p: Test physical timer (default: true)\n"); pr_info("\t-v: Test virtual timer (default: true)\n");
arch_timer_edge_cases tries to migrate itself across host cpus. Before the first test it migrates to cpu 0 by setting up an affinity mask with only bit 0 set. After that it looks for the next possible cpu in the current affinity mask which still has only bit 0 set. So there is no migration at all.
Fix this by reading the default mask at start and use this to find the next cpu in each iteration.
Signed-off-by: Sebastian Ott sebott@redhat.com --- tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c b/tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c index c4716e0c1438..a813b4c6c817 100644 --- a/tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c +++ b/tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c @@ -849,17 +849,17 @@ static void guest_code(enum arch_timer timer) GUEST_DONE(); }
+static cpu_set_t default_cpuset; + static uint32_t next_pcpu(void) { uint32_t max = get_nprocs(); uint32_t cur = sched_getcpu(); uint32_t next = cur; - cpu_set_t cpuset; + cpu_set_t cpuset = default_cpuset;
TEST_ASSERT(max > 1, "Need at least two physical cpus");
- sched_getaffinity(0, sizeof(cpuset), &cpuset); - do { next = (next + 1) % CPU_SETSIZE; } while (!CPU_ISSET(next, &cpuset)); @@ -1046,6 +1046,8 @@ int main(int argc, char *argv[]) if (!parse_args(argc, argv)) exit(KSFT_SKIP);
+ sched_getaffinity(0, sizeof(default_cpuset), &default_cpuset); + if (test_args.test_virtual) { test_vm_create(&vm, &vcpu, VIRTUAL); test_run(vm, vcpu);
arch_timer_edge_cases currently fails on ampere-one machines with the following assertion failure:
==== Test Assertion Failure ==== arm64/arch_timer_edge_cases.c:169: timer_condition == istatus pid=11236 tid=11236 errno=4 - Interrupted system call 1 0x0000000000404ce7: test_run at arch_timer_edge_cases.c:938 2 0x0000000000401ebb: main at arch_timer_edge_cases.c:1053 3 0x0000ffff9fa8625b: ?? ??:0 4 0x0000ffff9fa8633b: ?? ??:0 5 0x0000000000401fef: _start at ??:? 0x1 != 0x0 (timer_condition != istatus)
Meaning that the timer condition was met and an interrupt was presented but the timer status bit in the control register was not set.
This happens due to AC03_CPU_14 "Timer CVAL programming of a delta greater than 2^63 will result in incorrect behavior."
Work around this issue by reducing the value that is used to reset the counter and thus reduce the delta.
Link: https://lore.kernel.org/kvmarm/ac1de1d2-ef2b-d439-dc48-8615e121b07b@redhat.c... Link: https://amperecomputing.com/assets/AmpereOne_Developer_ER_v0_80_20240823_289... Signed-off-by: Sebastian Ott sebott@redhat.com --- tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c b/tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c index a813b4c6c817..2f0397df0aa6 100644 --- a/tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c +++ b/tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c @@ -31,7 +31,7 @@ static const int32_t TVAL_MIN = INT32_MIN; static const uint32_t TIMEOUT_NO_IRQ_US = 50000;
/* A nice counter value to use as the starting one for most tests. */ -static const uint64_t DEF_CNT = (CVAL_MAX / 2); +static const uint64_t DEF_CNT = (CVAL_MAX / 4);
/* Number of runs. */ static const uint32_t NR_TEST_ITERS_DEF = 5;
On Fri, 09 May 2025 15:33:12 +0100, Sebastian Ott sebott@redhat.com wrote:
arch_timer_edge_cases currently fails on ampere-one machines with the following assertion failure:
==== Test Assertion Failure ==== arm64/arch_timer_edge_cases.c:169: timer_condition == istatus pid=11236 tid=11236 errno=4 - Interrupted system call 1 0x0000000000404ce7: test_run at arch_timer_edge_cases.c:938 2 0x0000000000401ebb: main at arch_timer_edge_cases.c:1053 3 0x0000ffff9fa8625b: ?? ??:0 4 0x0000ffff9fa8633b: ?? ??:0 5 0x0000000000401fef: _start at ??:? 0x1 != 0x0 (timer_condition != istatus)
Meaning that the timer condition was met and an interrupt was presented but the timer status bit in the control register was not set.
This happens due to AC03_CPU_14 "Timer CVAL programming of a delta greater than 2^63 will result in incorrect behavior."
Work around this issue by reducing the value that is used to reset the counter and thus reduce the delta.
Link: https://lore.kernel.org/kvmarm/ac1de1d2-ef2b-d439-dc48-8615e121b07b@redhat.c... Link: https://amperecomputing.com/assets/AmpereOne_Developer_ER_v0_80_20240823_289... Signed-off-by: Sebastian Ott sebott@redhat.com
tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c b/tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c index a813b4c6c817..2f0397df0aa6 100644 --- a/tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c +++ b/tools/testing/selftests/kvm/arm64/arch_timer_edge_cases.c @@ -31,7 +31,7 @@ static const int32_t TVAL_MIN = INT32_MIN; static const uint32_t TIMEOUT_NO_IRQ_US = 50000; /* A nice counter value to use as the starting one for most tests. */ -static const uint64_t DEF_CNT = (CVAL_MAX / 2); +static const uint64_t DEF_CNT = (CVAL_MAX / 4);
This is rather arbitrary, and only sidestep the issue: the core problem is that CVAL_MAX is defined as ~0, and that we have no idea what the *effective* counter width is.
So while this happen to sidestep the particular Ampere erratum (and avoid failures on X1E), this is only papering over the problem. Which is why I always had some reservations on this particular test -- it is remarkably broken.
If anything, we should compute the expected width of the counter based on the frequency and the architectural guarantees ("Roll-over time of not less than 40 years."), just like the kernel driver does (see arch_counter_get_width()).
I'm also not keen on hiding a HW bug by altering the test. What of other guests that would fall into the same issue? If we think the problem exposed by this test is serious enough, then we need to fully trap and emulate the timers, X1E style. Performance would definitely suffer, but that would be the correct thing to do.
So my proposal is to fix the test to be compliant with the intent of the architecture instead of making bets and using semi-random values. If that's good enough to make that test pass on A1, great.
Thanks,
M.
linux-kselftest-mirror@lists.linaro.org