Currently, the migration worker delays 1-10 us, assuming that one KVM_RUN iteration only takes a few microseconds. But if the CPU low power wakeup latency is large enough, for example, hundreds or even thousands of microseconds deep C-state exit latencies on x86 server CPUs, it may happen that it's not able to wakeup the target CPU before the migration worker starts to migrate the vCPU thread to the next CPU.
If the system workload is light, most CPUs could be at a certain low power state, which may result in less successful migrations and fail the migration/KVM_RUN ratio sanity check. But this is not supposed to be deemed a test failure.
This patch adds a command line option to skip the sanity check in this case.
Co-developed-by: Dongsheng Zhang dongsheng.x.zhang@intel.com Signed-off-by: Dongsheng Zhang dongsheng.x.zhang@intel.com Signed-off-by: Zide Chen zide.chen@intel.com ---
V2: - removed the busy loop implementation - add the new "-s" option
V3: - drop the usleep randomization code - removed the term C-state for less confusion for non-x86 archetectures - changed patch subject
v4: - replaced Signed-off-by with Co-developed-by - changed command line option from "-s" to "-u" - Adopted the much clearer assertion error messages provided by Sean.
V5: - Fixed the missing SoB --- tools/testing/selftests/kvm/rseq_test.c | 35 +++++++++++++++++++++++-- 1 file changed, 33 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c index 28f97fb52044..ad418a5c59dd 100644 --- a/tools/testing/selftests/kvm/rseq_test.c +++ b/tools/testing/selftests/kvm/rseq_test.c @@ -186,12 +186,35 @@ static void calc_min_max_cpu(void) "Only one usable CPU, task migration not possible"); }
+static void help(const char *name) +{ + puts(""); + printf("usage: %s [-h] [-u]\n", name); + printf(" -u: Don't sanity check the number of successful KVM_RUNs\n"); + puts(""); + exit(0); +} + int main(int argc, char *argv[]) { int r, i, snapshot; struct kvm_vm *vm; struct kvm_vcpu *vcpu; u32 cpu, rseq_cpu; + bool skip_sanity_check = false; + int opt; + + while ((opt = getopt(argc, argv, "hu")) != -1) { + switch (opt) { + case 'u': + skip_sanity_check = true; + break; + case 'h': + default: + help(argv[0]); + break; + } + }
r = sched_getaffinity(0, sizeof(possible_mask), &possible_mask); TEST_ASSERT(!r, "sched_getaffinity failed, errno = %d (%s)", errno, @@ -254,9 +277,17 @@ int main(int argc, char *argv[]) * getcpu() to stabilize. A 2:1 migration:KVM_RUN ratio is a fairly * conservative ratio on x86-64, which can do _more_ KVM_RUNs than * migrations given the 1us+ delay in the migration task. + * + * Another reason why it may have small migration:KVM_RUN ratio is that, + * on systems with large low power mode wakeup latency, it may happen + * quite often that the scheduler is not able to wake up the target CPU + * before the vCPU thread is scheduled to another CPU. */ - TEST_ASSERT(i > (NR_TASK_MIGRATIONS / 2), - "Only performed %d KVM_RUNs, task stalled too much?", i); + TEST_ASSERT(skip_sanity_check || i > (NR_TASK_MIGRATIONS / 2), + "Only performed %d KVM_RUNs, task stalled too much? \n" + " Try disabling deep sleep states to reduce CPU wakeup latency,\n" + " e.g. via cpuidle.off=1 or setting /dev/cpu_dma_latency to '0',\n" + " or run with -u to disable this sanity check.", i);
pthread_join(migration_thread, NULL);
On Thu, 02 May 2024 14:39:36 -0700, Zide Chen wrote:
Currently, the migration worker delays 1-10 us, assuming that one KVM_RUN iteration only takes a few microseconds. But if the CPU low power wakeup latency is large enough, for example, hundreds or even thousands of microseconds deep C-state exit latencies on x86 server CPUs, it may happen that it's not able to wakeup the target CPU before the migration worker starts to migrate the vCPU thread to the next CPU.
[...]
Applied to kvm-x86 selftests, thanks! I tweaked the changelog slightly to call out the new comment and assert message. I also added an extra newline so that the "help" part of the assert message is isolated from the primary explanation of why the assert fired. E.g. the output looks like:
==== Test Assertion Failure ==== rseq_test.c:290: skip_sanity_check || i > (NR_TASK_MIGRATIONS * 2002) pid=20283 tid=20283 errno=4 - Interrupted system call 1 0x000000000040210a: main at rseq_test.c:286 2 0x00007f07fa821c86: ?? ??:0 3 0x0000000000402209: _start at ??:? Only performed 11162 KVM_RUNs, task stalled too much?
Try disabling deep sleep states to reduce CPU wakeup latency, e.g. via cpuidle.off=1 or setting /dev/cpu_dma_latency to '0', or run with -u to disable this sanity check.
[1/1] KVM: selftests: Add a new option to rseq_test https://github.com/kvm-x86/linux/commit/20ecf595b513
linux-kselftest-mirror@lists.linaro.org