Re: [PATCH 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs

19 Aug 2021

      On Thu, Aug 19, 2021, Mathieu Desnoyers wrote:
...
----- On Aug 17, 2021, at 8:12 PM, Sean Christopherson seanjc@google.com wrote:
...
Add a test to verify an rseq's CPU ID is updated correctly if the task is
migrated while the kernel is handling KVM_RUN.  This is a regression test
for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer
to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM
without updating rseq, leading to a stale CPU ID and other badness.
Signed-off-by: Sean Christopherson seanjc@google.com
[...]
...

while (!done) {
vcpu_run(vm, VCPU_ID);

TEST_ASSERT(get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC,

	    "Guest failed?");

cpu = sched_getcpu();

rseq_cpu = READ_ONCE(__rseq.cpu_id);

/*

 * Verify rseq's CPU matches sched's CPU, and that sched's CPU

 * is stable.  This doesn't handle the case where the task is

 * migrated between sched_getcpu() and reading rseq, and again

 * between reading rseq and sched_getcpu(), but in practice no

 * false positives have been observed, while on the other hand

 * blocking migration while this thread reads CPUs messes with

 * the timing and prevents hitting failures on a buggy kernel.

 */

I think you could get a stable cpu id between sched_getcpu and __rseq_abi.cpu_id
if you add a pthread mutex to protect:
sched_getcpu and __rseq_abi.cpu_id  reads
vs
sched_setaffinity calls within the migration thread.
Thoughts ?
I tried that and couldn't reproduce the bug.  That's what I attempted to call out
in the blurb "blocking migration while this thread reads CPUs ... prevents hitting
failures on a buggy kernel".
I considered adding arbitrary delays around the mutex to try and hit the bug, but
I was worried that even if I got it "working" for this bug, the test would be too
tailored to this bug and potentially miss future regression.  Letting the two
threads run wild seemed like it would provide the best coverage, at the cost of
potentially causing to false failures.

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs

Signed-off-by: Sean Christopherson seanjc@google.com