On 7/14/22 10:06, Gavin Shan wrote:
In rseq_test, there are two threads created. Those two threads are 'main' and 'migration_thread' separately. We also have the assumption that non-migration status on 'migration-worker' thread guarantees the same non-migration status on 'main' thread. Unfortunately, the assumption isn't true. The 'main' thread can be migrated from one CPU to another one between the calls to sched_getcpu() and READ_ONCE(__rseq.cpu_id). The following assert is raised eventually because of the mismatched CPU numbers.
The issue can be reproduced on arm64 system occasionally.
Hmm, this does not seem a correct patch - the threads are already synchronizing using seq_cnt, like this:
migration main ---------------------- -------------------------------- seq_cnt = 1 smp_wmb() snapshot = 0 smp_rmb() cpu = sched_getcpu() reads 23 sched_setaffinity() rseq_cpu = __rseq.cpuid reads 35 smp_rmb() snapshot != seq_cnt -> retry smp_wmb() seq_cnt = 2
sched_setaffinity() is guaranteed to block until the task is enqueued on an allowed CPU.
Can you check that smp_rmb() and smp_wmb() generate correct instructions on arm64?
Paolo
host# uname -r 5.19.0-rc6-gavin+ host# # cat /proc/cpuinfo | grep processor | tail -n 1 processor : 223 host# pwd /home/gavin/sandbox/linux.main/tools/testing/selftests/kvm host# for i in `seq 1 100`; \ do echo "--------> $i"; \ ./rseq_test; sleep 3; \ done --------> 1 --------> 2 --------> 3 --------> 4 --------> 5 --------> 6 ==== Test Assertion Failure ==== rseq_test.c:265: rseq_cpu == cpu pid=3925 tid=3925 errno=4 - Interrupted system call 1 0x0000000000401963: main at rseq_test.c:265 (discriminator 2) 2 0x0000ffffb044affb: ?? ??:0 3 0x0000ffffb044b0c7: ?? ??:0 4 0x0000000000401a6f: _start at ??:? rseq CPU = 4, sched CPU = 27
This fixes the issue by double-checking on the current CPU after call to READ_ONCE(__rseq.cpu_id) and restarting the test if the two consecutive CPU numbers aren't euqal.
Fixes: 61e52f1630f5 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs") Signed-off-by: Gavin Shan gshan@redhat.com
tools/testing/selftests/kvm/rseq_test.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c index 4158da0da2bb..74709dd9f5b2 100644 --- a/tools/testing/selftests/kvm/rseq_test.c +++ b/tools/testing/selftests/kvm/rseq_test.c @@ -207,7 +207,7 @@ int main(int argc, char *argv[]) { int r, i, snapshot; struct kvm_vm *vm;
- u32 cpu, rseq_cpu;
- u32 cpu, rseq_cpu, last_cpu;
/* Tell stdout not to buffer its content */ setbuf(stdout, NULL); @@ -259,8 +259,9 @@ int main(int argc, char *argv[]) smp_rmb(); cpu = sched_getcpu(); rseq_cpu = READ_ONCE(__rseq.cpu_id);
last_cpu = sched_getcpu(); smp_rmb();
} while (snapshot != atomic_read(&seq_cnt));
} while (snapshot != atomic_read(&seq_cnt) || cpu != last_cpu);
TEST_ASSERT(rseq_cpu == cpu, "rseq CPU = %d, sched CPU = %d\n", rseq_cpu, cpu);