Hi Gavin,
Thanks for cleaning this up.
July 16, 2022 7:45 AM, "Gavin Shan" gshan@redhat.com wrote:
In rseq_test, there are two threads, which are thread group leader and migration worker. The migration worker relies on sched_setaffinity() to force migration on the thread group leader.
It may be clearer to describe it as a vCPU thread and a migration worker thread. The meat of this test is to catch a regression in KVM.
Unfortunately, we have
s/we have/the test has the/
wrong parameter (0) passed to sched_getaffinity().
wrong PID
It's actually forcing migration on the migration worker instead of the thread group leader.
What's missing is _why_ the migration worker is getting moved around by the call. Perhaps instead it is better to state what a PID of 0 implies, for those of us who haven't read their manpages in a while ;-)
It also means migration can happen on the thread group leader at any time, which eventually leads to failure as the following logs show.
host# uname -r 5.19.0-rc6-gavin+ host# # cat /proc/cpuinfo | grep processor | tail -n 1 processor : 223 host# pwd /home/gavin/sandbox/linux.main/tools/testing/selftests/kvm host# for i in `seq 1 100`; \ do echo "--------> $i"; ./rseq_test; done --------> 1 --------> 2 --------> 3 --------> 4 --------> 5 --------> 6 ==== Test Assertion Failure ==== rseq_test.c:265: rseq_cpu == cpu pid=3925 tid=3925 errno=4 - Interrupted system call 1 0x0000000000401963: main at rseq_test.c:265 (discriminator 2) 2 0x0000ffffb044affb: ?? ??:0 3 0x0000ffffb044b0c7: ?? ??:0 4 0x0000000000401a6f: _start at ??:? rseq CPU = 4, sched CPU = 27
This fixes the issue by passing correct parameter, tid of the group thread leader, to sched_setaffinity().
Kernel commit messages should have an imperative tone:
Fix the issue by ...
Fixes: 61e52f1630f5 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs") Signed-off-by: Gavin Shan gshan@redhat.com
With the comments on the commit message addressed:
Reviewed-by: Oliver Upton oliver.upton@linux.dev