In rseq_test, there are two threads, which are thread group leader and migration worker. The migration worker relies on sched_setaffinity() to force migration on the thread group leader. Unfortunately, we have wrong parameter (0) passed to sched_getaffinity(). It's actually forcing migration on the migration worker instead of the thread group leader. It also means migration can happen on the thread group leader at any time, which eventually leads to failure as the following logs show.
host# uname -r 5.19.0-rc6-gavin+ host# # cat /proc/cpuinfo | grep processor | tail -n 1 processor : 223 host# pwd /home/gavin/sandbox/linux.main/tools/testing/selftests/kvm host# for i in `seq 1 100`; \ do echo "--------> $i"; ./rseq_test; done --------> 1 --------> 2 --------> 3 --------> 4 --------> 5 --------> 6 ==== Test Assertion Failure ==== rseq_test.c:265: rseq_cpu == cpu pid=3925 tid=3925 errno=4 - Interrupted system call 1 0x0000000000401963: main at rseq_test.c:265 (discriminator 2) 2 0x0000ffffb044affb: ?? ??:0 3 0x0000ffffb044b0c7: ?? ??:0 4 0x0000000000401a6f: _start at ??:? rseq CPU = 4, sched CPU = 27
This fixes the issue by passing correct parameter, tid of the group thread leader, to sched_setaffinity().
Fixes: 61e52f1630f5 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs") Signed-off-by: Gavin Shan gshan@redhat.com --- tools/testing/selftests/kvm/rseq_test.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c index 4158da0da2bb..c83ac7b467f8 100644 --- a/tools/testing/selftests/kvm/rseq_test.c +++ b/tools/testing/selftests/kvm/rseq_test.c @@ -38,6 +38,7 @@ static __thread volatile struct rseq __rseq = { */ #define NR_TASK_MIGRATIONS 100000
+static pid_t rseq_tid; static pthread_t migration_thread; static cpu_set_t possible_mask; static int min_cpu, max_cpu; @@ -106,7 +107,8 @@ static void *migration_worker(void *ign) * stable, i.e. while changing affinity is in-progress. */ smp_wmb(); - r = sched_setaffinity(0, sizeof(allowed_mask), &allowed_mask); + r = sched_setaffinity(rseq_tid, sizeof(allowed_mask), + &allowed_mask); TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d (%s)", errno, strerror(errno)); smp_wmb(); @@ -231,6 +233,7 @@ int main(int argc, char *argv[]) vm = vm_create_default(VCPU_ID, 0, guest_code); ucall_init(vm, NULL);
+ rseq_tid = gettid(); pthread_create(&migration_thread, NULL, migration_worker, 0);
for (i = 0; !done; i++) {
Hi Gavin,
Thanks for cleaning this up.
July 16, 2022 7:45 AM, "Gavin Shan" gshan@redhat.com wrote:
In rseq_test, there are two threads, which are thread group leader and migration worker. The migration worker relies on sched_setaffinity() to force migration on the thread group leader.
It may be clearer to describe it as a vCPU thread and a migration worker thread. The meat of this test is to catch a regression in KVM.
Unfortunately, we have
s/we have/the test has the/
wrong parameter (0) passed to sched_getaffinity().
wrong PID
It's actually forcing migration on the migration worker instead of the thread group leader.
What's missing is _why_ the migration worker is getting moved around by the call. Perhaps instead it is better to state what a PID of 0 implies, for those of us who haven't read their manpages in a while ;-)
It also means migration can happen on the thread group leader at any time, which eventually leads to failure as the following logs show.
host# uname -r 5.19.0-rc6-gavin+ host# # cat /proc/cpuinfo | grep processor | tail -n 1 processor : 223 host# pwd /home/gavin/sandbox/linux.main/tools/testing/selftests/kvm host# for i in `seq 1 100`; \ do echo "--------> $i"; ./rseq_test; done --------> 1 --------> 2 --------> 3 --------> 4 --------> 5 --------> 6 ==== Test Assertion Failure ==== rseq_test.c:265: rseq_cpu == cpu pid=3925 tid=3925 errno=4 - Interrupted system call 1 0x0000000000401963: main at rseq_test.c:265 (discriminator 2) 2 0x0000ffffb044affb: ?? ??:0 3 0x0000ffffb044b0c7: ?? ??:0 4 0x0000000000401a6f: _start at ??:? rseq CPU = 4, sched CPU = 27
This fixes the issue by passing correct parameter, tid of the group thread leader, to sched_setaffinity().
Kernel commit messages should have an imperative tone:
Fix the issue by ...
Fixes: 61e52f1630f5 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs") Signed-off-by: Gavin Shan gshan@redhat.com
With the comments on the commit message addressed:
Reviewed-by: Oliver Upton oliver.upton@linux.dev
Hi Oliver,
On 7/17/22 7:48 AM, oliver.upton@linux.dev wrote:
Thanks for cleaning this up.
Thanks for your review.
July 16, 2022 7:45 AM, "Gavin Shan" gshan@redhat.com wrote:
In rseq_test, there are two threads, which are thread group leader and migration worker. The migration worker relies on sched_setaffinity() to force migration on the thread group leader.
It may be clearer to describe it as a vCPU thread and a migration worker thread. The meat of this test is to catch a regression in KVM.
Unfortunately, we have
s/we have/the test has the/
wrong parameter (0) passed to sched_getaffinity().
wrong PID
Yep, it's much clearer to describe it as vCPU thread and migration worker.
It's actually forcing migration on the migration worker instead of the thread group leader.
What's missing is _why_ the migration worker is getting moved around by the call. Perhaps instead it is better to state what a PID of 0 implies, for those of us who haven't read their manpages in a while ;-)
Yes, it's good idea. I will have something like below in next revision :)
In rseq_test, there are two threads, which are vCPU thread and migration worker separately. Unfortunately, the test has the wrong PID passed to sched_setaffinity() in the migration worker. It forces migration on the migration worker because zeroed PID represents the calling thread, which is the migration worker itself. It means the vCPU thread is never enforced to migration and it can migrate at any time, which eventually leads to failure as the following logs show. : : Fix the issue by passing correct parameter, TID of the vCPU thread, to sched_setaffinity() in the migration worker.
It also means migration can happen on the thread group leader at any time, which eventually leads to failure as the following logs show.
host# uname -r 5.19.0-rc6-gavin+ host# # cat /proc/cpuinfo | grep processor | tail -n 1 processor : 223 host# pwd /home/gavin/sandbox/linux.main/tools/testing/selftests/kvm host# for i in `seq 1 100`; \ do echo "--------> $i"; ./rseq_test; done --------> 1 --------> 2 --------> 3 --------> 4 --------> 5 --------> 6 ==== Test Assertion Failure ==== rseq_test.c:265: rseq_cpu == cpu pid=3925 tid=3925 errno=4 - Interrupted system call 1 0x0000000000401963: main at rseq_test.c:265 (discriminator 2) 2 0x0000ffffb044affb: ?? ??:0 3 0x0000ffffb044b0c7: ?? ??:0 4 0x0000000000401a6f: _start at ??:? rseq CPU = 4, sched CPU = 27
This fixes the issue by passing correct parameter, tid of the group thread leader, to sched_setaffinity().
Kernel commit messages should have an imperative tone:
Fix the issue by ...
Ok. I've been having my style for long time. Actually, the style was shared by some one when I worked for IBM long time ago. I will bear it in mind to use imperative expression since now on :)
All your comments will be fixed in next revision, but I would delay the posting a bit to see Sean or Paolo have more comments. In that case, I can fix all of them at once.
Fixes: 61e52f1630f5 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs") Signed-off-by: Gavin Shan gshan@redhat.com
With the comments on the commit message addressed:
Reviewed-by: Oliver Upton oliver.upton@linux.dev
Thanks again for your time on this.
Thanks, Gavin
On 7/17/22 1:11 PM, Gavin Shan wrote:
On 7/17/22 7:48 AM, oliver.upton@linux.dev wrote:
July 16, 2022 7:45 AM, "Gavin Shan" gshan@redhat.com wrote:
In rseq_test, there are two threads, which are thread group leader and migration worker. The migration worker relies on sched_setaffinity() to force migration on the thread group leader.
It may be clearer to describe it as a vCPU thread and a migration worker thread. The meat of this test is to catch a regression in KVM.
Unfortunately, we have
s/we have/the test has the/
wrong parameter (0) passed to sched_getaffinity().
wrong PID
Yep, it's much clearer to describe it as vCPU thread and migration worker.
It's actually forcing migration on the migration worker instead of the thread group leader.
What's missing is _why_ the migration worker is getting moved around by the call. Perhaps instead it is better to state what a PID of 0 implies, for those of us who haven't read their manpages in a while ;-)
Yes, it's good idea. I will have something like below in next revision :)
In rseq_test, there are two threads, which are vCPU thread and migration worker separately. Unfortunately, the test has the wrong PID passed to sched_setaffinity() in the migration worker. It forces migration on the migration worker because zeroed PID represents the calling thread, which is the migration worker itself. It means the vCPU thread is never enforced to migration and it can migrate at any time, which eventually leads to failure as the following logs show. : : Fix the issue by passing correct parameter, TID of the vCPU thread, to sched_setaffinity() in the migration worker.
It also means migration can happen on the thread group leader at any time, which eventually leads to failure as the following logs show.
host# uname -r 5.19.0-rc6-gavin+ host# # cat /proc/cpuinfo | grep processor | tail -n 1 processor : 223 host# pwd /home/gavin/sandbox/linux.main/tools/testing/selftests/kvm host# for i in `seq 1 100`; \ do echo "--------> $i"; ./rseq_test; done --------> 1 --------> 2 --------> 3 --------> 4 --------> 5 --------> 6 ==== Test Assertion Failure ==== rseq_test.c:265: rseq_cpu == cpu pid=3925 tid=3925 errno=4 - Interrupted system call 1 0x0000000000401963: main at rseq_test.c:265 (discriminator 2) 2 0x0000ffffb044affb: ?? ??:0 3 0x0000ffffb044b0c7: ?? ??:0 4 0x0000000000401a6f: _start at ??:? rseq CPU = 4, sched CPU = 27
This fixes the issue by passing correct parameter, tid of the group thread leader, to sched_setaffinity().
Kernel commit messages should have an imperative tone:
Fix the issue by ...
Ok. I've been having my style for long time. Actually, the style was shared by some one when I worked for IBM long time ago. I will bear it in mind to use imperative expression since now on :)
All your comments will be fixed in next revision, but I would delay the posting a bit to see Sean or Paolo have more comments. In that case, I can fix all of them at once.
v3 was just posted.
https://lore.kernel.org/kvmarm/20220719013540.3477946-1-gshan@redhat.com/T/#...
Fixes: 61e52f1630f5 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs") Signed-off-by: Gavin Shan gshan@redhat.com
With the comments on the commit message addressed:
Reviewed-by: Oliver Upton oliver.upton@linux.dev
Thanks, Gavin
linux-kselftest-mirror@lists.linaro.org