Hello,
KVM Self tests located at tools/testing/selftests/kvm seams to be failing. I have tried: - dirty_log_test - x86_64/cr4_cpuid_sync_test - x86_64/evmcs_test - x86_64/platform_info_test - x86_64/set_sregs_test - x86_64/state_test - x86_64/sync_regs_test - x86_64/vmx_tsc_adjust_test
All of them triggered the following assert:
==== Test Assertion Failure ==== lib/kvm_util.c:596: ret == 0 pid=24507 tid=24507 - Invalid argument 1 0x000056360774e468: ?? ??:0 2 0x000056360774e609: ?? ??:0 3 0x00005636077537a0: ?? ??:0 4 0x000056360774d35e: ?? ??:0 5 0x00007f3117571ae6: ?? ??:0 6 0x000056360774d4a9: ?? ??:0 madvise failed, addr: 0x7f3117350000 length: 0x200000 src_type: 0
I have tried the tests on both kvm(next) branch commit 22a7cdcae6a4a3c89748 as well as torvald's master 4.20-rc2.
On Wed, Nov 14, 2018 at 05:30:28PM +0200, Ahmed Soliman wrote:
Hello,
KVM Self tests located at tools/testing/selftests/kvm seams to be failing. I have tried:
- dirty_log_test
- x86_64/cr4_cpuid_sync_test
- x86_64/evmcs_test
- x86_64/platform_info_test
- x86_64/set_sregs_test
- x86_64/state_test
- x86_64/sync_regs_test
- x86_64/vmx_tsc_adjust_test
All of them triggered the following assert:
==== Test Assertion Failure ==== lib/kvm_util.c:596: ret == 0 pid=24507 tid=24507 - Invalid argument 1 0x000056360774e468: ?? ??:0 2 0x000056360774e609: ?? ??:0 3 0x00005636077537a0: ?? ??:0 4 0x000056360774d35e: ?? ??:0 5 0x00007f3117571ae6: ?? ??:0 6 0x000056360774d4a9: ?? ??:0 madvise failed, addr: 0x7f3117350000 length: 0x200000 src_type: 0
I have tried the tests on both kvm(next) branch commit 22a7cdcae6a4a3c89748 as well as torvald's master 4.20-rc2.
Hi Ahmed,
I couldn't reproduce this on a Westmere. Are you sure you're testing a clean compilation? Can you bisect the kernel?
Thanks, drew
On Wed, 14 Nov 2018 at 19:08, Andrew Jones drjones@redhat.com wrote:
On Wed, Nov 14, 2018 at 05:30:28PM +0200, Ahmed Soliman wrote:
Hello,
KVM Self tests located at tools/testing/selftests/kvm seams to be failing. I have tried:
- dirty_log_test
- x86_64/cr4_cpuid_sync_test
- x86_64/evmcs_test
- x86_64/platform_info_test
- x86_64/set_sregs_test
- x86_64/state_test
- x86_64/sync_regs_test
- x86_64/vmx_tsc_adjust_test
All of them triggered the following assert:
==== Test Assertion Failure ==== lib/kvm_util.c:596: ret == 0 pid=24507 tid=24507 - Invalid argument 1 0x000056360774e468: ?? ??:0 2 0x000056360774e609: ?? ??:0 3 0x00005636077537a0: ?? ??:0 4 0x000056360774d35e: ?? ??:0 5 0x00007f3117571ae6: ?? ??:0 6 0x000056360774d4a9: ?? ??:0 madvise failed, addr: 0x7f3117350000 length: 0x200000 src_type: 0
I have tried the tests on both kvm(next) branch commit 22a7cdcae6a4a3c89748 as well as torvald's master 4.20-rc2.
Hi Ahmed,
Hi Drew, Ahmed,
I couldn't reproduce this on a Westmere. Are you sure you're testing a clean compilation? Can you bisect the kernel?
What would be the difference in CONFIG_* fragments that you both have enabled?
Cheers, Anders
Hello,
I couldn't reproduce this on a Westmere. Are you sure you're testing a clean compilation? Can you bisect the kernel?
I don't do a make clean normally, but I will do it this time when bisecting, also I only use shallow clones so it will also take some time pulling. Also to note, The arch I am using is Haswell, I am not sure if that should make any difference though.
What would be the difference in CONFIG_* fragments that you both have enabled?
Here is my config file: https://pastebin.com/fCNV2z8c
Thanks,
Hello again, On Wed, Nov 14, 2018 at 9:23 PM Ahmed Soliman ahmedsoliman@mena.vt.edu wrote:
I don't do a make clean normally, but I will do it this time when bisecting, also I only use shallow clones so it will also take some time pulling. Also to note, The arch I am using is Haswell, I am not sure if that should make any difference though.
I started bisecting from commit 783e9e51266ebb7f78c606a53cb0fa41bb7c31a0 which introduces kvm selftests. I started from complete clean kernel build and run, but unfortunately, I got the Assert there too.
Just in case it might help identifying the issue, this is the output of my lscpu: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 69 Model name: Intel(R) Core(TM) i7-4500U CPU @ 1.80GHz Stepping: 1 CPU MHz: 897.938 CPU max MHz: 3000.0000 CPU min MHz: 800.0000 BogoMIPS: 4788.69 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 4096K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 s mep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
What would be the difference in CONFIG_* fragments that you both have enabled?
Here is my config file: https://pastebin.com/fCNV2z8c
Kind regard,
On Wed, Nov 14, 2018 at 11:29:59PM +0200, Ahmed Soliman wrote:
Hello again, On Wed, Nov 14, 2018 at 9:23 PM Ahmed Soliman ahmedsoliman@mena.vt.edu wrote:
I don't do a make clean normally, but I will do it this time when bisecting, also I only use shallow clones so it will also take some time pulling. Also to note, The arch I am using is Haswell, I am not sure if that should make any difference though.
I started bisecting from commit 783e9e51266ebb7f78c606a53cb0fa41bb7c31a0 which introduces kvm selftests. I started from complete clean kernel build and run, but unfortunately, I got the Assert there too.
I think you should be able to pre-compile the selftests and save them, and then bisect the kernel further back than their introduction. Actually you may want to simplify the test case to just a mmap/madvise pair and see if that reproduces, since kvm selftests doesn't do much more than that. (Other than all the /dev/kvm ioctls - you have the right perms on /dev/kvm, or are running as root, right?)
Thanks, drew
Hello,
On Thu, Nov 15, 2018 at 11:02 AM Andrew Jones drjones@redhat.com wrote:
I think you should be able to pre-compile the selftests and save them, and then bisect the kernel further back than their introduction. Actually you may want to simplify the test case to just a mmap/madvise pair and see if that reproduces, since kvm selftests doesn't do much more than that. (Other than all the /dev/kvm ioctls - you have the right perms on /dev/kvm, or are running as root, right?)
yes I have /dev/kvm permissions, but it shouldn't matter at the moment.
well I stripped the kvm self tests from most stuff and ended up with simpler code (sorry for spaces, I am using web interface to send this email)
#include <stdio.h> #include <sys/mman.h> #include <string.h> #include <errno.h> static void *align(void *x, size_t size) { size_t mask = size - 1; return (void *) (((size_t) x + mask) & ~mask); }
int main(){
int ret; void *mmap_start; void *host_mem; size_t length = 6291456;
mmap_start = mmap(NULL, length, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | 0, -1, 0); host_mem = align(mmap_start, 1); ret = madvise(host_mem, length, MADV_NOHUGEPAGE); printf("madvise returned: %d\nerrno: %d %s\n", ret, errno, strerror(errno)); }
Now I couldn't find a single commit to bisect from, I checked 4.20, 4.19, 4.16, 4.10, 4.4 back to v3.18 tag and None of the builds I attempted did work. All of the kernels returned:
madvise returned: -1 errno: 22 Invalid argument
and the strace logs looks like this:
mmap(NULL, 6291456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46ea2a1000 madvise(0x7f46ea2a1000, 6291456, MADV_NOHUGEPAGE) = -1 EINVAL (Invalid argument)
For comprehension, this is done on intel core i7-4500U CPU @ 1.80GHz
As for now I will comment the madvise line and the assert when writing my own kvm self test. I think it wouldn't cause any trouble?, If it is not the case, please let me know.
Thanks,
On Thu, Nov 15, 2018 at 03:36:44PM +0200, Ahmed Soliman wrote:
mmap(NULL, 6291456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46ea2a1000 madvise(0x7f46ea2a1000, 6291456, MADV_NOHUGEPAGE) = -1 EINVAL (Invalid argument)
For comprehension, this is done on intel core i7-4500U CPU @ 1.80GHz
Argh. I see what it is. Your config doesn't have CONFIG_TRANSPARENT_HUGEPAGE=y, so madvise_behavior_valid() returns false, which causes madvise() to immediately return EINVAL. We should be more careful in kvm selftests with our madvise behavior use.
As for now I will comment the madvise line and the assert when writing my own kvm self test. I think it wouldn't cause any trouble?, If it is not the case, please let me know.
You may not need madvise() at all for your test, depending on what you're doing. So leaving it out may be fine. Reworking kvm selftests to ensure only valid madvise behaviors are used (and only when necessary), before adding new tests, would be best.
Thanks, drew
Hello, On Thu, Nov 15, 2018 at 4:50 PM Andrew Jones drjones@redhat.com wrote:
On Thu, Nov 15, 2018 at 03:36:44PM +0200, Ahmed Soliman wrote:
mmap(NULL, 6291456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46ea2a1000 madvise(0x7f46ea2a1000, 6291456, MADV_NOHUGEPAGE) = -1 EINVAL (Invalid argument)
For comprehension, this is done on intel core i7-4500U CPU @ 1.80GHz
Argh. I see what it is. Your config doesn't have CONFIG_TRANSPARENT_HUGEPAGE=y, so madvise_behavior_valid() returns false, which causes madvise() to immediately return EINVAL. We should be more careful in kvm selftests with our madvise behavior use.
As for now I will comment the madvise line and the assert when writing my own kvm self test. I think it wouldn't cause any trouble?, If it is not the case, please let me know.
You may not need madvise() at all for your test, depending on what you're doing. So leaving it out may be fine. Reworking kvm selftests to ensure only valid madvise behaviors are used (and only when necessary), before adding new tests, would be best.
Confirmed it worked after CONFIG_TRANSPARENT_HUGEPAGE=y I will send a patch soon that should make the self test check if Transparent huge pages are enabled first before issuing the related madvise()
Thanks,
Ahmed Soliman ahmedsoliman@mena.vt.edu writes:
Hello, On Thu, Nov 15, 2018 at 4:50 PM Andrew Jones drjones@redhat.com wrote:
On Thu, Nov 15, 2018 at 03:36:44PM +0200, Ahmed Soliman wrote:
mmap(NULL, 6291456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46ea2a1000 madvise(0x7f46ea2a1000, 6291456, MADV_NOHUGEPAGE) = -1 EINVAL (Invalid argument)
For comprehension, this is done on intel core i7-4500U CPU @ 1.80GHz
Argh. I see what it is. Your config doesn't have CONFIG_TRANSPARENT_HUGEPAGE=y, so madvise_behavior_valid() returns false, which causes madvise() to immediately return EINVAL. We should be more careful in kvm selftests with our madvise behavior use.
As for now I will comment the madvise line and the assert when writing my own kvm self test. I think it wouldn't cause any trouble?, If it is not the case, please let me know.
You may not need madvise() at all for your test, depending on what you're doing. So leaving it out may be fine. Reworking kvm selftests to ensure only valid madvise behaviors are used (and only when necessary), before adding new tests, would be best.
Confirmed it worked after CONFIG_TRANSPARENT_HUGEPAGE=y I will send a patch soon that should make the self test check if Transparent huge pages are enabled first before issuing the related madvise()
Just ignore -EINVAL returned from madvise(), QEMU went even further, see:
commit 1d7414396f926651c4d7a673eb3a10aca5246d76 Author: Dr. David Alan Gilbert dgilbert@redhat.com Date: Thu Nov 19 15:27:48 2015 +0000
Assume madvise for (no)hugepage works
Hello once again,
I noticed that there is still more problems with kvmself test (at least on my machine)
Any test that would successfully reach the guest's code immediately VMexit by a shutdown.
$ ./vmx_tsc_adjust_test ==== Test Assertion Failure ==== x86_64/vmx_tsc_adjust_test.c:156: run->exit_reason == KVM_EXIT_IO pid=8499 tid=8499 - Success 1 0x00005604f07bd36c: ?? ??:0 2 0x00007f5c0dc53ae6: ?? ??:0 3 0x00005604f07bd4b9: ?? ??:0 Got exit_reason other than KVM_EXIT_IO: 8 (SHUTDOWN)
Also I have got a question I noticed 2 kvm test suites, kvm selftests in tools/testing/selftests/kvm/ and this kvm unit test at [1] I am curious if I should put my tests in the kvm unit test intead of kvm selftest
-- links -- [1] https://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git
Thanks Ahmed,
On 16/11/18 13:00, Ahmed Soliman wrote:
Hello once again,
I noticed that there is still more problems with kvmself test (at least on my machine)
Any test that would successfully reach the guest's code immediately VMexit by a shutdown.
$ ./vmx_tsc_adjust_test ==== Test Assertion Failure ==== x86_64/vmx_tsc_adjust_test.c:156: run->exit_reason == KVM_EXIT_IO pid=8499 tid=8499 - Success 1 0x00005604f07bd36c: ?? ??:0 2 0x00007f5c0dc53ae6: ?? ??:0 3 0x00005604f07bd4b9: ?? ??:0 Got exit_reason other than KVM_EXIT_IO: 8 (SHUTDOWN)
Also I have got a question I noticed 2 kvm test suites, kvm selftests in tools/testing/selftests/kvm/ and this kvm unit test at [1] I am curious if I should put my tests in the kvm unit test intead of kvm selftest
If you are testing the API, or some corner cases and race conditions that are hard to synthesize with a "real-world" guest, put it here. If you are testing the emulated hardware, you probably want to put it in kvm-unit-tests instead.
Paolo
Ahmed Soliman ahmedsoliman@mena.vt.edu writes:
Hello once again,
I noticed that there is still more problems with kvmself test (at least on my machine)
Any test that would successfully reach the guest's code immediately VMexit by a shutdown.
$ ./vmx_tsc_adjust_test ==== Test Assertion Failure ==== x86_64/vmx_tsc_adjust_test.c:156: run->exit_reason == KVM_EXIT_IO pid=8499 tid=8499 - Success 1 0x00005604f07bd36c: ?? ??:0 2 0x00007f5c0dc53ae6: ?? ??:0 3 0x00005604f07bd4b9: ?? ??:0 Got exit_reason other than KVM_EXIT_IO: 8 (SHUTDOWN)
Hm, this is weird. Do other KVM users, e.g. qemu work on this host? Are you sure this happens on the first VCPU_RUN invocation?
Could you try doing
trace-cmd record -e kvm ./x86_64/vmx_tsc_adjust_test && trace-cmd report
to see if there's anything suspicious?
On Fri, Nov 16, 2018 at 3:37 PM Vitaly Kuznetsov vkuznets@redhat.com wrote:
Ahmed Soliman ahmedsoliman@mena.vt.edu writes:
Hello once again,
I noticed that there is still more problems with kvmself test (at least on my machine)
Any test that would successfully reach the guest's code immediately VMexit by a shutdown.
$ ./vmx_tsc_adjust_test ==== Test Assertion Failure ==== x86_64/vmx_tsc_adjust_test.c:156: run->exit_reason == KVM_EXIT_IO pid=8499 tid=8499 - Success 1 0x00005604f07bd36c: ?? ??:0 2 0x00007f5c0dc53ae6: ?? ??:0 3 0x00005604f07bd4b9: ?? ??:0 Got exit_reason other than KVM_EXIT_IO: 8 (SHUTDOWN)
Hm, this is weird. Do other KVM users, e.g. qemu work on this host? Are you sure this happens on the first VCPU_RUN invocation?
Yes and I have just tested: qemu-system-x86_64 -enable-kvm -cpu host,+vmx -m 2G
And Linux as well as android for x86 booted up. Also I kinda have my kvm tests that works but they are not built on top of kvm selftest api.
Could you try doing
trace-cmd record -e kvm ./x86_64/vmx_tsc_adjust_test && trace-cmd report
to see if there's anything suspicious?
Yes, there is a triple fault induced shutdown probably during EPT handling:
vmx_tsc_adjust_-17638 [001] 15628.729863: kvm_update_master_clock: masterclock 0 hostclock tsc offsetmatched 0 vmx_tsc_adjust_-17638 [001] 15628.730820: kvm_write_tsc_offset: vcpu=5 prev=0 next=18446733817569215369 vmx_tsc_adjust_-17638 [001] 15628.730822: kvm_track_tsc: vcpu_id 5 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc vmx_tsc_adjust_-17638 [001] 15628.730911: kvm_fpu: load vmx_tsc_adjust_-17638 [001] 15628.730913: kvm_update_master_clock: masterclock 1 hostclock tsc offsetmatched 1 vmx_tsc_adjust_-17638 [001] 15628.730954: kvm_entry: vcpu 5 vmx_tsc_adjust_-17638 [001] 15628.730974: kvm_exit: reason EPT_VIOLATION rip 0x56545f0b8760 info 83 0 vmx_tsc_adjust_-17638 [001] 15628.730976: kvm_page_fault: address 180560 error_code 83 vmx_tsc_adjust_-17638 [001] 15628.730989: kvm_entry: vcpu 5 vmx_tsc_adjust_-17638 [001] 15628.730992: kvm_exit: reason TRIPLE_FAULT rip 0x56545f0b8760 info 0 0 vmx_tsc_adjust_-17638 [001] 15628.730993: kvm_fpu: unload vmx_tsc_adjust_-17638 [001] 15628.730996: kvm_userspace_exit: reason KVM_EXIT_SHUTDOWN (8) vmx_tsc_adjust_-17638 [002] 15628.739237: kvm_hv_stimer_cleanup: vcpu_id 5 timer 0 vmx_tsc_adjust_-17638 [002] 15628.739244: kvm_hv_stimer_cleanup: vcpu_id 5 timer 1 vmx_tsc_adjust_-17638 [002] 15628.739244: kvm_hv_stimer_cleanup: vcpu_id 5 timer 2 vmx_tsc_adjust_-17638 [002] 15628.739245: kvm_hv_stimer_cleanup: vcpu_id 5 timer 3
I will try to dive into them after my midterms are over(ends tomorrow) but I would need to verify that someone else can reproduce this, I tried bisecting but it seams that this bug was there from kvmselftests commit one (rebuilt kernel, kvm and the kvm selftests each time).
Thanks,
On Wed, Nov 14, 2018 at 09:23:41PM +0200, Ahmed Soliman wrote:
Hello,
I couldn't reproduce this on a Westmere. Are you sure you're testing a clean compilation? Can you bisect the kernel?
I don't do a make clean normally, but I will do it this time when bisecting,
Sorry, I meant a clean build of the kvm selftests, not the kernel. The kvm selftests makefile is a bit lacking and I've noticed that tests can assert in random places if they aren't built from scratch - ensuring all changed dependencies also get recompiled. Fixing up the makefile is on my todo, but it wasn't very high on it, since 'make clean && make' is fast enough.
also I only use shallow clones so it will also take some time pulling. Also to note, The arch I am using is Haswell, I am not sure if that should make any difference though.
I'm not sure either, and unfortunately I don't have the same hardware as you to try and reproduce.
What would be the difference in CONFIG_* fragments that you both have enabled?
Here is my config file: https://pastebin.com/fCNV2z8c
I checked the differences but nothing interesting stood out. I'm using a defconfig config with a couple extra file system modules tacked on.
drew
linux-kselftest-mirror@lists.linaro.org