2018-05-19 3:22 GMT+01:00 Dmitry Safonov dima@arista.com:
On Fri, 2018-05-18 at 19:05 -0700, Andy Lutomirski wrote:
On May 18, 2018, at 4:10 PM, Dmitry Safonov 0x7f454c46@gmail.com cpu family : 6 model : 142 model name : Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz But I usually test kernels in VM. So, I use virt-manager as it's easier to manage multiple VMs. The thing is that I've chosen "Copy host CPU configuration" and for some reason, I don't quite follow virt-manager makes model
"Opteron_G4".
I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1- 2.fc26). So, cpuinfo in VM says: cpu family : 21 model : 1 model name : AMD Opteron 62xx class CPU
What does guest cpuinfo say for vendor_id?
There are multiple potential screwups here.
- (What I *thought* was going on) AMD CPUs have screwy IRET behavior
that’s different from Intel’s, and the test case was definitely wrong. But KVM has no way to influence it. Are you sure you’re using KVM and not QEMU TCG? Anyway, the IRET thing is minor compared to your other problems, so let’s try to fix them first.
- Compat fast syscalls are wildly different on AMD and Intel.
Because of this issue, QEMU with KVM is supposed to always report the real vendor_id no matter -cpu asks for. If we get the wrong vendor_id, then we’re at the mercy of KVM’s emulation and performance will suck. On older kernels, this would cause hideous kernel crashes. On new kernels, I would expect it to merely crash 32-bit user programs or be slow.
Heh, I didn't know those details, so it looks like it's (2), vendor_id : AuthenticAMD in guest.
What's worse than registers changes is that some selftests actually lead
to
Oops's. The same reason for criu-ia32 fails. I've tested so far v4.15 and v4.16 releases besides master (2c71d338bef2), so it looks to be not a recent regression. Full Oopses: [ 189.100174] BUG: unable to handle kernel paging request at
00000000417bafe8
[ 189.100174] PGD 69ed4067 P4D 69ed4067 PUD 707fc067 PMD 6c535067 PTE
6991f067
[ 189.100174] Oops: 0001 [#3] SMP NOPTI
Whoa there! 0001 means a failed *kernel* access.
[ 189.100174] Modules linked in: [ 189.100174] CPU: 0 PID: 2443 Comm: sysret_ss_attrs Tainted: G
Was this sysret_ss_attrs_32 or sysret_ss_attrs_64?
sysret_ss_attrs_32 survives
D 4.17.0-rc5+ #11 [ 189.103187] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 [ 189.103187] RIP: 0033:0x40085a
The oops was caused from CPL 3 at what looks like a totally sensible user address. Can you disassemble the offending binary and tell me what the code at 0x40085a is?
Here is the function: 0000000000400842 <call32_from_64>: 400842: 53 push %rbx 400843: 55 push %rbp 400844: 41 54 push %r12 400846: 41 55 push %r13 400848: 41 56 push %r14 40084a: 41 57 push %r15 40084c: 9c pushfq 40084d: 48 89 27 mov %rsp,(%rdi) 400850: 48 89 fc mov %rdi,%rsp 400853: 6a 23 pushq $0x23 400855: 68 5c 08 40 00 pushq $0x40085c 40085a: 48 cb lretq 40085c: ff d6 callq *%rsi 40085e: ea (bad) 40085f: 65 08 40 00 or %al,%gs:0x0(%rax) 400863: 33 00 xor (%rax),%eax 400865: 48 8b 24 24 mov (%rsp),%rsp 400869: 9d popfq 40086a: 41 5f pop %r15 40086c: 41 5e pop %r14 40086e: 41 5d pop %r13 400870: 41 5c pop %r12 400872: 5d pop %rbp 400873: 5b pop %rbx 400874: c3 retq 400875: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 40087c: 00 00 00 40087f: 90 nop
Looks like mov between registers caused it? The hell.
Oh, it's not 400850, I missloked, but 40085a so lretq might case it.
[ 189.103187] RSP: 002b:00000000417bafe8 EFLAGS: 00000206 [ 189.103187] RAX: 0000000000000000 RBX: 00000000000003e8 RCX:
0000000000000000
[ 189.103187] RDX: 0000000000000000 RSI: 0000000000400830 RDI:
00000000417baff8
[ 189.103187] RBP: 00000000417baff8 R08: 0000000000000000 R09:
0000000000000077
[ 189.103187] R10: 0000000000000006 R11: 0000000000000000 R12:
00000000417ba000
[ 189.103187] R13: 00007ffc05207840 R14: 0000000000000000 R15:
0000000000000000
[ 189.103187] FS: 00007f98566ecb40(0000) GS:ffff9740ffc00000(0000) knlGS:0000000000000000 [ 189.103187] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CS here is the value of CS that the *kernel* has, so 0x10 is normal.
[ 189.103187] CR2: 00000000417bafe8 CR3: 0000000069dc4000 CR4:
00000000007406f0
CR2 is in user space.
So the big question is: what happened here? Why did the CPU (or emulated CPU) attempt a privileged access to a user address while running user code?
No idea, but looks like it's not a kernel fault.
-- Thanks, Dmitry