Re: [Xen-devel] Linux 4.19.5 fails to boot as Xen dom0

29 Nov 2018


      On Thu, Nov 29, 2018 at 01:35:17PM +0000, Juergen Gross wrote:
...
On 29/11/2018 14:26, Kirill A. Shutemov wrote:
...
On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote:
...
On 29/11/2018 02:22, Hans van Kranenburg wrote:
...
Hi,
As also seen at:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
Attached there are two serial console output logs. One is starting with
Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
[    2.085543] BUG: unable to handle kernel paging request at
ffff888d9fffc000
[    2.085610] PGD 200c067 P4D 200c067 PUD 0
[    2.085674] Oops: 0000 [#1] SMP NOPTI
[    2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1
[    2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018
[    2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490
[...]
The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657
("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this
is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
Current upstream kernel is booting fine under Xen, so in general the
patch should be fine. Using an upstream kernel built from above commit
(with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine,
too.
Kirill, are you aware of any prerequisite patch from 4.20 which could be
missing in 4.19.5?
I'm not.
Let me look into this.
What is making me suspicious is the failure happening just after
releasing the init memory. Maybe there is an access to .init.data
segment or similar? The native kernel booting could be related to the
usage of 2M mappings not being available in a PV-domain.
Sounds like a valid hypothesis.
[ 2.085616] Code: 00 00 00 00 40 00 00 49 83 c5 08 48 01 04 24 4c 3b 6c 24 48 0f 84 83 02 00 00 48 8b 04 24 48 c1 f8
 10 48 89 84 24 88 00 00 00 <49> 8b 7d 00 48 f7 c7 9f ff ff ff 0f 85 36 ff ff ff 41 b8 03 00 00
All code
========
   0:   00 00                   add    %al,(%rax)
   2:   00 00                   add    %al,(%rax)
   4:   40 00 00                add    %al,(%rax)
   7:   49 83 c5 08             add    $0x8,%r13
   b:   48 01 04 24             add    %rax,(%rsp)
   f:   4c 3b 6c 24 48          cmp    0x48(%rsp),%r13
  14:   0f 84 83 02 00 00       je     0x29d
  1a:   48 8b 04 24             mov    (%rsp),%rax
  1e:   48 c1 f8 10             sar    $0x10,%rax
  22:   48 89 84 24 88 00 00    mov    %rax,0x88(%rsp)
  29:   00
  2a:*  49 8b 7d 00             mov    0x0(%r13),%rdi           <-- trapping instruction
  2e:   48 f7 c7 9f ff ff ff    test   $0xffffffffffffff9f,%rdi
  35:   0f 85 36 ff ff ff       jne    0xffffffffffffff71
  3b:   41                      rex.B
  3c:   b8                      .byte 0xb8
  3d:   03 00                   add    (%rax),%eax
        ...
Code starting with the faulting instruction
===========================================
   0:   49 8b 7d 00             mov    0x0(%r13),%rdi
   4:   48 f7 c7 9f ff ff ff    test   $0xffffffffffffff9f,%rdi
   b:   0f 85 36 ff ff ff       jne    0xffffffffffffff47
  11:   41                      rex.B
  12:   b8                      .byte 0xb8
  13:   03 00                   add    (%rax),%eax
        ...
Reading from %r13 causes the fault.
I don't have a setup to reproduce the issue myself and have hard time
correlate the code with source.
What is ptdump_walk_pgd_level_core+0x1fd/0x490 for you?
-- 
 Kirill A. Shutemov

2024

2023

2022

2021

2020

2019

2018

2017

Re: [Xen-devel] Linux 4.19.5 fails to boot as Xen dom0