On 29/11/2018 02:22, Hans van Kranenburg wrote:
Hi,
As also seen at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
Attached there are two serial console output logs. One is starting with Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
[ 2.085543] BUG: unable to handle kernel paging request at ffff888d9fffc000 [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 [ 2.085674] Oops: 0000 [#1] SMP NOPTI [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 [...]
The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
Current upstream kernel is booting fine under Xen, so in general the patch should be fine. Using an upstream kernel built from above commit (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, too.
Kirill, are you aware of any prerequisite patch from 4.20 which could be missing in 4.19.5?
Juergen
On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote:
On 29/11/2018 02:22, Hans van Kranenburg wrote:
Hi,
As also seen at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
Attached there are two serial console output logs. One is starting with Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
[ 2.085543] BUG: unable to handle kernel paging request at ffff888d9fffc000 [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 [ 2.085674] Oops: 0000 [#1] SMP NOPTI [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 [...]
The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
Current upstream kernel is booting fine under Xen, so in general the patch should be fine. Using an upstream kernel built from above commit (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, too.
Kirill, are you aware of any prerequisite patch from 4.20 which could be missing in 4.19.5?
I'm not.
Let me look into this.
On 29/11/2018 14:26, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote:
On 29/11/2018 02:22, Hans van Kranenburg wrote:
Hi,
As also seen at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
Attached there are two serial console output logs. One is starting with Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
[ 2.085543] BUG: unable to handle kernel paging request at ffff888d9fffc000 [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 [ 2.085674] Oops: 0000 [#1] SMP NOPTI [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 [...]
The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
Current upstream kernel is booting fine under Xen, so in general the patch should be fine. Using an upstream kernel built from above commit (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, too.
Kirill, are you aware of any prerequisite patch from 4.20 which could be missing in 4.19.5?
I'm not.
Let me look into this.
What is making me suspicious is the failure happening just after releasing the init memory. Maybe there is an access to .init.data segment or similar? The native kernel booting could be related to the usage of 2M mappings not being available in a PV-domain.
Juergen
On Thu, Nov 29, 2018 at 01:35:17PM +0000, Juergen Gross wrote:
On 29/11/2018 14:26, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote:
On 29/11/2018 02:22, Hans van Kranenburg wrote:
Hi,
As also seen at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
Attached there are two serial console output logs. One is starting with Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
[ 2.085543] BUG: unable to handle kernel paging request at ffff888d9fffc000 [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 [ 2.085674] Oops: 0000 [#1] SMP NOPTI [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 [...]
The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
Current upstream kernel is booting fine under Xen, so in general the patch should be fine. Using an upstream kernel built from above commit (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, too.
Kirill, are you aware of any prerequisite patch from 4.20 which could be missing in 4.19.5?
I'm not.
Let me look into this.
What is making me suspicious is the failure happening just after releasing the init memory. Maybe there is an access to .init.data segment or similar? The native kernel booting could be related to the usage of 2M mappings not being available in a PV-domain.
Sounds like a valid hypothesis.
[ 2.085616] Code: 00 00 00 00 40 00 00 49 83 c5 08 48 01 04 24 4c 3b 6c 24 48 0f 84 83 02 00 00 48 8b 04 24 48 c1 f8 10 48 89 84 24 88 00 00 00 <49> 8b 7d 00 48 f7 c7 9f ff ff ff 0f 85 36 ff ff ff 41 b8 03 00 00 All code ======== 0: 00 00 add %al,(%rax) 2: 00 00 add %al,(%rax) 4: 40 00 00 add %al,(%rax) 7: 49 83 c5 08 add $0x8,%r13 b: 48 01 04 24 add %rax,(%rsp) f: 4c 3b 6c 24 48 cmp 0x48(%rsp),%r13 14: 0f 84 83 02 00 00 je 0x29d 1a: 48 8b 04 24 mov (%rsp),%rax 1e: 48 c1 f8 10 sar $0x10,%rax 22: 48 89 84 24 88 00 00 mov %rax,0x88(%rsp) 29: 00 2a:* 49 8b 7d 00 mov 0x0(%r13),%rdi <-- trapping instruction 2e: 48 f7 c7 9f ff ff ff test $0xffffffffffffff9f,%rdi 35: 0f 85 36 ff ff ff jne 0xffffffffffffff71 3b: 41 rex.B 3c: b8 .byte 0xb8 3d: 03 00 add (%rax),%eax ...
Code starting with the faulting instruction =========================================== 0: 49 8b 7d 00 mov 0x0(%r13),%rdi 4: 48 f7 c7 9f ff ff ff test $0xffffffffffffff9f,%rdi b: 0f 85 36 ff ff ff jne 0xffffffffffffff47 11: 41 rex.B 12: b8 .byte 0xb8 13: 03 00 add (%rax),%eax ...
Reading from %r13 causes the fault.
I don't have a setup to reproduce the issue myself and have hard time correlate the code with source.
What is ptdump_walk_pgd_level_core+0x1fd/0x490 for you?
On Thu, Nov 29, 2018 at 01:35:17PM +0000, Juergen Gross wrote:
On 29/11/2018 14:26, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote:
On 29/11/2018 02:22, Hans van Kranenburg wrote:
Hi,
As also seen at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
Attached there are two serial console output logs. One is starting with Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
[ 2.085543] BUG: unable to handle kernel paging request at ffff888d9fffc000 [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 [ 2.085674] Oops: 0000 [#1] SMP NOPTI [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 [...]
The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
Current upstream kernel is booting fine under Xen, so in general the patch should be fine. Using an upstream kernel built from above commit (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, too.
Kirill, are you aware of any prerequisite patch from 4.20 which could be missing in 4.19.5?
I'm not.
Let me look into this.
What is making me suspicious is the failure happening just after releasing the init memory. Maybe there is an access to .init.data segment or similar? The native kernel booting could be related to the usage of 2M mappings not being available in a PV-domain.
Ahh.. Could you test this:
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index a12afff146d1..7dec63ec7aab 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx) * ffff800000000000 - ffff87ffffffffff is reserved for * the hypervisor. */ - return (idx >= pgd_index(__PAGE_OFFSET) - 16) && + return (idx >= pgd_index(__PAGE_OFFSET) - 17) && (idx < pgd_index(__PAGE_OFFSET)); #else return false;
On Thu, Nov 29, 2018 at 02:24:47PM +0000, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 01:35:17PM +0000, Juergen Gross wrote:
On 29/11/2018 14:26, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote:
On 29/11/2018 02:22, Hans van Kranenburg wrote:
Hi,
As also seen at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
Attached there are two serial console output logs. One is starting with Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
[ 2.085543] BUG: unable to handle kernel paging request at ffff888d9fffc000 [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 [ 2.085674] Oops: 0000 [#1] SMP NOPTI [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 [...]
The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
Current upstream kernel is booting fine under Xen, so in general the patch should be fine. Using an upstream kernel built from above commit (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, too.
Kirill, are you aware of any prerequisite patch from 4.20 which could be missing in 4.19.5?
I'm not.
Let me look into this.
What is making me suspicious is the failure happening just after releasing the init memory. Maybe there is an access to .init.data segment or similar? The native kernel booting could be related to the usage of 2M mappings not being available in a PV-domain.
Ahh.. Could you test this:
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index a12afff146d1..7dec63ec7aab 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx) * ffff800000000000 - ffff87ffffffffff is reserved for * the hypervisor. */
- return (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
- return (idx >= pgd_index(__PAGE_OFFSET) - 17) && (idx < pgd_index(__PAGE_OFFSET));
#else return false;
Or, better, this:
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index a12afff146d1..8c04fadc4423 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -496,8 +496,8 @@ static inline bool is_hypervisor_range(int idx) * ffff800000000000 - ffff87ffffffffff is reserved for * the hypervisor. */ - return (idx >= pgd_index(__PAGE_OFFSET) - 16) && - (idx < pgd_index(__PAGE_OFFSET)); + return (idx >= pgd_index(LDT_BASE_ADDR) - 16) && + (idx < pgd_index(LDT_BASE_ADDR)); #else return false; #endif diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index 2c84c6ad8b50..b078a5b0ac91 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -652,7 +652,7 @@ static int __xen_pgd_walk(struct mm_struct *mm, pgd_t *pgd, * will end up making a zero-sized hole and so is a no-op. */ hole_low = pgd_index(USER_LIMIT); - hole_high = pgd_index(PAGE_OFFSET); + hole_high = pgd_index(LDT_BASE_ADDR);
nr = pgd_index(limit) + 1; for (i = 0; i < nr; i++) {
On 29/11/2018 15:32, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 02:24:47PM +0000, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 01:35:17PM +0000, Juergen Gross wrote:
On 29/11/2018 14:26, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote:
On 29/11/2018 02:22, Hans van Kranenburg wrote:
Hi,
As also seen at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
Attached there are two serial console output logs. One is starting with Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
[ 2.085543] BUG: unable to handle kernel paging request at ffff888d9fffc000 [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 [ 2.085674] Oops: 0000 [#1] SMP NOPTI [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 [...]
The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
Current upstream kernel is booting fine under Xen, so in general the patch should be fine. Using an upstream kernel built from above commit (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, too.
Kirill, are you aware of any prerequisite patch from 4.20 which could be missing in 4.19.5?
I'm not.
Let me look into this.
What is making me suspicious is the failure happening just after releasing the init memory. Maybe there is an access to .init.data segment or similar? The native kernel booting could be related to the usage of 2M mappings not being available in a PV-domain.
Ahh.. Could you test this:
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index a12afff146d1..7dec63ec7aab 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx) * ffff800000000000 - ffff87ffffffffff is reserved for * the hypervisor. */
- return (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
- return (idx >= pgd_index(__PAGE_OFFSET) - 17) && (idx < pgd_index(__PAGE_OFFSET));
#else return false;
Or, better, this:
That makes it boot again!
Any idea why upstream doesn't need it?
Juergen
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index a12afff146d1..8c04fadc4423 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -496,8 +496,8 @@ static inline bool is_hypervisor_range(int idx) * ffff800000000000 - ffff87ffffffffff is reserved for * the hypervisor. */
- return (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
(idx < pgd_index(__PAGE_OFFSET));
- return (idx >= pgd_index(LDT_BASE_ADDR) - 16) &&
(idx < pgd_index(LDT_BASE_ADDR));
#else return false; #endif diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index 2c84c6ad8b50..b078a5b0ac91 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -652,7 +652,7 @@ static int __xen_pgd_walk(struct mm_struct *mm, pgd_t *pgd, * will end up making a zero-sized hole and so is a no-op. */ hole_low = pgd_index(USER_LIMIT);
- hole_high = pgd_index(PAGE_OFFSET);
- hole_high = pgd_index(LDT_BASE_ADDR);
nr = pgd_index(limit) + 1; for (i = 0; i < nr; i++) {
On Thu, Nov 29, 2018 at 03:00:45PM +0000, Juergen Gross wrote:
On 29/11/2018 15:32, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 02:24:47PM +0000, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 01:35:17PM +0000, Juergen Gross wrote:
On 29/11/2018 14:26, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote:
On 29/11/2018 02:22, Hans van Kranenburg wrote: > Hi, > > As also seen at: > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951 > > Attached there are two serial console output logs. One is starting with > Xen 4.11 (from debian unstable) as dom0, and the other one without Xen. > > [ 2.085543] BUG: unable to handle kernel paging request at > ffff888d9fffc000 > [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 > [ 2.085674] Oops: 0000 [#1] SMP NOPTI > [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted > 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 > [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 > [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 > [...]
The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
Current upstream kernel is booting fine under Xen, so in general the patch should be fine. Using an upstream kernel built from above commit (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, too.
Kirill, are you aware of any prerequisite patch from 4.20 which could be missing in 4.19.5?
I'm not.
Let me look into this.
What is making me suspicious is the failure happening just after releasing the init memory. Maybe there is an access to .init.data segment or similar? The native kernel booting could be related to the usage of 2M mappings not being available in a PV-domain.
Ahh.. Could you test this:
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index a12afff146d1..7dec63ec7aab 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx) * ffff800000000000 - ffff87ffffffffff is reserved for * the hypervisor. */
- return (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
- return (idx >= pgd_index(__PAGE_OFFSET) - 17) && (idx < pgd_index(__PAGE_OFFSET));
#else return false;
Or, better, this:
That makes it boot again!
Any idea why upstream doesn't need it?
Nope.
I'll prepare a proper fix.
On 11/29/18 4:06 PM, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 03:00:45PM +0000, Juergen Gross wrote:
On 29/11/2018 15:32, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 02:24:47PM +0000, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 01:35:17PM +0000, Juergen Gross wrote:
On 29/11/2018 14:26, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote: > On 29/11/2018 02:22, Hans van Kranenburg wrote: >> Hi, >> >> As also seen at: >> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951 >> >> Attached there are two serial console output logs. One is starting with >> Xen 4.11 (from debian unstable) as dom0, and the other one without Xen. >> >> [ 2.085543] BUG: unable to handle kernel paging request at >> ffff888d9fffc000 >> [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 >> [ 2.085674] Oops: 0000 [#1] SMP NOPTI >> [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted >> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 >> [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 >> [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 >> [...] > > The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 > ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this > is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream. > > Current upstream kernel is booting fine under Xen, so in general the > patch should be fine. Using an upstream kernel built from above commit > (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, > too. > > Kirill, are you aware of any prerequisite patch from 4.20 which could be > missing in 4.19.5?
I'm not.
Let me look into this.
What is making me suspicious is the failure happening just after releasing the init memory. Maybe there is an access to .init.data segment or similar? The native kernel booting could be related to the usage of 2M mappings not being available in a PV-domain.
Ahh.. Could you test this:
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index a12afff146d1..7dec63ec7aab 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx) * ffff800000000000 - ffff87ffffffffff is reserved for * the hypervisor. */
- return (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
- return (idx >= pgd_index(__PAGE_OFFSET) - 17) && (idx < pgd_index(__PAGE_OFFSET));
#else return false;
Or, better, this:
That makes it boot again!
Any idea why upstream doesn't need it?
Nope.
I'll prepare a proper fix.
Thanks for looking into this.
In the meantime, I applied the "Or, better, this" change, and my dom0 boots again.
FYI, boot log now: (paste 90d valid) https://paste.debian.net/plainh/48940826
Hans
On Fri, Nov 30, 2018 at 01:11:56PM +0000, Hans van Kranenburg wrote:
On 11/29/18 4:06 PM, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 03:00:45PM +0000, Juergen Gross wrote:
On 29/11/2018 15:32, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 02:24:47PM +0000, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 01:35:17PM +0000, Juergen Gross wrote:
On 29/11/2018 14:26, Kirill A. Shutemov wrote: > On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote: >> On 29/11/2018 02:22, Hans van Kranenburg wrote: >>> Hi, >>> >>> As also seen at: >>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951 >>> >>> Attached there are two serial console output logs. One is starting with >>> Xen 4.11 (from debian unstable) as dom0, and the other one without Xen. >>> >>> [ 2.085543] BUG: unable to handle kernel paging request at >>> ffff888d9fffc000 >>> [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 >>> [ 2.085674] Oops: 0000 [#1] SMP NOPTI >>> [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted >>> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 >>> [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 >>> [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 >>> [...] >> >> The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 >> ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this >> is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream. >> >> Current upstream kernel is booting fine under Xen, so in general the >> patch should be fine. Using an upstream kernel built from above commit >> (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, >> too. >> >> Kirill, are you aware of any prerequisite patch from 4.20 which could be >> missing in 4.19.5? > > I'm not. > > Let me look into this. >
What is making me suspicious is the failure happening just after releasing the init memory. Maybe there is an access to .init.data segment or similar? The native kernel booting could be related to the usage of 2M mappings not being available in a PV-domain.
Ahh.. Could you test this:
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index a12afff146d1..7dec63ec7aab 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx) * ffff800000000000 - ffff87ffffffffff is reserved for * the hypervisor. */
- return (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
- return (idx >= pgd_index(__PAGE_OFFSET) - 17) && (idx < pgd_index(__PAGE_OFFSET));
#else return false;
Or, better, this:
That makes it boot again!
Any idea why upstream doesn't need it?
Nope.
I'll prepare a proper fix.
Thanks for looking into this.
In the meantime, I applied the "Or, better, this" change, and my dom0 boots again.
FYI, boot log now: (paste 90d valid) https://paste.debian.net/plainh/48940826
I forgot to CC you:
https://lkml.kernel.org/r/20181130121131.g3xvlvixv7mvlr7b@black.fi.intel.com
Please give it a try.
On 11/30/18 2:26 PM, Kirill A. Shutemov wrote:
On Fri, Nov 30, 2018 at 01:11:56PM +0000, Hans van Kranenburg wrote:
On 11/29/18 4:06 PM, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 03:00:45PM +0000, Juergen Gross wrote:
On 29/11/2018 15:32, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 02:24:47PM +0000, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 01:35:17PM +0000, Juergen Gross wrote: > On 29/11/2018 14:26, Kirill A. Shutemov wrote: >> On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote: >>> On 29/11/2018 02:22, Hans van Kranenburg wrote: >>>> Hi, >>>> >>>> As also seen at: >>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951 >>>> >>>> Attached there are two serial console output logs. One is starting with >>>> Xen 4.11 (from debian unstable) as dom0, and the other one without Xen. >>>> >>>> [ 2.085543] BUG: unable to handle kernel paging request at >>>> ffff888d9fffc000 >>>> [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 >>>> [ 2.085674] Oops: 0000 [#1] SMP NOPTI >>>> [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted >>>> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 >>>> [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 >>>> [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 >>>> [...] >>> >>> The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 >>> ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this >>> is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream. >>> >>> Current upstream kernel is booting fine under Xen, so in general the >>> patch should be fine. Using an upstream kernel built from above commit >>> (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, >>> too. >>> >>> Kirill, are you aware of any prerequisite patch from 4.20 which could be >>> missing in 4.19.5? >> >> I'm not. >> >> Let me look into this. >> > > What is making me suspicious is the failure happening just after > releasing the init memory. Maybe there is an access to .init.data > segment or similar? The native kernel booting could be related to the > usage of 2M mappings not being available in a PV-domain.
Ahh.. Could you test this:
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index a12afff146d1..7dec63ec7aab 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx) * ffff800000000000 - ffff87ffffffffff is reserved for * the hypervisor. */
- return (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
- return (idx >= pgd_index(__PAGE_OFFSET) - 17) && (idx < pgd_index(__PAGE_OFFSET));
#else return false;
Or, better, this:
That makes it boot again!
Any idea why upstream doesn't need it?
Nope.
I'll prepare a proper fix.
Thanks for looking into this.
In the meantime, I applied the "Or, better, this" change, and my dom0 boots again.
FYI, boot log now: (paste 90d valid) https://paste.debian.net/plainh/48940826
I forgot to CC you:
https://lkml.kernel.org/r/20181130121131.g3xvlvixv7mvlr7b@black.fi.intel.com
Please give it a try.
Ah, right, thanks. The xen-devel list is also not in Cc.
I'll slam it on top of my 4.19.5 debian package build and test.
Hans
On 11/30/18 2:26 PM, Kirill A. Shutemov wrote:
On Fri, Nov 30, 2018 at 01:11:56PM +0000, Hans van Kranenburg wrote:
On 11/29/18 4:06 PM, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 03:00:45PM +0000, Juergen Gross wrote:
On 29/11/2018 15:32, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 02:24:47PM +0000, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 01:35:17PM +0000, Juergen Gross wrote: > On 29/11/2018 14:26, Kirill A. Shutemov wrote: >> On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote: >>> On 29/11/2018 02:22, Hans van Kranenburg wrote: >>>> Hi, >>>> >>>> As also seen at: >>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951 >>>> >>>> Attached there are two serial console output logs. One is starting with >>>> Xen 4.11 (from debian unstable) as dom0, and the other one without Xen. >>>> >>>> [ 2.085543] BUG: unable to handle kernel paging request at >>>> ffff888d9fffc000 >>>> [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 >>>> [ 2.085674] Oops: 0000 [#1] SMP NOPTI >>>> [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted >>>> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 >>>> [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 >>>> [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 >>>> [...] >>> >>> The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 >>> ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this >>> is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream. >>> >>> Current upstream kernel is booting fine under Xen, so in general the >>> patch should be fine. Using an upstream kernel built from above commit >>> (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, >>> too. >>> >>> Kirill, are you aware of any prerequisite patch from 4.20 which could be >>> missing in 4.19.5? >> >> I'm not. >> >> Let me look into this. >> > > What is making me suspicious is the failure happening just after > releasing the init memory. Maybe there is an access to .init.data > segment or similar? The native kernel booting could be related to the > usage of 2M mappings not being available in a PV-domain.
Ahh.. Could you test this:
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index a12afff146d1..7dec63ec7aab 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx) * ffff800000000000 - ffff87ffffffffff is reserved for * the hypervisor. */
- return (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
- return (idx >= pgd_index(__PAGE_OFFSET) - 17) && (idx < pgd_index(__PAGE_OFFSET));
#else return false;
Or, better, this:
That makes it boot again!
Any idea why upstream doesn't need it?
Nope.
I'll prepare a proper fix.
Thanks for looking into this.
In the meantime, I applied the "Or, better, this" change, and my dom0 boots again.
FYI, boot log now: (paste 90d valid) https://paste.debian.net/plainh/48940826
I forgot to CC you:
https://lkml.kernel.org/r/20181130121131.g3xvlvixv7mvlr7b@black.fi.intel.com
Please give it a try.
I'm not in that thread, so my response here...
You paste a v2-like patch into 'Re: [PATCH 1/2]'. Juergen says: s/LDT_PGD_ENTRY/GUARD_HOLE_PGD_ENTRY/, then you say Ughh.., change it to GUARD_HOLE_ENTRY, which does not exist, and then get a Reviewed-by from Juergen.
I guess it has to be GUARD_HOLE_PGD_ENTRY after all...
arch/x86/include/asm/pgtable_64_types.h:116:31: error: 'GUARD_HOLE_ENTRY' undeclared (first use in this function); did you mean 'GUARD_HOLE_PGD_ENTRY'?
I'll test that instead.
Hans
On Fri, Nov 30, 2018 at 02:53:50PM +0000, Hans van Kranenburg wrote:
On 11/30/18 2:26 PM, Kirill A. Shutemov wrote:
On Fri, Nov 30, 2018 at 01:11:56PM +0000, Hans van Kranenburg wrote:
On 11/29/18 4:06 PM, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 03:00:45PM +0000, Juergen Gross wrote:
On 29/11/2018 15:32, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 02:24:47PM +0000, Kirill A. Shutemov wrote: > On Thu, Nov 29, 2018 at 01:35:17PM +0000, Juergen Gross wrote: >> On 29/11/2018 14:26, Kirill A. Shutemov wrote: >>> On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote: >>>> On 29/11/2018 02:22, Hans van Kranenburg wrote: >>>>> Hi, >>>>> >>>>> As also seen at: >>>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951 >>>>> >>>>> Attached there are two serial console output logs. One is starting with >>>>> Xen 4.11 (from debian unstable) as dom0, and the other one without Xen. >>>>> >>>>> [ 2.085543] BUG: unable to handle kernel paging request at >>>>> ffff888d9fffc000 >>>>> [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 >>>>> [ 2.085674] Oops: 0000 [#1] SMP NOPTI >>>>> [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted >>>>> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 >>>>> [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 >>>>> [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 >>>>> [...] >>>> >>>> The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 >>>> ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this >>>> is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream. >>>> >>>> Current upstream kernel is booting fine under Xen, so in general the >>>> patch should be fine. Using an upstream kernel built from above commit >>>> (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, >>>> too. >>>> >>>> Kirill, are you aware of any prerequisite patch from 4.20 which could be >>>> missing in 4.19.5? >>> >>> I'm not. >>> >>> Let me look into this. >>> >> >> What is making me suspicious is the failure happening just after >> releasing the init memory. Maybe there is an access to .init.data >> segment or similar? The native kernel booting could be related to the >> usage of 2M mappings not being available in a PV-domain. > > Ahh.. Could you test this: > > diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c > index a12afff146d1..7dec63ec7aab 100644 > --- a/arch/x86/mm/dump_pagetables.c > +++ b/arch/x86/mm/dump_pagetables.c > @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx) > * ffff800000000000 - ffff87ffffffffff is reserved for > * the hypervisor. > */ > - return (idx >= pgd_index(__PAGE_OFFSET) - 16) && > + return (idx >= pgd_index(__PAGE_OFFSET) - 17) && > (idx < pgd_index(__PAGE_OFFSET)); > #else > return false;
Or, better, this:
That makes it boot again!
Any idea why upstream doesn't need it?
Nope.
I'll prepare a proper fix.
Thanks for looking into this.
In the meantime, I applied the "Or, better, this" change, and my dom0 boots again.
FYI, boot log now: (paste 90d valid) https://paste.debian.net/plainh/48940826
I forgot to CC you:
https://lkml.kernel.org/r/20181130121131.g3xvlvixv7mvlr7b@black.fi.intel.com
Please give it a try.
I'm not in that thread, so my response here...
You paste a v2-like patch into 'Re: [PATCH 1/2]'. Juergen says: s/LDT_PGD_ENTRY/GUARD_HOLE_PGD_ENTRY/, then you say Ughh.., change it to GUARD_HOLE_ENTRY, which does not exist, and then get a Reviewed-by from Juergen.
I guess it has to be GUARD_HOLE_PGD_ENTRY after all...
arch/x86/include/asm/pgtable_64_types.h:116:31: error: 'GUARD_HOLE_ENTRY' undeclared (first use in this function); did you mean 'GUARD_HOLE_PGD_ENTRY'?
I'll test that instead.
Yes, thank you. It was a long week... :/
Let me know if it works. I'll repost the fixed version with your Tested-by.
On 11/30/18 5:21 PM, Kirill A. Shutemov wrote:
On Fri, Nov 30, 2018 at 02:53:50PM +0000, Hans van Kranenburg wrote:
On 11/30/18 2:26 PM, Kirill A. Shutemov wrote:
On Fri, Nov 30, 2018 at 01:11:56PM +0000, Hans van Kranenburg wrote:
On 11/29/18 4:06 PM, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 03:00:45PM +0000, Juergen Gross wrote:
On 29/11/2018 15:32, Kirill A. Shutemov wrote: > On Thu, Nov 29, 2018 at 02:24:47PM +0000, Kirill A. Shutemov wrote: >> On Thu, Nov 29, 2018 at 01:35:17PM +0000, Juergen Gross wrote: >>> On 29/11/2018 14:26, Kirill A. Shutemov wrote: >>>> On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote: >>>>> On 29/11/2018 02:22, Hans van Kranenburg wrote: >>>>>> Hi, >>>>>> >>>>>> As also seen at: >>>>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951 >>>>>> >>>>>> Attached there are two serial console output logs. One is starting with >>>>>> Xen 4.11 (from debian unstable) as dom0, and the other one without Xen. >>>>>> >>>>>> [ 2.085543] BUG: unable to handle kernel paging request at >>>>>> ffff888d9fffc000 >>>>>> [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 >>>>>> [ 2.085674] Oops: 0000 [#1] SMP NOPTI >>>>>> [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted >>>>>> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 >>>>>> [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 >>>>>> [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 >>>>>> [...] >>>>> >>>>> The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 >>>>> ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this >>>>> is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream. >>>>> >>>>> Current upstream kernel is booting fine under Xen, so in general the >>>>> patch should be fine. Using an upstream kernel built from above commit >>>>> (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, >>>>> too. >>>>> >>>>> Kirill, are you aware of any prerequisite patch from 4.20 which could be >>>>> missing in 4.19.5? >>>> >>>> I'm not. >>>> >>>> Let me look into this. >>>> >>> >>> What is making me suspicious is the failure happening just after >>> releasing the init memory. Maybe there is an access to .init.data >>> segment or similar? The native kernel booting could be related to the >>> usage of 2M mappings not being available in a PV-domain. >> >> Ahh.. Could you test this: >> >> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c >> index a12afff146d1..7dec63ec7aab 100644 >> --- a/arch/x86/mm/dump_pagetables.c >> +++ b/arch/x86/mm/dump_pagetables.c >> @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx) >> * ffff800000000000 - ffff87ffffffffff is reserved for >> * the hypervisor. >> */ >> - return (idx >= pgd_index(__PAGE_OFFSET) - 16) && >> + return (idx >= pgd_index(__PAGE_OFFSET) - 17) && >> (idx < pgd_index(__PAGE_OFFSET)); >> #else >> return false; > > Or, better, this:
That makes it boot again!
Any idea why upstream doesn't need it?
Nope.
I'll prepare a proper fix.
Thanks for looking into this.
In the meantime, I applied the "Or, better, this" change, and my dom0 boots again.
FYI, boot log now: (paste 90d valid) https://paste.debian.net/plainh/48940826
I forgot to CC you:
https://lkml.kernel.org/r/20181130121131.g3xvlvixv7mvlr7b@black.fi.intel.com
Please give it a try.
I'm not in that thread, so my response here...
You paste a v2-like patch into 'Re: [PATCH 1/2]'. Juergen says: s/LDT_PGD_ENTRY/GUARD_HOLE_PGD_ENTRY/, then you say Ughh.., change it to GUARD_HOLE_ENTRY, which does not exist, and then get a Reviewed-by from Juergen.
I guess it has to be GUARD_HOLE_PGD_ENTRY after all...
arch/x86/include/asm/pgtable_64_types.h:116:31: error: 'GUARD_HOLE_ENTRY' undeclared (first use in this function); did you mean 'GUARD_HOLE_PGD_ENTRY'?
I'll test that instead.
Yes, thank you. It was a long week... :/
Let me know if it works. I'll repost the fixed version with your Tested-by.
Ok. It boots fine as Xen dom0. \o/
You can use "Hans van Kranenburg hans.van.kranenburg@mendix.com" (lowercase please) for reported/tested in the real v2.
Hans
On Thu, Nov 29, 2018 at 02:35:17PM +0100, Juergen Gross wrote:
On 29/11/2018 14:26, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote:
On 29/11/2018 02:22, Hans van Kranenburg wrote:
Hi,
As also seen at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
Attached there are two serial console output logs. One is starting with Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
[ 2.085543] BUG: unable to handle kernel paging request at ffff888d9fffc000 [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 [ 2.085674] Oops: 0000 [#1] SMP NOPTI [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 [...]
The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
Current upstream kernel is booting fine under Xen, so in general the patch should be fine. Using an upstream kernel built from above commit (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, too.
Kirill, are you aware of any prerequisite patch from 4.20 which could be missing in 4.19.5?
I'm not.
Let me look into this.
What is making me suspicious is the failure happening just after releasing the init memory. Maybe there is an access to .init.data segment or similar? The native kernel booting could be related to the usage of 2M mappings not being available in a PV-domain.
Did this ever get fixed anywhere that I can properly backport it to the 4.19.y tree?
thanks,
greg k-h
On 06/12/2018 12:13, Greg KH wrote:
On Thu, Nov 29, 2018 at 02:35:17PM +0100, Juergen Gross wrote:
On 29/11/2018 14:26, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote:
On 29/11/2018 02:22, Hans van Kranenburg wrote:
Hi,
As also seen at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
Attached there are two serial console output logs. One is starting with Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
[ 2.085543] BUG: unable to handle kernel paging request at ffff888d9fffc000 [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 [ 2.085674] Oops: 0000 [#1] SMP NOPTI [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 [...]
The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
Current upstream kernel is booting fine under Xen, so in general the patch should be fine. Using an upstream kernel built from above commit (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, too.
Kirill, are you aware of any prerequisite patch from 4.20 which could be missing in 4.19.5?
I'm not.
Let me look into this.
What is making me suspicious is the failure happening just after releasing the init memory. Maybe there is an access to .init.data segment or similar? The native kernel booting could be related to the usage of 2M mappings not being available in a PV-domain.
Did this ever get fixed anywhere that I can properly backport it to the 4.19.y tree?
https://lore.kernel.org/lkml/20181130202328.65359-2-kirill.shutemov@linux.in...
Still pending upstream. Just pinged tglx.
Juergen
On 12/6/18 12:31 PM, Juergen Gross wrote:
On 06/12/2018 12:13, Greg KH wrote:
On Thu, Nov 29, 2018 at 02:35:17PM +0100, Juergen Gross wrote:
On 29/11/2018 14:26, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote:
On 29/11/2018 02:22, Hans van Kranenburg wrote:
Hi,
As also seen at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
Attached there are two serial console output logs. One is starting with Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
[ 2.085543] BUG: unable to handle kernel paging request at ffff888d9fffc000 [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 [ 2.085674] Oops: 0000 [#1] SMP NOPTI [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 [...]
The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
Current upstream kernel is booting fine under Xen, so in general the patch should be fine. Using an upstream kernel built from above commit (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, too.
Kirill, are you aware of any prerequisite patch from 4.20 which could be missing in 4.19.5?
I'm not.
Let me look into this.
What is making me suspicious is the failure happening just after releasing the init memory. Maybe there is an access to .init.data segment or similar? The native kernel booting could be related to the usage of 2M mappings not being available in a PV-domain.
Did this ever get fixed anywhere that I can properly backport it to the 4.19.y tree?
https://lore.kernel.org/lkml/20181130202328.65359-2-kirill.shutemov@linux.in...
Still pending upstream. Just pinged tglx.
And FYI, I actually use this patch on top of 4.19.5 now. It just applies and works.
Hans
On Thu, Dec 06, 2018 at 12:31:15PM +0100, Juergen Gross wrote:
On 06/12/2018 12:13, Greg KH wrote:
On Thu, Nov 29, 2018 at 02:35:17PM +0100, Juergen Gross wrote:
On 29/11/2018 14:26, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote:
On 29/11/2018 02:22, Hans van Kranenburg wrote:
Hi,
As also seen at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
Attached there are two serial console output logs. One is starting with Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
[ 2.085543] BUG: unable to handle kernel paging request at ffff888d9fffc000 [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 [ 2.085674] Oops: 0000 [#1] SMP NOPTI [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 [...]
The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
Current upstream kernel is booting fine under Xen, so in general the patch should be fine. Using an upstream kernel built from above commit (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, too.
Kirill, are you aware of any prerequisite patch from 4.20 which could be missing in 4.19.5?
I'm not.
Let me look into this.
What is making me suspicious is the failure happening just after releasing the init memory. Maybe there is an access to .init.data segment or similar? The native kernel booting could be related to the usage of 2M mappings not being available in a PV-domain.
Did this ever get fixed anywhere that I can properly backport it to the 4.19.y tree?
https://lore.kernel.org/lkml/20181130202328.65359-2-kirill.shutemov@linux.in...
Still pending upstream. Just pinged tglx.
Thanks, it should have gotten a cc: stable@ tag, but I can watch out for it...
greg k-h
On 06/12/2018 12:46, Greg KH wrote:
On Thu, Dec 06, 2018 at 12:31:15PM +0100, Juergen Gross wrote:
On 06/12/2018 12:13, Greg KH wrote:
On Thu, Nov 29, 2018 at 02:35:17PM +0100, Juergen Gross wrote:
On 29/11/2018 14:26, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote:
On 29/11/2018 02:22, Hans van Kranenburg wrote: > Hi, > > As also seen at: > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951 > > Attached there are two serial console output logs. One is starting with > Xen 4.11 (from debian unstable) as dom0, and the other one without Xen. > > [ 2.085543] BUG: unable to handle kernel paging request at > ffff888d9fffc000 > [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 > [ 2.085674] Oops: 0000 [#1] SMP NOPTI > [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted > 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 > [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 > [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 > [...]
The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
Current upstream kernel is booting fine under Xen, so in general the patch should be fine. Using an upstream kernel built from above commit (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, too.
Kirill, are you aware of any prerequisite patch from 4.20 which could be missing in 4.19.5?
I'm not.
Let me look into this.
What is making me suspicious is the failure happening just after releasing the init memory. Maybe there is an access to .init.data segment or similar? The native kernel booting could be related to the usage of 2M mappings not being available in a PV-domain.
Did this ever get fixed anywhere that I can properly backport it to the 4.19.y tree?
https://lore.kernel.org/lkml/20181130202328.65359-2-kirill.shutemov@linux.in...
Still pending upstream. Just pinged tglx.
Thanks, it should have gotten a cc: stable@ tag, but I can watch out for it...
It's upstream now: commit 16877a5570e0c5f4270d5b17f9bab427bcae9514
Juergen
On 22/12/2018 12:14, Juergen Gross wrote:
On 06/12/2018 12:46, Greg KH wrote:
On Thu, Dec 06, 2018 at 12:31:15PM +0100, Juergen Gross wrote:
On 06/12/2018 12:13, Greg KH wrote:
On Thu, Nov 29, 2018 at 02:35:17PM +0100, Juergen Gross wrote:
On 29/11/2018 14:26, Kirill A. Shutemov wrote:
On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote: > On 29/11/2018 02:22, Hans van Kranenburg wrote: >> Hi, >> >> As also seen at: >> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951 >> >> Attached there are two serial console output logs. One is starting with >> Xen 4.11 (from debian unstable) as dom0, and the other one without Xen. >> >> [ 2.085543] BUG: unable to handle kernel paging request at >> ffff888d9fffc000 >> [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 >> [ 2.085674] Oops: 0000 [#1] SMP NOPTI >> [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted >> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 >> [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 >> [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 >> [...] > > The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 > ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this > is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream. > > Current upstream kernel is booting fine under Xen, so in general the > patch should be fine. Using an upstream kernel built from above commit > (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, > too. > > Kirill, are you aware of any prerequisite patch from 4.20 which could be > missing in 4.19.5?
I'm not.
Let me look into this.
What is making me suspicious is the failure happening just after releasing the init memory. Maybe there is an access to .init.data segment or similar? The native kernel booting could be related to the usage of 2M mappings not being available in a PV-domain.
Did this ever get fixed anywhere that I can properly backport it to the 4.19.y tree?
https://lore.kernel.org/lkml/20181130202328.65359-2-kirill.shutemov@linux.in...
Still pending upstream. Just pinged tglx.
Thanks, it should have gotten a cc: stable@ tag, but I can watch out for it...
It's upstream now: commit 16877a5570e0c5f4270d5b17f9bab427bcae9514
Any reason you didn't include this patch in 4.19.14?
Juergen
On Fri, Jan 11, 2019 at 08:59:52AM +0100, Juergen Gross wrote:
On 22/12/2018 12:14, Juergen Gross wrote:
On 06/12/2018 12:46, Greg KH wrote:
On Thu, Dec 06, 2018 at 12:31:15PM +0100, Juergen Gross wrote:
On 06/12/2018 12:13, Greg KH wrote:
On Thu, Nov 29, 2018 at 02:35:17PM +0100, Juergen Gross wrote:
On 29/11/2018 14:26, Kirill A. Shutemov wrote: > On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote: >> On 29/11/2018 02:22, Hans van Kranenburg wrote: >>> Hi, >>> >>> As also seen at: >>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951 >>> >>> Attached there are two serial console output logs. One is starting with >>> Xen 4.11 (from debian unstable) as dom0, and the other one without Xen. >>> >>> [ 2.085543] BUG: unable to handle kernel paging request at >>> ffff888d9fffc000 >>> [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 >>> [ 2.085674] Oops: 0000 [#1] SMP NOPTI >>> [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted >>> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 >>> [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 >>> [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 >>> [...] >> >> The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 >> ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this >> is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream. >> >> Current upstream kernel is booting fine under Xen, so in general the >> patch should be fine. Using an upstream kernel built from above commit >> (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, >> too. >> >> Kirill, are you aware of any prerequisite patch from 4.20 which could be >> missing in 4.19.5? > > I'm not. > > Let me look into this. >
What is making me suspicious is the failure happening just after releasing the init memory. Maybe there is an access to .init.data segment or similar? The native kernel booting could be related to the usage of 2M mappings not being available in a PV-domain.
Did this ever get fixed anywhere that I can properly backport it to the 4.19.y tree?
https://lore.kernel.org/lkml/20181130202328.65359-2-kirill.shutemov@linux.in...
Still pending upstream. Just pinged tglx.
Thanks, it should have gotten a cc: stable@ tag, but I can watch out for it...
It's upstream now: commit 16877a5570e0c5f4270d5b17f9bab427bcae9514
Any reason you didn't include this patch in 4.19.14?
I was catching up on pending patches and got to this yesterday. It should now be queued up already for the next releases, right?
thanks,
greg k-h
On 11/01/2019 09:46, Greg KH wrote:
On Fri, Jan 11, 2019 at 08:59:52AM +0100, Juergen Gross wrote:
On 22/12/2018 12:14, Juergen Gross wrote:
On 06/12/2018 12:46, Greg KH wrote:
On Thu, Dec 06, 2018 at 12:31:15PM +0100, Juergen Gross wrote:
On 06/12/2018 12:13, Greg KH wrote:
On Thu, Nov 29, 2018 at 02:35:17PM +0100, Juergen Gross wrote: > On 29/11/2018 14:26, Kirill A. Shutemov wrote: >> On Thu, Nov 29, 2018 at 09:41:25AM +0000, Juergen Gross wrote: >>> On 29/11/2018 02:22, Hans van Kranenburg wrote: >>>> Hi, >>>> >>>> As also seen at: >>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951 >>>> >>>> Attached there are two serial console output logs. One is starting with >>>> Xen 4.11 (from debian unstable) as dom0, and the other one without Xen. >>>> >>>> [ 2.085543] BUG: unable to handle kernel paging request at >>>> ffff888d9fffc000 >>>> [ 2.085610] PGD 200c067 P4D 200c067 PUD 0 >>>> [ 2.085674] Oops: 0000 [#1] SMP NOPTI >>>> [ 2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted >>>> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1 >>>> [ 2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018 >>>> [ 2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490 >>>> [...] >>> >>> The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657 >>> ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this >>> is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream. >>> >>> Current upstream kernel is booting fine under Xen, so in general the >>> patch should be fine. Using an upstream kernel built from above commit >>> (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine, >>> too. >>> >>> Kirill, are you aware of any prerequisite patch from 4.20 which could be >>> missing in 4.19.5? >> >> I'm not. >> >> Let me look into this. >> > > What is making me suspicious is the failure happening just after > releasing the init memory. Maybe there is an access to .init.data > segment or similar? The native kernel booting could be related to the > usage of 2M mappings not being available in a PV-domain.
Did this ever get fixed anywhere that I can properly backport it to the 4.19.y tree?
https://lore.kernel.org/lkml/20181130202328.65359-2-kirill.shutemov@linux.in...
Still pending upstream. Just pinged tglx.
Thanks, it should have gotten a cc: stable@ tag, but I can watch out for it...
It's upstream now: commit 16877a5570e0c5f4270d5b17f9bab427bcae9514
Any reason you didn't include this patch in 4.19.14?
I was catching up on pending patches and got to this yesterday. It should now be queued up already for the next releases, right?
Okay, thanks for confirmation.
Juergen
linux-stable-mirror@lists.linaro.org