This patch series includes some improvement to Machine check handler for pseries. Patch 1 fixes an issue where machine check handler crashes kernel while accessing vmalloc-ed buffer while in nmi context. Patch 2 fixes endain bug while restoring of r3 in MCE handler. Patch 4 dumps the SLB contents on SLB MCE errors to improve the debugability. Patch 5 display's the MCE error details on console.
CHange in V3: - Moved patch 5 to patch 2
Change in V2: - patch 3: Display additional info (NIP and task info) in MCE error details. - patch 5: Fix endain bug while restoring of r3 in MCE handler.
---
Mahesh Salgaonkar (5): powerpc/pseries: convert rtas_log_buf to linear allocation. powerpc/pseries: Fix endainness while restoring of r3 in MCE handler. powerpc/pseries: Define MCE error event section. powerpc/pseries: Dump and flush SLB contents on SLB MCE errors. powerpc/pseries: Display machine check error details.
arch/powerpc/include/asm/book3s/64/mmu-hash.h | 1 arch/powerpc/include/asm/rtas.h | 109 ++++++++++++++++++ arch/powerpc/kernel/rtasd.c | 2 arch/powerpc/mm/slb.c | 35 ++++++ arch/powerpc/platforms/pseries/ras.c | 155 +++++++++++++++++++++++++ 5 files changed, 299 insertions(+), 3 deletions(-)
-- Signature
From: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com
rtas_log_buf is a buffer to hold RTAS event data that are communicated to kernel by hypervisor. This buffer is then used to pass RTAS event data to user through proc fs. This buffer is allocated from vmalloc (non-linear mapping) area.
On Machine check interrupt, register r3 points to RTAS extended event log passed by hypervisor that contains the MCE event. The pseries machine check handler then logs this error into rtas_log_buf. The rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a page fault (vector 0x300) while accessing it. Since machine check interrupt handler runs in NMI context we can not afford to take any page fault. Page faults are not honored in NMI context and causes kernel panic. This patch fixes this issue by allocating rtas_log_buf using kmalloc.
Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt") Cc: stable@vger.kernel.org Suggested-by: Aneesh Kumar K.V aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com --- arch/powerpc/kernel/rtasd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c index f915db93cd42..3957d4ae2ba2 100644 --- a/arch/powerpc/kernel/rtasd.c +++ b/arch/powerpc/kernel/rtasd.c @@ -559,7 +559,7 @@ static int __init rtas_event_scan_init(void) rtas_error_log_max = rtas_get_error_log_max(); rtas_error_log_buffer_max = rtas_error_log_max + sizeof(int);
- rtas_log_buf = vmalloc(rtas_error_log_buffer_max*LOG_NUMBER); + rtas_log_buf = kmalloc(rtas_error_log_buffer_max*LOG_NUMBER, GFP_KERNEL); if (!rtas_log_buf) { printk(KERN_ERR "rtasd: no memory\n"); return -ENOMEM;
On Thu, 07 Jun 2018 22:58:11 +0530 Mahesh J Salgaonkar mahesh@linux.vnet.ibm.com wrote:
From: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com
rtas_log_buf is a buffer to hold RTAS event data that are communicated to kernel by hypervisor. This buffer is then used to pass RTAS event data to user through proc fs. This buffer is allocated from vmalloc (non-linear mapping) area.
On Machine check interrupt, register r3 points to RTAS extended event log passed by hypervisor that contains the MCE event. The pseries machine check handler then logs this error into rtas_log_buf. The rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a page fault (vector 0x300) while accessing it. Since machine check interrupt handler runs in NMI context we can not afford to take any page fault. Page faults are not honored in NMI context and causes kernel panic. This patch fixes this issue by allocating rtas_log_buf using kmalloc.
Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt") Cc: stable@vger.kernel.org Suggested-by: Aneesh Kumar K.V aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com
arch/powerpc/kernel/rtasd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c index f915db93cd42..3957d4ae2ba2 100644 --- a/arch/powerpc/kernel/rtasd.c +++ b/arch/powerpc/kernel/rtasd.c @@ -559,7 +559,7 @@ static int __init rtas_event_scan_init(void) rtas_error_log_max = rtas_get_error_log_max(); rtas_error_log_buffer_max = rtas_error_log_max + sizeof(int);
- rtas_log_buf = vmalloc(rtas_error_log_buffer_max*LOG_NUMBER);
- rtas_log_buf = kmalloc(rtas_error_log_buffer_max*LOG_NUMBER, GFP_KERNEL);
Does this have to be in the RMA region if it's to be accessed with relocation off in the guest?
A comment about it being accessed with relocation off might be helpful too.
Thanks, Nick
On 06/08/2018 07:01 AM, Nicholas Piggin wrote:
On Thu, 07 Jun 2018 22:58:11 +0530 Mahesh J Salgaonkar mahesh@linux.vnet.ibm.com wrote:
From: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com
rtas_log_buf is a buffer to hold RTAS event data that are communicated to kernel by hypervisor. This buffer is then used to pass RTAS event data to user through proc fs. This buffer is allocated from vmalloc (non-linear mapping) area.
On Machine check interrupt, register r3 points to RTAS extended event log passed by hypervisor that contains the MCE event. The pseries machine check handler then logs this error into rtas_log_buf. The rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a page fault (vector 0x300) while accessing it. Since machine check interrupt handler runs in NMI context we can not afford to take any page fault. Page faults are not honored in NMI context and causes kernel panic. This patch fixes this issue by allocating rtas_log_buf using kmalloc.
Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt") Cc: stable@vger.kernel.org Suggested-by: Aneesh Kumar K.V aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com
arch/powerpc/kernel/rtasd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c index f915db93cd42..3957d4ae2ba2 100644 --- a/arch/powerpc/kernel/rtasd.c +++ b/arch/powerpc/kernel/rtasd.c @@ -559,7 +559,7 @@ static int __init rtas_event_scan_init(void) rtas_error_log_max = rtas_get_error_log_max(); rtas_error_log_buffer_max = rtas_error_log_max + sizeof(int);
- rtas_log_buf = vmalloc(rtas_error_log_buffer_max*LOG_NUMBER);
- rtas_log_buf = kmalloc(rtas_error_log_buffer_max*LOG_NUMBER, GFP_KERNEL);
Does this have to be in the RMA region if it's to be accessed with relocation off in the guest?
Nope not required. It never gets accessed with relocation off.
A comment about it being accessed with relocation off might be helpful too.
Sure.
Thanks, -Mahesh.
From: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com
During Machine Check interrupt on pseries platform, register r3 points RTAS extended event log passed by hypervisor. Since hypervisor uses r3 to pass pointer to rtas log, it stores the original r3 value at the start of the memory (first 8 bytes) pointed by r3. Since hypervisor stores this info and rtas log is in BE format, linux should make sure to restore r3 value in correct endian format.
Without this patch when MCE handler, after recovery, returns to code that that caused the MCE may end up with Data SLB access interrupt for invalid address followed by kernel panic or hang.
[ 62.878965] Severe Machine check interrupt [Recovered] [ 62.878968] NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel] [ 62.878969] Initiator: CPU [ 62.878970] Error type: SLB [Multihit] [ 62.878971] Effective address: d00000000ca70000 cpu 0xa: Vector: 380 (Data SLB Access) at [c0000000fc7775b0] pc: c0000000009694c0: vsnprintf+0x80/0x480 lr: c0000000009698e0: vscnprintf+0x20/0x60 sp: c0000000fc777830 msr: 8000000002009033 dar: a803a30c000000d0 current = 0xc00000000bc9ef00 paca = 0xc00000001eca5c00 softe: 3 irq_happened: 0x01 pid = 8860, comm = insmod [c0000000fc7778b0] c0000000009698e0 vscnprintf+0x20/0x60 [c0000000fc7778e0] c00000000016b6c4 vprintk_emit+0xb4/0x4b0 [c0000000fc777960] c00000000016d40c vprintk_func+0x5c/0xd0 [c0000000fc777980] c00000000016cbb4 printk+0x38/0x4c [c0000000fc7779a0] d00000000ca301c0 init_module+0x1c0/0x338 [bork_kernel] [c0000000fc777a40] c00000000000d9c4 do_one_initcall+0x54/0x230 [c0000000fc777b00] c0000000001b3b74 do_init_module+0x8c/0x248 [c0000000fc777b90] c0000000001b2478 load_module+0x12b8/0x15b0 [c0000000fc777d30] c0000000001b29e8 sys_finit_module+0xa8/0x110 [c0000000fc777e30] c00000000000b204 system_call+0x58/0x6c --- Exception: c00 (System Call) at 00007fff8bda0644 SP (7fffdfbfe980) is in userspace
This patch fixes this issue.
Fixes: a08a53ea4c97 ("powerpc/le: Enable RTAS events support") Cc: stable@vger.kernel.org Signed-off-by: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com --- arch/powerpc/platforms/pseries/ras.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c index 5e1ef9150182..2edc673be137 100644 --- a/arch/powerpc/platforms/pseries/ras.c +++ b/arch/powerpc/platforms/pseries/ras.c @@ -360,7 +360,7 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs) }
savep = __va(regs->gpr[3]); - regs->gpr[3] = savep[0]; /* restore original r3 */ + regs->gpr[3] = be64_to_cpu(savep[0]); /* restore original r3 */
/* If it isn't an extended log we can use the per cpu 64bit buffer */ h = (struct rtas_error_log *)&savep[1];
On Thu, 07 Jun 2018 22:58:33 +0530 Mahesh J Salgaonkar mahesh@linux.vnet.ibm.com wrote:
From: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com
During Machine Check interrupt on pseries platform, register r3 points RTAS extended event log passed by hypervisor. Since hypervisor uses r3 to pass pointer to rtas log, it stores the original r3 value at the start of the memory (first 8 bytes) pointed by r3. Since hypervisor stores this info and rtas log is in BE format, linux should make sure to restore r3 value in correct endian format.
Without this patch when MCE handler, after recovery, returns to code that that caused the MCE may end up with Data SLB access interrupt for invalid address followed by kernel panic or hang.
[ 62.878965] Severe Machine check interrupt [Recovered] [ 62.878968] NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel] [ 62.878969] Initiator: CPU [ 62.878970] Error type: SLB [Multihit] [ 62.878971] Effective address: d00000000ca70000 cpu 0xa: Vector: 380 (Data SLB Access) at [c0000000fc7775b0] pc: c0000000009694c0: vsnprintf+0x80/0x480 lr: c0000000009698e0: vscnprintf+0x20/0x60 sp: c0000000fc777830 msr: 8000000002009033 dar: a803a30c000000d0 current = 0xc00000000bc9ef00 paca = 0xc00000001eca5c00 softe: 3 irq_happened: 0x01 pid = 8860, comm = insmod [c0000000fc7778b0] c0000000009698e0 vscnprintf+0x20/0x60 [c0000000fc7778e0] c00000000016b6c4 vprintk_emit+0xb4/0x4b0 [c0000000fc777960] c00000000016d40c vprintk_func+0x5c/0xd0 [c0000000fc777980] c00000000016cbb4 printk+0x38/0x4c [c0000000fc7779a0] d00000000ca301c0 init_module+0x1c0/0x338 [bork_kernel] [c0000000fc777a40] c00000000000d9c4 do_one_initcall+0x54/0x230 [c0000000fc777b00] c0000000001b3b74 do_init_module+0x8c/0x248 [c0000000fc777b90] c0000000001b2478 load_module+0x12b8/0x15b0 [c0000000fc777d30] c0000000001b29e8 sys_finit_module+0xa8/0x110 [c0000000fc777e30] c00000000000b204 system_call+0x58/0x6c --- Exception: c00 (System Call) at 00007fff8bda0644 SP (7fffdfbfe980) is in userspace
This patch fixes this issue.
LGTM
Reviewed-by: Nicholas Piggin npiggin@gmail.com
Fixes: a08a53ea4c97 ("powerpc/le: Enable RTAS events support") Cc: stable@vger.kernel.org Signed-off-by: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com
arch/powerpc/platforms/pseries/ras.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c index 5e1ef9150182..2edc673be137 100644 --- a/arch/powerpc/platforms/pseries/ras.c +++ b/arch/powerpc/platforms/pseries/ras.c @@ -360,7 +360,7 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs) } savep = __va(regs->gpr[3]);
- regs->gpr[3] = savep[0]; /* restore original r3 */
- regs->gpr[3] = be64_to_cpu(savep[0]); /* restore original r3 */
/* If it isn't an extended log we can use the per cpu 64bit buffer */ h = (struct rtas_error_log *)&savep[1];
Mahesh J Salgaonkar mahesh@linux.vnet.ibm.com writes:
From: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com
During Machine Check interrupt on pseries platform, register r3 points RTAS extended event log passed by hypervisor. Since hypervisor uses r3 to pass pointer to rtas log, it stores the original r3 value at the start of the memory (first 8 bytes) pointed by r3. Since hypervisor stores this info and rtas log is in BE format, linux should make sure to restore r3 value in correct endian format.
Can we hit this under KVM? And if so what if the KVM/qemu is running little endian, does it still write the value BE?
cheers
On 06/08/2018 12:20 PM, Michael Ellerman wrote:
Mahesh J Salgaonkar mahesh@linux.vnet.ibm.com writes:
From: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com
During Machine Check interrupt on pseries platform, register r3 points RTAS extended event log passed by hypervisor. Since hypervisor uses r3 to pass pointer to rtas log, it stores the original r3 value at the start of the memory (first 8 bytes) pointed by r3. Since hypervisor stores this info and rtas log is in BE format, linux should make sure to restore r3 value in correct endian format.
Can we hit this under KVM? And if so what if the KVM/qemu is running little endian, does it still write the value BE?
FWNMI support for qemu is still not in. But when it is in, we can hit this. But whenever FWNMI support gets in, it should pass RTAS event data always in BE format including original r3 value.
Thanks, -Mahesh.
cheers
linux-stable-mirror@lists.linaro.org