From: "Jason A. Donenfeld" Jason@zx2c4.com
[ Upstream commit a7c01fa93aeb03ab76cd3cb2107990dd160498e6 ]
I was recently surprised to learn that msleep_interruptible(), wait_for_completion_interruptible_timeout(), and related functions simply hung when I called kthread_stop() on kthreads using them. The solution to fixing the case with msleep_interruptible() was more simply to move to schedule_timeout_interruptible(). Why?
The reason is that msleep_interruptible(), and many functions just like it, has a loop like this:
while (timeout && !signal_pending(current)) timeout = schedule_timeout_interruptible(timeout);
The call to kthread_stop() woke up the thread, so schedule_timeout_ interruptible() returned early, but because signal_pending() returned true, it went back into another timeout, which was never woken up.
This wait loop pattern is common to various pieces of code, and I suspect that the subtle misuse in a kthread that caused a deadlock in the code I looked at last week is also found elsewhere.
So this commit causes signal_pending() to return true when kthread_stop() is called, by setting TIF_NOTIFY_SIGNAL.
The same also probably applies to the similar kthread_park() functionality, but that can be addressed later, as its semantics are slightly different.
Cc: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com v1: https://lkml.kernel.org/r/20220627120020.608117-1-Jason@zx2c4.com v2: https://lkml.kernel.org/r/20220627145716.641185-1-Jason@zx2c4.com v3: https://lkml.kernel.org/r/20220628161441.892925-1-Jason@zx2c4.com v4: https://lkml.kernel.org/r/20220711202136.64458-1-Jason@zx2c4.com v5: https://lkml.kernel.org/r/20220711232123.136330-1-Jason@zx2c4.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/kthread.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/kthread.c b/kernel/kthread.c index 3c677918d8f2..7243a010f433 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -704,6 +704,7 @@ int kthread_stop(struct task_struct *k) kthread = to_kthread(k); set_bit(KTHREAD_SHOULD_STOP, &kthread->flags); kthread_unpark(k); + set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL); wake_up_process(k); wait_for_completion(&kthread->exited); ret = kthread->result;
From: ye xingchen ye.xingchen@zte.com.cn
[ Upstream commit c814bf958926ff45a9c1e899bd001006ab6cfbae ]
Use timersub() function to simplify the code.
Reported-by: Zeal Robot zealci@zte.com.cn Signed-off-by: ye xingchen ye.xingchen@zte.com.cn Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220816105106.82666-1-ye.xingchen@zte.com.cn Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/powerpc/benchmarks/gettimeofday.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/powerpc/benchmarks/gettimeofday.c b/tools/testing/selftests/powerpc/benchmarks/gettimeofday.c index 6b415683357b..580fcac0a09f 100644 --- a/tools/testing/selftests/powerpc/benchmarks/gettimeofday.c +++ b/tools/testing/selftests/powerpc/benchmarks/gettimeofday.c @@ -12,7 +12,7 @@ static int test_gettimeofday(void) { int i;
- struct timeval tv_start, tv_end; + struct timeval tv_start, tv_end, tv_diff;
gettimeofday(&tv_start, NULL);
@@ -20,7 +20,9 @@ static int test_gettimeofday(void) gettimeofday(&tv_end, NULL); }
- printf("time = %.6f\n", tv_end.tv_sec - tv_start.tv_sec + (tv_end.tv_usec - tv_start.tv_usec) * 1e-6); + timersub(&tv_start, &tv_end, &tv_diff); + + printf("time = %.6f\n", tv_diff.tv_sec + (tv_diff.tv_usec) * 1e-6);
return 0; }
From: Junaid Shahid junaids@google.com
[ Upstream commit b24ede22538b4d984cbe20532bbcb303692e7f52 ]
If vm_init() fails [which can happen, for instance, if a memory allocation fails during avic_vm_init()], we need to cleanup some state in order to avoid resource leaks.
Signed-off-by: Junaid Shahid junaids@google.com Link: https://lore.kernel.org/r/20220729224329.323378-1-junaids@google.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/x86.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b0c47b41c264..11fbd42100be 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12080,6 +12080,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) if (ret) goto out_page_track;
+ ret = static_call(kvm_x86_vm_init)(kvm); + if (ret) + goto out_uninit_mmu; + INIT_HLIST_HEAD(&kvm->arch.mask_notifier_list); INIT_LIST_HEAD(&kvm->arch.assigned_dev_head); atomic_set(&kvm->arch.noncoherent_dma_count, 0); @@ -12115,8 +12119,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) kvm_hv_init_vm(kvm); kvm_xen_init_vm(kvm);
- return static_call(kvm_x86_vm_init)(kvm); + return 0;
+out_uninit_mmu: + kvm_mmu_uninit_vm(kvm); out_page_track: kvm_page_track_cleanup(kvm); out:
On 10/14/22 15:51, Sasha Levin wrote:
From: Junaid Shahid junaids@google.com
[ Upstream commit b24ede22538b4d984cbe20532bbcb303692e7f52 ]
If vm_init() fails [which can happen, for instance, if a memory allocation fails during avic_vm_init()], we need to cleanup some state in order to avoid resource leaks.
Signed-off-by: Junaid Shahid junaids@google.com Link: https://lore.kernel.org/r/20220729224329.323378-1-junaids@google.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Sasha Levin sashal@kernel.org
arch/x86/kvm/x86.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b0c47b41c264..11fbd42100be 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12080,6 +12080,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) if (ret) goto out_page_track;
- ret = static_call(kvm_x86_vm_init)(kvm);
- if (ret)
goto out_uninit_mmu;
- INIT_HLIST_HEAD(&kvm->arch.mask_notifier_list); INIT_LIST_HEAD(&kvm->arch.assigned_dev_head); atomic_set(&kvm->arch.noncoherent_dma_count, 0);
@@ -12115,8 +12119,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) kvm_hv_init_vm(kvm); kvm_xen_init_vm(kvm);
- return static_call(kvm_x86_vm_init)(kvm);
- return 0;
+out_uninit_mmu:
- kvm_mmu_uninit_vm(kvm); out_page_track: kvm_page_track_cleanup(kvm); out:
Acked-by: Paolo Bonzini pbonzini@redhat.com
From: Christophe Leroy christophe.leroy@csgroup.eu
[ Upstream commit 7245fc5bb7a966852d5bd7779d1f5855530b461a ]
As reported by Nathan, the module_init() macro was not taken into account because the header was missing. That means spe_mathemu_init() was never called.
This should have been detected by gcc at build time, but due to '-w' flag it went undetected.
Removing that flag leads to many warnings hence errors.
Fix those warnings then remove the -w flag.
Reported-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Reviewed-by: Nathan Chancellor nathan@kernel.org Link: https://lore.kernel.org/r/2663961738a46073713786d4efeb53100ca156e7.166213427... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/math-emu/Makefile | 2 -- arch/powerpc/math-emu/math.c | 18 +++++----- arch/powerpc/math-emu/math_efp.c | 57 +++++++++++++++++--------------- include/math-emu/op-common.h | 3 ++ 4 files changed, 42 insertions(+), 38 deletions(-)
diff --git a/arch/powerpc/math-emu/Makefile b/arch/powerpc/math-emu/Makefile index a8794032f15f..26fef2e5672e 100644 --- a/arch/powerpc/math-emu/Makefile +++ b/arch/powerpc/math-emu/Makefile @@ -16,5 +16,3 @@ obj-$(CONFIG_SPE) += math_efp.o
CFLAGS_fabs.o = -fno-builtin-fabs CFLAGS_math.o = -fno-builtin-fabs - -ccflags-y = -w diff --git a/arch/powerpc/math-emu/math.c b/arch/powerpc/math-emu/math.c index 36761bd00f38..936a9a149037 100644 --- a/arch/powerpc/math-emu/math.c +++ b/arch/powerpc/math-emu/math.c @@ -24,9 +24,9 @@ FLOATFUNC(mtfsf); FLOATFUNC(mtfsfi);
#ifdef CONFIG_MATH_EMULATION_HW_UNIMPLEMENTED -#undef FLOATFUNC(x) +#undef FLOATFUNC #define FLOATFUNC(x) static inline int x(void *op1, void *op2, void *op3, \ - void *op4) { } + void *op4) { return 0; } #endif
FLOATFUNC(fadd); @@ -396,28 +396,28 @@ do_mathemu(struct pt_regs *regs)
case XCR: op0 = (void *)®s->ccr; - op1 = (void *)((insn >> 23) & 0x7); + op1 = (void *)(long)((insn >> 23) & 0x7); op2 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f); op3 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f); break;
case XCRL: op0 = (void *)®s->ccr; - op1 = (void *)((insn >> 23) & 0x7); - op2 = (void *)((insn >> 18) & 0x7); + op1 = (void *)(long)((insn >> 23) & 0x7); + op2 = (void *)(long)((insn >> 18) & 0x7); break;
case XCRB: - op0 = (void *)((insn >> 21) & 0x1f); + op0 = (void *)(long)((insn >> 21) & 0x1f); break;
case XCRI: - op0 = (void *)((insn >> 23) & 0x7); - op1 = (void *)((insn >> 12) & 0xf); + op0 = (void *)(long)((insn >> 23) & 0x7); + op1 = (void *)(long)((insn >> 12) & 0xf); break;
case XFLB: - op0 = (void *)((insn >> 17) & 0xff); + op0 = (void *)(long)((insn >> 17) & 0xff); op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f); break;
diff --git a/arch/powerpc/math-emu/math_efp.c b/arch/powerpc/math-emu/math_efp.c index 39b84e7452e1..47ecd5d66391 100644 --- a/arch/powerpc/math-emu/math_efp.c +++ b/arch/powerpc/math-emu/math_efp.c @@ -218,6 +218,7 @@ int do_spe_mathemu(struct pt_regs *regs) case AB: case XCR: FP_UNPACK_SP(SA, va.wp + 1); + fallthrough; case XB: FP_UNPACK_SP(SB, vb.wp + 1); break; @@ -226,8 +227,8 @@ int do_spe_mathemu(struct pt_regs *regs) break; }
- pr_debug("SA: %ld %08lx %ld (%ld)\n", SA_s, SA_f, SA_e, SA_c); - pr_debug("SB: %ld %08lx %ld (%ld)\n", SB_s, SB_f, SB_e, SB_c); + pr_debug("SA: %d %08x %d (%d)\n", SA_s, SA_f, SA_e, SA_c); + pr_debug("SB: %d %08x %d (%d)\n", SB_s, SB_f, SB_e, SB_c);
switch (func) { case EFSABS: @@ -278,7 +279,7 @@ int do_spe_mathemu(struct pt_regs *regs) } else { SB_e += (func == EFSCTSF ? 31 : 32); FP_TO_INT_ROUND_S(vc.wp[1], SB, 32, - (func == EFSCTSF)); + (func == EFSCTSF) ? 1 : 0); } goto update_regs;
@@ -287,7 +288,7 @@ int do_spe_mathemu(struct pt_regs *regs) FP_CLEAR_EXCEPTIONS; FP_UNPACK_DP(DB, vb.dp);
- pr_debug("DB: %ld %08lx %08lx %ld (%ld)\n", + pr_debug("DB: %d %08x %08x %d (%d)\n", DB_s, DB_f1, DB_f0, DB_e, DB_c);
FP_CONV(S, D, 1, 2, SR, DB); @@ -301,7 +302,7 @@ int do_spe_mathemu(struct pt_regs *regs) FP_SET_EXCEPTION(FP_EX_INVALID); } else { FP_TO_INT_ROUND_S(vc.wp[1], SB, 32, - ((func & 0x3) != 0)); + ((func & 0x3) != 0) ? 1 : 0); } goto update_regs;
@@ -312,7 +313,7 @@ int do_spe_mathemu(struct pt_regs *regs) FP_SET_EXCEPTION(FP_EX_INVALID); } else { FP_TO_INT_S(vc.wp[1], SB, 32, - ((func & 0x3) != 0)); + ((func & 0x3) != 0) ? 1 : 0); } goto update_regs;
@@ -322,7 +323,7 @@ int do_spe_mathemu(struct pt_regs *regs) break;
pack_s: - pr_debug("SR: %ld %08lx %ld (%ld)\n", SR_s, SR_f, SR_e, SR_c); + pr_debug("SR: %d %08x %d (%d)\n", SR_s, SR_f, SR_e, SR_c);
FP_PACK_SP(vc.wp + 1, SR); goto update_regs; @@ -346,6 +347,7 @@ int do_spe_mathemu(struct pt_regs *regs) case AB: case XCR: FP_UNPACK_DP(DA, va.dp); + fallthrough; case XB: FP_UNPACK_DP(DB, vb.dp); break; @@ -354,9 +356,9 @@ int do_spe_mathemu(struct pt_regs *regs) break; }
- pr_debug("DA: %ld %08lx %08lx %ld (%ld)\n", + pr_debug("DA: %d %08x %08x %d (%d)\n", DA_s, DA_f1, DA_f0, DA_e, DA_c); - pr_debug("DB: %ld %08lx %08lx %ld (%ld)\n", + pr_debug("DB: %d %08x %08x %d (%d)\n", DB_s, DB_f1, DB_f0, DB_e, DB_c);
switch (func) { @@ -408,7 +410,7 @@ int do_spe_mathemu(struct pt_regs *regs) } else { DB_e += (func == EFDCTSF ? 31 : 32); FP_TO_INT_ROUND_D(vc.wp[1], DB, 32, - (func == EFDCTSF)); + (func == EFDCTSF) ? 1 : 0); } goto update_regs;
@@ -417,7 +419,7 @@ int do_spe_mathemu(struct pt_regs *regs) FP_CLEAR_EXCEPTIONS; FP_UNPACK_SP(SB, vb.wp + 1);
- pr_debug("SB: %ld %08lx %ld (%ld)\n", + pr_debug("SB: %d %08x %d (%d)\n", SB_s, SB_f, SB_e, SB_c);
FP_CONV(D, S, 2, 1, DR, SB); @@ -431,7 +433,7 @@ int do_spe_mathemu(struct pt_regs *regs) FP_SET_EXCEPTION(FP_EX_INVALID); } else { FP_TO_INT_D(vc.dp[0], DB, 64, - ((func & 0x1) == 0)); + ((func & 0x1) == 0) ? 1 : 0); } goto update_regs;
@@ -442,7 +444,7 @@ int do_spe_mathemu(struct pt_regs *regs) FP_SET_EXCEPTION(FP_EX_INVALID); } else { FP_TO_INT_ROUND_D(vc.wp[1], DB, 32, - ((func & 0x3) != 0)); + ((func & 0x3) != 0) ? 1 : 0); } goto update_regs;
@@ -453,7 +455,7 @@ int do_spe_mathemu(struct pt_regs *regs) FP_SET_EXCEPTION(FP_EX_INVALID); } else { FP_TO_INT_D(vc.wp[1], DB, 32, - ((func & 0x3) != 0)); + ((func & 0x3) != 0) ? 1 : 0); } goto update_regs;
@@ -463,7 +465,7 @@ int do_spe_mathemu(struct pt_regs *regs) break;
pack_d: - pr_debug("DR: %ld %08lx %08lx %ld (%ld)\n", + pr_debug("DR: %d %08x %08x %d (%d)\n", DR_s, DR_f1, DR_f0, DR_e, DR_c);
FP_PACK_DP(vc.dp, DR); @@ -492,6 +494,7 @@ int do_spe_mathemu(struct pt_regs *regs) case XCR: FP_UNPACK_SP(SA0, va.wp); FP_UNPACK_SP(SA1, va.wp + 1); + fallthrough; case XB: FP_UNPACK_SP(SB0, vb.wp); FP_UNPACK_SP(SB1, vb.wp + 1); @@ -502,13 +505,13 @@ int do_spe_mathemu(struct pt_regs *regs) break; }
- pr_debug("SA0: %ld %08lx %ld (%ld)\n", + pr_debug("SA0: %d %08x %d (%d)\n", SA0_s, SA0_f, SA0_e, SA0_c); - pr_debug("SA1: %ld %08lx %ld (%ld)\n", + pr_debug("SA1: %d %08x %d (%d)\n", SA1_s, SA1_f, SA1_e, SA1_c); - pr_debug("SB0: %ld %08lx %ld (%ld)\n", + pr_debug("SB0: %d %08x %d (%d)\n", SB0_s, SB0_f, SB0_e, SB0_c); - pr_debug("SB1: %ld %08lx %ld (%ld)\n", + pr_debug("SB1: %d %08x %d (%d)\n", SB1_s, SB1_f, SB1_e, SB1_c);
switch (func) { @@ -567,7 +570,7 @@ int do_spe_mathemu(struct pt_regs *regs) } else { SB0_e += (func == EVFSCTSF ? 31 : 32); FP_TO_INT_ROUND_S(vc.wp[0], SB0, 32, - (func == EVFSCTSF)); + (func == EVFSCTSF) ? 1 : 0); } if (SB1_c == FP_CLS_NAN) { vc.wp[1] = 0; @@ -575,7 +578,7 @@ int do_spe_mathemu(struct pt_regs *regs) } else { SB1_e += (func == EVFSCTSF ? 31 : 32); FP_TO_INT_ROUND_S(vc.wp[1], SB1, 32, - (func == EVFSCTSF)); + (func == EVFSCTSF) ? 1 : 0); } goto update_regs;
@@ -586,14 +589,14 @@ int do_spe_mathemu(struct pt_regs *regs) FP_SET_EXCEPTION(FP_EX_INVALID); } else { FP_TO_INT_ROUND_S(vc.wp[0], SB0, 32, - ((func & 0x3) != 0)); + ((func & 0x3) != 0) ? 1 : 0); } if (SB1_c == FP_CLS_NAN) { vc.wp[1] = 0; FP_SET_EXCEPTION(FP_EX_INVALID); } else { FP_TO_INT_ROUND_S(vc.wp[1], SB1, 32, - ((func & 0x3) != 0)); + ((func & 0x3) != 0) ? 1 : 0); } goto update_regs;
@@ -604,14 +607,14 @@ int do_spe_mathemu(struct pt_regs *regs) FP_SET_EXCEPTION(FP_EX_INVALID); } else { FP_TO_INT_S(vc.wp[0], SB0, 32, - ((func & 0x3) != 0)); + ((func & 0x3) != 0) ? 1 : 0); } if (SB1_c == FP_CLS_NAN) { vc.wp[1] = 0; FP_SET_EXCEPTION(FP_EX_INVALID); } else { FP_TO_INT_S(vc.wp[1], SB1, 32, - ((func & 0x3) != 0)); + ((func & 0x3) != 0) ? 1 : 0); } goto update_regs;
@@ -621,9 +624,9 @@ int do_spe_mathemu(struct pt_regs *regs) break;
pack_vs: - pr_debug("SR0: %ld %08lx %ld (%ld)\n", + pr_debug("SR0: %d %08x %d (%d)\n", SR0_s, SR0_f, SR0_e, SR0_c); - pr_debug("SR1: %ld %08lx %ld (%ld)\n", + pr_debug("SR1: %d %08x %d (%d)\n", SR1_s, SR1_f, SR1_e, SR1_c);
FP_PACK_SP(vc.wp, SR0); diff --git a/include/math-emu/op-common.h b/include/math-emu/op-common.h index 4b57bbba588a..8ce066c035cf 100644 --- a/include/math-emu/op-common.h +++ b/include/math-emu/op-common.h @@ -662,12 +662,14 @@ do { \ if (X##_e < 0) \ { \ FP_SET_EXCEPTION(FP_EX_INEXACT); \ + fallthrough; \ case FP_CLS_ZERO: \ r = 0; \ } \ else if (X##_e >= rsize - (rsigned > 0 || X##_s) \ || (!rsigned && X##_s)) \ { /* overflow */ \ + fallthrough; \ case FP_CLS_NAN: \ case FP_CLS_INF: \ if (rsigned == 2) \ @@ -767,6 +769,7 @@ do { \ if (X##_e >= rsize - (rsigned > 0 || X##_s) \ || (!rsigned && X##_s)) \ { /* overflow */ \ + fallthrough; \ case FP_CLS_NAN: \ case FP_CLS_INF: \ if (!rsigned) \
From: "Gustavo A. R. Silva" gustavoars@kernel.org
[ Upstream commit d4d944ff68cb1f896d3f3b1af0bc656949dc626a ]
Fix the following fallthrough warning:
arch/powerpc/platforms/85xx/mpc85xx_cds.c:161:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
Reported-by: kernel test robot lkp@intel.com Signed-off-by: Gustavo A. R. Silva gustavoars@kernel.org Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://github.com/KSPP/linux/issues/198 Link: https://lore.kernel.org/lkml/202209061224.KxORRGVg-lkp@intel.com/ Link: https://lore.kernel.org/r/Yxe8XTY5C9qJLd0Z@work Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/platforms/85xx/mpc85xx_cds.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/platforms/85xx/mpc85xx_cds.c b/arch/powerpc/platforms/85xx/mpc85xx_cds.c index 48f3acfece0b..0b8f2101c5fb 100644 --- a/arch/powerpc/platforms/85xx/mpc85xx_cds.c +++ b/arch/powerpc/platforms/85xx/mpc85xx_cds.c @@ -159,6 +159,7 @@ static void __init mpc85xx_cds_pci_irq_fixup(struct pci_dev *dev) else dev->irq = 10; pci_write_config_byte(dev, PCI_INTERRUPT_LINE, dev->irq); + break; default: break; }
From: Sergey Matyukevich sergey.matyukevich@syntacore.com
[ Upstream commit 096b52fd2bb4996fd68d22b3b7ad21a1296db9d3 ]
Call perf_sample_event_took() to report time spent in overflow interrupts. Perf core uses these measurements to throttle perf events properly.
Signed-off-by: Sergey Matyukevich sergey.matyukevich@syntacore.com Reviewed-by: Atish Patra atishp@rivosinc.com Link: https://lore.kernel.org/r/20220830155306.301714-4-geomatsi@gmail.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/perf/riscv_pmu_sbi.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c index 8de4ca2fef21..88e326c5f63b 100644 --- a/drivers/perf/riscv_pmu_sbi.c +++ b/drivers/perf/riscv_pmu_sbi.c @@ -18,6 +18,7 @@ #include <linux/of_irq.h> #include <linux/of.h> #include <linux/cpu_pm.h> +#include <linux/sched/clock.h>
#include <asm/sbi.h> #include <asm/hwcap.h> @@ -567,6 +568,7 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev) unsigned long overflow; unsigned long overflowed_ctrs = 0; struct cpu_hw_events *cpu_hw_evt = dev; + u64 start_clock = sched_clock();
if (WARN_ON_ONCE(!cpu_hw_evt)) return IRQ_NONE; @@ -635,7 +637,9 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev) perf_event_overflow(event, &data, regs); } } + pmu_sbi_start_overflow_mask(pmu, overflowed_ctrs); + perf_sample_event_took(sched_clock() - start_clock);
return IRQ_HANDLED; }
From: Rohan McLure rmclure@linux.ibm.com
[ Upstream commit 4df0221f9ded8c39aecfb1a80cef346026671cb7 ]
Syscall handlers should not be invoked internally by their symbol names, as these symbols defined by the architecture-defined SYSCALL_DEFINE macro. Fortunately, in the case of ppc64_personality, its call to sys_personality can be replaced with an invocation to the equivalent ksys_personality inline helper in <linux/syscalls.h>.
Signed-off-by: Rohan McLure rmclure@linux.ibm.com Reviewed-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220921065605.1051927-13-rmclure@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/syscalls.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c index fc999140bc27..690afd77e7fe 100644 --- a/arch/powerpc/kernel/syscalls.c +++ b/arch/powerpc/kernel/syscalls.c @@ -88,7 +88,7 @@ long ppc64_personality(unsigned long personality) if (personality(current->personality) == PER_LINUX32 && personality(personality) == PER_LINUX) personality = (personality & ~PER_MASK) | PER_LINUX32; - ret = sys_personality(personality); + ret = ksys_personality(personality); if (personality(ret) == PER_LINUX32) ret = (ret & ~PER_MASK) | PER_LINUX; return ret;
From: Athira Rajeev atrajeev@linux.vnet.ibm.com
[ Upstream commit b9c001276d4a756f98cc7dc4672eff5343949203 ]
For PERF_SAMPLE_BRANCH_STACK sample type, different branch_sample_type ie branch filters are supported. The branch filters are requested via event attribute "branch_sample_type". Multiple branch filters can be passed in event attribute. eg:
$ perf record -b -o- -B --branch-filter any,ind_call true
None of the Power PMUs support having multiple branch filters at the same time. Branch filters for branch stack sampling is set via MMCRA IFM bits [32:33]. But currently when requesting for multiple filter types, the "perf record" command does not report any error.
eg: $ perf record -b -o- -B --branch-filter any,save_type true $ perf record -b -o- -B --branch-filter any,ind_call true
The "bhrb_filter_map" function in PMU driver code does the validity check for supported branch filters. But this check is done for single filter. Hence "perf record" will proceed here without reporting any error.
Fix power_pmu_event_init() to return EOPNOTSUPP when multiple branch filters are requested in the event attr.
After the fix: $ perf record --branch-filter any,ind_call -- ls Error: cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
Reported-by: Disha Goel disgoel@linux.vnet.ibm.com Signed-off-by: Athira Rajeev atrajeev@linux.vnet.ibm.com Tested-by: Disha Goeldisgoel@linux.vnet.ibm.com Reviewed-by: Madhavan Srinivasan maddy@linux.ibm.com Reviewed-by: Kajol Jain kjain@linux.ibm.com [mpe: Tweak comment and change log wording] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220921145255.20972-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/perf/core-book3s.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 13919eb96931..03e31ae97741 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -2131,6 +2131,23 @@ static int power_pmu_event_init(struct perf_event *event) if (has_branch_stack(event)) { u64 bhrb_filter = -1;
+ /* + * Currently no PMU supports having multiple branch filters + * at the same time. Branch filters are set via MMCRA IFM[32:33] + * bits for Power8 and above. Return EOPNOTSUPP when multiple + * branch filters are requested in the event attr. + * + * When opening event via perf_event_open(), branch_sample_type + * gets adjusted in perf_copy_attr(). Kernel will automatically + * adjust the branch_sample_type based on the event modifier + * settings to include PERF_SAMPLE_BRANCH_PLM_ALL. Hence drop + * the check for PERF_SAMPLE_BRANCH_PLM_ALL. + */ + if (hweight64(event->attr.branch_sample_type & ~PERF_SAMPLE_BRANCH_PLM_ALL) > 1) { + local_irq_restore(irq_flags); + return -EOPNOTSUPP; + } + if (ppmu->bhrb_filter_map) bhrb_filter = ppmu->bhrb_filter_map( event->attr.branch_sample_type);
From: Nathan Lynch nathanl@linux.ibm.com
[ Upstream commit b8f3e48834fe8c86b4f21739c6effd160e2c2c19 ]
The error injection facility on pseries VMs allows corruption of arbitrary guest memory, potentially enabling a sufficiently privileged user to disable lockdown or perform other modifications of the running kernel via the rtas syscall.
Block the PAPR error injection facility from being opened or called when locked down.
Signed-off-by: Nathan Lynch nathanl@linux.ibm.com Acked-by: Paul Moore paul@paul-moore.com (LSM) Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220926131643.146502-3-nathanl@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/rtas.c | 25 ++++++++++++++++++++++++- include/linux/security.h | 1 + security/security.c | 1 + 3 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 693133972294..c2540d393f1c 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -23,6 +23,7 @@ #include <linux/memblock.h> #include <linux/slab.h> #include <linux/reboot.h> +#include <linux/security.h> #include <linux/syscalls.h> #include <linux/of.h> #include <linux/of_fdt.h> @@ -464,6 +465,9 @@ void rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret, va_end(list); }
+static int ibm_open_errinjct_token; +static int ibm_errinjct_token; + int rtas_call(int token, int nargs, int nret, int *outputs, ...) { va_list list; @@ -476,6 +480,16 @@ int rtas_call(int token, int nargs, int nret, int *outputs, ...) if (!rtas.entry || token == RTAS_UNKNOWN_SERVICE) return -1;
+ if (token == ibm_open_errinjct_token || token == ibm_errinjct_token) { + /* + * It would be nicer to not discard the error value + * from security_locked_down(), but callers expect an + * RTAS status, not an errno. + */ + if (security_locked_down(LOCKDOWN_RTAS_ERROR_INJECTION)) + return -1; + } + if ((mfmsr() & (MSR_IR|MSR_DR)) != (MSR_IR|MSR_DR)) { WARN_ON_ONCE(1); return -1; @@ -1227,6 +1241,14 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) if (block_rtas_call(token, nargs, &args)) return -EINVAL;
+ if (token == ibm_open_errinjct_token || token == ibm_errinjct_token) { + int err; + + err = security_locked_down(LOCKDOWN_RTAS_ERROR_INJECTION); + if (err) + return err; + } + /* Need to handle ibm,suspend_me call specially */ if (token == rtas_token("ibm,suspend-me")) {
@@ -1325,7 +1347,8 @@ void __init rtas_initialize(void) #ifdef CONFIG_RTAS_ERROR_LOGGING rtas_last_error_token = rtas_token("rtas-last-error"); #endif - + ibm_open_errinjct_token = rtas_token("ibm,open-errinjct"); + ibm_errinjct_token = rtas_token("ibm,errinjct"); rtas_syscall_filter_init(); }
diff --git a/include/linux/security.h b/include/linux/security.h index 7bd0c490703d..0ca55306f1eb 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -122,6 +122,7 @@ enum lockdown_reason { LOCKDOWN_XMON_WR, LOCKDOWN_BPF_WRITE_USER, LOCKDOWN_DBG_WRITE_KERNEL, + LOCKDOWN_RTAS_ERROR_INJECTION, LOCKDOWN_INTEGRITY_MAX, LOCKDOWN_KCORE, LOCKDOWN_KPROBES, diff --git a/security/security.c b/security/security.c index 4b95de24bc8d..11e2c8757275 100644 --- a/security/security.c +++ b/security/security.c @@ -60,6 +60,7 @@ const char *const lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = { [LOCKDOWN_XMON_WR] = "xmon write access", [LOCKDOWN_BPF_WRITE_USER] = "use of bpf to write user RAM", [LOCKDOWN_DBG_WRITE_KERNEL] = "use of kgdb/kdb to write kernel RAM", + [LOCKDOWN_RTAS_ERROR_INJECTION] = "RTAS error injection", [LOCKDOWN_INTEGRITY_MAX] = "integrity", [LOCKDOWN_KCORE] = "/proc/kcore access", [LOCKDOWN_KPROBES] = "use of kprobes",
From: "Aneesh Kumar K.V" aneesh.kumar@linux.ibm.com
[ Upstream commit 7dd3a7b90bca2c12e2146a47d63cf69a2f5d7e89 ]
Powerpc architecture supports 16GB hugetlb pages with hash translation. For 4K page size, this is implemented as a hugepage directory entry at PGD level and for 64K it is implemented as a huge page pte at PUD level
With 16GB hugetlb size, offset within a page is greater than 32 bits. Hence switch to use unsigned long type when using hugepd_shift.
In order to keep things simpler, we make sure we always use unsigned long type when using hugepd_shift() even though all the hugetlb page size won't require that.
The hugetlb_free_p*d_range changes are all related to nohash usage where we can have multiple pgd entries pointing to the same hugepd entries. Hence on book3s64 where we can have > 4GB hugetlb page size we will always find more < next even if we compute the value of more correctly.
Hence there is no functional change in this patch except that it fixes the below warning.
UBSAN: shift-out-of-bounds in arch/powerpc/mm/hugetlbpage.c:499:21 shift exponent 34 is too large for 32-bit type 'int' CPU: 39 PID: 1673 Comm: a.out Not tainted 6.0.0-rc2-00327-gee88a56e8517-dirty #1 Call Trace: dump_stack_lvl+0x98/0xe0 (unreliable) ubsan_epilogue+0x18/0x70 __ubsan_handle_shift_out_of_bounds+0x1bc/0x390 hugetlb_free_pgd_range+0x5d8/0x600 free_pgtables+0x114/0x290 exit_mmap+0x150/0x550 mmput+0xcc/0x210 do_exit+0x420/0xdd0 do_group_exit+0x4c/0xd0 sys_exit_group+0x24/0x30 system_call_exception+0x250/0x600 system_call_common+0xec/0x250
Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com [mpe: Drop generic change to be sent separately, change 1ULL to 1UL] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220908072440.258301-1-aneesh.kumar@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/mm/hugetlbpage.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index bc84a594ca62..7db918eb5b4b 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -392,7 +392,7 @@ static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud, * single hugepage, but all of them point to * the same kmem cache that holds the hugepte. */ - more = addr + (1 << hugepd_shift(*(hugepd_t *)pmd)); + more = addr + (1UL << hugepd_shift(*(hugepd_t *)pmd)); if (more > next) next = more;
@@ -434,7 +434,7 @@ static void hugetlb_free_pud_range(struct mmu_gather *tlb, p4d_t *p4d, * single hugepage, but all of them point to * the same kmem cache that holds the hugepte. */ - more = addr + (1 << hugepd_shift(*(hugepd_t *)pud)); + more = addr + (1UL << hugepd_shift(*(hugepd_t *)pud)); if (more > next) next = more;
@@ -496,7 +496,7 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb, * for a single hugepage, but all of them point to the * same kmem cache that holds the hugepte. */ - more = addr + (1 << hugepd_shift(*(hugepd_t *)pgd)); + more = addr + (1UL << hugepd_shift(*(hugepd_t *)pgd)); if (more > next) next = more;
From: Nicholas Piggin npiggin@gmail.com
[ Upstream commit 0fa6831811f62cfc10415d731bcf9fde2647ad81 ]
irq soft-masking means that when Linux irqs are disabled, the MSR[EE] value can change from 1 to 0 asynchronously: if a masked interrupt of the PACA_IRQ_MUST_HARD_MASK variety fires while irqs are disabled, the masked handler will return with MSR[EE]=0.
This means a sequence like mtmsr(mfmsr() | MSR_FP) is racy if it can be called with local irqs disabled, unless a hard_irq_disable has been done.
Reported-by: Sachin Sant sachinp@linux.ibm.com Signed-off-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20221004051157.308999-2-npiggin@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/hw_irq.h | 24 ++++++++++++++++++++++++ arch/powerpc/kernel/process.c | 4 ++-- 2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h index 983551859891..9cb1daa1be4f 100644 --- a/arch/powerpc/include/asm/hw_irq.h +++ b/arch/powerpc/include/asm/hw_irq.h @@ -489,6 +489,30 @@ static inline void irq_soft_mask_regs_set_state(struct pt_regs *regs, unsigned l } #endif /* CONFIG_PPC64 */
+static inline unsigned long mtmsr_isync_irqsafe(unsigned long msr) +{ +#ifdef CONFIG_PPC64 + if (arch_irqs_disabled()) { + /* + * With soft-masking, MSR[EE] can change from 1 to 0 + * asynchronously when irqs are disabled, and we don't want to + * set MSR[EE] back to 1 here if that has happened. A race-free + * way to do this is ensure EE is already 0. Another way it + * could be done is with a RESTART_TABLE handler, but that's + * probably overkill here. + */ + msr &= ~MSR_EE; + mtmsr_isync(msr); + irq_soft_mask_set(IRQS_ALL_DISABLED); + local_paca->irq_happened |= PACA_IRQ_HARD_DIS; + } else +#endif + mtmsr_isync(msr); + + return msr; +} + + #define ARCH_IRQ_INIT_FLAGS IRQ_NOREQUEST
#endif /* __ASSEMBLY__ */ diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 0fbda89cd1bb..37df0428e4fb 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -127,7 +127,7 @@ unsigned long notrace msr_check_and_set(unsigned long bits) newmsr |= MSR_VSX;
if (oldmsr != newmsr) - mtmsr_isync(newmsr); + newmsr = mtmsr_isync_irqsafe(newmsr);
return newmsr; } @@ -145,7 +145,7 @@ void notrace __msr_check_and_clear(unsigned long bits) newmsr &= ~MSR_VSX;
if (oldmsr != newmsr) - mtmsr_isync(newmsr); + mtmsr_isync_irqsafe(newmsr); } EXPORT_SYMBOL(__msr_check_and_clear);
linux-stable-mirror@lists.linaro.org