SysRq-L and RCU stall detector call arch_trigger_cpumask_backtrace() to
trigger other CPU's backtrace, but its behavior is totally broken. The
root cause is arch_trigger_cpumask_backtrace() use call-function IPI in
irq context, which trigger deadlocks in smp_call_function_single() and
smp_call_function_many().
This patch fix arch_trigger_cpumask_backtrace() by:
1, Use a dedecated IPI (SMP_CPU_BACKTRACE) to trigger backtraces;
2, If myself is in target cpumask, do backtrace and clear myself;
3, Use a spinlock to avoid parallel backtrace output;
4, Handle SMP_CPU_BACKTRACE IPI for Loongson-3.
I have attempted to implement SMP_CPU_BACKTRACE for all MIPS CPUs, but I
failed because some of their IPIs are not extensible. :(
Cc: stable(a)vger.kernel.org
Signed-off-by: Huacai Chen <chenhc(a)lemote.com>
---
arch/mips/include/asm/smp.h | 3 +++
arch/mips/kernel/process.c | 23 ++++++++++++++++++-----
arch/mips/loongson64/loongson-3/smp.c | 6 ++++++
3 files changed, 27 insertions(+), 5 deletions(-)
diff --git a/arch/mips/include/asm/smp.h b/arch/mips/include/asm/smp.h
index 88ebd83..b0521f4 100644
--- a/arch/mips/include/asm/smp.h
+++ b/arch/mips/include/asm/smp.h
@@ -43,6 +43,7 @@ extern int __cpu_logical_map[NR_CPUS];
/* Octeon - Tell another core to flush its icache */
#define SMP_ICACHE_FLUSH 0x4
#define SMP_ASK_C0COUNT 0x8
+#define SMP_CPU_BACKTRACE 0x10
/* Mask of CPUs which are currently definitely operating coherently */
extern cpumask_t cpu_coherent_mask;
@@ -81,6 +82,8 @@ static inline void __cpu_die(unsigned int cpu)
extern void play_dead(void);
#endif
+void arch_dump_stack(void);
+
/*
* This function will set up the necessary IPIs for Linux to communicate
* with the CPUs in mask.
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index 57028d4..647e15d 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -655,26 +655,39 @@ unsigned long arch_align_stack(unsigned long sp)
return sp & ALMASK;
}
-static void arch_dump_stack(void *info)
+void arch_dump_stack(void)
{
struct pt_regs *regs;
+ static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED;
+ arch_spin_lock(&lock);
regs = get_irq_regs();
if (regs)
show_regs(regs);
dump_stack();
+ arch_spin_unlock(&lock);
}
void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
{
long this_cpu = get_cpu();
+ struct cpumask backtrace_mask;
+ extern const struct plat_smp_ops *mp_ops;
+
+ cpumask_copy(&backtrace_mask, mask);
+ if (cpumask_test_cpu(this_cpu, mask)) {
+ if (!exclude_self) {
+ struct pt_regs *regs = get_irq_regs();
+ if (regs)
+ show_regs(regs);
+ dump_stack();
+ }
+ cpumask_clear_cpu(this_cpu, &backtrace_mask);
+ }
- if (cpumask_test_cpu(this_cpu, mask) && !exclude_self)
- dump_stack();
-
- smp_call_function_many(mask, arch_dump_stack, NULL, 1);
+ mp_ops->send_ipi_mask(&backtrace_mask, SMP_CPU_BACKTRACE);
put_cpu();
}
diff --git a/arch/mips/loongson64/loongson-3/smp.c b/arch/mips/loongson64/loongson-3/smp.c
index 8501109..0655114 100644
--- a/arch/mips/loongson64/loongson-3/smp.c
+++ b/arch/mips/loongson64/loongson-3/smp.c
@@ -291,6 +291,12 @@ void loongson3_ipi_interrupt(struct pt_regs *regs)
__wbflush(); /* Let others see the result ASAP */
}
+ if (action & SMP_CPU_BACKTRACE) {
+ irq_enter();
+ arch_dump_stack();
+ irq_exit();
+ }
+
if (irqs) {
int irq;
while ((irq = ffs(irqs))) {
--
2.7.0
Changes since v9 [1] and v10 [2]
* Resend the full series with the reworked "mm: introduce
MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS" (Christoph)
* Move generic_dax_pagefree() into the pmem driver (Christoph)
* Cleanup __bdev_dax_supported() (Christoph)
* Cleanup some stale SRCU bits leftover from other iterations (Jan)
* Cleanup xfs_break_layouts() (Jan)
[1]: https://lists.01.org/pipermail/linux-nvdimm/2018-April/015457.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2018-May/015885.html
---
Background:
get_user_pages() in the filesystem pins file backed memory pages for
access by devices performing dma. However, it only pins the memory pages
not the page-to-file offset association. If a file is truncated the
pages are mapped out of the file and dma may continue indefinitely into
a page that is owned by a device driver. This breaks coherency of the
file vs dma, but the assumption is that if userspace wants the
file-space truncated it does not matter what data is inbound from the
device, it is not relevant anymore. The only expectation is that dma can
safely continue while the filesystem reallocates the block(s).
Problem:
This expectation that dma can safely continue while the filesystem
changes the block map is broken by dax. With dax the target dma page
*is* the filesystem block. The model of leaving the page pinned for dma,
but truncating the file block out of the file, means that the filesytem
is free to reallocate a block under active dma to another file and now
the expected data-incoherency situation has turned into active
data-corruption.
Solution:
Defer all filesystem operations (fallocate(), truncate()) on a dax mode
file while any page/block in the file is under active dma. This solution
assumes that dma is transient. Cases where dma operations are known to
not be transient, like RDMA, have been explicitly disabled via
commits like 5f1d43de5416 "IB/core: disable memory registration of
filesystem-dax vmas".
The dax_layout_busy_page() routine is called by filesystems with a lock
held against mm faults (i_mmap_lock) to find pinned / busy dax pages.
The process of looking up a busy page invalidates all mappings
to trigger any subsequent get_user_pages() to block on i_mmap_lock.
The filesystem continues to call dax_layout_busy_page() until it finally
returns no more active pages. This approach assumes that the page
pinning is transient, if that assumption is violated the system would
have likely hung from the uncompleted I/O.
---
Dan Williams (7):
memremap: split devm_memremap_pages() and memremap() infrastructure
mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS
mm: fix __gup_device_huge vs unmap
mm, fs, dax: handle layout changes to pinned dax mappings
xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL
xfs: prepare xfs_break_layouts() for another layout type
xfs, dax: introduce xfs_break_dax_layouts()
drivers/dax/super.c | 14 ++-
drivers/nvdimm/pfn_devs.c | 2
drivers/nvdimm/pmem.c | 25 +++++
fs/Kconfig | 1
fs/dax.c | 97 +++++++++++++++++++++
fs/xfs/xfs_file.c | 72 ++++++++++++++--
fs/xfs/xfs_inode.h | 16 +++
fs/xfs/xfs_ioctl.c | 8 --
fs/xfs/xfs_iops.c | 16 ++-
fs/xfs/xfs_pnfs.c | 15 ++-
fs/xfs/xfs_pnfs.h | 5 +
include/linux/dax.h | 7 ++
include/linux/memremap.h | 36 ++------
include/linux/mm.h | 71 +++++++++++----
kernel/Makefile | 3 -
kernel/iomem.c | 167 ++++++++++++++++++++++++++++++++++++
kernel/memremap.c | 209 ++++++---------------------------------------
mm/Kconfig | 5 +
mm/gup.c | 36 ++++++--
mm/hmm.c | 13 ---
mm/swap.c | 3 -
21 files changed, 542 insertions(+), 279 deletions(-)
create mode 100644 kernel/iomem.c
Mark notes that gcc optimization passes have the potential to elide
necessary invocations of this instruction sequence, so include an
optimization barrier.
> I think that either way, we have a potential problem if the compiler
> generates a branch dependent on the result of validate_index_nospec().
>
> In that case, we could end up with codegen approximating:
>
> bool safe = false;
>
> if (idx < bound) {
> idx = array_index_nospec(idx, bound);
> safe = true;
> }
>
> // this branch can be mispredicted
> if (safe) {
> foo = array[idx];
> }
>
> ... and thus we lose the nospec protection.
I see GCC do this at -O0, but so far I haven't tricked it into doing
this at -O1 or above.
Regardless, I worry this is fragile -- GCC *can* generate code as per
the above, even if it's unlikely to.
Cc: <stable(a)vger.kernel.org>
Fixes: babdde2698d4 ("x86: Implement array_index_mask_nospec")
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Ingo Molnar <mingo(a)kernel.org>
Reported-by: Mark Rutland <mark.rutland(a)arm.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
arch/x86/include/asm/barrier.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 042b5e892ed1..41f7435c84a7 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -38,10 +38,11 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
{
unsigned long mask;
- asm ("cmp %1,%2; sbb %0,%0;"
+ asm volatile ("cmp %1,%2; sbb %0,%0;"
:"=r" (mask)
:"g"(size),"r" (index)
:"cc");
+ barrier();
return mask;
}
The X86_FEATURE_SSBD is an synthetic CPU feature - that is
it bit location has no relevance to the real CPUID 0x7.EBX[31]
bit position. For that we need the new CPU feature name.
Fixes: 52817587e706 ("x86/cpufeatures: Disentangle SSBD enumeration")
CC: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: "Radim Krčmář" <rkrcmar(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
---
arch/x86/kvm/cpuid.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index ced851169730..598461e24be3 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -407,8 +407,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
- F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) | F(SSBD) |
- F(ARCH_CAPABILITIES);
+ F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
+ F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES);
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
--
2.13.4
On some devices the contents of the ctrl register get lost over a
suspend/resume and the PWM comes back up disabled after the resume.
This is seen on some Bay Trail devices with the PWM in ACPI enumerated
mode, so it shows up as a platform device instead of a PCI device.
If we still think it is enabled and then try to change the duty-cycle
after this, we end up with a "PWM_SW_UPDATE was not cleared" error and
the PWM is stuck in that state from then on.
This commit adds suspend and resume pm callbacks to the pwm-lpss-platform
code, which save/restore the ctrl register over a suspend/resume, fixing
this.
Note that:
1) There is no need to do this over a runtime suspend, since we
only runtime suspend when disabled and then we properly set the enable
bit and reprogram the timings when we re-enable the PWM.
2) This may be happening on more systems then we realize, but has been
covered up sofar by a bug in the acpi-lpss.c code which was save/restoring
the regular device registers instead of the lpss private registers due to
lpss_device_desc.prv_offset not being set. This is fixed by a later patch
in this series.
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
Changes in v2:
-Add Cc: stable(a)vger.kernel.org to make sure this goes into stable
together with "ACPI / LPSS: Add missing prv_offset setting for byt/cht
PWM devices" which depends on this
---
drivers/pwm/pwm-lpss-platform.c | 5 +++++
drivers/pwm/pwm-lpss.c | 30 ++++++++++++++++++++++++++++++
drivers/pwm/pwm-lpss.h | 2 ++
3 files changed, 37 insertions(+)
diff --git a/drivers/pwm/pwm-lpss-platform.c b/drivers/pwm/pwm-lpss-platform.c
index 5d6ed1507d29..5561b9e190f8 100644
--- a/drivers/pwm/pwm-lpss-platform.c
+++ b/drivers/pwm/pwm-lpss-platform.c
@@ -74,6 +74,10 @@ static int pwm_lpss_remove_platform(struct platform_device *pdev)
return pwm_lpss_remove(lpwm);
}
+static SIMPLE_DEV_PM_OPS(pwm_lpss_platform_pm_ops,
+ pwm_lpss_suspend,
+ pwm_lpss_resume);
+
static const struct acpi_device_id pwm_lpss_acpi_match[] = {
{ "80860F09", (unsigned long)&pwm_lpss_byt_info },
{ "80862288", (unsigned long)&pwm_lpss_bsw_info },
@@ -86,6 +90,7 @@ static struct platform_driver pwm_lpss_driver_platform = {
.driver = {
.name = "pwm-lpss",
.acpi_match_table = pwm_lpss_acpi_match,
+ .pm = &pwm_lpss_platform_pm_ops,
},
.probe = pwm_lpss_probe_platform,
.remove = pwm_lpss_remove_platform,
diff --git a/drivers/pwm/pwm-lpss.c b/drivers/pwm/pwm-lpss.c
index 8db0d40ccacd..4721a264bac2 100644
--- a/drivers/pwm/pwm-lpss.c
+++ b/drivers/pwm/pwm-lpss.c
@@ -32,10 +32,13 @@
/* Size of each PWM register space if multiple */
#define PWM_SIZE 0x400
+#define MAX_PWMS 4
+
struct pwm_lpss_chip {
struct pwm_chip chip;
void __iomem *regs;
const struct pwm_lpss_boardinfo *info;
+ u32 saved_ctrl[MAX_PWMS];
};
static inline struct pwm_lpss_chip *to_lpwm(struct pwm_chip *chip)
@@ -177,6 +180,9 @@ struct pwm_lpss_chip *pwm_lpss_probe(struct device *dev, struct resource *r,
unsigned long c;
int ret;
+ if (WARN_ON(info->npwm > MAX_PWMS))
+ return ERR_PTR(-ENODEV);
+
lpwm = devm_kzalloc(dev, sizeof(*lpwm), GFP_KERNEL);
if (!lpwm)
return ERR_PTR(-ENOMEM);
@@ -212,6 +218,30 @@ int pwm_lpss_remove(struct pwm_lpss_chip *lpwm)
}
EXPORT_SYMBOL_GPL(pwm_lpss_remove);
+int pwm_lpss_suspend(struct device *dev)
+{
+ struct pwm_lpss_chip *lpwm = dev_get_drvdata(dev);
+ int i;
+
+ for (i = 0; i < lpwm->info->npwm; i++)
+ lpwm->saved_ctrl[i] = readl(lpwm->regs + i * PWM_SIZE + PWM);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(pwm_lpss_suspend);
+
+int pwm_lpss_resume(struct device *dev)
+{
+ struct pwm_lpss_chip *lpwm = dev_get_drvdata(dev);
+ int i;
+
+ for (i = 0; i < lpwm->info->npwm; i++)
+ writel(lpwm->saved_ctrl[i], lpwm->regs + i * PWM_SIZE + PWM);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(pwm_lpss_resume);
+
MODULE_DESCRIPTION("PWM driver for Intel LPSS");
MODULE_AUTHOR("Mika Westerberg <mika.westerberg(a)linux.intel.com>");
MODULE_LICENSE("GPL v2");
diff --git a/drivers/pwm/pwm-lpss.h b/drivers/pwm/pwm-lpss.h
index 98306bb02cfe..7a4238ad1fcb 100644
--- a/drivers/pwm/pwm-lpss.h
+++ b/drivers/pwm/pwm-lpss.h
@@ -28,5 +28,7 @@ struct pwm_lpss_boardinfo {
struct pwm_lpss_chip *pwm_lpss_probe(struct device *dev, struct resource *r,
const struct pwm_lpss_boardinfo *info);
int pwm_lpss_remove(struct pwm_lpss_chip *lpwm);
+int pwm_lpss_suspend(struct device *dev);
+int pwm_lpss_resume(struct device *dev);
#endif /* __PWM_LPSS_H */
--
2.17.0
Decided to add Enric's commit because it is also a bug fix instead
of modifying Chris commit.
Chris Chiu (1):
tpm: self test failure should not cause suspend to fail
Enric Balletbo i Serra (1):
tpm: do not suspend/resume if power stays on
drivers/char/tpm/tpm-chip.c | 12 ++++++++++++
drivers/char/tpm/tpm-interface.c | 7 +++++++
drivers/char/tpm/tpm.h | 1 +
3 files changed, 20 insertions(+)
--
v2: moved the check from tpm_of.c to tpm-chip.c as in v4.4 chip is
unreachable otherwise. I did compilation test now with BuildRoot
for power arch.
2.17.0
Hi Doug and Jason,
We have two more late breaking fix up patches. The DMA_RTAIL fix is the more
serious of the two. I realize we are at the tail end of 4.17 so I would not be
against holding off till 4.18 for these, but if there is another rdma
pull request we may want to tack these on.
---
Kaike Wan (1):
IB/hfi1: Ensure VL index is within bounds
Mike Marciniszyn (1):
IB/hfi1: Fix user context tail allocation for DMA_RTAIL
drivers/infiniband/hw/hfi1/chip.c | 8 ++++----
drivers/infiniband/hw/hfi1/file_ops.c | 2 +-
drivers/infiniband/hw/hfi1/init.c | 9 ++++-----
drivers/infiniband/hw/hfi1/sdma.c | 12 +++---------
4 files changed, 12 insertions(+), 19 deletions(-)
--
-Denny