This is a note to let you know that I've just added the patch titled
nvmem: core: fix error handling while validating keepout regions
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From de0534df93474f268486c486ea7e01b44a478026 Mon Sep 17 00:00:00 2001
From: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Date: Fri, 6 Aug 2021 09:59:47 +0100
Subject: nvmem: core: fix error handling while validating keepout regions
Current error path on failure of validating keepout regions is calling
put_device, eventhough the device is not even registered at that point.
Fix this by adding proper error handling of freeing ida and nvmem.
Fixes: fd3bb8f54a88 ("nvmem: core: Add support for keepout regions")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Link: https://lore.kernel.org/r/20210806085947.22682-5-srinivas.kandagatla@linaro…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/nvmem/core.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c
index b3bc30a04ed7..3d87fadaa160 100644
--- a/drivers/nvmem/core.c
+++ b/drivers/nvmem/core.c
@@ -824,8 +824,11 @@ struct nvmem_device *nvmem_register(const struct nvmem_config *config)
if (nvmem->nkeepout) {
rval = nvmem_validate_keepouts(nvmem);
- if (rval)
- goto err_put_device;
+ if (rval) {
+ ida_free(&nvmem_ida, nvmem->id);
+ kfree(nvmem);
+ return ERR_PTR(rval);
+ }
}
dev_dbg(&nvmem->dev, "Registering nvmem device %s\n", config->name);
--
2.32.0
When switching to an 'mm_struct' for the first time following an ASID
rollover, a new ASID may be allocated and assigned to 'mm->context.id'.
This reassignment can happen concurrently with other operations on the
mm, such as unmapping pages and subsequently issuing TLB invalidation.
Consequently, we need to ensure that (a) accesses to 'mm->context.id'
are atomic and (b) all page-table updates made prior to a TLBI using the
old ASID are guaranteed to be visible to CPUs running with the new ASID.
This was found by inspection after reviewing the VMID changes from
Shameer but it looks like a real (yet hard to hit) bug.
Cc: <stable(a)vger.kernel.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Marc Zyngier <maz(a)kernel.org>
Cc: Jade Alglave <jade.alglave(a)arm.com>
Cc: Shameer Kolothum <shameerali.kolothum.thodi(a)huawei.com>
Signed-off-by: Will Deacon <will(a)kernel.org>
---
arch/arm64/include/asm/mmu.h | 29 +++++++++++++++++++++++++----
arch/arm64/include/asm/tlbflush.h | 11 ++++++-----
2 files changed, 31 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 75beffe2ee8a..e9c30859f80c 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -27,11 +27,32 @@ typedef struct {
} mm_context_t;
/*
- * This macro is only used by the TLBI and low-level switch_mm() code,
- * neither of which can race with an ASID change. We therefore don't
- * need to reload the counter using atomic64_read().
+ * We use atomic64_read() here because the ASID for an 'mm_struct' can
+ * be reallocated when scheduling one of its threads following a
+ * rollover event (see new_context() and flush_context()). In this case,
+ * a concurrent TLBI (e.g. via try_to_unmap_one() and ptep_clear_flush())
+ * may use a stale ASID. This is fine in principle as the new ASID is
+ * guaranteed to be clean in the TLB, but the TLBI routines have to take
+ * care to handle the following race:
+ *
+ * CPU 0 CPU 1 CPU 2
+ *
+ * // ptep_clear_flush(mm)
+ * xchg_relaxed(pte, 0)
+ * DSB ISHST
+ * old = ASID(mm)
+ * | <rollover>
+ * | new = new_context(mm)
+ * \-----------------> atomic_set(mm->context.id, new)
+ * cpu_switch_mm(mm)
+ * // Hardware walk of pte using new ASID
+ * TLBI(old)
+ *
+ * In this scenario, the barrier on CPU 0 and the dependency on CPU 1
+ * ensure that the page-table walker on CPU 1 *must* see the invalid PTE
+ * written by CPU 0.
*/
-#define ASID(mm) ((mm)->context.id.counter & 0xffff)
+#define ASID(mm) (atomic64_read(&(mm)->context.id) & 0xffff)
static inline bool arm64_kernel_unmapped_at_el0(void)
{
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index cc3f5a33ff9c..36f02892e1df 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -245,9 +245,10 @@ static inline void flush_tlb_all(void)
static inline void flush_tlb_mm(struct mm_struct *mm)
{
- unsigned long asid = __TLBI_VADDR(0, ASID(mm));
+ unsigned long asid;
dsb(ishst);
+ asid = __TLBI_VADDR(0, ASID(mm));
__tlbi(aside1is, asid);
__tlbi_user(aside1is, asid);
dsb(ish);
@@ -256,9 +257,10 @@ static inline void flush_tlb_mm(struct mm_struct *mm)
static inline void flush_tlb_page_nosync(struct vm_area_struct *vma,
unsigned long uaddr)
{
- unsigned long addr = __TLBI_VADDR(uaddr, ASID(vma->vm_mm));
+ unsigned long addr;
dsb(ishst);
+ addr = __TLBI_VADDR(uaddr, ASID(vma->vm_mm));
__tlbi(vale1is, addr);
__tlbi_user(vale1is, addr);
}
@@ -283,9 +285,7 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
{
int num = 0;
int scale = 0;
- unsigned long asid = ASID(vma->vm_mm);
- unsigned long addr;
- unsigned long pages;
+ unsigned long asid, addr, pages;
start = round_down(start, stride);
end = round_up(end, stride);
@@ -305,6 +305,7 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
}
dsb(ishst);
+ asid = ASID(vma->vm_mm);
/*
* When the CPU does not support TLB range operations, flush the TLB
--
2.32.0.605.g8dce9f2422-goog
On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at
runtime. Today, the IPI is not created for such nodes, and hot-plugged
CPUs use a bogus IPI, which leads to soft lockups.
We could create the node IPI on demand but it is a bit complex because
this code would be called under bringup_up() and some IRQ locking is
being done. The simplest solution is to create the IPIs for all nodes
at startup.
Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node")
Cc: stable(a)vger.kernel.org # v5.13
Reported-by: Geetika Moolchandani <Geetika.Moolchandani1(a)ibm.com>
Cc: Srikar Dronamraju <srikar(a)linux.vnet.ibm.com>
Signed-off-by: Cédric Le Goater <clg(a)kaod.org>
---
This patch breaks old versions of irqbalance (<= v1.4). Possible nodes
are collected from /sys/devices/system/node/ but CPU-less nodes are
not listed there. When interrupts are scanned, the link representing
the node structure is NULL and segfault occurs.
Version 1.7 seems immune.
---
arch/powerpc/sysdev/xive/common.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index f3b16ed48b05..5d2c58dba57e 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1143,10 +1143,6 @@ static int __init xive_request_ipi(void)
struct xive_ipi_desc *xid = &xive_ipis[node];
struct xive_ipi_alloc_info info = { node };
- /* Skip nodes without CPUs */
- if (cpumask_empty(cpumask_of_node(node)))
- continue;
-
/*
* Map one IPI interrupt per node for all cpus of that node.
* Since the HW interrupt number doesn't have any meaning,
--
2.31.1