On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at runtime. Today, the IPI is not created for such nodes, and hot-plugged CPUs use a bogus IPI, which leads to soft lockups.
We could create the node IPI on demand but it is a bit complex because this code would be called under bringup_up() and some IRQ locking is being done. The simplest solution is to create the IPIs for all nodes at startup.
Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node") Cc: stable@vger.kernel.org # v5.13 Reported-by: Geetika Moolchandani Geetika.Moolchandani1@ibm.com Cc: Srikar Dronamraju srikar@linux.vnet.ibm.com Signed-off-by: Cédric Le Goater clg@kaod.org ---
This patch breaks old versions of irqbalance (<= v1.4). Possible nodes are collected from /sys/devices/system/node/ but CPU-less nodes are not listed there. When interrupts are scanned, the link representing the node structure is NULL and segfault occurs.
Version 1.7 seems immune.
--- arch/powerpc/sysdev/xive/common.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c index f3b16ed48b05..5d2c58dba57e 100644 --- a/arch/powerpc/sysdev/xive/common.c +++ b/arch/powerpc/sysdev/xive/common.c @@ -1143,10 +1143,6 @@ static int __init xive_request_ipi(void) struct xive_ipi_desc *xid = &xive_ipis[node]; struct xive_ipi_alloc_info info = { node };
- /* Skip nodes without CPUs */ - if (cpumask_empty(cpumask_of_node(node))) - continue; - /* * Map one IPI interrupt per node for all cpus of that node. * Since the HW interrupt number doesn't have any meaning,
* C?dric Le Goater clg@kaod.org [2021-06-29 15:15:42]:
On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at runtime. Today, the IPI is not created for such nodes, and hot-plugged CPUs use a bogus IPI, which leads to soft lockups.
We could create the node IPI on demand but it is a bit complex because this code would be called under bringup_up() and some IRQ locking is being done. The simplest solution is to create the IPIs for all nodes at startup.
Thanks for quickly coming up with the fix.
Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node") Cc: stable@vger.kernel.org # v5.13 Reported-by: Geetika Moolchandani Geetika.Moolchandani1@ibm.com Cc: Srikar Dronamraju srikar@linux.vnet.ibm.com Signed-off-by: Cédric Le Goater clg@kaod.org
Tested-by: Srikar Dronamraju srikar@linux.vnet.ibm.com
This patch breaks old versions of irqbalance (<= v1.4). Possible nodes are collected from /sys/devices/system/node/ but CPU-less nodes are not listed there. When interrupts are scanned, the link representing the node structure is NULL and segfault occurs.
Version 1.7 seems immune.
arch/powerpc/sysdev/xive/common.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c index f3b16ed48b05..5d2c58dba57e 100644 --- a/arch/powerpc/sysdev/xive/common.c +++ b/arch/powerpc/sysdev/xive/common.c @@ -1143,10 +1143,6 @@ static int __init xive_request_ipi(void) struct xive_ipi_desc *xid = &xive_ipis[node]; struct xive_ipi_alloc_info info = { node };
/* Skip nodes without CPUs */
if (cpumask_empty(cpumask_of_node(node)))
continue;
- /*
- Map one IPI interrupt per node for all cpus of that node.
- Since the HW interrupt number doesn't have any meaning,
-- 2.31.1
On 29/06/2021 15:15, Cédric Le Goater wrote:
On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at runtime. Today, the IPI is not created for such nodes, and hot-plugged CPUs use a bogus IPI, which leads to soft lockups.
We could create the node IPI on demand but it is a bit complex because this code would be called under bringup_up() and some IRQ locking is being done. The simplest solution is to create the IPIs for all nodes at startup.
Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node") Cc: stable@vger.kernel.org # v5.13 Reported-by: Geetika Moolchandani Geetika.Moolchandani1@ibm.com Cc: Srikar Dronamraju srikar@linux.vnet.ibm.com Signed-off-by: Cédric Le Goater clg@kaod.org
This patch breaks old versions of irqbalance (<= v1.4). Possible nodes are collected from /sys/devices/system/node/ but CPU-less nodes are not listed there. When interrupts are scanned, the link representing the node structure is NULL and segfault occurs.
Version 1.7 seems immune.
arch/powerpc/sysdev/xive/common.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c index f3b16ed48b05..5d2c58dba57e 100644 --- a/arch/powerpc/sysdev/xive/common.c +++ b/arch/powerpc/sysdev/xive/common.c @@ -1143,10 +1143,6 @@ static int __init xive_request_ipi(void) struct xive_ipi_desc *xid = &xive_ipis[node]; struct xive_ipi_alloc_info info = { node };
/* Skip nodes without CPUs */
if (cpumask_empty(cpumask_of_node(node)))
continue;
- /*
- Map one IPI interrupt per node for all cpus of that node.
- Since the HW interrupt number doesn't have any meaning,
Tested-by: Laurent Vivier lvivier@redhat.com
On 29/06/2021 15:15, Cédric Le Goater wrote:
On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at runtime. Today, the IPI is not created for such nodes, and hot-plugged CPUs use a bogus IPI, which leads to soft lockups.
We could create the node IPI on demand but it is a bit complex because this code would be called under bringup_up() and some IRQ locking is being done. The simplest solution is to create the IPIs for all nodes at startup.
Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node") Cc: stable@vger.kernel.org # v5.13 Reported-by: Geetika Moolchandani Geetika.Moolchandani1@ibm.com Cc: Srikar Dronamraju srikar@linux.vnet.ibm.com Signed-off-by: Cédric Le Goater clg@kaod.org
This patch breaks old versions of irqbalance (<= v1.4). Possible nodes are collected from /sys/devices/system/node/ but CPU-less nodes are not listed there. When interrupts are scanned, the link representing the node structure is NULL and segfault occurs.
Version 1.7 seems immune.
arch/powerpc/sysdev/xive/common.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c index f3b16ed48b05..5d2c58dba57e 100644 --- a/arch/powerpc/sysdev/xive/common.c +++ b/arch/powerpc/sysdev/xive/common.c @@ -1143,10 +1143,6 @@ static int __init xive_request_ipi(void) struct xive_ipi_desc *xid = &xive_ipis[node]; struct xive_ipi_alloc_info info = { node };
/* Skip nodes without CPUs */
if (cpumask_empty(cpumask_of_node(node)))
continue;
- /*
- Map one IPI interrupt per node for all cpus of that node.
- Since the HW interrupt number doesn't have any meaning,
What happened to this fix? Will it be merged?
Thanks, Laurent
Cédric Le Goater clg@kaod.org writes:
On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at runtime. Today, the IPI is not created for such nodes, and hot-plugged CPUs use a bogus IPI, which leads to soft lockups.
We could create the node IPI on demand but it is a bit complex because this code would be called under bringup_up() and some IRQ locking is being done. The simplest solution is to create the IPIs for all nodes at startup.
Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node") Cc: stable@vger.kernel.org # v5.13 Reported-by: Geetika Moolchandani Geetika.Moolchandani1@ibm.com Cc: Srikar Dronamraju srikar@linux.vnet.ibm.com Signed-off-by: Cédric Le Goater clg@kaod.org
This patch breaks old versions of irqbalance (<= v1.4). Possible nodes are collected from /sys/devices/system/node/ but CPU-less nodes are not listed there. When interrupts are scanned, the link representing the node structure is NULL and segfault occurs.
Breaking userspace is usually frowned upon, even if it is irqbalance.
If CPU-less nodes appeared in /sys/devices/system/node would that fix it? Could we do that or is that not possible for other reasons?
Version 1.7 seems immune.
Which was released in August 2020.
Looks like some distros still ship 1.6, I take it you're not sure if that is broken or not.
cheers
On 8/2/21 8:37 AM, Michael Ellerman wrote:
Cédric Le Goater clg@kaod.org writes:
On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at runtime. Today, the IPI is not created for such nodes, and hot-plugged CPUs use a bogus IPI, which leads to soft lockups.
We could create the node IPI on demand but it is a bit complex because this code would be called under bringup_up() and some IRQ locking is being done. The simplest solution is to create the IPIs for all nodes at startup.
Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node") Cc: stable@vger.kernel.org # v5.13 Reported-by: Geetika Moolchandani Geetika.Moolchandani1@ibm.com Cc: Srikar Dronamraju srikar@linux.vnet.ibm.com Signed-off-by: Cédric Le Goater clg@kaod.org
This patch breaks old versions of irqbalance (<= v1.4). Possible nodes are collected from /sys/devices/system/node/ but CPU-less nodes are not listed there. When interrupts are scanned, the link representing the node structure is NULL and segfault occurs.
Breaking userspace is usually frowned upon, even if it is irqbalance.
If CPU-less nodes appeared in /sys/devices/system/node would that fix it? Could we do that or is that not possible for other reasons?
Version 1.7 seems immune.
Which was released in August 2020.
Looks like some distros still ship 1.6, I take it you're not sure if that is broken or not.
I did a bisect on irqbalance and the "bad" commit was introduced between version 1.7 and version 1.8 :
commit 31dea01f3a47 ("Also fetch node info for non-PCI devices") https://github.com/Irqbalance/irqbalance/commit/31dea01f3a47aa63745606384868...
which was backported on RHEL 8 in RPM irqbalance-1.4.0-6.el8.
Any distro using irqbalance <= 1.7 without the patch above is fine.
Since irqbalance handled cleanly irqs referencing offline nodes before this patch, I am inclined to think that the irqbalance fix is incomplete. Unfortunately, the commit log lacks some context on the non-PCI devices.
Thanks,
C.
On 6/29/21 3:15 PM, Cédric Le Goater wrote:
On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at runtime. Today, the IPI is not created for such nodes, and hot-plugged CPUs use a bogus IPI, which leads to soft lockups.
We could create the node IPI on demand but it is a bit complex because this code would be called under bringup_up() and some IRQ locking is being done. The simplest solution is to create the IPIs for all nodes at startup.
Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node") Cc: stable@vger.kernel.org # v5.13 Reported-by: Geetika Moolchandani Geetika.Moolchandani1@ibm.com Cc: Srikar Dronamraju srikar@linux.vnet.ibm.com Signed-off-by: Cédric Le Goater clg@kaod.org
This patch breaks old versions of irqbalance (<= v1.4). Possible nodes are collected from /sys/devices/system/node/ but CPU-less nodes are not listed there. When interrupts are scanned, the link representing the node structure is NULL and segfault occurs.
This is an irqbalance regression due to :
https://github.com/Irqbalance/irqbalance/pull/172
I will report through an issue.
Anyhow, there is a better approach which is to allocate IPIs for all nodes at boot time and do the mapping on demand. Removing the mapping on last use seems more complex though.
I will send a v2 after some tests.
Thanks,
C.
Version 1.7 seems immune.
arch/powerpc/sysdev/xive/common.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c index f3b16ed48b05..5d2c58dba57e 100644 --- a/arch/powerpc/sysdev/xive/common.c +++ b/arch/powerpc/sysdev/xive/common.c @@ -1143,10 +1143,6 @@ static int __init xive_request_ipi(void) struct xive_ipi_desc *xid = &xive_ipis[node]; struct xive_ipi_alloc_info info = { node };
/* Skip nodes without CPUs */
if (cpumask_empty(cpumask_of_node(node)))
continue;
- /*
- Map one IPI interrupt per node for all cpus of that node.
- Since the HW interrupt number doesn't have any meaning,
linux-stable-mirror@lists.linaro.org