On 2/12/21 5:41 PM, Greg Kurz wrote:
Depending on the number of online CPUs in the original kernel, it is likely for CPU #0 to be offline in a kdump kernel. The associated IRQs in the affinity mappings provided by irq_create_affinity_masks() are thus not started by irq_startup(), as per-design with managed IRQs.
This can be a problem with multi-queue block devices driven by blk-mq : such a non-started IRQ is very likely paired with the single queue enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This causes the device to remain silent and likely hangs the guest at some point.
This is a regression caused by commit 9ea69a55b3b9 ("powerpc/pseries: Pass MSI affinity to irq_create_mapping()"). Note that this only happens with the XIVE interrupt controller because XICS has a workaround to bypass affinity, which is activated during kdump with the "noirqdistrib" kernel parameter.
The issue comes from a combination of factors:
- discrepancy between the number of queues detected by the multi-queue block driver, that was used to create the MSI vectors, and the single queue mode enforced later on by blk-mq because of kdump (i.e. keeping all queues fixes the issue)
- CPU#0 offline (i.e. kdump always succeed with CPU#0)
Given that I couldn't reproduce on x86, which seems to always have CPU#0 online even during kdump, I'm not sure where this should be fixed. Hence going for another approach : fine-grained affinity is for performance and we don't really care about that during kdump. Simply revert to the previous working behavior of ignoring affinity masks in this case only.
Fixes: 9ea69a55b3b9 ("powerpc/pseries: Pass MSI affinity to irq_create_mapping()") Cc: lvivier@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kurz groug@kaod.org
Reviewed-by: Cédric Le Goater clg@kaod.org
Thanks for tracking this issue.
This layer needs a rework. Patches adding a MSI domain should be ready in a couple of releases. Hopefully.
C.
arch/powerpc/platforms/pseries/msi.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c index b3ac2455faad..29d04b83288d 100644 --- a/arch/powerpc/platforms/pseries/msi.c +++ b/arch/powerpc/platforms/pseries/msi.c @@ -458,8 +458,28 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type) return hwirq; }
virq = irq_create_mapping_affinity(NULL, hwirq,
entry->affinity);
/*
* Depending on the number of online CPUs in the original
* kernel, it is likely for CPU #0 to be offline in a kdump
* kernel. The associated IRQs in the affinity mappings
* provided by irq_create_affinity_masks() are thus not
* started by irq_startup(), as per-design for managed IRQs.
* This can be a problem with multi-queue block devices driven
* by blk-mq : such a non-started IRQ is very likely paired
* with the single queue enforced by blk-mq during kdump (see
* blk_mq_alloc_tag_set()). This causes the device to remain
* silent and likely hangs the guest at some point.
*
* We don't really care for fine-grained affinity when doing
* kdump actually : simply ignore the pre-computed affinity
* masks in this case and let the default mask with all CPUs
* be used when creating the IRQ mappings.
*/
if (is_kdump_kernel())
virq = irq_create_mapping(NULL, hwirq);
else
virq = irq_create_mapping_affinity(NULL, hwirq,
entry->affinity);
if (!virq) { pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);