Hyper-V driver has advertised support for multi-MSI, but any attempt at using the feature would fallback to a single MSI (non-starter for devices that require multi-MSI). The fallback also covered up other bugs related to multi-MSI functionality rooted in the driver not being able to tell MSIs apart.
These patches fix those bugs by enabling hv multi-MSI through IOMMU remapping, distinguishing multi-MSIs from the initial MSI of the MSI block, preventing retargeting of MSI subsets from invalidating the IRTE block, and aiding hypervisor to preserve the block of requests.
Tested on 5.18.10-stable. Looking to backport to all long-term stable branches (post any possible merge-conflict resolution and testing documented in future patch-sets of course.)
Jeffrey Hugo (4): PCI: hv: Fix multi-MSI to allow more than one MSI vector PCI: hv: Fix hv_arch_irq_unmask() for multi-MSI PCI: hv: Reuse existing IRTE allocation in compose_msi_msg() PCI: hv: Fix interrupt mapping for multi-MSI
drivers/pci/controller/pci-hyperv.c | 99 +++++++++++++++++++++-------- 1 file changed, 73 insertions(+), 26 deletions(-)
From: Jeffrey Hugo quic_jhugo@quicinc.com
If the allocation of multiple MSI vectors for multi-MSI fails in the core PCI framework, the framework will retry the allocation as a single MSI vector, assuming that meets the min_vecs specified by the requesting driver.
Hyper-V advertises that multi-MSI is supported, but reuses the VECTOR domain to implement that for x86. The VECTOR domain does not support multi-MSI, so the alloc will always fail and fallback to a single MSI allocation.
In short, Hyper-V advertises a capability it does not implement.
Hyper-V can support multi-MSI because it coordinates with the hypervisor to map the MSIs in the IOMMU's interrupt remapper, which is something the VECTOR domain does not have. Therefore the fix is simple - copy what the x86 IOMMU drivers (AMD/Intel-IR) do by removing X86_IRQ_ALLOC_CONTIGUOUS_VECTORS after calling the VECTOR domain's pci_msi_prepare().
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") Signed-off-by: Jeffrey Hugo quic_jhugo@quicinc.com Reviewed-by: Dexuan Cui decui@microsoft.com Link: https://lore.kernel.org/r/1649856981-14649-1-git-send-email-quic_jhugo@quici... Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Carl Vanderlip quic_carlv@quicinc.com --- drivers/pci/controller/pci-hyperv.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index d270a204324e..1cbe24b92a38 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -614,7 +614,16 @@ static void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry, static int hv_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, msi_alloc_info_t *info) { - return pci_msi_prepare(domain, dev, nvec, info); + int ret = pci_msi_prepare(domain, dev, nvec, info); + + /* + * By using the interrupt remapper in the hypervisor IOMMU, contiguous + * CPU vectors is not needed for multi-MSI + */ + if (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI) + info->flags &= ~X86_IRQ_ALLOC_CONTIGUOUS_VECTORS; + + return ret; }
/**
On Wed, Jul 13, 2022 at 06:12:51PM +0000, Carl Vanderlip wrote:
From: Jeffrey Hugo quic_jhugo@quicinc.com
If the allocation of multiple MSI vectors for multi-MSI fails in the core PCI framework, the framework will retry the allocation as a single MSI vector, assuming that meets the min_vecs specified by the requesting driver.
Hyper-V advertises that multi-MSI is supported, but reuses the VECTOR domain to implement that for x86. The VECTOR domain does not support multi-MSI, so the alloc will always fail and fallback to a single MSI allocation.
In short, Hyper-V advertises a capability it does not implement.
Hyper-V can support multi-MSI because it coordinates with the hypervisor to map the MSIs in the IOMMU's interrupt remapper, which is something the VECTOR domain does not have. Therefore the fix is simple - copy what the x86 IOMMU drivers (AMD/Intel-IR) do by removing X86_IRQ_ALLOC_CONTIGUOUS_VECTORS after calling the VECTOR domain's pci_msi_prepare().
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") Signed-off-by: Jeffrey Hugo quic_jhugo@quicinc.com Reviewed-by: Dexuan Cui decui@microsoft.com Link: https://lore.kernel.org/r/1649856981-14649-1-git-send-email-quic_jhugo@quici... Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Carl Vanderlip quic_carlv@quicinc.com
drivers/pci/controller/pci-hyperv.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
You forgot to list the git commit id of this and the other commits here, or anywhere else. Please fix that up and resend the series.
thanks,
greg k-h
From: Jeffrey Hugo quic_jhugo@quicinc.com
In the multi-MSI case, hv_arch_irq_unmask() will only operate on the first MSI of the N allocated. This is because only the first msi_desc is cached and it is shared by all the MSIs of the multi-MSI block. This means that hv_arch_irq_unmask() gets the correct address, but the wrong data (always 0).
This can break MSIs.
Lets assume MSI0 is vector 34 on CPU0, and MSI1 is vector 33 on CPU0.
hv_arch_irq_unmask() is called on MSI0. It uses a hypercall to configure the MSI address and data (0) to vector 34 of CPU0. This is correct. Then hv_arch_irq_unmask is called on MSI1. It uses another hypercall to configure the MSI address and data (0) to vector 33 of CPU0. This is wrong, and results in both MSI0 and MSI1 being routed to vector 33. Linux will observe extra instances of MSI1 and no instances of MSI0 despite the endpoint device behaving correctly.
For the multi-MSI case, we need unique address and data info for each MSI, but the cached msi_desc does not provide that. However, that information can be gotten from the int_desc cached in the chip_data by compose_msi_msg(). Fix the multi-MSI case to use that cached information instead. Since hv_set_msi_entry_from_desc() is no longer applicable, remove it.
Signed-off-by: Jeffrey Hugo quic_jhugo@quicinc.com Reviewed-by: Michael Kelley mikelley@microsoft.com Link: https://lore.kernel.org/r/1651068453-29588-1-git-send-email-quic_jhugo@quici... Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Carl Vanderlip quic_carlv@quicinc.com --- drivers/pci/controller/pci-hyperv.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index 1cbe24b92a38..d30821a39a7b 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -604,13 +604,6 @@ static unsigned int hv_msi_get_int_vector(struct irq_data *data) return cfg->vector; }
-static void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry, - struct msi_desc *msi_desc) -{ - msi_entry->address.as_uint32 = msi_desc->msg.address_lo; - msi_entry->data.as_uint32 = msi_desc->msg.data; -} - static int hv_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, msi_alloc_info_t *info) { @@ -640,6 +633,7 @@ static void hv_arch_irq_unmask(struct irq_data *data) { struct msi_desc *msi_desc = irq_data_get_msi_desc(data); struct hv_retarget_device_interrupt *params; + struct tran_int_desc *int_desc; struct hv_pcibus_device *hbus; struct cpumask *dest; cpumask_var_t tmp; @@ -654,6 +648,7 @@ static void hv_arch_irq_unmask(struct irq_data *data) pdev = msi_desc_to_pci_dev(msi_desc); pbus = pdev->bus; hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata); + int_desc = data->chip_data;
spin_lock_irqsave(&hbus->retarget_msi_interrupt_lock, flags);
@@ -661,7 +656,8 @@ static void hv_arch_irq_unmask(struct irq_data *data) memset(params, 0, sizeof(*params)); params->partition_id = HV_PARTITION_ID_SELF; params->int_entry.source = HV_INTERRUPT_SOURCE_MSI; - hv_set_msi_entry_from_desc(¶ms->int_entry.msi_entry, msi_desc); + params->int_entry.msi_entry.address.as_uint32 = int_desc->address & 0xffffffff; + params->int_entry.msi_entry.data.as_uint32 = int_desc->data; params->device_id = (hbus->hdev->dev_instance.b[5] << 24) | (hbus->hdev->dev_instance.b[4] << 16) | (hbus->hdev->dev_instance.b[7] << 8) |
From: Jeffrey Hugo quic_jhugo@quicinc.com
Currently if compose_msi_msg() is called multiple times, it will free any previous IRTE allocation, and generate a new allocation. While nothing prevents this from occurring, it is extraneous when Linux could just reuse the existing allocation and avoid a bunch of overhead.
However, when future IRTE allocations operate on blocks of MSIs instead of a single line, freeing the allocation will impact all of the lines. This could cause an issue where an allocation of N MSIs occurs, then some of the lines are retargeted, and finally the allocation is freed/reallocated. The freeing of the allocation removes all of the configuration for the entire block, which requires all the lines to be retargeted, which might not happen since some lines might already be unmasked/active.
Signed-off-by: Jeffrey Hugo quic_jhugo@quicinc.com Reviewed-by: Dexuan Cui decui@microsoft.com Tested-by: Dexuan Cui decui@microsoft.com Tested-by: Michael Kelley mikelley@microsoft.com Link: https://lore.kernel.org/r/1652282582-21595-1-git-send-email-quic_jhugo@quici... Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Carl Vanderlip quic_carlv@quicinc.com --- drivers/pci/controller/pci-hyperv.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index d30821a39a7b..ac2b844ce81e 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -1700,6 +1700,15 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) u32 size; int ret;
+ /* Reuse the previous allocation */ + if (data->chip_data) { + int_desc = data->chip_data; + msg->address_hi = int_desc->address >> 32; + msg->address_lo = int_desc->address & 0xffffffff; + msg->data = int_desc->data; + return; + } + pdev = msi_desc_to_pci_dev(irq_data_get_msi_desc(data)); dest = irq_data_get_effective_affinity_mask(data); pbus = pdev->bus; @@ -1709,13 +1718,6 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) if (!hpdev) goto return_null_message;
- /* Free any previous message that might have already been composed. */ - if (data->chip_data) { - int_desc = data->chip_data; - data->chip_data = NULL; - hv_int_desc_free(hpdev, int_desc); - } - int_desc = kzalloc(sizeof(*int_desc), GFP_ATOMIC); if (!int_desc) goto drop_reference;
From: Jeffrey Hugo quic_jhugo@quicinc.com
According to Dexuan, the hypervisor folks beleive that multi-msi allocations are not correct. compose_msi_msg() will allocate multi-msi one by one. However, multi-msi is a block of related MSIs, with alignment requirements. In order for the hypervisor to allocate properly aligned and consecutive entries in the IOMMU Interrupt Remapping Table, there should be a single mapping request that requests all of the multi-msi vectors in one shot.
Dexuan suggests detecting the multi-msi case and composing a single request related to the first MSI. Then for the other MSIs in the same block, use the cached information. This appears to be viable, so do it.
Suggested-by: Dexuan Cui decui@microsoft.com Signed-off-by: Jeffrey Hugo quic_jhugo@quicinc.com Reviewed-by: Dexuan Cui decui@microsoft.com Tested-by: Michael Kelley mikelley@microsoft.com Link: https://lore.kernel.org/r/1652282599-21643-1-git-send-email-quic_jhugo@quici... Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Carl Vanderlip quic_carlv@quicinc.com --- drivers/pci/controller/pci-hyperv.c | 60 ++++++++++++++++++++++++----- 1 file changed, 50 insertions(+), 10 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index ac2b844ce81e..fb185cb05258 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -1518,6 +1518,10 @@ static void hv_int_desc_free(struct hv_pci_dev *hpdev, u8 buffer[sizeof(struct pci_delete_interrupt)]; } ctxt;
+ if (!int_desc->vector_count) { + kfree(int_desc); + return; + } memset(&ctxt, 0, sizeof(ctxt)); int_pkt = (struct pci_delete_interrupt *)&ctxt.pkt.message; int_pkt->message_type.type = @@ -1602,12 +1606,12 @@ static void hv_pci_compose_compl(void *context, struct pci_response *resp,
static u32 hv_compose_msi_req_v1( struct pci_create_interrupt *int_pkt, struct cpumask *affinity, - u32 slot, u8 vector) + u32 slot, u8 vector, u8 vector_count) { int_pkt->message_type.type = PCI_CREATE_INTERRUPT_MESSAGE; int_pkt->wslot.slot = slot; int_pkt->int_desc.vector = vector; - int_pkt->int_desc.vector_count = 1; + int_pkt->int_desc.vector_count = vector_count; int_pkt->int_desc.delivery_mode = DELIVERY_MODE;
/* @@ -1630,14 +1634,14 @@ static int hv_compose_msi_req_get_cpu(struct cpumask *affinity)
static u32 hv_compose_msi_req_v2( struct pci_create_interrupt2 *int_pkt, struct cpumask *affinity, - u32 slot, u8 vector) + u32 slot, u8 vector, u8 vector_count) { int cpu;
int_pkt->message_type.type = PCI_CREATE_INTERRUPT_MESSAGE2; int_pkt->wslot.slot = slot; int_pkt->int_desc.vector = vector; - int_pkt->int_desc.vector_count = 1; + int_pkt->int_desc.vector_count = vector_count; int_pkt->int_desc.delivery_mode = DELIVERY_MODE; cpu = hv_compose_msi_req_get_cpu(affinity); int_pkt->int_desc.processor_array[0] = @@ -1649,7 +1653,7 @@ static u32 hv_compose_msi_req_v2(
static u32 hv_compose_msi_req_v3( struct pci_create_interrupt3 *int_pkt, struct cpumask *affinity, - u32 slot, u32 vector) + u32 slot, u32 vector, u8 vector_count) { int cpu;
@@ -1657,7 +1661,7 @@ static u32 hv_compose_msi_req_v3( int_pkt->wslot.slot = slot; int_pkt->int_desc.vector = vector; int_pkt->int_desc.reserved = 0; - int_pkt->int_desc.vector_count = 1; + int_pkt->int_desc.vector_count = vector_count; int_pkt->int_desc.delivery_mode = DELIVERY_MODE; cpu = hv_compose_msi_req_get_cpu(affinity); int_pkt->int_desc.processor_array[0] = @@ -1688,6 +1692,8 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) struct cpumask *dest; struct compose_comp_ctxt comp; struct tran_int_desc *int_desc; + struct msi_desc *msi_desc; + u8 vector, vector_count; struct { struct pci_packet pci_pkt; union { @@ -1709,7 +1715,8 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) return; }
- pdev = msi_desc_to_pci_dev(irq_data_get_msi_desc(data)); + msi_desc = irq_data_get_msi_desc(data); + pdev = msi_desc_to_pci_dev(msi_desc); dest = irq_data_get_effective_affinity_mask(data); pbus = pdev->bus; hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata); @@ -1722,6 +1729,36 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) if (!int_desc) goto drop_reference;
+ if (!msi_desc->pci.msi_attrib.is_msix && msi_desc->nvec_used > 1) { + /* + * If this is not the first MSI of Multi MSI, we already have + * a mapping. Can exit early. + */ + if (msi_desc->irq != data->irq) { + data->chip_data = int_desc; + int_desc->address = msi_desc->msg.address_lo | + (u64)msi_desc->msg.address_hi << 32; + int_desc->data = msi_desc->msg.data + + (data->irq - msi_desc->irq); + msg->address_hi = msi_desc->msg.address_hi; + msg->address_lo = msi_desc->msg.address_lo; + msg->data = int_desc->data; + put_pcichild(hpdev); + return; + } + /* + * The vector we select here is a dummy value. The correct + * value gets sent to the hypervisor in unmask(). This needs + * to be aligned with the count, and also not zero. Multi-msi + * is powers of 2 up to 32, so 32 will always work here. + */ + vector = 32; + vector_count = msi_desc->nvec_used; + } else { + vector = hv_msi_get_int_vector(data); + vector_count = 1; + } + memset(&ctxt, 0, sizeof(ctxt)); init_completion(&comp.comp_pkt.host_event); ctxt.pci_pkt.completion_func = hv_pci_compose_compl; @@ -1732,7 +1769,8 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) size = hv_compose_msi_req_v1(&ctxt.int_pkts.v1, dest, hpdev->desc.win_slot.slot, - hv_msi_get_int_vector(data)); + vector, + vector_count); break;
case PCI_PROTOCOL_VERSION_1_2: @@ -1740,14 +1778,16 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) size = hv_compose_msi_req_v2(&ctxt.int_pkts.v2, dest, hpdev->desc.win_slot.slot, - hv_msi_get_int_vector(data)); + vector, + vector_count); break;
case PCI_PROTOCOL_VERSION_1_4: size = hv_compose_msi_req_v3(&ctxt.int_pkts.v3, dest, hpdev->desc.win_slot.slot, - hv_msi_get_int_vector(data)); + vector, + vector_count); break;
default:
Hyper-V driver has advertised support for multi-MSI, but any attempt at using the feature would fallback to a single MSI (non-starter for devices that require multi-MSI). The fallback also covered up other bugs related to multi-MSI functionality rooted in the driver not being able to tell MSIs apart.
These patches fix those bugs by enabling hv multi-MSI through IOMMU remapping, distinguishing multi-MSIs from the initial MSI of the MSI block, preventing retargeting of MSI subsets from invalidating the IRTE block, and aiding hypervisor to preserve the block of requests.
Tested on 5.18.10-stable. Looking to backport to all long-term stable branches (post any possible merge-conflict resolution and testing documented in future patch-sets of course.)
Changes from v1->v2: - Added upstream commit ids
Jeffrey Hugo (4): PCI: hv: Fix multi-MSI to allow more than one MSI vector PCI: hv: Fix hv_arch_irq_unmask() for multi-MSI PCI: hv: Reuse existing IRTE allocation in compose_msi_msg() PCI: hv: Fix interrupt mapping for multi-MSI
drivers/pci/controller/pci-hyperv.c | 99 +++++++++++++++++++++-------- 1 file changed, 73 insertions(+), 26 deletions(-)
From: Jeffrey Hugo quic_jhugo@quicinc.com
[ upstream change 08e61e861a0e47e5e1a3fb78406afd6b0cea6b6d ]
If the allocation of multiple MSI vectors for multi-MSI fails in the core PCI framework, the framework will retry the allocation as a single MSI vector, assuming that meets the min_vecs specified by the requesting driver.
Hyper-V advertises that multi-MSI is supported, but reuses the VECTOR domain to implement that for x86. The VECTOR domain does not support multi-MSI, so the alloc will always fail and fallback to a single MSI allocation.
In short, Hyper-V advertises a capability it does not implement.
Hyper-V can support multi-MSI because it coordinates with the hypervisor to map the MSIs in the IOMMU's interrupt remapper, which is something the VECTOR domain does not have. Therefore the fix is simple - copy what the x86 IOMMU drivers (AMD/Intel-IR) do by removing X86_IRQ_ALLOC_CONTIGUOUS_VECTORS after calling the VECTOR domain's pci_msi_prepare().
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") Signed-off-by: Jeffrey Hugo quic_jhugo@quicinc.com Reviewed-by: Dexuan Cui decui@microsoft.com Link: https://lore.kernel.org/r/1649856981-14649-1-git-send-email-quic_jhugo@quici... Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Carl Vanderlip quic_carlv@quicinc.com --- drivers/pci/controller/pci-hyperv.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index d270a204324e..1cbe24b92a38 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -614,7 +614,16 @@ static void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry, static int hv_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, msi_alloc_info_t *info) { - return pci_msi_prepare(domain, dev, nvec, info); + int ret = pci_msi_prepare(domain, dev, nvec, info); + + /* + * By using the interrupt remapper in the hypervisor IOMMU, contiguous + * CPU vectors is not needed for multi-MSI + */ + if (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI) + info->flags &= ~X86_IRQ_ALLOC_CONTIGUOUS_VECTORS; + + return ret; }
/**
From: Jeffrey Hugo quic_jhugo@quicinc.com
[ upstream change 455880dfe292a2bdd3b4ad6a107299fce610e64b ]
In the multi-MSI case, hv_arch_irq_unmask() will only operate on the first MSI of the N allocated. This is because only the first msi_desc is cached and it is shared by all the MSIs of the multi-MSI block. This means that hv_arch_irq_unmask() gets the correct address, but the wrong data (always 0).
This can break MSIs.
Lets assume MSI0 is vector 34 on CPU0, and MSI1 is vector 33 on CPU0.
hv_arch_irq_unmask() is called on MSI0. It uses a hypercall to configure the MSI address and data (0) to vector 34 of CPU0. This is correct. Then hv_arch_irq_unmask is called on MSI1. It uses another hypercall to configure the MSI address and data (0) to vector 33 of CPU0. This is wrong, and results in both MSI0 and MSI1 being routed to vector 33. Linux will observe extra instances of MSI1 and no instances of MSI0 despite the endpoint device behaving correctly.
For the multi-MSI case, we need unique address and data info for each MSI, but the cached msi_desc does not provide that. However, that information can be gotten from the int_desc cached in the chip_data by compose_msi_msg(). Fix the multi-MSI case to use that cached information instead. Since hv_set_msi_entry_from_desc() is no longer applicable, remove it.
Signed-off-by: Jeffrey Hugo quic_jhugo@quicinc.com Reviewed-by: Michael Kelley mikelley@microsoft.com Link: https://lore.kernel.org/r/1651068453-29588-1-git-send-email-quic_jhugo@quici... Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Carl Vanderlip quic_carlv@quicinc.com --- drivers/pci/controller/pci-hyperv.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index 1cbe24b92a38..d30821a39a7b 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -604,13 +604,6 @@ static unsigned int hv_msi_get_int_vector(struct irq_data *data) return cfg->vector; }
-static void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry, - struct msi_desc *msi_desc) -{ - msi_entry->address.as_uint32 = msi_desc->msg.address_lo; - msi_entry->data.as_uint32 = msi_desc->msg.data; -} - static int hv_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, msi_alloc_info_t *info) { @@ -640,6 +633,7 @@ static void hv_arch_irq_unmask(struct irq_data *data) { struct msi_desc *msi_desc = irq_data_get_msi_desc(data); struct hv_retarget_device_interrupt *params; + struct tran_int_desc *int_desc; struct hv_pcibus_device *hbus; struct cpumask *dest; cpumask_var_t tmp; @@ -654,6 +648,7 @@ static void hv_arch_irq_unmask(struct irq_data *data) pdev = msi_desc_to_pci_dev(msi_desc); pbus = pdev->bus; hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata); + int_desc = data->chip_data;
spin_lock_irqsave(&hbus->retarget_msi_interrupt_lock, flags);
@@ -661,7 +656,8 @@ static void hv_arch_irq_unmask(struct irq_data *data) memset(params, 0, sizeof(*params)); params->partition_id = HV_PARTITION_ID_SELF; params->int_entry.source = HV_INTERRUPT_SOURCE_MSI; - hv_set_msi_entry_from_desc(¶ms->int_entry.msi_entry, msi_desc); + params->int_entry.msi_entry.address.as_uint32 = int_desc->address & 0xffffffff; + params->int_entry.msi_entry.data.as_uint32 = int_desc->data; params->device_id = (hbus->hdev->dev_instance.b[5] << 24) | (hbus->hdev->dev_instance.b[4] << 16) | (hbus->hdev->dev_instance.b[7] << 8) |
From: Jeffrey Hugo quic_jhugo@quicinc.com
[ upstream change b4b77778ecc5bfbd4e77de1b2fd5c1dd3c655f1f ]
Currently if compose_msi_msg() is called multiple times, it will free any previous IRTE allocation, and generate a new allocation. While nothing prevents this from occurring, it is extraneous when Linux could just reuse the existing allocation and avoid a bunch of overhead.
However, when future IRTE allocations operate on blocks of MSIs instead of a single line, freeing the allocation will impact all of the lines. This could cause an issue where an allocation of N MSIs occurs, then some of the lines are retargeted, and finally the allocation is freed/reallocated. The freeing of the allocation removes all of the configuration for the entire block, which requires all the lines to be retargeted, which might not happen since some lines might already be unmasked/active.
Signed-off-by: Jeffrey Hugo quic_jhugo@quicinc.com Reviewed-by: Dexuan Cui decui@microsoft.com Tested-by: Dexuan Cui decui@microsoft.com Tested-by: Michael Kelley mikelley@microsoft.com Link: https://lore.kernel.org/r/1652282582-21595-1-git-send-email-quic_jhugo@quici... Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Carl Vanderlip quic_carlv@quicinc.com --- drivers/pci/controller/pci-hyperv.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index d30821a39a7b..ac2b844ce81e 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -1700,6 +1700,15 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) u32 size; int ret;
+ /* Reuse the previous allocation */ + if (data->chip_data) { + int_desc = data->chip_data; + msg->address_hi = int_desc->address >> 32; + msg->address_lo = int_desc->address & 0xffffffff; + msg->data = int_desc->data; + return; + } + pdev = msi_desc_to_pci_dev(irq_data_get_msi_desc(data)); dest = irq_data_get_effective_affinity_mask(data); pbus = pdev->bus; @@ -1709,13 +1718,6 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) if (!hpdev) goto return_null_message;
- /* Free any previous message that might have already been composed. */ - if (data->chip_data) { - int_desc = data->chip_data; - data->chip_data = NULL; - hv_int_desc_free(hpdev, int_desc); - } - int_desc = kzalloc(sizeof(*int_desc), GFP_ATOMIC); if (!int_desc) goto drop_reference;
From: Jeffrey Hugo quic_jhugo@quicinc.com
[ upstream change a2bad844a67b1c7740bda63e87453baf63c3a7f7 ]
According to Dexuan, the hypervisor folks beleive that multi-msi allocations are not correct. compose_msi_msg() will allocate multi-msi one by one. However, multi-msi is a block of related MSIs, with alignment requirements. In order for the hypervisor to allocate properly aligned and consecutive entries in the IOMMU Interrupt Remapping Table, there should be a single mapping request that requests all of the multi-msi vectors in one shot.
Dexuan suggests detecting the multi-msi case and composing a single request related to the first MSI. Then for the other MSIs in the same block, use the cached information. This appears to be viable, so do it.
Suggested-by: Dexuan Cui decui@microsoft.com Signed-off-by: Jeffrey Hugo quic_jhugo@quicinc.com Reviewed-by: Dexuan Cui decui@microsoft.com Tested-by: Michael Kelley mikelley@microsoft.com Link: https://lore.kernel.org/r/1652282599-21643-1-git-send-email-quic_jhugo@quici... Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Carl Vanderlip quic_carlv@quicinc.com --- drivers/pci/controller/pci-hyperv.c | 60 ++++++++++++++++++++++++----- 1 file changed, 50 insertions(+), 10 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index ac2b844ce81e..fb185cb05258 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -1518,6 +1518,10 @@ static void hv_int_desc_free(struct hv_pci_dev *hpdev, u8 buffer[sizeof(struct pci_delete_interrupt)]; } ctxt;
+ if (!int_desc->vector_count) { + kfree(int_desc); + return; + } memset(&ctxt, 0, sizeof(ctxt)); int_pkt = (struct pci_delete_interrupt *)&ctxt.pkt.message; int_pkt->message_type.type = @@ -1602,12 +1606,12 @@ static void hv_pci_compose_compl(void *context, struct pci_response *resp,
static u32 hv_compose_msi_req_v1( struct pci_create_interrupt *int_pkt, struct cpumask *affinity, - u32 slot, u8 vector) + u32 slot, u8 vector, u8 vector_count) { int_pkt->message_type.type = PCI_CREATE_INTERRUPT_MESSAGE; int_pkt->wslot.slot = slot; int_pkt->int_desc.vector = vector; - int_pkt->int_desc.vector_count = 1; + int_pkt->int_desc.vector_count = vector_count; int_pkt->int_desc.delivery_mode = DELIVERY_MODE;
/* @@ -1630,14 +1634,14 @@ static int hv_compose_msi_req_get_cpu(struct cpumask *affinity)
static u32 hv_compose_msi_req_v2( struct pci_create_interrupt2 *int_pkt, struct cpumask *affinity, - u32 slot, u8 vector) + u32 slot, u8 vector, u8 vector_count) { int cpu;
int_pkt->message_type.type = PCI_CREATE_INTERRUPT_MESSAGE2; int_pkt->wslot.slot = slot; int_pkt->int_desc.vector = vector; - int_pkt->int_desc.vector_count = 1; + int_pkt->int_desc.vector_count = vector_count; int_pkt->int_desc.delivery_mode = DELIVERY_MODE; cpu = hv_compose_msi_req_get_cpu(affinity); int_pkt->int_desc.processor_array[0] = @@ -1649,7 +1653,7 @@ static u32 hv_compose_msi_req_v2(
static u32 hv_compose_msi_req_v3( struct pci_create_interrupt3 *int_pkt, struct cpumask *affinity, - u32 slot, u32 vector) + u32 slot, u32 vector, u8 vector_count) { int cpu;
@@ -1657,7 +1661,7 @@ static u32 hv_compose_msi_req_v3( int_pkt->wslot.slot = slot; int_pkt->int_desc.vector = vector; int_pkt->int_desc.reserved = 0; - int_pkt->int_desc.vector_count = 1; + int_pkt->int_desc.vector_count = vector_count; int_pkt->int_desc.delivery_mode = DELIVERY_MODE; cpu = hv_compose_msi_req_get_cpu(affinity); int_pkt->int_desc.processor_array[0] = @@ -1688,6 +1692,8 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) struct cpumask *dest; struct compose_comp_ctxt comp; struct tran_int_desc *int_desc; + struct msi_desc *msi_desc; + u8 vector, vector_count; struct { struct pci_packet pci_pkt; union { @@ -1709,7 +1715,8 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) return; }
- pdev = msi_desc_to_pci_dev(irq_data_get_msi_desc(data)); + msi_desc = irq_data_get_msi_desc(data); + pdev = msi_desc_to_pci_dev(msi_desc); dest = irq_data_get_effective_affinity_mask(data); pbus = pdev->bus; hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata); @@ -1722,6 +1729,36 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) if (!int_desc) goto drop_reference;
+ if (!msi_desc->pci.msi_attrib.is_msix && msi_desc->nvec_used > 1) { + /* + * If this is not the first MSI of Multi MSI, we already have + * a mapping. Can exit early. + */ + if (msi_desc->irq != data->irq) { + data->chip_data = int_desc; + int_desc->address = msi_desc->msg.address_lo | + (u64)msi_desc->msg.address_hi << 32; + int_desc->data = msi_desc->msg.data + + (data->irq - msi_desc->irq); + msg->address_hi = msi_desc->msg.address_hi; + msg->address_lo = msi_desc->msg.address_lo; + msg->data = int_desc->data; + put_pcichild(hpdev); + return; + } + /* + * The vector we select here is a dummy value. The correct + * value gets sent to the hypervisor in unmask(). This needs + * to be aligned with the count, and also not zero. Multi-msi + * is powers of 2 up to 32, so 32 will always work here. + */ + vector = 32; + vector_count = msi_desc->nvec_used; + } else { + vector = hv_msi_get_int_vector(data); + vector_count = 1; + } + memset(&ctxt, 0, sizeof(ctxt)); init_completion(&comp.comp_pkt.host_event); ctxt.pci_pkt.completion_func = hv_pci_compose_compl; @@ -1732,7 +1769,8 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) size = hv_compose_msi_req_v1(&ctxt.int_pkts.v1, dest, hpdev->desc.win_slot.slot, - hv_msi_get_int_vector(data)); + vector, + vector_count); break;
case PCI_PROTOCOL_VERSION_1_2: @@ -1740,14 +1778,16 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) size = hv_compose_msi_req_v2(&ctxt.int_pkts.v2, dest, hpdev->desc.win_slot.slot, - hv_msi_get_int_vector(data)); + vector, + vector_count); break;
case PCI_PROTOCOL_VERSION_1_4: size = hv_compose_msi_req_v3(&ctxt.int_pkts.v3, dest, hpdev->desc.win_slot.slot, - hv_msi_get_int_vector(data)); + vector, + vector_count); break;
default:
linux-stable-mirror@lists.linaro.org