Since the commit 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync") the failed list is always built and returned to let the caller decide what to do with the failures. The caller may want to retry resource fitting and assignment and before that can happen, the resources should be restored to their original state (a reset effectively clears the struct resource), which requires returning them on the failed list so that the original state remains stored in the associated struct pci_dev_resource.
Resource resizing is different from the ordinary resource fitting and assignment in that it only considers part of the resources. This means failures for other resource types are not relevant at all and should be ignored. As resize doesn't unassign such unrelated resources, those resource ending up into the failed list implies assignment of that resource must have failed before resize too. The check in pci_reassign_bridge_resources() to decide if the whole assignment is successful, however, is based on list emptiness which will cause false negatives when the failed list has resources with an unrelated type.
If the failed list is not empty, call pci_required_resource_failed() and extend it to be able to filter on specific resource types too (if provided).
Calling pci_required_resource_failed() at this point is slightly problematic because the resource itself is reset when the failed list is constructed in __assign_resources_sorted(). As a result, pci_resource_is_optional() does not have access to the original resource flags. This could be worked around by restoring and re-reseting the resource around the call to pci_resource_is_optional(), however, it shouldn't cause issue as resource resizing is meant for 64-bit prefetchable resources according to Christian König (see the Link which unfortunately doesn't point directly to Christian's reply because lore didn't store that email at all).
Fixes: 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync") Link: https://lore.kernel.org/all/c5d1b5d8-8669-5572-75a7-0b480f581ac1@linux.intel... Reported-by: D Scott Phillips scott@os.amperecomputing.com Closes: https://lore.kernel.org/all/86plf0lgit.fsf@scott-ph-mail.amperecomputing.com... Tested-by: D Scott Phillips scott@os.amperecomputing.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Reviewed-by: D Scott Phillips scott@os.amperecomputing.com Cc: Christian König christian.koenig@amd.com Cc: stable@vger.kernel.org --- drivers/pci/setup-bus.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-)
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index df5aec46c29d..def29506700e 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -28,6 +28,10 @@ #include <linux/acpi.h> #include "pci.h"
+#define PCI_RES_TYPE_MASK \ + (IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\ + IORESOURCE_MEM_64) + unsigned int pci_flags; EXPORT_SYMBOL_GPL(pci_flags);
@@ -384,13 +388,19 @@ static bool pci_need_to_release(unsigned long mask, struct resource *res) }
/* Return: @true if assignment of a required resource failed. */ -static bool pci_required_resource_failed(struct list_head *fail_head) +static bool pci_required_resource_failed(struct list_head *fail_head, + unsigned long type) { struct pci_dev_resource *fail_res;
+ type &= PCI_RES_TYPE_MASK; + list_for_each_entry(fail_res, fail_head, list) { int idx = pci_resource_num(fail_res->dev, fail_res->res);
+ if (type && (fail_res->flags & PCI_RES_TYPE_MASK) != type) + continue; + if (!pci_resource_is_optional(fail_res->dev, idx)) return true; } @@ -504,7 +514,7 @@ static void __assign_resources_sorted(struct list_head *head, }
/* Without realloc_head and only optional fails, nothing more to do. */ - if (!pci_required_resource_failed(&local_fail_head) && + if (!pci_required_resource_failed(&local_fail_head, 0) && list_empty(realloc_head)) { list_for_each_entry(save_res, &save_head, list) { struct resource *res = save_res->res; @@ -1708,10 +1718,6 @@ static void __pci_bridge_assign_resources(const struct pci_dev *bridge, } }
-#define PCI_RES_TYPE_MASK \ - (IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\ - IORESOURCE_MEM_64) - static void pci_bridge_release_resources(struct pci_bus *bus, unsigned long type) { @@ -2450,8 +2456,12 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type) free_list(&added);
if (!list_empty(&failed)) { - ret = -ENOSPC; - goto cleanup; + if (pci_required_resource_failed(&failed, type)) { + ret = -ENOSPC; + goto cleanup; + } + /* Only resources with unrelated types failed (again) */ + free_list(&failed); }
list_for_each_entry(dev_res, &saved, list) {
On Fri, Aug 22, 2025 at 03:33:59PM +0300, Ilpo Järvinen wrote:
Since the commit 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync") the failed list is always built and returned to let the caller decide what to do with the failures. The caller may want to retry resource fitting and assignment and before that can happen, the resources should be restored to their original state (a reset effectively clears the struct resource), which requires returning them on the failed list so that the original state remains stored in the associated struct pci_dev_resource.
Resource resizing is different from the ordinary resource fitting and assignment in that it only considers part of the resources. This means failures for other resource types are not relevant at all and should be ignored. As resize doesn't unassign such unrelated resources, those resource ending up into the failed list implies assignment of that resource must have failed before resize too. The check in pci_reassign_bridge_resources() to decide if the whole assignment is successful, however, is based on list emptiness which will cause false negatives when the failed list has resources with an unrelated type.
If the failed list is not empty, call pci_required_resource_failed() and extend it to be able to filter on specific resource types too (if provided).
Calling pci_required_resource_failed() at this point is slightly problematic because the resource itself is reset when the failed list is constructed in __assign_resources_sorted(). As a result, pci_resource_is_optional() does not have access to the original resource flags. This could be worked around by restoring and re-reseting the resource around the call to pci_resource_is_optional(), however, it shouldn't cause issue as resource resizing is meant for 64-bit prefetchable resources according to Christian König (see the Link which unfortunately doesn't point directly to Christian's reply because lore didn't store that email at all).
Fixes: 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync") Link: https://lore.kernel.org/all/c5d1b5d8-8669-5572-75a7-0b480f581ac1@linux.intel... Reported-by: D Scott Phillips scott@os.amperecomputing.com Closes: https://lore.kernel.org/all/86plf0lgit.fsf@scott-ph-mail.amperecomputing.com...
I'm trying to connect this fix with the Asynchronous SError Interrupt crash that Scott reported here. From the call trace:
... arm64_serror_panic+0x6c/0x90 do_serror+0x58/0x60 el1h_64_error_handler+0x38/0x60 el1h_64_error+0x84/0x88 _raw_spin_lock_irqsave+0x34/0xb0 (P) amdgpu_ih_process+0xf0/0x150 [amdgpu] amdgpu_irq_handler+0x34/0xa0 [amdgpu] __handle_irq_event_percpu+0x60/0x248 handle_irq_event+0x4c/0xc0 handle_fasteoi_irq+0xa0/0x1c8 handle_irq_desc+0x3c/0x68 generic_handle_domain_irq+0x24/0x40 __gic_handle_irq_from_irqson.isra.0+0x15c/0x260 gic_handle_irq+0x28/0x80 call_on_irq_stack+0x24/0x30 do_interrupt_handler+0x88/0xa0 el1_interrupt+0x44/0xd0 el1h_64_irq_handler+0x18/0x28 el1h_64_irq+0x84/0x88 amdgpu_device_rreg.part.0+0x4c/0x190 [amdgpu] (P) amdgpu_device_rreg+0x24/0x40 [amdgpu]
I guess something happened in amdgpu_device_rreg() that caused an interrupt, maybe a bogus virtual address for a register?
And then amdgpu_ih_process() did something that caused the SError? Or since it seems to be asynchronous, maybe the amdgpu_ih_process() connection is coincidental and the SError was a consequence of something else?
I'd like to have a bread crumb here in the commit log that connects this fix with something a user might see, but I don't know what that would look like.
Tested-by: D Scott Phillips scott@os.amperecomputing.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Reviewed-by: D Scott Phillips scott@os.amperecomputing.com Cc: Christian König christian.koenig@amd.com Cc: stable@vger.kernel.org
drivers/pci/setup-bus.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-)
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index df5aec46c29d..def29506700e 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -28,6 +28,10 @@ #include <linux/acpi.h> #include "pci.h" +#define PCI_RES_TYPE_MASK \
- (IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\
IORESOURCE_MEM_64)
unsigned int pci_flags; EXPORT_SYMBOL_GPL(pci_flags); @@ -384,13 +388,19 @@ static bool pci_need_to_release(unsigned long mask, struct resource *res) } /* Return: @true if assignment of a required resource failed. */ -static bool pci_required_resource_failed(struct list_head *fail_head) +static bool pci_required_resource_failed(struct list_head *fail_head,
unsigned long type)
{ struct pci_dev_resource *fail_res;
- type &= PCI_RES_TYPE_MASK;
- list_for_each_entry(fail_res, fail_head, list) { int idx = pci_resource_num(fail_res->dev, fail_res->res);
if (type && (fail_res->flags & PCI_RES_TYPE_MASK) != type)
continue;
- if (!pci_resource_is_optional(fail_res->dev, idx)) return true; }
@@ -504,7 +514,7 @@ static void __assign_resources_sorted(struct list_head *head, } /* Without realloc_head and only optional fails, nothing more to do. */
- if (!pci_required_resource_failed(&local_fail_head) &&
- if (!pci_required_resource_failed(&local_fail_head, 0) && list_empty(realloc_head)) { list_for_each_entry(save_res, &save_head, list) { struct resource *res = save_res->res;
@@ -1708,10 +1718,6 @@ static void __pci_bridge_assign_resources(const struct pci_dev *bridge, } } -#define PCI_RES_TYPE_MASK \
- (IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\
IORESOURCE_MEM_64)
static void pci_bridge_release_resources(struct pci_bus *bus, unsigned long type) { @@ -2450,8 +2456,12 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type) free_list(&added); if (!list_empty(&failed)) {
ret = -ENOSPC;
goto cleanup;
if (pci_required_resource_failed(&failed, type)) {
ret = -ENOSPC;
goto cleanup;
}
/* Only resources with unrelated types failed (again) */
}free_list(&failed);
list_for_each_entry(dev_res, &saved, list) { -- 2.39.5
Adding Alex & Christian as they might be able to shed light on the amdgpu side, but I think the problem still starts from pci_reassign_bridge_resources().
On Mon, 25 Aug 2025, Bjorn Helgaas wrote:
On Fri, Aug 22, 2025 at 03:33:59PM +0300, Ilpo Järvinen wrote:
Since the commit 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync") the failed list is always built and returned to let the caller decide what to do with the failures. The caller may want to retry resource fitting and assignment and before that can happen, the resources should be restored to their original state (a reset effectively clears the struct resource), which requires returning them on the failed list so that the original state remains stored in the associated struct pci_dev_resource.
Resource resizing is different from the ordinary resource fitting and assignment in that it only considers part of the resources. This means failures for other resource types are not relevant at all and should be ignored. As resize doesn't unassign such unrelated resources, those resource ending up into the failed list implies assignment of that resource must have failed before resize too. The check in pci_reassign_bridge_resources() to decide if the whole assignment is successful, however, is based on list emptiness which will cause false negatives when the failed list has resources with an unrelated type.
If the failed list is not empty, call pci_required_resource_failed() and extend it to be able to filter on specific resource types too (if provided).
Calling pci_required_resource_failed() at this point is slightly problematic because the resource itself is reset when the failed list is constructed in __assign_resources_sorted(). As a result, pci_resource_is_optional() does not have access to the original resource flags. This could be worked around by restoring and re-reseting the resource around the call to pci_resource_is_optional(), however, it shouldn't cause issue as resource resizing is meant for 64-bit prefetchable resources according to Christian König (see the Link which unfortunately doesn't point directly to Christian's reply because lore didn't store that email at all).
Fixes: 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync") Link: https://lore.kernel.org/all/c5d1b5d8-8669-5572-75a7-0b480f581ac1@linux.intel... Reported-by: D Scott Phillips scott@os.amperecomputing.com Closes: https://lore.kernel.org/all/86plf0lgit.fsf@scott-ph-mail.amperecomputing.com...
I'm trying to connect this fix with the Asynchronous SError Interrupt crash that Scott reported here. From the call trace:
... arm64_serror_panic+0x6c/0x90 do_serror+0x58/0x60 el1h_64_error_handler+0x38/0x60 el1h_64_error+0x84/0x88 _raw_spin_lock_irqsave+0x34/0xb0 (P) amdgpu_ih_process+0xf0/0x150 [amdgpu] amdgpu_irq_handler+0x34/0xa0 [amdgpu] __handle_irq_event_percpu+0x60/0x248 handle_irq_event+0x4c/0xc0 handle_fasteoi_irq+0xa0/0x1c8 handle_irq_desc+0x3c/0x68 generic_handle_domain_irq+0x24/0x40 __gic_handle_irq_from_irqson.isra.0+0x15c/0x260 gic_handle_irq+0x28/0x80 call_on_irq_stack+0x24/0x30 do_interrupt_handler+0x88/0xa0 el1_interrupt+0x44/0xd0 el1h_64_irq_handler+0x18/0x28 el1h_64_irq+0x84/0x88 amdgpu_device_rreg.part.0+0x4c/0x190 [amdgpu] (P) amdgpu_device_rreg+0x24/0x40 [amdgpu]
I guess something happened in amdgpu_device_rreg() that caused an interrupt, maybe a bogus virtual address for a register?
I think that the bogosity starts within pci_reassign_bridge_resources(). I've very recently come to realize the entire BAR resize operation is quite fragile as is and can fail to restore the original BARs as they were when the resize fails (even if it tries to restore things as they were). To fix that, I'll likely need to rework the entire structure of the resize related functions so that the saved list can hold resources beyond just the bridge windows that were released. I plan to eventually look at it but the rebar max size thing seems way more urgent than this atm.
It also looks pci_reassign_bridge_resources() can leave resources in non-resetted state for unassigned resources such as in this case (the non-resize side of the fitting algorithm resets resources that it failed to assign). For such resources, also IORESOURCE_UNSET gets overwritten by restore_dev_resource() which is even worse. My guess is that something in amdgpu assumes that, e.g., non-zero resource len implies the resource is assigned, or it could be that this IORESOURCE_UNSET problem make the amdgpu checks for it to not work as intended.
While I cannot pinpoint what ultimately causes the crash within amdgpu, it seems that some code there takes pci_resource_start/len() without checking first if the resource is assigned (admittedly, that check could be somewhere else in the call chain, I only grepped for -A20 -B20 'resource' which had lots of noise to comb through, using 'pci_resource' too should find the interesting bits I think).
I'd actually want to add pci_resource_assigned() which checks only res->parent as that seems the most robust check to tell if the resource has been truly assigned. Endpoint drivers should then check a resource with pci_resource_assigned() before using other resource getters on it.
I could say much more about how I think IORESOURCE_UNSET is entirely redundant information and should be just dropped for simplicity's sake (and current flags handling likely has many many corner cases which the ->parent check is entirely immune to) but it'd add to the length of an already long reply. :-)
And then amdgpu_ih_process() did something that caused the SError? Or since it seems to be asynchronous, maybe the amdgpu_ih_process() connection is coincidental and the SError was a consequence of something else?
I'd like to have a bread crumb here in the commit log that connects this fix with something a user might see, but I don't know what that would look like.
I'm sorry I don't know the answer, the amdgpu code is too unfamiliar territory, maybe Alex or Christian has some idea and can pinpoint us to what to look at.
On Tue, Aug 26, 2025 at 03:51:25PM +0300, Ilpo Järvinen wrote:
Adding Alex & Christian as they might be able to shed light on the amdgpu side, but I think the problem still starts from pci_reassign_bridge_resources().
On Mon, 25 Aug 2025, Bjorn Helgaas wrote:
On Fri, Aug 22, 2025 at 03:33:59PM +0300, Ilpo Järvinen wrote:
Since the commit 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync") the failed list is always built and returned to let the caller decide what to do with the failures. The caller may want to retry resource fitting and assignment and before that can happen, the resources should be restored to their original state (a reset effectively clears the struct resource), which requires returning them on the failed list so that the original state remains stored in the associated struct pci_dev_resource.
Resource resizing is different from the ordinary resource fitting and assignment in that it only considers part of the resources. This means failures for other resource types are not relevant at all and should be ignored. As resize doesn't unassign such unrelated resources, those resource ending up into the failed list implies assignment of that resource must have failed before resize too. The check in pci_reassign_bridge_resources() to decide if the whole assignment is successful, however, is based on list emptiness which will cause false negatives when the failed list has resources with an unrelated type.
If the failed list is not empty, call pci_required_resource_failed() and extend it to be able to filter on specific resource types too (if provided).
Calling pci_required_resource_failed() at this point is slightly problematic because the resource itself is reset when the failed list is constructed in __assign_resources_sorted(). As a result, pci_resource_is_optional() does not have access to the original resource flags. This could be worked around by restoring and re-reseting the resource around the call to pci_resource_is_optional(), however, it shouldn't cause issue as resource resizing is meant for 64-bit prefetchable resources according to Christian König (see the Link which unfortunately doesn't point directly to Christian's reply because lore didn't store that email at all).
Fixes: 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync") Link: https://lore.kernel.org/all/c5d1b5d8-8669-5572-75a7-0b480f581ac1@linux.intel... Reported-by: D Scott Phillips scott@os.amperecomputing.com Closes: https://lore.kernel.org/all/86plf0lgit.fsf@scott-ph-mail.amperecomputing.com...
I'm trying to connect this fix with the Asynchronous SError Interrupt crash that Scott reported here. From the call trace:
... arm64_serror_panic+0x6c/0x90 do_serror+0x58/0x60 el1h_64_error_handler+0x38/0x60 el1h_64_error+0x84/0x88 _raw_spin_lock_irqsave+0x34/0xb0 (P) amdgpu_ih_process+0xf0/0x150 [amdgpu] amdgpu_irq_handler+0x34/0xa0 [amdgpu] __handle_irq_event_percpu+0x60/0x248 handle_irq_event+0x4c/0xc0 handle_fasteoi_irq+0xa0/0x1c8 handle_irq_desc+0x3c/0x68 generic_handle_domain_irq+0x24/0x40 __gic_handle_irq_from_irqson.isra.0+0x15c/0x260 gic_handle_irq+0x28/0x80 call_on_irq_stack+0x24/0x30 do_interrupt_handler+0x88/0xa0 el1_interrupt+0x44/0xd0 el1h_64_irq_handler+0x18/0x28 el1h_64_irq+0x84/0x88 amdgpu_device_rreg.part.0+0x4c/0x190 [amdgpu] (P) amdgpu_device_rreg+0x24/0x40 [amdgpu]
I guess something happened in amdgpu_device_rreg() that caused an interrupt, maybe a bogus virtual address for a register?
...
And then amdgpu_ih_process() did something that caused the SError? Or since it seems to be asynchronous, maybe the amdgpu_ih_process() connection is coincidental and the SError was a consequence of something else?
I'd like to have a bread crumb here in the commit log that connects this fix with something a user might see, but I don't know what that would look like.
I'm sorry I don't know the answer, the amdgpu code is too unfamiliar territory, maybe Alex or Christian has some idea and can pinpoint us to what to look at.
Do we know what the PCIe controller is here? Is there a public datasheet for it?
I've seen other issues that make me wonder if some controllers work like this:
- PCIe error occurs on read or write transaction
- PCIe write dropped or read completed by the controller synthesizing ~0 data to CPU
- PCIe controller signals Asynchronous SError as a result of the error
But I guess even if the above happens, I can't explain why the PCIe error would occur in the first place. Scott didn't mention anything like an FLR. But maybe if we actually got as far as programming something bogus in a BAR, a read might get no response (or two responses).
I would assume there should be something logged in the AER Capability, but I don't think we've looked at that yet. The AER interrupt is also asynchronous, so not surprising that this panic could happen before handling it.
Bjorn
linux-stable-mirror@lists.linaro.org