pciehp_reset_slot() disables PDCE (Presence Detect Changed Enable) and DLLSCE (Data Link Layer State Changed Enable) for the duration of reset and clears the related status bits PDC and DLLSC from the Slot Status register after the reset to avoid hotplug incorrectly assuming the card was removed.
However, hotplug shares interrupt with PME and BW notifications both of which can make pciehp_isr() to run despite PDCE and DLLSCE bits being off. pciehp_isr() then picks PDC or DLLSC bits from the Slot Status register due to the events that occur during reset and caches them into ->pending_events. Later, the IRQ thread in pciehp_ist() will process the ->pending_events and will assume the Link went Down due to a card change (in pciehp_handle_presence_or_link_change()).
Change pciehp_reset_slot() to also clear HPIE (Hot-Plug Interrupt Enable) as pciehp_isr() will first check HPIE to see if the interrupt is not for it. Then synchronize with the IRQ handling to ensure no events are pending, before invoking the reset.
Similarly, if the poll mode is in use, park the poll thread over the duration of the reset to stop handling events.
In order to not race irq_syncronize()/kthread_{,un}park() with the irq / poll_thread freeing from pciehp_remove(), take reset_lock in pciehp_free_irq() and check the irq / poll_thread variable validity in pciehp_reset_slot().
Fixes: 06a8d89af551 ("PCI: pciehp: Disable link notification across slot reset") Fixes: 720d6a671a6e ("PCI: pciehp: Do not handle events if interrupts are masked") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219765 Suggested-by: Lukas Wunner lukas@wunner.de Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org --- drivers/pci/hotplug/pciehp_hpc.c | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index bb5a8d9f03ad..c487e274b282 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -77,10 +77,15 @@ static inline int pciehp_request_irq(struct controller *ctrl)
static inline void pciehp_free_irq(struct controller *ctrl) { - if (pciehp_poll_mode) + down_read_nested(&ctrl->reset_lock, ctrl->depth); + if (pciehp_poll_mode) { kthread_stop(ctrl->poll_thread); - else + ctrl->poll_thread = NULL; + } else { free_irq(ctrl->pcie->irq, ctrl); + ctrl->pcie->irq = IRQ_NOTCONNECTED; + } + up_read(&ctrl->reset_lock); }
static int pcie_poll_cmd(struct controller *ctrl, int timeout) @@ -766,8 +771,9 @@ static int pciehp_poll(void *data)
while (!kthread_should_stop()) { /* poll for interrupt events or user requests */ - while (pciehp_isr(IRQ_NOTCONNECTED, ctrl) == IRQ_WAKE_THREAD || - atomic_read(&ctrl->pending_events)) + while (!kthread_should_park() && + (pciehp_isr(IRQ_NOTCONNECTED, ctrl) == IRQ_WAKE_THREAD || + atomic_read(&ctrl->pending_events))) pciehp_ist(IRQ_NOTCONNECTED, ctrl);
if (pciehp_poll_time <= 0 || pciehp_poll_time > 60) @@ -907,6 +913,8 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, bool probe)
down_write_nested(&ctrl->reset_lock, ctrl->depth);
+ if (!pciehp_poll_mode) + ctrl_mask |= PCI_EXP_SLTCTL_HPIE; if (!ATTN_BUTTN(ctrl)) { ctrl_mask |= PCI_EXP_SLTCTL_PDCE; stat_mask |= PCI_EXP_SLTSTA_PDC; @@ -918,9 +926,21 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, bool probe) ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
+ /* Make sure HPIE is no longer seen by the interrupt handler. */ + if (pciehp_poll_mode) { + if (ctrl->poll_thread) + kthread_park(ctrl->poll_thread); + } else { + if (ctrl->pcie->irq != IRQ_NOTCONNECTED) + synchronize_irq(ctrl->pcie->irq); + } + rc = pci_bridge_secondary_bus_reset(ctrl->pcie->port);
pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, stat_mask); + if (pciehp_poll_mode && ctrl->poll_thread) + kthread_unpark(ctrl->poll_thread); + pcie_write_cmd_nowait(ctrl, ctrl_mask, ctrl_mask); ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask);
On Thu, Mar 13, 2025 at 04:23:30PM +0200, Ilpo Järvinen wrote:
pciehp_reset_slot() disables PDCE (Presence Detect Changed Enable) and DLLSCE (Data Link Layer State Changed Enable) for the duration of reset and clears the related status bits PDC and DLLSC from the Slot Status register after the reset to avoid hotplug incorrectly assuming the card was removed.
However, hotplug shares interrupt with PME and BW notifications both of which can make pciehp_isr() to run despite PDCE and DLLSCE bits being off. pciehp_isr() then picks PDC or DLLSC bits from the Slot Status register due to the events that occur during reset and caches them into ->pending_events. Later, the IRQ thread in pciehp_ist() will process the ->pending_events and will assume the Link went Down due to a card change (in pciehp_handle_presence_or_link_change()).
Change pciehp_reset_slot() to also clear HPIE (Hot-Plug Interrupt Enable) as pciehp_isr() will first check HPIE to see if the interrupt is not for it. Then synchronize with the IRQ handling to ensure no events are pending, before invoking the reset.
After dwelling on this for a while, I'm thinking that it may re-introduce the issue fixed by commit f5eff5591b8f ("PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock"):
Looking at the second and third stack trace in its commit message, down_write(reset_lock) in pciehp_reset_slot() is basically equivalent to synchronize_irq() and we're holding device_lock() at that point, hindering progress of pciehp_ist().
So I think I have guided you in the wrong direction and I apologize for that.
However it seems to me that this should be solvable with the small patch below. Am I missing something?
@Joel Mathew Thomas, could you give the below patch a spin and see if it helps?
Thanks!
-- >8 --
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index bb5a8d9f03ad..99a2ac13a3d1 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -688,6 +688,11 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) return IRQ_HANDLED; }
+ /* Ignore events masked by pciehp_reset_slot(). */ + events &= ctrl->slot_ctrl; + if (!events) + return IRQ_HANDLED; + /* Save pending events for consumption by IRQ thread. */ atomic_or(events, &ctrl->pending_events); return IRQ_WAKE_THREAD;
On Sat, 15 Mar 2025, Lukas Wunner wrote:
On Thu, Mar 13, 2025 at 04:23:30PM +0200, Ilpo Järvinen wrote:
pciehp_reset_slot() disables PDCE (Presence Detect Changed Enable) and DLLSCE (Data Link Layer State Changed Enable) for the duration of reset and clears the related status bits PDC and DLLSC from the Slot Status register after the reset to avoid hotplug incorrectly assuming the card was removed.
However, hotplug shares interrupt with PME and BW notifications both of which can make pciehp_isr() to run despite PDCE and DLLSCE bits being off. pciehp_isr() then picks PDC or DLLSC bits from the Slot Status register due to the events that occur during reset and caches them into ->pending_events. Later, the IRQ thread in pciehp_ist() will process the ->pending_events and will assume the Link went Down due to a card change (in pciehp_handle_presence_or_link_change()).
Change pciehp_reset_slot() to also clear HPIE (Hot-Plug Interrupt Enable) as pciehp_isr() will first check HPIE to see if the interrupt is not for it. Then synchronize with the IRQ handling to ensure no events are pending, before invoking the reset.
After dwelling on this for a while, I'm thinking that it may re-introduce the issue fixed by commit f5eff5591b8f ("PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock"):
Looking at the second and third stack trace in its commit message, down_write(reset_lock) in pciehp_reset_slot() is basically equivalent to synchronize_irq() and we're holding device_lock() at that point, hindering progress of pciehp_ist().
This description was somewhat confusing but what I can see, now that you mentioned this, is that if pciehp_reset_slot() calls synchronize_irq(), it can result in trying to acquire device_lock() again while trying to drain the pending events. ->reset_lock seems irrelevant to that problem.
Thus, pciehp_reset_slot() cannot ever rely on completing the processing of all pending events before it invokes the reset as long as any of its callers is holding device_lock().
It's a bit sad, because removing most of the reset_lock complexity would have been nice simplification in locking, effectively it would have reverted f5eff5591b8f too.
So I think I have guided you in the wrong direction and I apologize for that.
However it seems to me that this should be solvable with the small patch below. Am I missing something?
@Joel Mathew Thomas, could you give the below patch a spin and see if it helps?
Thanks!
-- >8 --
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index bb5a8d9f03ad..99a2ac13a3d1 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -688,6 +688,11 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) return IRQ_HANDLED; }
- /* Ignore events masked by pciehp_reset_slot(). */
- events &= ctrl->slot_ctrl;
- if (!events)
return IRQ_HANDLED;
- /* Save pending events for consumption by IRQ thread. */ atomic_or(events, &ctrl->pending_events); return IRQ_WAKE_THREAD;
Yes, this should work, I think.
I'm not entirely sure though how reading ->slot_ctrl here synchronizes wrt. pciehp_reset_slot() invoking reset. What guarantees pciehp_isr() sees the updated ->slot_ctrl when pciehp_reset_slot() has proceeded to invoke the reset? (I'm in general very hesitant about lockless and barrierless reader being race free, I might be just paranoid about it.)
linux-stable-mirror@lists.linaro.org