On Sat, Mar 15, 2025 at 07:57:55PM +0100, proxy0@tutamail.com wrote:
Mar 15, 2025, 9:28 PM by lukas@wunner.de:
After dwelling on this for a while, I'm thinking that it may re-introduce the issue fixed by commit f5eff5591b8f ("PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock"):
Looking at the second and third stack trace in its commit message, down_write(reset_lock) in pciehp_reset_slot() is basically equivalent to synchronize_irq() and we're holding device_lock() at that point, hindering progress of pciehp_ist().
So I think I have guided you in the wrong direction and I apologize for that.
However it seems to me that this should be solvable with the small patch below. Am I missing something?
@Joel Mathew Thomas, could you give the below patch a spin and see if it helps?
I've tested the patch series along with the additional patch provided.
Kernel: 6.14.0-rc6-00043-g3571e8b091f4-dirty-pci-hotplug-reset-fixes-eventmask-fix
Patches applied:
- [PATCH 1/4] PCI/hotplug: Disable HPIE over reset
- [PATCH 2/4] PCI/hotplug: Clearing HPIE for the duration of reset is enough
- [PATCH 3/4] PCI/hotplug: reset_lock is not required synchronizing with irq thread
- [PATCH 4/4] PCI/hotplug: Don't enable HPIE in poll mode
- The latest patch from you:
- /* Ignore events masked by pciehp_reset_slot(). */
- events &= ctrl->slot_ctrl;
- if (!events)
return IRQ_HANDLED;
Could you test *only* the quoted diff, i.e. without patches [1/4] - [4/4], on top of a recent kernel?
Sorry for not having been clear about this.
I believe that patch [1/4] will re-introduce a deadlock we've already fixed two years ago, so the small diff above seeks to replace it with a simpler approach that will hopefully avoid the issue as well.
Thanks,
Lukas