+Bjorn for guidance.
quick context - previously intel-iommu driver fixed a lockup issue in surprise removal, by checking pci_dev_is_disconnected(). But Jinhui still observed the lockup issue in a setup where no interrupt is raised to pci core upon surprise removal (so pci_dev_is_disconnected() is false), hence suggesting to replace the check with pci_device_is_present() instead.
Bjorn, is it a common practice to fix it directly/only in drivers or should the pci core be notified e.g. simulating a late removal event? By searching the code looks it's the former, but better confirm with you before picking this fix...
From: Baolu Lu baolu.lu@linux.intel.com Sent: Tuesday, December 23, 2025 12:06 PM
On 12/22/25 19:19, Jinhui Guo wrote:
On Thu, Dec 18, 2025 08:04:20AM +0000, Tian, Kevin wrote:
From: Jinhui Guoguojinhui.liam@bytedance.com Sent: Thursday, December 11, 2025 12:00 PM
Commit 4fc82cd907ac ("iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected") relies on pci_dev_is_disconnected() to skip ATS invalidation for safely-removed devices, but it does not cover link-down caused by faults, which can still hard-lock the system.
According to the commit msg it actually tries to fix the hard lockup with surprise removal. For safe removal the device is not removed before invalidation is done:
" For safe removal, device wouldn't be removed until the whole software handling process is done, it wouldn't trigger the hard lock up issue caused by too long ATS Invalidation timeout wait. "
Can you help articulate the problem especially about the part 'link-down caused by faults"? What are those faults? How are they different from the said surprise removal in the commit msg to not set pci_dev_is_disconnected()?
Hi, kevin, sorry for the delayed reply.
A normal or surprise removal of a PCIe device on a hot-plug port normally triggers an interrupt from the PCIe switch.
We have, however, observed cases where no interrupt is generated when
the
device suddenly loses its link; the behaviour is identical to setting the Link Disable bit in the switch’s Link Control register (offset 10h). Exactly what goes wrong in the LTSSM between the PCIe switch and the endpoint
remains
unknown.
In this scenario, the hardware has effectively vanished, yet the device driver remains bound and the IOMMU resources haven't been released. I’m just curious if this stale state could trigger issues in other places before the kernel fully realizes the device is gone? I’m not objecting to the fix. I'm just interested in whether this 'zombie' state creates risks elsewhere.