From: Jinhui Guo guojinhui.liam@bytedance.com Sent: Thursday, December 11, 2025 12:00 PM
Commit 4fc82cd907ac ("iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected") relies on pci_dev_is_disconnected() to skip ATS invalidation for safely-removed devices, but it does not cover link-down caused by faults, which can still hard-lock the system.
According to the commit msg it actually tries to fix the hard lockup with surprise removal. For safe removal the device is not removed before invalidation is done:
" For safe removal, device wouldn't be removed until the whole software handling process is done, it wouldn't trigger the hard lock up issue caused by too long ATS Invalidation timeout wait. "
Can you help articulate the problem especially about the part 'link-down caused by faults"? What are those faults? How are they different from the said surprise removal in the commit msg to not set pci_dev_is_disconnected()?
For example, if a VM fails to connect to the PCIe device,
'failed' for what reason?
"virsh destroy" is executed to release resources and isolate the fault, but a hard-lockup occurs while releasing the group fd.
Call Trace: qi_submit_sync qi_flush_dev_iotlb intel_pasid_tear_down_entry device_block_translation blocking_domain_attach_dev __iommu_attach_device __iommu_device_set_domain __iommu_group_set_domain_internal iommu_detach_group vfio_iommu_type1_detach_group vfio_group_detach_container vfio_group_fops_release __fput
Although pci_device_is_present() is slower than pci_dev_is_disconnected(), it still takes only ~70 µs on a ConnectX-5 (8 GT/s, x2) and becomes even faster as PCIe speed and width increase.
Besides, devtlb_invalidation_with_pasid() is called only in the paths below, which are far less frequent than memory map/unmap.
- mm-struct release
- {attach,release}_dev
- set/remove PASID
- dirty-tracking setup
surprise removal can happen at any time, e.g. after the check of pci_device_is_present(). In the end we need the logic in qi_check_fault() to check the presence upon ITE timeout error received to break the infinite loop. So in your case even with that logici in place you still observe lockup (probably due to hardware ITE timeout is longer than the lockup detection on the CPU?
In any case this change cannot 100% fix the lockup. It just reduces the possibility which should be made clear.