On Wed, Nov 22, 2023 at 04:58:24AM +0000, Tian, Kevin wrote:
As Yi/Baolu discussed there is an issue in intel-iommu driver which incorrectly skips devtlb invalidation in the guest with the assumption that the host combines iotlb/devtlb invalidation together. This is incorrect and should be fixed.
Yes, this seems quite problematic - you guys will have to think of something and decide what kind of backward compat you want :(
Maybe the viommu driver can observe the guest and if it sees an ATC invalidation assume it is non-buggy, until seen it can do a combined flush.
But what I was talking about earlier is about the uAPI between viommu and iommu driver. I don't see a need of having separate invalidation cmds for each, as I'm not sure what the user can expect in the window when iotlb and devtlb are out of sync.
If the guest is always issuing the device invalidation then I don't see too much point in suppressing it in the kernel. Just forward it naturally.
then we just define hwpt 'cache' invalidation in vtd always refers to both iotlb and devtlb. Then viommu just needs to call invalidation uapi once when emulating virtual iotlb invalidation descriptor while emulating the following devtlb invalidation descriptor as a nop.
In principle ATC and IOMMU TLB invalidations should not always be linked.
Any scenario that allows devices to share an IOTLB cache tag requires fewer IOMMU TLB invalidations than ATC invalidations.
I like the view of this invalidation interface as reflecting the actual HW and not trying to be smarter an real HW.
I'm fully expecting that Intel will adopt an direct-DMA flush queue like SMMU and AMD have already done as a performance optimization. In this world it makes no sense that the behavior of the direct DMA queue and driver mediated queue would be different.
Jason