On Thu, Dec 14, 2023 at 07:26:39PM +0800, Yi Liu wrote:
Per the prior discussion[1], we agreed to move the error reporting into the driver specific part. On Intel side, we want to report two devTLB invalidation errors: ICE (invalid completion error) and ITE (invalidation timeout error). Such errors have an additional SID information to tell which device failed the devTLB invalidation. I've got the below structure.
IMHO all of this complexity is a consequence of the decision to hide the devtlb invalidation from the VM..
On the other hand I guess you want to do this because of the SIOV troubles where the vPCI function in the VM is entirely virtual and can't be trivially mapped to a real PCI function for ATC invalidation like ARM and AMD can do (but they also can't support SIOV because of this). :(
However it also makes it very confusing about how the VM would perceive an error - eg if it invalidates an SIOV device single PASID and that devtlb fails then the error should be connected back to the vPCI function for the SIOV's specific PASID and not back to the physical PCI function for the SIOV owner.
As the iommu driver itself has no idea about the vPCI functions this seems like it is going to get really confusing. The API I suggested in the other email is not entirely going to work as the vPCI function for SIOV cases will have to be identified by the (struct device, PASID) - while it would be easy enough for the iommu driver to provide the PASID, I'm not sure how the iommufd core will relate the PASID back to the iommu_device to understand SIOV without actually being aware of SIOV to some degree :\
(Given SIOVr1 seems on track to be replaced by SIOVr2 so this is all a one-off I was hoping to minimize such awareness)
Jason