RE: [PATCH v7 1/3] iommufd: Add data structure for Intel VT-d stage-1 cache invalidation

5 Jan 2024


      ...
From: Tian, Kevin
Sent: Friday, January 5, 2024 10:16 AM
...
From: Jason Gunthorpe jgg@nvidia.com
Sent: Thursday, January 4, 2024 10:37 PM
On Thu, Dec 14, 2023 at 07:26:39PM +0800, Yi Liu wrote:
...
Per the prior discussion[1], we agreed to move the error reporting into
the
...
...
driver specific part. On Intel side, we want to report two devTLB
invalidation errors: ICE (invalid completion error) and ITE (invalidation
timeout error). Such errors have an additional SID information to tell
which device failed the devTLB invalidation. I've got the below structure.
IMHO all of this complexity is a consequence of the decision to hide
the devtlb invalidation from the VM..
On the other hand I guess you want to do this because of the SIOV
troubles where the vPCI function in the VM is entirely virtual and
can't be trivially mapped to a real PCI function for ATC invalidation
like ARM and AMD can do (but they also can't support SIOV because of
this). :(
However it also makes it very confusing about how the VM would
perceive an error - eg if it invalidates an SIOV device single PASID
and that devtlb fails then the error should be connected back to the
vPCI function for the SIOV's specific PASID and not back to the
physical PCI function for the SIOV owner.
As the iommu driver itself has no idea about the vPCI functions this
seems like it is going to get really confusing. The API I suggested in
the other email is not entirely going to work as the vPCI function for
SIOV cases will have to be identified by the (struct device, PASID) -
while it would be easy enough for the iommu driver to provide the
PASID, I'm not sure how the iommufd core will relate the PASID back to
the iommu_device to understand SIOV without actually being aware of
SIOV to some degree :\
we plan to add such awareness with a new binding helper:
https://lore.kernel.org/lkml/20231009085123.463179-4-yi.l.liu@intel.com/
and with metadata to track association between PASID's and iommufd vdev.
but in reality the relation could be identified in an easy way due to a SIOV
restriction which we discussed before - shared PASID space of PF disallows
assigning sibling vdev's to a same VM (otherwise no way to identify which
sibling vdev triggering an iopf when a pasid is used on both vdev's). That
restriction implies that within an iommufd context every iommufd_device
object should contain a unique struct device pointer. So PASID can be
instead ignored in the lookup then just always do iommufd_get_dev_id()
using struct device.
A bit more background.
Previously we thought this restriction only applies to SIOV+vSVA, as
a guest process may bind to both sibling vdev's, leading to the same
pasid situation.
In concept w/o vSVA it's still possible to assign sibling vdev's to
a same VM as each vdev is allocated with a unique pasid to mark vRID
so can be differentiated from each other in the fault/error path.
But when looking at this err code issue with Yi closely, we found
there is another gap in the VT-d spec. Upon devtlb invalidation
timeout the hw doesn't report pasid in the error info register. this
makes it impossible to identify the source vdev if a hwpt invalidation
request involves sibling vdev's from a same PF.
with that I'm inclined to always imposing this restriction for SIOV. 
One may argue that SIOV w/o vSVA w/o devtlb is conceptually immune
but I'm with you that given SIOVr1 is one-off I prefer to limiting its
usability other than complexing the kernel.

2025

2024

2023

2022

2021

2020

2019

2018

2017

RE: [PATCH v7 1/3] iommufd: Add data structure for Intel VT-d stage-1 cache invalidation