On 2024/1/8 12:07, Tian, Kevin wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Friday, January 5, 2024 10:45 PM
On Fri, Jan 05, 2024 at 02:52:50AM +0000, Tian, Kevin wrote:
but in reality the relation could be identified in an easy way due to a SIOV restriction which we discussed before - shared PASID space of PF
disallows
assigning sibling vdev's to a same VM (otherwise no way to identify which sibling vdev triggering an iopf when a pasid is used on both vdev's). That restriction implies that within an iommufd context every iommufd_device object should contain a unique struct device pointer. So PASID can be instead ignored in the lookup then just always do iommufd_get_dev_id() using struct device.
A bit more background.
Previously we thought this restriction only applies to SIOV+vSVA, as a guest process may bind to both sibling vdev's, leading to the same pasid situation.
In concept w/o vSVA it's still possible to assign sibling vdev's to a same VM as each vdev is allocated with a unique pasid to mark vRID so can be differentiated from each other in the fault/error path.
I thought the SIOV plan was that each "vdev" ie vpci function would get a slice of the pRID's PASID space statically selected at creation?
So SVA/etc doesn't matter, you reliably get a disjoint set of pRID & pPASID into each VM.
From that view you can't identify the iommufd dev_id without knowing both the pRID and pPASID which will disambiguate the different SIOV iommufd dev_id instances sharing a rid.
true when assigning those instances to different VMs.
Here I was talking about assigning them to a same VM being a problem. with rid sharing plus same ENQCMD pPASID potentially used on both instances there'd be ambiguity in vSVA e.g. iopf to identify dev_id.
we agreed before on preventing sibling vdev's in one VM for above reason, but only as far as vSVA is concerned.
then given the new finding in err reporting I wondered whether this restriction should be applied to all SIOV scenarios (but irrelevant now with below explanation after another thinking)
But when looking at this err code issue with Yi closely, we found there is another gap in the VT-d spec. Upon devtlb invalidation timeout the hw doesn't report pasid in the error info register. this makes it impossible to identify the source vdev if a hwpt invalidation request involves sibling vdev's from a same PF.
Don't you know which command timed out?
unfortunately no.
for errors related to descriptor fetch the driver can tell the command by looking at the head pointer of the invalidation queue.
command completion is indirectly detected by inserting a wait descriptor as fence. completion timeout error is reported in an error register. but this register doesn't record pasid, nor does the command location. if there are multiple pending devtlb invalidation commands upon timeout error the spec suggests the driver to treat all of them timeout as the register can only record one rid.
this is kind of moot. If the driver submits only one command (plus wait) at a time it doesn't need hw's help to identify the timeout command. If the driver batches invalidation commands it must treat all timeout if an timeout error is reported.
from this angle whether to record pasid doesn't really matter.
intel-iommu driver doesn't batch commands. so it's possible for the driver to figure out the timeout device itself and identify rid plus pasid to find dev_id from iommufd.
based on this, even RID is unnecessary. Software should know which device it has sent a devTLB invalidation.