On Mon, Dec 11, 2023 at 10:34:09PM +0700, Suthikulpanit, Suravee wrote:
Currently, the AMD IOMMU driver allocates a DomainId per IOMMU group. One issue with this is when we have nested translation where we could end up with multiple devices (RIDs) sharing same PASID and the same hDomainID.
Which means you also create multiple GCR3 tables since those are (soon) per-device and we end up with the situation I described for a functional legitimate reason :( It is just wasting memory by duplicating GCR3 tables.
For example:
- Host view Device1 (RID 1) w/ hDomainId 1 Device2 (RID 2) w/ hDomainId 1
So.. Groups are another ugly mess that we may have to do something more robust about.
The group infrastructure assumes that all devices in the group have the same translation. This is not how the VM communicates, each member of the group gets to have its own DTE and there are legitimate cases where the DTEs will be different (even if just temporarily)
How to mesh this is not yet solved (most likely we need to allow group members to have temporarily different translation). But in the long run the group should definately not be providing the cache tag, the driver has to be smarter than this.
I think we talked about this before.. For the AMD driver the v1 page table should store the domainid in the iommu_domain and that value should be used everywhere
For modes with a GCR3 table the best you can do is to de-duplicate the GCR3 tables and assign identical GCR3 tables to identical domain ids. Ie all devices in a group will eventually share GCR3 tables so they can converge on the same domain id.
- Guest view Pass-through Device1 (vRID 3) w/ vDomainID A + PASID 0 Pass-through Device2 (vRID 4) w/ vDomainID B + PASID 0
We should be able to workaround this by changing the way we assign hDomainId to be per-device for VFIO pass-through devices although sharing the same v1 (stage-2) page table. This would look like.
As I said, this doesn't quite work since the VM could do other things. The kernel must be aware of the vDomainID and must select an appropriate hDomainID with that knowledge in mind, otherwise multi-device-groups in guests are fully broken.
- Guest view Pass-through Device1 (vRID 3) w/ vDomainID A + PASID 0 Pass-through Device2 (vRID 4) w/ vDomainID B + PASID 0
This should avoid the IOMMU TLB conflict. However, the invalidation would need to be done for both DomainId 1 and 2 when updating the v1 (stage-2) page table.
Which is the key problem, if the VM thinks it has only one vDomainID the VMM can't split that into two hDomainID's and expect the viommu acceleration will work - so we shouldn't try to make it work in SW either, IMHO.
Jason