On Fri, Sep 27, 2024 at 08:59:25AM -0300, Jason Gunthorpe wrote:
On Thu, Sep 26, 2024 at 11:02:37PM -0700, Nicolin Chen wrote:
On Fri, Sep 27, 2024 at 01:38:08PM +0800, Yi Liu wrote:
Does it mean each vIOMMU of VM can only have one s2 HWPT?
Giving some examples here:
- If a VM has 1 vIOMMU, there will be 1 vIOMMU object in the kernel holding one S2 HWPT.
- If a VM has 2 vIOMMUs, there will be 2 vIOMMU objects in the kernel that can hold two different S2 HWPTs, or share one S2 HWPT (saving memory).
So if you have two devices assigned to a VM, then you may have two vIOMMUs or one vIOMMU exposed to guest. This depends on whether the two devices are behind the same physical IOMMU. If it's two vIOMMUs, the two can share the s2 hwpt if their physical IOMMU is compatible. is it?
Yes.
To achieve the above, you need to know if the physical IOMMUs of the assigned devices, hence be able to tell if physical IOMMUs are the same and if they are compatible. How would userspace know such infos?
My draft implementation with QEMU does something like this:
- List all viommu-matched iommu nodes under /sys/class/iommu: LINKs
- Get PCI device's /sys/bus/pci/devices/0000:00:00.0/iommu: LINK0
- Compare the LINK0 against the LINKs
We so far don't have an ID for physical IOMMU instance, which can be an alternative to return via the hw_info call, otherwise.
We could return the sys/class/iommu string from some get_info or something
I had a patch doing an ida alloc for each iommu_dev and returning the ID via hw_info. It wasn't useful at that time, as we went for fail-n-retry for S2 HWPT allocations on multi-pIOMMU platforms.
Perhaps that could be cleaner than returning a string?
For compatibility to share a stage-2 HWPT, basically we would do a device attach to one of the stage-2 HWPT from the list that VMM should keep. This attach has all the compatibility test, down to the IOMMU driver. If it fails, just allocate a new stage-2 HWPT.
Ideally just creating the viommu should validate the passed in hwpt is compatible without attaching.
I think I should add a validation between hwpt->domain->owner and dev_iommu_ops(idev->dev) then!
Thanks Nicolin