From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 14, 2023 7:27 PM
On 2023/11/17 21:18, Yi Liu wrote:> This adds the data structure for flushing iotlb for the nested domain
+struct iommu_hwpt_vtd_s1_invalidate {
- __aligned_u64 addr;
- __aligned_u64 npages;
- __u32 flags;
- __u32 __reserved;
- __u32 error;
- __u32 dev_id;
+};
dev_id is used to report the failed device, userspace should be able to map it to a vRID, and inject it to VM as part of ITE/ICE error.
However, I got a problem when trying to get dev_id in cache invalidation path, since this is filled in intel iommu driver. Seems like there is no good way for it. I've below alternatives to move forward, wish you have a look.
- Reporting pSID instead of dev_id. This may not work if userspace for
example Qemu cen get a vfio device cdev fd from management stack. Maybe you have different opinion, do let me know.
yes, there is no guarantee that pRID is always visible to the user.
- Let iommufd to convert a SID info or device pointer to a dev_id, and then
report it back to userspace. This seems easiest, but breaks layer and also requires vt-d specific logic. :(
yes, the current philosophy of iommufd is to put diver specific knowledge out of iommufd.
- Reuse Nicolin's vRID->pRID mapping. If thevRID->pRID mapping is
maintained, then intel iommu can report a vRID back to user. But intel iommu driver does not have viommu context, no place to hold the vRID-
pRID
mapping. TBH. It may require other reasons to introduce it other than the error reporting need. Anyhow, this requires more thinking and also has dependency even if it is doable in intel side.
this sounds like a cleaner way to inject knowledge which iommu driver requires to find out the user tag. but yes it's a bit weird to introduce viommu awareness in intel iommu driver when there is no such thing in real hardware.
and for this error reporting case what we actually require is the reverse map i.e. pRID->vRID. Not sure whether we can leverage the same RID mapping uAPI as for ARM/AMD but ignore viommu_id and then store vRID under device_domain_info. a bit tricky on life cycle management and also incompatible with SIOV...
let's see whether Jason has a better idea here.
- Only report error code, but no device info at first. May adding the
device info (dev_id or vRID) in future series. In reality, the existing Qemu vIOMMU does not report ICE, ITE, neither the SID to VM. Also, VT-d
and IOAS_UNMAP doesn't provide such ATS error either.
spec defined the ICE/ITE errors first in 2007 spec 1.1, and added SID info later in 2019 spec 3.1. We may do it in stage as well.
and it's not tied to a specific iommu version. the spec is stated in a way that software treats zero value in SID as no hw support so theoretically even a modern hw may not report SID for certain reason.
What about your opinion?
[1] https://lore.kernel.org/linux-iommu/a9699f71-805a-4a5a-9282- 3ec52e5bc81a@intel.com/
I'm fine with this staged approach given the spec allows this behavior and no vIOMMU properly emulates ITE/ICE today.
let's work out a new version w/o dev info and make sure it's in a good state first. Then let Jason decide next week whether he wants to take it for this cycle or not.