This is the second part to add Intel VT-d nested translation based on IOMMUFD nesting infrastructure. As the iommufd nesting infrastructure series [1], iommu core supports new ops to invalidate the cache after the modifictions in stage-1 page table. So far, the cache invalidation data is vendor specific, the data_type (IOMMU_HWPT_DATA_VTD_S1) defined for the vendor specific HWPT allocation is reused in the cache invalidation path. User should provide the correct data_type that suit with the type used in HWPT allocation.
IOMMU_HWPT_INVALIDATE iotcl returns an error in @out_driver_error_code. However Intel VT-d does not define error code so far, so it's not easy to pre-define it in iommufd neither. As a result, this field should just be ignored on VT-d platform.
Complete code can be found in [2], corresponding QEMU could can be found in [3].
[1] https://lore.kernel.org/linux-iommu/20231020092426.13907-1-yi.l.liu@intel.co... [2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting [3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1
Change log:
v6: - Address comments from Kevin - Split the VT-d nesting series into two parts (Jason)
v5: https://lore.kernel.org/linux-iommu/20230921075431.125239-1-yi.l.liu@intel.c... - Add Kevin's r-b for patch 2, 3 ,5 8, 10 - Drop enforce_cache_coherency callback from the nested type domain ops (Kevin) - Remove duplicate agaw check in patch 04 (Kevin) - Remove duplicate domain_update_iommu_cap() in patch 06 (Kevin) - Check parent's force_snooping to set pgsnp in the pasid entry (Kevin) - uapi data structure check (Kevin) - Simplify the errata handling as user can allocate nested parent domain
v4: https://lore.kernel.org/linux-iommu/20230724111335.107427-1-yi.l.liu@intel.c... - Remove ascii art tables (Jason) - Drop EMT (Tina, Jason) - Drop MTS and related definitions (Kevin) - Rename macro IOMMU_VTD_PGTBL_ to IOMMU_VTD_S1_ (Kevin) - Rename struct iommu_hwpt_intel_vtd_ to iommu_hwpt_vtd_ (Kevin) - Rename struct iommu_hwpt_intel_vtd to iommu_hwpt_vtd_s1 (Kevin) - Put the vendor specific hwpt alloc data structure before enuma iommu_hwpt_type (Kevin) - Do not trim the higher page levels of S2 domain in nested domain attachment as the S2 domain may have been used independently. (Kevin) - Remove the first-stage pgd check against the maximum address of s2_domain as hw can check it anyhow. It makes sense to check every pfns used in the stage-1 page table. But it cannot make it. So just leave it to hw. (Kevin) - Split the iotlb flush part into an order of uapi, helper and callback implementation (Kevin) - Change the policy of VT-d nesting errata, disallow RO mapping once a domain is used as parent domain of a nested domain. This removes the nested_users counting. (Kevin) - Minor fix for "make htmldocs"
v3: https://lore.kernel.org/linux-iommu/20230511145110.27707-1-yi.l.liu@intel.co... - Further split the patches into an order of adding helpers for nested domain, iotlb flush, nested domain attachment and nested domain allocation callback, then report the hw_info to userspace. - Add batch support in cache invalidation from userspace - Disallow nested translation usage if RO mappings exists in stage-2 domain due to errata on readonly mappings on Sapphire Rapids platform.
v2: https://lore.kernel.org/linux-iommu/20230309082207.612346-1-yi.l.liu@intel.c... - The iommufd infrastructure is split to be separate series.
v1: https://lore.kernel.org/linux-iommu/20230209043153.14964-1-yi.l.liu@intel.co...
Regards, Yi Liu
Yi Liu (3): iommufd: Add data structure for Intel VT-d stage-1 cache invalidation iommu/vt-d: Make iotlb flush helpers to be extern iommu/vt-d: Add iotlb flush for nested domain
drivers/iommu/intel/iommu.c | 10 +++---- drivers/iommu/intel/iommu.h | 6 ++++ drivers/iommu/intel/nested.c | 54 ++++++++++++++++++++++++++++++++++++ include/uapi/linux/iommufd.h | 36 ++++++++++++++++++++++++ 4 files changed, 101 insertions(+), 5 deletions(-)
This adds the data structure for flushing iotlb for the nested domain allocated with IOMMU_HWPT_DATA_VTD_S1 type.
This only supports invalidating IOTLB, but no for device-TLB as device-TLB invalidation will be covered automatically in the IOTLB invalidation if the underlying IOMMU driver has enabled ATS for the affected device.
Signed-off-by: Yi Liu yi.l.liu@intel.com --- include/uapi/linux/iommufd.h | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index c8f523a7bc06..a910f67e344b 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -519,6 +519,42 @@ struct iommu_hw_info { }; #define IOMMU_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_GET_HW_INFO)
+/** + * enum iommu_hwpt_vtd_s1_invalidate_flags - Flags for Intel VT-d + * stage-1 cache invalidation + * @IOMMU_VTD_INV_FLAGS_LEAF: The LEAF flag indicates whether only the + * leaf PTE caching needs to be invalidated + * and other paging structure caches can be + * preserved. + */ +enum iommu_hwpt_vtd_s1_invalidate_flags { + IOMMU_VTD_INV_FLAGS_LEAF = 1 << 0, +}; + +/** + * struct iommu_hwpt_vtd_s1_invalidate - Intel VT-d cache invalidation + * (IOMMU_HWPT_DATA_VTD_S1) + * @addr: The start address of the addresses to be invalidated. It needs + * to be 4KB aligned. + * @npages: Number of contiguous 4K pages to be invalidated. + * @flags: Combination of enum iommu_hwpt_vtd_s1_invalidate_flags + * @__reserved: Must be 0 + * + * The Intel VT-d specific invalidation data for user-managed stage-1 cache + * invalidation in nested translation. Userspace uses this structure to + * tell the impacted cache scope after modifying the stage-1 page table. + * + * Invalidating all the caches related to the page table by setting @addr + * to be 0 and @npages to be __aligned_u64(-1). This includes the + * corresponding device-TLB if ATS is enabled on the attached devices. + */ +struct iommu_hwpt_vtd_s1_invalidate { + __aligned_u64 addr; + __aligned_u64 npages; + __u32 flags; + __u32 __reserved; +}; + /** * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE) * @size: sizeof(struct iommu_hwpt_invalidate)
This makes the helpers visible to nested.c.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/intel/iommu.c | 10 +++++----- drivers/iommu/intel/iommu.h | 6 ++++++ 2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index a0341a069fbf..8f81a5c9fcc0 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1511,10 +1511,10 @@ static void domain_flush_pasid_iotlb(struct intel_iommu *iommu, spin_unlock_irqrestore(&domain->lock, flags); }
-static void iommu_flush_iotlb_psi(struct intel_iommu *iommu, - struct dmar_domain *domain, - unsigned long pfn, unsigned int pages, - int ih, int map) +void iommu_flush_iotlb_psi(struct intel_iommu *iommu, + struct dmar_domain *domain, + unsigned long pfn, unsigned int pages, + int ih, int map) { unsigned int aligned_pages = __roundup_pow_of_two(pages); unsigned int mask = ilog2(aligned_pages); @@ -1587,7 +1587,7 @@ static inline void __mapping_notify_one(struct intel_iommu *iommu, iommu_flush_write_buffer(iommu); }
-static void intel_flush_iotlb_all(struct iommu_domain *domain) +void intel_flush_iotlb_all(struct iommu_domain *domain) { struct dmar_domain *dmar_domain = to_dmar_domain(domain); struct iommu_domain_info *info; diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 0539a0f47557..b5f33a7c1973 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -867,6 +867,12 @@ void device_block_translation(struct device *dev); int prepare_domain_attach_device(struct iommu_domain *domain, struct device *dev); void domain_update_iommu_cap(struct dmar_domain *domain); +void iommu_flush_iotlb_psi(struct intel_iommu *iommu, + struct dmar_domain *domain, + unsigned long pfn, unsigned int pages, + int ih, int map); +void intel_flush_iotlb_all(struct iommu_domain *domain); +
int dmar_ir_support(void);
On 10/20/23 5:37 PM, Yi Liu wrote:
This makes the helpers visible to nested.c.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/intel/iommu.c | 10 +++++----- drivers/iommu/intel/iommu.h | 6 ++++++ 2 files changed, 11 insertions(+), 5 deletions(-)
Reviewed-by: Lu Baolu baolu.lu@linux.intel.com
Best regards, baolu
This implements the .cache_invalidate_user() callback to support iotlb flush for nested domain.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/intel/nested.c | 54 ++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+)
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c index 19538fb616db..ef8fe53c33ef 100644 --- a/drivers/iommu/intel/nested.c +++ b/drivers/iommu/intel/nested.c @@ -73,9 +73,63 @@ static void intel_nested_domain_free(struct iommu_domain *domain) kfree(to_dmar_domain(domain)); }
+static void domain_flush_iotlb_psi(struct dmar_domain *domain, + u64 addr, unsigned long npages) +{ + struct iommu_domain_info *info; + unsigned long i; + + xa_for_each(&domain->iommu_array, i, info) + iommu_flush_iotlb_psi(info->iommu, domain, + addr >> VTD_PAGE_SHIFT, npages, 1, 0); +} + +static int intel_nested_cache_invalidate_user(struct iommu_domain *domain, + struct iommu_user_data_array *array, + u32 *cerror_idx) +{ + struct dmar_domain *dmar_domain = to_dmar_domain(domain); + struct iommu_hwpt_vtd_s1_invalidate inv_info; + u32 index; + int ret; + + /* REVISIT: + * VT-d has defined ITE, ICE, IQE for invalidation failure per hardware, + * but no error code yet, so just set the error code to be 0. + */ + *cerror_idx = 0; + + for (index = 0; index < array->entry_num; index++) { + ret = iommu_copy_struct_from_user_array(&inv_info, array, + IOMMU_HWPT_DATA_VTD_S1, + index, __reserved); + if (ret) { + pr_err_ratelimited("Failed to fetch invalidation request\n"); + break; + } + + if (inv_info.__reserved || (inv_info.flags & ~IOMMU_VTD_INV_FLAGS_LEAF) || + !IS_ALIGNED(inv_info.addr, VTD_PAGE_SIZE)) { + ret = -EINVAL; + break; + } + + if (inv_info.addr == 0 && inv_info.npages == -1) + intel_flush_iotlb_all(domain); + else + domain_flush_iotlb_psi(dmar_domain, + inv_info.addr, inv_info.npages); + } + + array->entry_num = index; + + return ret; +} + static const struct iommu_domain_ops intel_nested_domain_ops = { .attach_dev = intel_nested_attach_dev, .free = intel_nested_domain_free, + .cache_invalidate_user = intel_nested_cache_invalidate_user, };
struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
linux-kselftest-mirror@lists.linaro.org