Nested translation is a hardware feature that is supported by many modern IOMMU hardwares. It has two stages (stage-1, stage-2) address translation to get access to the physical address. stage-1 translation table is owned by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes to stage-1 translation table should be followed by an IOTLB invalidation.
Take Intel VT-d as an example, the stage-1 translation table is I/O page table. As the below diagram shows, guest I/O page table pointer in GPA (guest physical address) is passed to host and be used to perform the stage-1 address translation. Along with it, modifications to present mappings in the guest I/O page table should be followed with an IOTLB invalidation.
.-------------. .---------------------------. | vIOMMU | | Guest I/O page table | | | '---------------------------' .----------------/ | PASID Entry |--- PASID cache flush --+ '-------------' | | | V | | I/O page table pointer in GPA '-------------' Guest ------| Shadow |---------------------------|-------- v v v Host .-------------. .------------------------. | pIOMMU | | FS for GIOVA->GPA | | | '------------------------' .----------------/ | | PASID Entry | V (Nested xlate) '----------------.----------------------------------. | | | SS for GPA->HPA, unmanaged domain| | | '----------------------------------' '-------------' Where: - FS = First stage page tables - SS = Second stage page tables <Intel VT-d Nested translation>
This series is based on the first part which was merged [1], this series is to add the cache invalidation interface or the userspace to invalidate cache after modifying the stage-1 page table. This includes both the iommufd changes and the VT-d driver changes.
Complete code can be found in [2], QEMU could can be found in [3].
At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks them for the help. ^_^. Look forward to your feedbacks.
[1] https://lore.kernel.org/linux-iommu/20231026044216.64964-1-yi.l.liu@intel.co... - merged [2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting [3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1
Change log:
v7: - Remove domain->ops->cache_invalidate_user check in hwpt alloc path due to failure in bisect (Baolu) - Remove out_driver_error_code from struct iommu_hwpt_invalidate after discussion in v6. Should expect per-entry error code. - Rework the selftest cache invalidation part to report a per-entry error - Allow user to pass in an empty array to have a try-and-fail mechanism for user to check if a given req_type is supported by the kernel (Jason) - Define a separate enum type for cache invalidation data (Jason) - Fix the IOMMU_HWPT_INVALIDATE to always update the req_num field before returning (Nicolin) - Merge the VT-d nesting part 2/2 https://lore.kernel.org/linux-iommu/20231117131816.24359-1-yi.l.liu@intel.co... into this series to avoid defining empty enum in the middle of the series. The major difference is adding the VT-d related invalidation uapi structures together with the generic data structures in patch 02 of this series. - VT-d driver was refined to report ICE/ITE error from the bottom cache invalidation submit helpers, hence the cache_invalidate_user op could report such errors via the per-entry error field to user. VT-d driver will not stop the invalidation array walking due to the ICE/ITE errors as such errors are defined by VT-d spec, userspace should be able to handle it and let the real user (say Virtual Machine) know about it. But for other errors like invalid uapi data structure configuration, memory copy failure, such errors should stop the array walking as it may have more issues if go on. - Minor fixes per Jason and Kevin's review comments
v6: https://lore.kernel.org/linux-iommu/20231117130717.19875-1-yi.l.liu@intel.co... - No much change, just rebase on top of 6.7-rc1 as part 1/2 is merged
v5: https://lore.kernel.org/linux-iommu/20231020092426.13907-1-yi.l.liu@intel.co... - Split the iommufd nesting series into two parts of alloc_user and invalidation (Jason) - Split IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING/_NESTED, and do the same with the structures/alloc()/abort()/destroy(). Reworked the selftest accordingly too. (Jason) - Move hwpt/data_type into struct iommu_user_data from standalone op arguments. (Jason) - Rename hwpt_type to be data_type, the HWPT_TYPE to be HWPT_ALLOC_DATA, _TYPE_DEFAULT to be _ALLOC_DATA_NONE (Jason, Kevin) - Rename iommu_copy_user_data() to iommu_copy_struct_from_user() (Kevin) - Add macro to the iommu_copy_struct_from_user() to calculate min_size (Jason) - Fix two bugs spotted by ZhaoYan
v4: https://lore.kernel.org/linux-iommu/20230921075138.124099-1-yi.l.liu@intel.c... - Separate HWPT alloc/destroy/abort functions between user-managed HWPTs and kernel-managed HWPTs - Rework invalidate uAPI to be a multi-request array-based design - Add a struct iommu_user_data_array and a helper for driver to sanitize and copy the entry data from user space invalidation array - Add a patch fixing TEST_LENGTH() in selftest program - Drop IOMMU_RESV_IOVA_RANGES patches - Update kdoc and inline comments - Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation, this does not change the rule that resv regions should only be added to the kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series as it is needed only by SMMU so far.
v3: https://lore.kernel.org/linux-iommu/20230724110406.107212-1-yi.l.liu@intel.c... - Add new uAPI things in alphabetical order - Pass in "enum iommu_hwpt_type hwpt_type" to op->domain_alloc_user for sanity, replacing the previous op->domain_alloc_user_data_len solution - Return ERR_PTR from domain_alloc_user instead of NULL - Only add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation (Kevin) - Add IOMMU_RESV_IOVA_RANGES to report resv iova ranges to userspace hence userspace is able to exclude the ranges in the stage-1 HWPT (e.g. guest I/O page table). (Kevin) - Add selftest coverage for the new IOMMU_RESV_IOVA_RANGES ioctl - Minor changes per Kevin's inputs
v2: https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.co... - Add union iommu_domain_user_data to include all user data structures to avoid passing void * in kernel APIs. - Add iommu op to return user data length for user domain allocation - Rename struct iommu_hwpt_alloc::data_type to be hwpt_type - Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len - Convert cache_invalidate_user op to be int instead of void - Remove @data_type in struct iommu_hwpt_invalidate - Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1
v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.c...
Thanks, Yi Liu
Lu Baolu (4): iommu: Add cache_invalidate_user op iommu/vt-d: Allow qi_submit_sync() to return the QI faults iommu/vt-d: Convert pasid based cache invalidation to return QI fault iommu/vt-d: Add iotlb flush for nested domain
Nicolin Chen (4): iommu: Add iommu_copy_struct_from_user_array helper iommufd/selftest: Add mock_domain_cache_invalidate_user support iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
Yi Liu (1): iommufd: Add IOMMU_HWPT_INVALIDATE
drivers/iommu/intel/dmar.c | 36 ++-- drivers/iommu/intel/iommu.c | 12 +- drivers/iommu/intel/iommu.h | 8 +- drivers/iommu/intel/irq_remapping.c | 2 +- drivers/iommu/intel/nested.c | 116 ++++++++++++ drivers/iommu/intel/pasid.c | 14 +- drivers/iommu/intel/svm.c | 14 +- drivers/iommu/iommufd/hw_pagetable.c | 36 ++++ drivers/iommu/iommufd/iommufd_private.h | 10 ++ drivers/iommu/iommufd/iommufd_test.h | 39 ++++ drivers/iommu/iommufd/main.c | 3 + drivers/iommu/iommufd/selftest.c | 93 ++++++++++ include/linux/iommu.h | 101 +++++++++++ include/uapi/linux/iommufd.h | 100 +++++++++++ tools/testing/selftests/iommu/iommufd.c | 170 ++++++++++++++++++ tools/testing/selftests/iommu/iommufd_utils.h | 57 ++++++ 16 files changed, 773 insertions(+), 38 deletions(-)
From: Lu Baolu baolu.lu@linux.intel.com
The updates of the PTEs in the nested page table will be propagated to the hardware caches on both IOMMU (IOTLB) and devices (DevTLB/ATC).
Add a new domain op cache_invalidate_user for the userspace to flush the hardware caches for a nested domain through iommufd. No wrapper for it, as it's only supposed to be used by iommufd. Then, pass in invalidation requests in form of a user data array conatining a number of invalidation data entries.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- include/linux/iommu.h | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 6291aa7b079b..5c4a17f13761 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -284,6 +284,24 @@ struct iommu_user_data { size_t len; };
+/** + * struct iommu_user_data_array - iommu driver specific user space data array + * @type: The data type of all the entries in the user buffer array + * @uptr: Pointer to the user buffer array for copy_from_user() + * @entry_len: The fixed-width length of a entry in the array, in bytes + * @entry_num: The number of total entries in the array + * + * A array having a @entry_num number of @entry_len sized entries, each entry is + * user space data, an uAPI defined in include/uapi/linux/iommufd.h where @type + * is also defined as enum iommu_xyz_data_type. + */ +struct iommu_user_data_array { + unsigned int type; + void __user *uptr; + size_t entry_len; + u32 entry_num; +}; + /** * __iommu_copy_struct_from_user - Copy iommu driver specific user space data * @dst_data: Pointer to an iommu driver specific user data that is defined in @@ -440,6 +458,13 @@ struct iommu_ops { * @iotlb_sync_map: Sync mappings created recently using @map to the hardware * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush * queue + * @cache_invalidate_user: Flush hardware cache for user space IO page table. + * The @domain must be IOMMU_DOMAIN_NESTED. The @array + * passes in the cache invalidation requests, in form + * of a driver data structure. The driver must update + * array->entry_num to report the number of handled + * invalidation requests. The driver data structure + * must be defined in include/uapi/linux/iommufd.h * @iova_to_phys: translate iova to physical address * @enforce_cache_coherency: Prevent any kind of DMA from bypassing IOMMU_CACHE, * including no-snoop TLPs on PCIe or other platform @@ -465,6 +490,8 @@ struct iommu_domain_ops { size_t size); void (*iotlb_sync)(struct iommu_domain *domain, struct iommu_iotlb_gather *iotlb_gather); + int (*cache_invalidate_user)(struct iommu_domain *domain, + struct iommu_user_data_array *array);
phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
From: Lu Baolu baolu.lu@linux.intel.com
The updates of the PTEs in the nested page table will be propagated to the hardware caches on both IOMMU (IOTLB) and devices (DevTLB/ATC).
this is incorrect. the scope of this cmd is driver specific.
Add a new domain op cache_invalidate_user for the userspace to flush the hardware caches for a nested domain through iommufd. No wrapper for it, as it's only supposed to be used by iommufd. Then, pass in invalidation requests in form of a user data array conatining a number of invalidation data entries.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
include/linux/iommu.h | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 6291aa7b079b..5c4a17f13761 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -284,6 +284,24 @@ struct iommu_user_data { size_t len; };
+/**
- struct iommu_user_data_array - iommu driver specific user space data
array
- @type: The data type of all the entries in the user buffer array
- @uptr: Pointer to the user buffer array for copy_from_user()
remove 'for copy_from_user();
- @entry_len: The fixed-width length of a entry in the array, in bytes
s/a/an/
- @entry_num: The number of total entries in the array
- A array having a @entry_num number of @entry_len sized entries, each
the first sentence is redundant.
entry is
- user space data, an uAPI defined in include/uapi/linux/iommufd.h where
@type
- is also defined as enum iommu_xyz_data_type.
I'd just say:
"The user buffer includes an array of requests with format defined in include/uapi/linux/iommufd.h"
On 2023/12/22 10:30, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
From: Lu Baolu baolu.lu@linux.intel.com
The updates of the PTEs in the nested page table will be propagated to the hardware caches on both IOMMU (IOTLB) and devices (DevTLB/ATC).
this is incorrect. the scope of this cmd is driver specific.
yes. May just say the hardware caches.
Add a new domain op cache_invalidate_user for the userspace to flush the hardware caches for a nested domain through iommufd. No wrapper for it, as it's only supposed to be used by iommufd. Then, pass in invalidation requests in form of a user data array conatining a number of invalidation data entries.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
include/linux/iommu.h | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 6291aa7b079b..5c4a17f13761 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -284,6 +284,24 @@ struct iommu_user_data { size_t len; };
+/**
- struct iommu_user_data_array - iommu driver specific user space data
array
- @type: The data type of all the entries in the user buffer array
- @uptr: Pointer to the user buffer array for copy_from_user()
remove 'for copy_from_user();
- @entry_len: The fixed-width length of a entry in the array, in bytes
s/a/an/
- @entry_num: The number of total entries in the array
- A array having a @entry_num number of @entry_len sized entries, each
the first sentence is redundant.
entry is
- user space data, an uAPI defined in include/uapi/linux/iommufd.h where
@type
- is also defined as enum iommu_xyz_data_type.
I'd just say:
"The user buffer includes an array of requests with format defined in include/uapi/linux/iommufd.h"
sure.
In nested translation, the stage-1 page table is user-managed but cached by the IOMMU hardware, so an update on present page table entries in the stage-1 page table should be followed with a cache invalidation.
Add an IOMMU_HWPT_INVALIDATE ioctl to support such a cache invalidation. It takes hwpt_id to specify the iommu_domain, and a multi-entry array to support multiple invalidation requests in one ioctl.
This also adds the cache invalidation data structure for Intel VT-d to have a complete story. Intel iommu driver support will be added in later patch.
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/hw_pagetable.c | 36 +++++++++ drivers/iommu/iommufd/iommufd_private.h | 10 +++ drivers/iommu/iommufd/main.c | 3 + include/uapi/linux/iommufd.h | 100 ++++++++++++++++++++++++ 4 files changed, 149 insertions(+)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index cbb5df0a6c32..6389a9809c7b 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -371,3 +371,39 @@ int iommufd_hwpt_get_dirty_bitmap(struct iommufd_ucmd *ucmd) iommufd_put_object(ucmd->ictx, &hwpt_paging->common.obj); return rc; } + +int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) +{ + struct iommu_hwpt_invalidate *cmd = ucmd->cmd; + struct iommu_user_data_array data_array = { + .type = cmd->req_type, + .uptr = u64_to_user_ptr(cmd->reqs_uptr), + .entry_len = cmd->req_len, + .entry_num = cmd->req_num, + }; + struct iommufd_hw_pagetable *hwpt; + u32 done_num = 0; + int rc; + + if (cmd->req_num && (!cmd->reqs_uptr || !cmd->req_len)) { + rc = -EINVAL; + goto out; + } + + hwpt = iommufd_get_hwpt_nested(ucmd, cmd->hwpt_id); + if (IS_ERR(hwpt)) { + rc = PTR_ERR(hwpt); + goto out; + } + + rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain, + &data_array); + done_num = data_array.entry_num; + + iommufd_put_object(ucmd->ictx, &hwpt->obj); +out: + cmd->req_num = done_num; + if (iommufd_ucmd_respond(ucmd, sizeof(*cmd))) + return -EFAULT; + return rc; +} diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index abae041e256f..991f864d1f9b 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -328,6 +328,15 @@ iommufd_get_hwpt_paging(struct iommufd_ucmd *ucmd, u32 id) IOMMUFD_OBJ_HWPT_PAGING), struct iommufd_hwpt_paging, common.obj); } + +static inline struct iommufd_hw_pagetable * +iommufd_get_hwpt_nested(struct iommufd_ucmd *ucmd, u32 id) +{ + return container_of(iommufd_get_object(ucmd->ictx, id, + IOMMUFD_OBJ_HWPT_NESTED), + struct iommufd_hw_pagetable, obj); +} + int iommufd_hwpt_set_dirty_tracking(struct iommufd_ucmd *ucmd); int iommufd_hwpt_get_dirty_bitmap(struct iommufd_ucmd *ucmd);
@@ -345,6 +354,7 @@ void iommufd_hwpt_paging_abort(struct iommufd_object *obj); void iommufd_hwpt_nested_destroy(struct iommufd_object *obj); void iommufd_hwpt_nested_abort(struct iommufd_object *obj); int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd); +int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd);
static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, struct iommufd_hw_pagetable *hwpt) diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index c9091e46d208..e4f6dfc3474f 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -323,6 +323,7 @@ union ucmd_buffer { struct iommu_hwpt_alloc hwpt; struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap; struct iommu_hwpt_set_dirty_tracking set_dirty_tracking; + struct iommu_hwpt_invalidate cache; struct iommu_ioas_alloc alloc; struct iommu_ioas_allow_iovas allow_iovas; struct iommu_ioas_copy ioas_copy; @@ -362,6 +363,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { struct iommu_hwpt_get_dirty_bitmap, data), IOCTL_OP(IOMMU_HWPT_SET_DIRTY_TRACKING, iommufd_hwpt_set_dirty_tracking, struct iommu_hwpt_set_dirty_tracking, __reserved), + IOCTL_OP(IOMMU_HWPT_INVALIDATE, iommufd_hwpt_invalidate, + struct iommu_hwpt_invalidate, __reserved), IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl, struct iommu_ioas_alloc, out_ioas_id), IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas, diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 0b2bc6252e2c..e501ef4f7ec1 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -49,6 +49,7 @@ enum { IOMMUFD_CMD_GET_HW_INFO, IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING, IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP, + IOMMUFD_CMD_HWPT_INVALIDATE, };
/** @@ -613,4 +614,103 @@ struct iommu_hwpt_get_dirty_bitmap { #define IOMMU_HWPT_GET_DIRTY_BITMAP _IO(IOMMUFD_TYPE, \ IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP)
+/** + * enum iommu_hwpt_invalidate_data_type - IOMMU HWPT Cache Invalidation + * Data Type + * @IOMMU_HWPT_INVALIDATE_DATA_VTD_S1: Invalidation data for VTD_S1 + */ +enum iommu_hwpt_invalidate_data_type { + IOMMU_HWPT_INVALIDATE_DATA_VTD_S1, +}; + +/** + * enum iommu_hwpt_vtd_s1_invalidate_flags - Flags for Intel VT-d + * stage-1 cache invalidation + * @IOMMU_VTD_INV_FLAGS_LEAF: The LEAF flag indicates whether only the + * leaf PTE caching needs to be invalidated + * and other paging structure caches can be + * preserved. + */ +enum iommu_hwpt_vtd_s1_invalidate_flags { + IOMMU_VTD_INV_FLAGS_LEAF = 1 << 0, +}; + +/** + * enum iommu_hwpt_vtd_s1_invalidate_error - Result of invalidation + * @IOMMU_HWPT_INVALIDATE_VTD_S1_ICE: Invalidation Completion Error, details + * refer to 11.4.7.1 Fault Status Register + * of VT-d specification. + * @IOMMU_HWPT_INVALIDATE_VTD_S1_ITE: Invalidation Time-out Error, details + * refer to 11.4.7.1 Fault Status Register + * of VT-d specification. + */ +enum iommu_hwpt_vtd_s1_invalidate_error { + IOMMU_HWPT_INVALIDATE_VTD_S1_ICE = 1 << 0, + IOMMU_HWPT_INVALIDATE_VTD_S1_ITE = 1 << 1, +}; + +/** + * struct iommu_hwpt_vtd_s1_invalidate - Intel VT-d cache invalidation + * (IOMMU_HWPT_INVALIDATE_DATA_VTD_S1) + * @addr: The start address of the addresses to be invalidated. It needs + * to be 4KB aligned. + * @npages: Number of contiguous 4K pages to be invalidated. + * @flags: Combination of enum iommu_hwpt_vtd_s1_invalidate_flags + * @inv_error: One of enum iommu_hwpt_vtd_s1_invalidate_error + * + * The Intel VT-d specific invalidation data for user-managed stage-1 cache + * invalidation in nested translation. Userspace uses this structure to + * tell the impacted cache scope after modifying the stage-1 page table. + * + * Invalidating all the caches related to the page table by setting @addr + * to be 0 and @npages to be U64_MAX. + * + * @inv_error is meaningful only if the request is handled by kernel. This + * can be known by checking struct iommu_hwpt_invalidate::req_num output. + * @inv_error only covers the errors detected by hardware after submitting the + * invalidation. The software detected errors would go through the normal + * ioctl errno. + */ +struct iommu_hwpt_vtd_s1_invalidate { + __aligned_u64 addr; + __aligned_u64 npages; + __u32 flags; + __u32 inv_error; +}; + +/** + * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE) + * @size: sizeof(struct iommu_hwpt_invalidate) + * @hwpt_id: HWPT ID of a nested HWPT for cache invalidation + * @reqs_uptr: User pointer to an array having @req_num of cache invalidation + * requests. The request entries in the array are of fixed width + * @req_len, and contain a user data structure for invalidation + * request specific to the given hardware page table. + * @req_type: One of enum iommu_hwpt_invalidate_data_type, defining the data + * type of all the entries in the invalidation request array. It + * should be a type supported by the hwpt pointed by @hwpt_id. + * @req_len: Length (in bytes) of a request entry in the request array + * @req_num: Input the number of cache invalidation requests in the array. + * Output the number of requests successfully handled by kernel. + * @__reserved: Must be 0. + * + * Invalidate the iommu cache for user-managed page table. Modifications on a + * user-managed page table should be followed by this operation to sync cache. + * Each ioctl can support one or more cache invalidation requests in the array + * that has a total size of @req_len * @req_num. + * + * An empty invalidation request array by setting @req_num==0 is allowed, and + * @req_len and @reqs_uptr would be ignored in this case. This can be used to + * check if the given @req_type is supported or not by kernel. + */ +struct iommu_hwpt_invalidate { + __u32 size; + __u32 hwpt_id; + __aligned_u64 reqs_uptr; + __u32 req_type; + __u32 req_len; + __u32 req_num; + __u32 __reserved; +}; +#define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE) #endif
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
+int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) +{
- struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
- struct iommu_user_data_array data_array = {
.type = cmd->req_type,
.uptr = u64_to_user_ptr(cmd->reqs_uptr),
.entry_len = cmd->req_len,
.entry_num = cmd->req_num,
- };
- struct iommufd_hw_pagetable *hwpt;
- u32 done_num = 0;
- int rc;
- if (cmd->req_num && (!cmd->reqs_uptr || !cmd->req_len)) {
rc = -EINVAL;
goto out;
- }
miss a check on the __reserved field.
@@ -323,6 +323,7 @@ union ucmd_buffer { struct iommu_hwpt_alloc hwpt; struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap; struct iommu_hwpt_set_dirty_tracking set_dirty_tracking;
- struct iommu_hwpt_invalidate cache;
this should be in alphabetic order. I gave this comment in v6 too:
https://lore.kernel.org/linux-iommu/BN9PR11MB5276D8406BF08B853329288C8CB4A@B...
+/**
- enum iommu_hwpt_invalidate_data_type - IOMMU HWPT Cache
Invalidation
Data Type
- @IOMMU_HWPT_INVALIDATE_DATA_VTD_S1: Invalidation data for
VTD_S1
- */
+enum iommu_hwpt_invalidate_data_type {
- IOMMU_HWPT_INVALIDATE_DATA_VTD_S1,
+};
Defining DATA_VTD_S1 at this point is fine, if there is no usage on DATA_NONE. But following vtd specific definitions should be moved to the later vtd specific patches. they are not used by the common code anyway.
+/**
- enum iommu_hwpt_vtd_s1_invalidate_flags - Flags for Intel VT-d
stage-1 cache invalidation
- @IOMMU_VTD_INV_FLAGS_LEAF: The LEAF flag indicates whether only
the
leaf PTE caching needs to be invalidated
and other paging structure caches can be
preserved.
"indicates whether the invalidation applies to all-levels page structure cache or just the leaf PTE cache"
- */
+enum iommu_hwpt_vtd_s1_invalidate_flags {
- IOMMU_VTD_INV_FLAGS_LEAF = 1 << 0,
+};
+/**
- enum iommu_hwpt_vtd_s1_invalidate_error - Result of invalidation
"hardware error of invalidation"
- @IOMMU_HWPT_INVALIDATE_VTD_S1_ICE: Invalidation Completion
Error, details
refer to 11.4.7.1 Fault Status Register
of VT-d specification.
- @IOMMU_HWPT_INVALIDATE_VTD_S1_ITE: Invalidation Time-out Error,
details
refer to 11.4.7.1 Fault Status Register
of VT-d specification.
- */
+enum iommu_hwpt_vtd_s1_invalidate_error {
- IOMMU_HWPT_INVALIDATE_VTD_S1_ICE = 1 << 0,
- IOMMU_HWPT_INVALIDATE_VTD_S1_ITE = 1 << 1,
+};
+/**
- struct iommu_hwpt_vtd_s1_invalidate - Intel VT-d cache invalidation
(IOMMU_HWPT_INVALIDATE_DATA_VTD_S1)
- @addr: The start address of the addresses to be invalidated. It needs
to be 4KB aligned.
'of the range'
- @npages: Number of contiguous 4K pages to be invalidated.
- @flags: Combination of enum iommu_hwpt_vtd_s1_invalidate_flags
- @inv_error: One of enum iommu_hwpt_vtd_s1_invalidate_error
'@hw_error'
- The Intel VT-d specific invalidation data for user-managed stage-1 cache
- invalidation in nested translation. Userspace uses this structure to
- tell the impacted cache scope after modifying the stage-1 page table.
- Invalidating all the caches related to the page table by setting @addr
- to be 0 and @npages to be U64_MAX.
here should clarify that the invalidation applies to device TLB automatically for VT-d.
- @inv_error is meaningful only if the request is handled by kernel. This
- can be known by checking struct iommu_hwpt_invalidate::req_num
output.
- @inv_error only covers the errors detected by hardware after submitting
the
- invalidation. The software detected errors would go through the normal
- ioctl errno.
- */
+struct iommu_hwpt_vtd_s1_invalidate {
- __aligned_u64 addr;
- __aligned_u64 npages;
- __u32 flags;
- __u32 inv_error;
+};
+/**
- struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
- @size: sizeof(struct iommu_hwpt_invalidate)
- @hwpt_id: HWPT ID of a nested HWPT for cache invalidation
- @reqs_uptr: User pointer to an array having @req_num of cache
invalidation
requests. The request entries in the array are of fixed width
@req_len, and contain a user data structure for invalidation
request specific to the given hardware page table.
Just:
'User pointer to an array of driver-specific cache invalidation requests'
- @req_type: One of enum iommu_hwpt_invalidate_data_type, defining
the data
type of all the entries in the invalidation request array. It
should be a type supported by the hwpt pointed by @hwpt_id.
- @req_len: Length (in bytes) of a request entry in the request array
- @req_num: Input the number of cache invalidation requests in the array.
Output the number of requests successfully handled by kernel.
- @__reserved: Must be 0.
- Invalidate the iommu cache for user-managed page table. Modifications
on a
- user-managed page table should be followed by this operation to sync
cache.
- Each ioctl can support one or more cache invalidation requests in the
array
- that has a total size of @req_len * @req_num.
- An empty invalidation request array by setting @req_num==0 is allowed,
and
- @req_len and @reqs_uptr would be ignored in this case. This can be
used to
- check if the given @req_type is supported or not by kernel.
- */
+struct iommu_hwpt_invalidate {
- __u32 size;
- __u32 hwpt_id;
- __aligned_u64 reqs_uptr;
- __u32 req_type;
- __u32 req_len;
- __u32 req_num;
- __u32 __reserved;
+}; +#define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE)
#endif
2.34.1
On 2023/12/22 11:19, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
+int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) +{
- struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
- struct iommu_user_data_array data_array = {
.type = cmd->req_type,
.uptr = u64_to_user_ptr(cmd->reqs_uptr),
.entry_len = cmd->req_len,
.entry_num = cmd->req_num,
- };
- struct iommufd_hw_pagetable *hwpt;
- u32 done_num = 0;
- int rc;
- if (cmd->req_num && (!cmd->reqs_uptr || !cmd->req_len)) {
rc = -EINVAL;
goto out;
- }
miss a check on the __reserved field.
done.
@@ -323,6 +323,7 @@ union ucmd_buffer { struct iommu_hwpt_alloc hwpt; struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap; struct iommu_hwpt_set_dirty_tracking set_dirty_tracking;
- struct iommu_hwpt_invalidate cache;
this should be in alphabetic order. I gave this comment in v6 too:
https://lore.kernel.org/linux-iommu/BN9PR11MB5276D8406BF08B853329288C8CB4A@B...
oops, I did miss it. sigh, I paid too much attention to the error code discussion in the end. :( I've revisited the comments and should have done now.
+/**
- enum iommu_hwpt_invalidate_data_type - IOMMU HWPT Cache
Invalidation
Data Type
- @IOMMU_HWPT_INVALIDATE_DATA_VTD_S1: Invalidation data for
VTD_S1
- */
+enum iommu_hwpt_invalidate_data_type {
- IOMMU_HWPT_INVALIDATE_DATA_VTD_S1,
+};
Defining DATA_VTD_S1 at this point is fine, if there is no usage on DATA_NONE. But following vtd specific definitions should be moved to the later vtd specific patches. they are not used by the common code anyway.
sure.
+/**
- enum iommu_hwpt_vtd_s1_invalidate_flags - Flags for Intel VT-d
stage-1 cache invalidation
- @IOMMU_VTD_INV_FLAGS_LEAF: The LEAF flag indicates whether only
the
leaf PTE caching needs to be invalidated
and other paging structure caches can be
preserved.
"indicates whether the invalidation applies to all-levels page structure cache or just the leaf PTE cache"
done.
- */
+enum iommu_hwpt_vtd_s1_invalidate_flags {
- IOMMU_VTD_INV_FLAGS_LEAF = 1 << 0,
+};
+/**
- enum iommu_hwpt_vtd_s1_invalidate_error - Result of invalidation
"hardware error of invalidation"
done.
- @IOMMU_HWPT_INVALIDATE_VTD_S1_ICE: Invalidation Completion
Error, details
refer to 11.4.7.1 Fault Status Register
of VT-d specification.
- @IOMMU_HWPT_INVALIDATE_VTD_S1_ITE: Invalidation Time-out Error,
details
refer to 11.4.7.1 Fault Status Register
of VT-d specification.
- */
+enum iommu_hwpt_vtd_s1_invalidate_error {
- IOMMU_HWPT_INVALIDATE_VTD_S1_ICE = 1 << 0,
- IOMMU_HWPT_INVALIDATE_VTD_S1_ITE = 1 << 1,
+};
+/**
- struct iommu_hwpt_vtd_s1_invalidate - Intel VT-d cache invalidation
(IOMMU_HWPT_INVALIDATE_DATA_VTD_S1)
- @addr: The start address of the addresses to be invalidated. It needs
to be 4KB aligned.
'of the range'
done.
- @npages: Number of contiguous 4K pages to be invalidated.
- @flags: Combination of enum iommu_hwpt_vtd_s1_invalidate_flags
- @inv_error: One of enum iommu_hwpt_vtd_s1_invalidate_error
'@hw_error'
done.
- The Intel VT-d specific invalidation data for user-managed stage-1 cache
- invalidation in nested translation. Userspace uses this structure to
- tell the impacted cache scope after modifying the stage-1 page table.
- Invalidating all the caches related to the page table by setting @addr
- to be 0 and @npages to be U64_MAX.
here should clarify that the invalidation applies to device TLB automatically for VT-d.
done.
- @inv_error is meaningful only if the request is handled by kernel. This
- can be known by checking struct iommu_hwpt_invalidate::req_num
output.
- @inv_error only covers the errors detected by hardware after submitting
the
- invalidation. The software detected errors would go through the normal
- ioctl errno.
- */
+struct iommu_hwpt_vtd_s1_invalidate {
- __aligned_u64 addr;
- __aligned_u64 npages;
- __u32 flags;
- __u32 inv_error;
+};
+/**
- struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
- @size: sizeof(struct iommu_hwpt_invalidate)
- @hwpt_id: HWPT ID of a nested HWPT for cache invalidation
- @reqs_uptr: User pointer to an array having @req_num of cache
invalidation
requests. The request entries in the array are of fixed width
@req_len, and contain a user data structure for invalidation
request specific to the given hardware page table.
Just:
'User pointer to an array of driver-specific cache invalidation requests'
done.
- @req_type: One of enum iommu_hwpt_invalidate_data_type, defining
the data
type of all the entries in the invalidation request array. It
should be a type supported by the hwpt pointed by @hwpt_id.
- @req_len: Length (in bytes) of a request entry in the request array
- @req_num: Input the number of cache invalidation requests in the array.
s/req_len/entry_len, s/req_num/entry_num per the comments in last version.
Output the number of requests successfully handled by kernel.
- @__reserved: Must be 0.
- Invalidate the iommu cache for user-managed page table. Modifications
on a
- user-managed page table should be followed by this operation to sync
cache.
- Each ioctl can support one or more cache invalidation requests in the
array
- that has a total size of @req_len * @req_num.
- An empty invalidation request array by setting @req_num==0 is allowed,
and
- @req_len and @reqs_uptr would be ignored in this case. This can be
used to
- check if the given @req_type is supported or not by kernel.
- */
+struct iommu_hwpt_invalidate {
- __u32 size;
- __u32 hwpt_id;
- __aligned_u64 reqs_uptr;
- __u32 req_type;
- __u32 req_len;
- __u32 req_num;
- __u32 __reserved;
+}; +#define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE)
#endif
2.34.1
From: Nicolin Chen nicolinc@nvidia.com
Wrap up the data pointer/num sanity and __iommu_copy_struct_from_user call for iommu drivers to copy driver specific data at a specific location in the struct iommu_user_data_array, and iommu_respond_struct_to_user_array() to copy response to a specific location in the struct iommu_user_data_array.
And expect it to be used in cache_invalidate_user ops for example.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Co-developed-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- include/linux/iommu.h | 74 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 74 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 5c4a17f13761..cfab934e71a2 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -342,6 +342,80 @@ static inline int __iommu_copy_struct_from_user( sizeof(*kdst), \ offsetofend(typeof(*kdst), min_last))
+/** + * __iommu_copy_struct_from_user_array - Copy iommu driver specific user space + * data from an iommu_user_data_array + * @dst_data: Pointer to an iommu driver specific user data that is defined in + * include/uapi/linux/iommufd.h + * @src_array: Pointer to a struct iommu_user_data_array for a user space array + * @data_type: The data type of the @dst_data. Must match with @src_array.type + * @index: Index to the location in the array to copy user data from + * @data_len: Length of current user data structure, i.e. sizeof(struct _dst) + * @min_len: Initial length of user data structure for backward compatibility. + * This should be offsetofend using the last member in the user data + * struct that was initially added to include/uapi/linux/iommufd.h + */ +static inline int +__iommu_copy_struct_from_user_array(void *dst_data, + const struct iommu_user_data_array *src_array, + unsigned int data_type, unsigned int index, + size_t data_len, size_t min_len) +{ + struct iommu_user_data src_data; + + if (WARN_ON(!src_array || index >= src_array->entry_num)) + return -EINVAL; + if (!src_array->entry_num) + return -EINVAL; + src_data.uptr = src_array->uptr + src_array->entry_len * index; + src_data.len = src_array->entry_len; + src_data.type = src_array->type; + + return __iommu_copy_struct_from_user(dst_data, &src_data, data_type, + data_len, min_len); +} + +/** + * iommu_copy_struct_from_user_array - Copy iommu driver specific user space + * data from an iommu_user_data_array + * @kdst: Pointer to an iommu driver specific user data that is defined in + * include/uapi/linux/iommufd.h + * @user_array: Pointer to a struct iommu_user_data_array for a user space + * array + * @data_type: The data type of the @kdst. Must match with @user_array->type + * @index: Index to the location in the array to copy user data from + * @min_last: The last memember of the data structure @kdst points in the + * initial version. + * Return 0 for success, otherwise -error. + */ +#define iommu_copy_struct_from_user_array(kdst, user_array, data_type, \ + index, min_last) \ + __iommu_copy_struct_from_user_array(kdst, user_array, data_type, \ + index, sizeof(*kdst), \ + offsetofend(typeof(*kdst), \ + min_last)) + +/** + * iommu_respond_struct_to_user_array - Copy the response in @ksrc back to + * a specific entry of user array + * @user_array: Pointer to a struct iommu_user_data_array for a user space + * array + * @index: Index to the location in the array to copy response + * @ksrc: Pointer to kernel structure + * @klen: Length of @ksrc struct + * + * This only copies response of one entry (@index) in @user_array. + */ +static inline int +iommu_respond_struct_to_user_array(const struct iommu_user_data_array *array, + unsigned int index, void *ksrc, size_t klen) +{ + if (copy_to_user(array->uptr + array->entry_len * index, + ksrc, min_t(size_t, array->entry_len, klen))) + return -EFAULT; + return 0; +} + /** * struct iommu_ops - iommu ops and capabilities * @capable: check capability
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
From: Nicolin Chen nicolinc@nvidia.com
Wrap up the data pointer/num sanity and __iommu_copy_struct_from_user call for iommu drivers to copy driver specific data at a specific location in the struct iommu_user_data_array, and iommu_respond_struct_to_user_array() to copy response to a specific location in the struct iommu_user_data_array.
And expect it to be used in cache_invalidate_user ops for example.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Co-developed-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
From: Nicolin Chen nicolinc@nvidia.com
Add mock_domain_cache_invalidate_user() data structure to support user space selftest program to cover user cache invalidation pathway.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Co-developed-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/iommufd_test.h | 34 ++++++++++++++ drivers/iommu/iommufd/selftest.c | 67 ++++++++++++++++++++++++++++ 2 files changed, 101 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 7910fbe1962d..15dd74d361f9 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -148,4 +148,38 @@ struct iommu_hwpt_selftest { __u32 iotlb; };
+/* Should not be equal to any defined value in enum iommu_hwpt_invalidate_data_type */ +#define IOMMU_HWPT_INVALIDATE_DATA_SELFTEST 0xdeadbeef +#define IOMMU_HWPT_INVALIDATE_DATA_SELFTEST_INVALID 0xdadbeef + +/** + * enum iommu_hwpt_invalidate_selftest_result_code - Result of invalidation + * @IOMMU_TEST_INVALIDATE_SUCC: Success + * @IOMMU_TEST_INVALIDATE_FAKE_ERROR: Fake error per test program's request + */ +enum iommu_hwpt_invalidate_selftest_result_code { + IOMMU_TEST_INVALIDATE_SUCC, + IOMMU_TEST_INVALIDATE_FAKE_ERROR, +}; + +/** + * struct iommu_hwpt_invalidate_selftest + * + * @flags: Invalidate flags + * @iotlb_id: Invalidate iotlb entry index + * @code: One of enum iommu_hwpt_invalidate_selftest_result_code + * @__reserved: Must be 0 + * + * If IOMMU_TEST_INVALIDATE_ALL is set in @flags, @iotlb_id will be ignored + * @code meaningful only if the request is processed successfully. + */ +struct iommu_hwpt_invalidate_selftest { +#define IOMMU_TEST_INVALIDATE_FLAG_ALL (1ULL << 0) +#define IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR (1ULL << 1) + __u32 flags; + __u32 iotlb_id; + __u32 code; + __u32 __reserved; +}; + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 022ef8f55088..0eb4e7a05f4a 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -473,9 +473,76 @@ static void mock_domain_free_nested(struct iommu_domain *domain) kfree(mock_nested); }
+static int +mock_domain_cache_invalidate_user(struct iommu_domain *domain, + struct iommu_user_data_array *array) +{ + struct mock_iommu_domain_nested *mock_nested = + container_of(domain, struct mock_iommu_domain_nested, domain); + u32 error_code = IOMMU_TEST_INVALIDATE_SUCC, processed = 0; + struct iommu_hwpt_invalidate_selftest inv; + int i = 0, j; + int rc = 0; + + if (array->type != IOMMU_HWPT_INVALIDATE_DATA_SELFTEST) { + rc = -EINVAL; + goto out; + } + + for ( ; i < array->entry_num; i++) { + rc = iommu_copy_struct_from_user_array(&inv, array, + IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, + i, __reserved); + if (rc) + break; + + if ((inv.flags & ~(IOMMU_TEST_INVALIDATE_FLAG_ALL | + IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR)) || + inv.__reserved) { + rc = -EOPNOTSUPP; + break; + } + + if ((inv.flags & IOMMU_TEST_INVALIDATE_FLAG_ALL) && + (inv.flags & IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR)) { + rc = -EINVAL; + break; + } + + if (inv.iotlb_id > MOCK_NESTED_DOMAIN_IOTLB_ID_MAX) { + rc = -EINVAL; + break; + } + + if (inv.flags & IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR) { + error_code = IOMMU_TEST_INVALIDATE_FAKE_ERROR; + } else if (inv.flags & IOMMU_TEST_INVALIDATE_FLAG_ALL) { + /* Invalidate all mock iotlb entries and ignore iotlb_id */ + for (j = 0; j < MOCK_NESTED_DOMAIN_IOTLB_NUM; j++) + mock_nested->iotlb[j] = 0; + } else { + mock_nested->iotlb[inv.iotlb_id] = 0; + } + + /* error code only cover mock iommu errors */ + inv.code = error_code; + rc = iommu_respond_struct_to_user_array(array, i, (void *)&inv, + sizeof(inv)); + if (rc) + break; + + processed++; + } + +out: + array->entry_num = processed; + return rc; +} + static struct iommu_domain_ops domain_nested_ops = { .free = mock_domain_free_nested, .attach_dev = mock_domain_nop_attach, + .cache_invalidate_user = mock_domain_cache_invalidate_user, };
static inline struct iommufd_hw_pagetable *
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
if ((inv.flags & IOMMU_TEST_INVALIDATE_FLAG_ALL) &&
(inv.flags &
IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR)) {
rc = -EINVAL;
break;
}
a nit. is there a reason why the two flags can not be set together?
in concept a mock iommu error could occur in either invalidate-one or invalidate-all.
otherwise,
Reviewed-by: Kevin Tian kevin.tian@intel.com
On 2023/12/22 11:39, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
if ((inv.flags & IOMMU_TEST_INVALIDATE_FLAG_ALL) &&
(inv.flags &
IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR)) {
rc = -EINVAL;
break;
}
a nit. is there a reason why the two flags can not be set together?
in concept a mock iommu error could occur in either invalidate-one or invalidate-all.
I see. I'm ok to relax this check and remove the selftest case as well.
otherwise,
Reviewed-by: Kevin Tian kevin.tian@intel.com
From: Nicolin Chen nicolinc@nvidia.com
Allow to test whether IOTLB has been invalidated or not.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/iommufd_test.h | 5 ++++ drivers/iommu/iommufd/selftest.c | 26 +++++++++++++++++++ tools/testing/selftests/iommu/iommufd.c | 4 +++ tools/testing/selftests/iommu/iommufd_utils.h | 24 +++++++++++++++++ 4 files changed, 59 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 15dd74d361f9..c20ca8acb789 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -21,6 +21,7 @@ enum { IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS, IOMMU_TEST_OP_DIRTY, + IOMMU_TEST_OP_MD_CHECK_IOTLB, };
enum { @@ -121,6 +122,10 @@ struct iommu_test_cmd { __aligned_u64 uptr; __aligned_u64 out_nr_dirty; } dirty; + struct { + __u32 id; + __u32 iotlb; + } check_iotlb; }; __u32 last; }; diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 0eb4e7a05f4a..f4b474e5043c 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -860,6 +860,28 @@ static int iommufd_test_md_check_refs(struct iommufd_ucmd *ucmd, return 0; }
+static int iommufd_test_md_check_iotlb(struct iommufd_ucmd *ucmd, + u32 mockpt_id, unsigned int iotlb_id, + u32 iotlb) +{ + struct mock_iommu_domain_nested *mock_nested; + struct iommufd_hw_pagetable *hwpt; + int rc = 0; + + hwpt = get_md_pagetable_nested(ucmd, mockpt_id, &mock_nested); + if (IS_ERR(hwpt)) + return PTR_ERR(hwpt); + + mock_nested = container_of(hwpt->domain, + struct mock_iommu_domain_nested, domain); + + if (iotlb_id > MOCK_NESTED_DOMAIN_IOTLB_ID_MAX || + mock_nested->iotlb[iotlb_id] != iotlb) + rc = -EINVAL; + iommufd_put_object(ucmd->ictx, &hwpt->obj); + return rc; +} + struct selftest_access { struct iommufd_access *access; struct file *file; @@ -1341,6 +1363,10 @@ int iommufd_test(struct iommufd_ucmd *ucmd) return iommufd_test_md_check_refs( ucmd, u64_to_user_ptr(cmd->check_refs.uptr), cmd->check_refs.length, cmd->check_refs.refs); + case IOMMU_TEST_OP_MD_CHECK_IOTLB: + return iommufd_test_md_check_iotlb(ucmd, cmd->id, + cmd->check_iotlb.id, + cmd->check_iotlb.iotlb); case IOMMU_TEST_OP_CREATE_ACCESS: return iommufd_test_create_access(ucmd, cmd->id, cmd->create_access.flags); diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 6ed328c863c4..c8763b880a16 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -330,6 +330,10 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested) &nested_hwpt_id[1], IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data)); + test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[0], + IOMMU_TEST_IOTLB_DEFAULT); + test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[1], + IOMMU_TEST_IOTLB_DEFAULT);
/* Negative test: a nested hwpt on top of a nested hwpt */ test_err_hwpt_alloc_nested(EINVAL, self->device_id, diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index ad9202335656..fe0a0f566b67 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -195,6 +195,30 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ hwpt_id, data_type, data, data_len))
+#define test_cmd_hwpt_check_iotlb(hwpt_id, iotlb_id, expected) \ + ({ \ + struct iommu_test_cmd test_cmd = { \ + .size = sizeof(test_cmd), \ + .op = IOMMU_TEST_OP_MD_CHECK_IOTLB, \ + .id = hwpt_id, \ + .check_iotlb = { \ + .id = iotlb_id, \ + .iotlb = expected, \ + }, \ + }; \ + ASSERT_EQ(0, \ + ioctl(self->fd, \ + _IOMMU_TEST_CMD(IOMMU_TEST_OP_MD_CHECK_IOTLB), \ + &test_cmd)); \ + }) + +#define test_cmd_hwpt_check_iotlb_all(hwpt_id, expected) \ + ({ \ + int i; \ + for (i = 0; i < MOCK_NESTED_DOMAIN_IOTLB_NUM; i++) \ + test_cmd_hwpt_check_iotlb(hwpt_id, i, expected); \ + }) + static int _test_cmd_access_replace_ioas(int fd, __u32 access_id, unsigned int ioas_id) {
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
From: Nicolin Chen nicolinc@nvidia.com
Allow to test whether IOTLB has been invalidated or not.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
From: Nicolin Chen nicolinc@nvidia.com
Add test cases for the IOMMU_HWPT_INVALIDATE ioctl and verify it by using the new IOMMU_TEST_OP_MD_CHECK_IOTLB.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Co-developed-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- tools/testing/selftests/iommu/iommufd.c | 166 ++++++++++++++++++ tools/testing/selftests/iommu/iommufd_utils.h | 33 ++++ 2 files changed, 199 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index c8763b880a16..6aba797a3d4b 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -116,6 +116,7 @@ TEST_F(iommufd, cmd_length) TEST_LENGTH(iommu_destroy, IOMMU_DESTROY, id); TEST_LENGTH(iommu_hw_info, IOMMU_GET_HW_INFO, __reserved); TEST_LENGTH(iommu_hwpt_alloc, IOMMU_HWPT_ALLOC, __reserved); + TEST_LENGTH(iommu_hwpt_invalidate, IOMMU_HWPT_INVALIDATE, __reserved); TEST_LENGTH(iommu_ioas_alloc, IOMMU_IOAS_ALLOC, out_ioas_id); TEST_LENGTH(iommu_ioas_iova_ranges, IOMMU_IOAS_IOVA_RANGES, out_iova_alignment); @@ -271,7 +272,9 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested) struct iommu_hwpt_selftest data = { .iotlb = IOMMU_TEST_IOTLB_DEFAULT, }; + struct iommu_hwpt_invalidate_selftest inv_reqs[2] = {}; uint32_t nested_hwpt_id[2] = {}; + uint32_t num_inv; uint32_t parent_hwpt_id = 0; uint32_t parent_hwpt_id_not_work = 0; uint32_t test_hwpt_id = 0; @@ -344,6 +347,169 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested) EXPECT_ERRNO(EBUSY, _test_ioctl_destroy(self->fd, parent_hwpt_id));
+ /* hwpt_invalidate only supports a user-managed hwpt (nested) */ + num_inv = 1; + test_err_hwpt_invalidate(ENOENT, parent_hwpt_id, inv_reqs, + IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Check req_type by passing zero-length array */ + num_inv = 0; + test_cmd_hwpt_invalidate(nested_hwpt_id[0], inv_reqs, + IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: Invalid req_type */ + num_inv = 1; + test_err_hwpt_invalidate(EINVAL, nested_hwpt_id[0], inv_reqs, + IOMMU_HWPT_INVALIDATE_DATA_SELFTEST_INVALID, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: structure size sanity */ + num_inv = 1; + test_err_hwpt_invalidate(EINVAL, nested_hwpt_id[0], inv_reqs, + IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs) + 1, &num_inv); + assert(!num_inv); + + num_inv = 1; + test_err_hwpt_invalidate(EINVAL, nested_hwpt_id[0], inv_reqs, + IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, + 1, &num_inv); + assert(!num_inv); + + /* Negative test: invalid flag is passed */ + num_inv = 1; + inv_reqs[0].flags = 0xffffffff; + test_err_hwpt_invalidate(EOPNOTSUPP, nested_hwpt_id[0], inv_reqs, + IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + num_inv = 1; + inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_ALL | + IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR; + test_err_hwpt_invalidate(EINVAL, nested_hwpt_id[0], inv_reqs, + IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].__reserved = 0x1234; + test_err_hwpt_invalidate(EOPNOTSUPP, nested_hwpt_id[0], inv_reqs, + IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: invalid iotlb_id */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].__reserved = 0; + inv_reqs[0].iotlb_id = MOCK_NESTED_DOMAIN_IOTLB_ID_MAX + 1; + test_err_hwpt_invalidate(EINVAL, nested_hwpt_id[0], inv_reqs, + IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: trigger error */ + num_inv = 1; + inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR; + inv_reqs[0].iotlb_id = 0; + test_cmd_hwpt_invalidate(nested_hwpt_id[0], inv_reqs, + IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + assert(inv_reqs[0].code == IOMMU_TEST_INVALIDATE_FAKE_ERROR); + + /* + * Invalidate the 1st iotlb entry but fail the 2nd request + * - mock driver error, the error code field is meaningful, + * the ioctl returns 0. + */ + num_inv = 2; + inv_reqs[0].flags = 0; + inv_reqs[0].iotlb_id = 0; + inv_reqs[1].flags = IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR; + inv_reqs[1].iotlb_id = 1; + test_cmd_hwpt_invalidate(nested_hwpt_id[0], inv_reqs, + IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 2); + assert(inv_reqs[0].code == IOMMU_TEST_INVALIDATE_SUCC); + assert(inv_reqs[1].code == IOMMU_TEST_INVALIDATE_FAKE_ERROR); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 0, 0); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 1, + IOMMU_TEST_IOTLB_DEFAULT); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 2, + IOMMU_TEST_IOTLB_DEFAULT); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 3, + IOMMU_TEST_IOTLB_DEFAULT); + + /* + * Invalidate the 1st iotlb entry but fail the 2nd request + * - ioctl error, the error code field is meaningless + */ + num_inv = 2; + inv_reqs[0].flags = 0; + inv_reqs[0].iotlb_id = 0; + inv_reqs[1].flags = 0; + inv_reqs[1].iotlb_id = MOCK_NESTED_DOMAIN_IOTLB_ID_MAX + 1; + test_err_hwpt_invalidate(EINVAL, nested_hwpt_id[0], inv_reqs, + IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + assert(inv_reqs[0].code == IOMMU_TEST_INVALIDATE_SUCC); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 0, 0); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 1, + IOMMU_TEST_IOTLB_DEFAULT); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 2, + IOMMU_TEST_IOTLB_DEFAULT); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 3, + IOMMU_TEST_IOTLB_DEFAULT); + + /* Invalidate the 2nd iotlb entry and verify */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].iotlb_id = 1; + test_cmd_hwpt_invalidate(nested_hwpt_id[0], inv_reqs, + IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(inv_reqs[0].code == IOMMU_TEST_INVALIDATE_SUCC); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 0, 0); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 1, 0); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 2, + IOMMU_TEST_IOTLB_DEFAULT); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 3, + IOMMU_TEST_IOTLB_DEFAULT); + + /* Invalidate the 3rd and 4th iotlb entries and verify */ + num_inv = 2; + inv_reqs[0].flags = 0; + inv_reqs[0].iotlb_id = 2; + inv_reqs[1].flags = 0; + inv_reqs[1].iotlb_id = 3; + test_cmd_hwpt_invalidate(nested_hwpt_id[0], inv_reqs, + IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 2); + assert(inv_reqs[0].code == IOMMU_TEST_INVALIDATE_SUCC); + assert(inv_reqs[1].code == IOMMU_TEST_INVALIDATE_SUCC); + test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[0], 0); + + /* Invalidate all iotlb entries for nested_hwpt_id[1] and verify */ + num_inv = 1; + inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_ALL; + test_cmd_hwpt_invalidate(nested_hwpt_id[1], inv_reqs, + IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + assert(inv_reqs[0].code == IOMMU_TEST_INVALIDATE_SUCC); + test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[1], 0); + /* Attach device to nested_hwpt_id[0] that then will be busy */ test_cmd_mock_domain_replace(self->stdev_id, nested_hwpt_id[0]); EXPECT_ERRNO(EBUSY, diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index fe0a0f566b67..86f3f66c97f0 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -219,6 +219,39 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, test_cmd_hwpt_check_iotlb(hwpt_id, i, expected); \ })
+static int _test_cmd_hwpt_invalidate(int fd, __u32 hwpt_id, void *reqs, + uint32_t req_type, uint32_t lreq, + uint32_t *nreqs) +{ + struct iommu_hwpt_invalidate cmd = { + .size = sizeof(cmd), + .hwpt_id = hwpt_id, + .req_type = req_type, + .reqs_uptr = (uint64_t)reqs, + .req_len = lreq, + .req_num = *nreqs, + }; + int rc = ioctl(fd, IOMMU_HWPT_INVALIDATE, &cmd); + *nreqs = cmd.req_num; + return rc; +} + +#define test_cmd_hwpt_invalidate(hwpt_id, reqs, req_type, lreq, nreqs) \ + ({ \ + ASSERT_EQ(0, \ + _test_cmd_hwpt_invalidate(self->fd, hwpt_id, reqs, \ + req_type, \ + lreq, nreqs)); \ + }) +#define test_err_hwpt_invalidate(_errno, hwpt_id, reqs, req_type, lreq, \ + nreqs) \ + ({ \ + EXPECT_ERRNO(_errno, \ + _test_cmd_hwpt_invalidate(self->fd, hwpt_id, \ + reqs, req_type, \ + lreq, nreqs)); \ + }) + static int _test_cmd_access_replace_ioas(int fd, __u32 access_id, unsigned int ioas_id) {
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
From: Nicolin Chen nicolinc@nvidia.com
Add test cases for the IOMMU_HWPT_INVALIDATE ioctl and verify it by using the new IOMMU_TEST_OP_MD_CHECK_IOTLB.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Co-developed-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com
overall this look good:
Reviewed-by: Kevin Tian kevin.tian@intel.com
with two nits:
num_inv = 1;
inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_ALL |
IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR;
test_err_hwpt_invalidate(EINVAL, nested_hwpt_id[0],
inv_reqs,
IOMMU_HWPT_INVALIDATE_DATA_SELFTEST,
sizeof(*inv_reqs), &num_inv);
assert(!num_inv);
this may need adjustment upon whether we want to allow two flags together.
and let's add a test for below code for completeness:
+ if (cmd->req_num && (!cmd->reqs_uptr || !cmd->req_len)) { + rc = -EINVAL; + goto out; + }
On 2023/12/22 12:09, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
From: Nicolin Chen nicolinc@nvidia.com
Add test cases for the IOMMU_HWPT_INVALIDATE ioctl and verify it by using the new IOMMU_TEST_OP_MD_CHECK_IOTLB.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Co-developed-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com
overall this look good:
Reviewed-by: Kevin Tian kevin.tian@intel.com
with two nits:
num_inv = 1;
inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_ALL |
IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR;
test_err_hwpt_invalidate(EINVAL, nested_hwpt_id[0],
inv_reqs,
IOMMU_HWPT_INVALIDATE_DATA_SELFTEST,
sizeof(*inv_reqs), &num_inv);
assert(!num_inv);
this may need adjustment upon whether we want to allow two flags together.
it was removed :)
and let's add a test for below code for completeness:
- if (cmd->req_num && (!cmd->reqs_uptr || !cmd->req_len)) {
rc = -EINVAL;
goto out;
- }
done.
From: Lu Baolu baolu.lu@linux.intel.com
This allows qi_submit_sync() to return back faults to callers.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/intel/dmar.c | 29 ++++++++++++++++++----------- drivers/iommu/intel/iommu.h | 2 +- drivers/iommu/intel/irq_remapping.c | 2 +- drivers/iommu/intel/pasid.c | 2 +- drivers/iommu/intel/svm.c | 6 +++--- 5 files changed, 24 insertions(+), 17 deletions(-)
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index 23cb80d62a9a..a3e0cb720e06 100644 --- a/drivers/iommu/intel/dmar.c +++ b/drivers/iommu/intel/dmar.c @@ -1267,7 +1267,8 @@ static void qi_dump_fault(struct intel_iommu *iommu, u32 fault) (unsigned long long)desc->qw1); }
-static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index) +static int qi_check_fault(struct intel_iommu *iommu, int index, + int wait_index, u32 *fsts) { u32 fault; int head, tail; @@ -1278,8 +1279,12 @@ static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index) return -EAGAIN;
fault = readl(iommu->reg + DMAR_FSTS_REG); - if (fault & (DMA_FSTS_IQE | DMA_FSTS_ITE | DMA_FSTS_ICE)) + fault &= DMA_FSTS_IQE | DMA_FSTS_ITE | DMA_FSTS_ICE; + if (fault) { + if (fsts) + *fsts |= fault; qi_dump_fault(iommu, fault); + }
/* * If IQE happens, the head points to the descriptor associated @@ -1342,9 +1347,11 @@ static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index) * time, a wait descriptor will be appended to each submission to ensure * hardware has completed the invalidation before return. Wait descriptors * can be part of the submission but it will not be polled for completion. + * If callers are interested in the QI faults that occur during the handling + * of requests, the QI faults are saved in @fault. */ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc, - unsigned int count, unsigned long options) + unsigned int count, unsigned long options, u32 *fault) { struct q_inval *qi = iommu->qi; s64 devtlb_start_ktime = 0; @@ -1430,7 +1437,7 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc, * a deadlock where the interrupt context can wait indefinitely * for free slots in the queue. */ - rc = qi_check_fault(iommu, index, wait_index); + rc = qi_check_fault(iommu, index, wait_index, fault); if (rc) break;
@@ -1476,7 +1483,7 @@ void qi_global_iec(struct intel_iommu *iommu) desc.qw3 = 0;
/* should never fail */ - qi_submit_sync(iommu, &desc, 1, 0); + qi_submit_sync(iommu, &desc, 1, 0, NULL); }
void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm, @@ -1490,7 +1497,7 @@ void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm, desc.qw2 = 0; desc.qw3 = 0;
- qi_submit_sync(iommu, &desc, 1, 0); + qi_submit_sync(iommu, &desc, 1, 0, NULL); }
void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr, @@ -1514,7 +1521,7 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr, desc.qw2 = 0; desc.qw3 = 0;
- qi_submit_sync(iommu, &desc, 1, 0); + qi_submit_sync(iommu, &desc, 1, 0, NULL); }
void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid, @@ -1545,7 +1552,7 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid, desc.qw2 = 0; desc.qw3 = 0;
- qi_submit_sync(iommu, &desc, 1, 0); + qi_submit_sync(iommu, &desc, 1, 0, NULL); }
/* PASID-based IOTLB invalidation */ @@ -1586,7 +1593,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr, QI_EIOTLB_AM(mask); }
- qi_submit_sync(iommu, &desc, 1, 0); + qi_submit_sync(iommu, &desc, 1, 0, NULL); }
/* PASID-based device IOTLB Invalidate */ @@ -1639,7 +1646,7 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid, desc.qw1 |= QI_DEV_EIOTLB_SIZE; }
- qi_submit_sync(iommu, &desc, 1, 0); + qi_submit_sync(iommu, &desc, 1, 0, NULL); }
void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, @@ -1649,7 +1656,7 @@ void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did,
desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) | QI_PC_GRAN(granu) | QI_PC_TYPE; - qi_submit_sync(iommu, &desc, 1, 0); + qi_submit_sync(iommu, &desc, 1, 0, NULL); }
/* diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index ce030c5b5772..c6de958e4f54 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -881,7 +881,7 @@ void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, u32 pasid);
int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc, - unsigned int count, unsigned long options); + unsigned int count, unsigned long options, u32 *fault); /* * Options used in qi_submit_sync: * QI_OPT_WAIT_DRAIN - Wait for PRQ drain completion, spec 6.5.2.8. diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index 29b9e55dcf26..f834afa3672d 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -153,7 +153,7 @@ static int qi_flush_iec(struct intel_iommu *iommu, int index, int mask) desc.qw2 = 0; desc.qw3 = 0;
- return qi_submit_sync(iommu, &desc, 1, 0); + return qi_submit_sync(iommu, &desc, 1, 0, NULL); }
static int modify_irte(struct irq_2_iommu *irq_iommu, diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index 74e8e4c17e81..67f924760ba8 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -467,7 +467,7 @@ pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu, desc.qw2 = 0; desc.qw3 = 0;
- qi_submit_sync(iommu, &desc, 1, 0); + qi_submit_sync(iommu, &desc, 1, 0, NULL); }
static void diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index ac12f76c1212..660d049ad5b6 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -543,7 +543,7 @@ void intel_drain_pasid_prq(struct device *dev, u32 pasid) QI_DEV_IOTLB_PFSID(info->pfsid); qi_retry: reinit_completion(&iommu->prq_complete); - qi_submit_sync(iommu, desc, 3, QI_OPT_WAIT_DRAIN); + qi_submit_sync(iommu, desc, 3, QI_OPT_WAIT_DRAIN, NULL); if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO) { wait_for_completion(&iommu->prq_complete); goto qi_retry; @@ -646,7 +646,7 @@ static void handle_bad_prq_event(struct intel_iommu *iommu, desc.qw3 = 0; }
- qi_submit_sync(iommu, &desc, 1, 0); + qi_submit_sync(iommu, &desc, 1, 0, NULL); }
static irqreturn_t prq_event_thread(int irq, void *d) @@ -811,7 +811,7 @@ int intel_svm_page_response(struct device *dev, ktime_to_ns(ktime_get()) - prm->private_data[0]); }
- qi_submit_sync(iommu, &desc, 1, 0); + qi_submit_sync(iommu, &desc, 1, 0, NULL); } out: return ret;
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
- fault &= DMA_FSTS_IQE | DMA_FSTS_ITE | DMA_FSTS_ICE;
- if (fault) {
if (fsts)
*fsts |= fault;
do we expect the fault to be accumulated? otherwise it's clearer to just do direct assignment instead of asking for the caller to clear the variable before invocation.
the rest looks good:
Reviewed-by: Kevin Tian kevin.tian@intel.com
On 2023/12/22 12:23, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
- fault &= DMA_FSTS_IQE | DMA_FSTS_ITE | DMA_FSTS_ICE;
- if (fault) {
if (fsts)
*fsts |= fault;
do we expect the fault to be accumulated? otherwise it's clearer to just do direct assignment instead of asking for the caller to clear the variable before invocation.
not quite get. do you mean the fault should not be cleared in the caller side?
the rest looks good:
Reviewed-by: Kevin Tian kevin.tian@intel.com
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, December 26, 2023 12:03 PM
On 2023/12/22 12:23, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
- fault &= DMA_FSTS_IQE | DMA_FSTS_ITE | DMA_FSTS_ICE;
- if (fault) {
if (fsts)
*fsts |= fault;
do we expect the fault to be accumulated? otherwise it's clearer to just do direct assignment instead of asking for the caller to clear the variable before invocation.
not quite get. do you mean the fault should not be cleared in the caller side?
I meant:
if (fsts) *fsts = fault;
unless there is a reason to *OR* the original value.
On 2023/12/26 12:13, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, December 26, 2023 12:03 PM
On 2023/12/22 12:23, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
- fault &= DMA_FSTS_IQE | DMA_FSTS_ITE | DMA_FSTS_ICE;
- if (fault) {
if (fsts)
*fsts |= fault;
do we expect the fault to be accumulated? otherwise it's clearer to just do direct assignment instead of asking for the caller to clear the variable before invocation.
not quite get. do you mean the fault should not be cleared in the caller side?
I meant:
if (fsts) *fsts = fault;
unless there is a reason to *OR* the original value.
I guess no such a reason. :) let me modify it.
On 2023/12/26 14:15, Yi Liu wrote:
On 2023/12/26 12:13, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, December 26, 2023 12:03 PM
On 2023/12/22 12:23, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
+ fault &= DMA_FSTS_IQE | DMA_FSTS_ITE | DMA_FSTS_ICE; + if (fault) { + if (fsts) + *fsts |= fault;
do we expect the fault to be accumulated? otherwise it's clearer to just do direct assignment instead of asking for the caller to clear the variable before invocation.
not quite get. do you mean the fault should not be cleared in the caller side?
I meant:
if (fsts) *fsts = fault;
unless there is a reason to *OR* the original value.
I guess no such a reason. :) let me modify it.
hmmm, replied too soon. The qi_check_fault() would be called multiple times in one invalidation circle as qi_submit_sync() needs to see if any fault happened before the hw writes back QI_DONE to the wait descriptor. There can be ICE which may eventually result in ITE. So caller of qi_check_fault() would continue to wait for QI_DONE. So qi_check_fault() returns 0 to let qi_submit_sync() go on though ICE detected. If we use '*fsts = fault;', then ICE would be missed since the input fsts pointer is the same in one qi_submit_sync() call.
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, December 26, 2023 4:44 PM
On 2023/12/26 14:15, Yi Liu wrote:
On 2023/12/26 12:13, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, December 26, 2023 12:03 PM
On 2023/12/22 12:23, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
+ fault &= DMA_FSTS_IQE | DMA_FSTS_ITE | DMA_FSTS_ICE; + if (fault) { + if (fsts) + *fsts |= fault;
do we expect the fault to be accumulated? otherwise it's clearer to just do direct assignment instead of asking for the caller to clear the variable before invocation.
not quite get. do you mean the fault should not be cleared in the caller side?
I meant:
if (fsts) *fsts = fault;
unless there is a reason to *OR* the original value.
I guess no such a reason. :) let me modify it.
hmmm, replied too soon. The qi_check_fault() would be called multiple times in one invalidation circle as qi_submit_sync() needs to see if any fault happened before the hw writes back QI_DONE to the wait descriptor. There can be ICE which may eventually result in ITE. So caller of qi_check_fault() would continue to wait for QI_DONE. So qi_check_fault() returns 0 to let qi_submit_sync() go on though ICE detected. If we use '*fsts = fault;', then ICE would be missed since the input fsts pointer is the same in one qi_submit_sync() call.
ok, that makes sense then.
-----Original Message----- From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, December 26, 2023 4:44 PM Subject: Re: [PATCH v7 7/9] iommu/vt-d: Allow qi_submit_sync() to return the QI faults
On 2023/12/26 14:15, Yi Liu wrote:
On 2023/12/26 12:13, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, December 26, 2023 12:03 PM
On 2023/12/22 12:23, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
+ fault &= DMA_FSTS_IQE | DMA_FSTS_ITE | DMA_FSTS_ICE; + if (fault) { + if (fsts) + *fsts |= fault;
do we expect the fault to be accumulated? otherwise it's clearer to just do direct assignment instead of asking for the caller to clear the variable before invocation.
not quite get. do you mean the fault should not be cleared in the caller side?
I meant:
if (fsts) *fsts = fault;
unless there is a reason to *OR* the original value.
I guess no such a reason. :) let me modify it.
hmmm, replied too soon. The qi_check_fault() would be called multiple times in one invalidation circle as qi_submit_sync() needs to see if any fault happened before the hw writes back QI_DONE to the wait descriptor. There can be ICE which may eventually result in ITE. So caller of qi_check_fault() would continue to wait for QI_DONE. So qi_check_fault() returns 0 to let qi_submit_sync() go on though ICE detected. If we use '*fsts = fault;', then ICE would be missed since the input fsts pointer is the same in one qi_submit_sync() call.
Is it necessary to return fault to user if qi_check_fault() return -EAGAIN and a restart run succeeds?
Thanks Zhenzhong
On 12/27/2023 5:06 PM, Duan, Zhenzhong wrote:
-----Original Message----- From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, December 26, 2023 4:44 PM Subject: Re: [PATCH v7 7/9] iommu/vt-d: Allow qi_submit_sync() to return the QI faults
On 2023/12/26 14:15, Yi Liu wrote:
On 2023/12/26 12:13, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, December 26, 2023 12:03 PM
On 2023/12/22 12:23, Tian, Kevin wrote:
> From: Liu, Yi L yi.l.liu@intel.com > Sent: Thursday, December 21, 2023 11:40 PM > > + fault &= DMA_FSTS_IQE | DMA_FSTS_ITE | DMA_FSTS_ICE; > + if (fault) { > + if (fsts) > + *fsts |= fault; do we expect the fault to be accumulated? otherwise it's clearer to just do direct assignment instead of asking for the caller to clear the variable before invocation.
not quite get. do you mean the fault should not be cleared in the caller side?
I meant:
if (fsts) *fsts = fault;
unless there is a reason to *OR* the original value.
I guess no such a reason. :) let me modify it.
hmmm, replied too soon. The qi_check_fault() would be called multiple times in one invalidation circle as qi_submit_sync() needs to see if any fault happened before the hw writes back QI_DONE to the wait descriptor. There can be ICE which may eventually result in ITE. So caller of qi_check_fault() would continue to wait for QI_DONE. So qi_check_fault() returns 0 to let qi_submit_sync() go on though ICE detected. If we use '*fsts = fault;', then ICE would be missed since the input fsts pointer is the same in one qi_submit_sync() call.
Is it necessary to return fault to user if qi_check_fault() return -EAGAIN and a restart run succeeds?
Issue a device-TLB invalidation to no response device there is possibility
will be trapped there loop for ITE , never get return.
Thanks,
Ethan
Thanks Zhenzhong
On 2023/12/27 17:33, Ethan Zhao wrote:
On 12/27/2023 5:06 PM, Duan, Zhenzhong wrote:
-----Original Message----- From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, December 26, 2023 4:44 PM Subject: Re: [PATCH v7 7/9] iommu/vt-d: Allow qi_submit_sync() to return the QI faults
On 2023/12/26 14:15, Yi Liu wrote:
On 2023/12/26 12:13, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, December 26, 2023 12:03 PM
On 2023/12/22 12:23, Tian, Kevin wrote: >> From: Liu, Yi L yi.l.liu@intel.com >> Sent: Thursday, December 21, 2023 11:40 PM >> >> + fault &= DMA_FSTS_IQE | DMA_FSTS_ITE | DMA_FSTS_ICE; >> + if (fault) { >> + if (fsts) >> + *fsts |= fault; > do we expect the fault to be accumulated? otherwise it's clearer to > just do direct assignment instead of asking for the caller to clear > the variable before invocation. not quite get. do you mean the fault should not be cleared in the caller side?
I meant:
if (fsts) *fsts = fault;
unless there is a reason to *OR* the original value.
I guess no such a reason. :) let me modify it.
hmmm, replied too soon. The qi_check_fault() would be called multiple times in one invalidation circle as qi_submit_sync() needs to see if any fault happened before the hw writes back QI_DONE to the wait descriptor. There can be ICE which may eventually result in ITE. So caller of qi_check_fault() would continue to wait for QI_DONE. So qi_check_fault() returns 0 to let qi_submit_sync() go on though ICE detected. If we use '*fsts = fault;', then ICE would be missed since the input fsts pointer is the same in one qi_submit_sync() call.
Is it necessary to return fault to user if qi_check_fault() return -EAGAIN and a restart run succeeds?
no need if a restart succeeds. I would add a *fault = 0 per the restart.
Issue a device-TLB invalidation to no response device there is possibility
will be trapped there loop for ITE , never get return.
yes. This the implementation today, in future I think we may need a kind of timeout mechanism, so that it can return and report the error to user. In concept, in nested translation, the page table is owned by userspace, so it makes more sense to let userspace know it and take proper action.
From: Liu, Yi L yi.l.liu@intel.com Sent: Wednesday, December 27, 2023 10:13 PM
On 2023/12/27 17:33, Ethan Zhao wrote:
On 12/27/2023 5:06 PM, Duan, Zhenzhong wrote:
-----Original Message----- From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, December 26, 2023 4:44 PM Subject: Re: [PATCH v7 7/9] iommu/vt-d: Allow qi_submit_sync() to
return
the QI faults
On 2023/12/26 14:15, Yi Liu wrote:
On 2023/12/26 12:13, Tian, Kevin wrote:
> From: Liu, Yi L yi.l.liu@intel.com > Sent: Tuesday, December 26, 2023 12:03 PM > > On 2023/12/22 12:23, Tian, Kevin wrote: >>> From: Liu, Yi L yi.l.liu@intel.com >>> Sent: Thursday, December 21, 2023 11:40 PM >>> >>> + fault &= DMA_FSTS_IQE | DMA_FSTS_ITE | DMA_FSTS_ICE; >>> + if (fault) { >>> + if (fsts) >>> + *fsts |= fault; >> do we expect the fault to be accumulated? otherwise it's clearer to >> just do direct assignment instead of asking for the caller to clear >> the variable before invocation. > not quite get. do you mean the fault should not be cleared in the
caller
> side? > I meant:
if (fsts) *fsts = fault;
unless there is a reason to *OR* the original value.
I guess no such a reason. :) let me modify it.
hmmm, replied too soon. The qi_check_fault() would be called multiple times in one invalidation circle as qi_submit_sync() needs to see if any fault happened before the hw writes back QI_DONE to the wait
descriptor.
There can be ICE which may eventually result in ITE. So caller of qi_check_fault() would continue to wait for QI_DONE. So qi_check_fault() returns 0 to let qi_submit_sync() go on though ICE detected. If we use '*fsts = fault;', then ICE would be missed since the input fsts pointer is the same in one qi_submit_sync() call.
Is it necessary to return fault to user if qi_check_fault() return -EAGAIN and a restart run succeeds?
no need if a restart succeeds. I would add a *fault = 0 per the restart.
Issue a device-TLB invalidation to no response device there is possibility
will be trapped there loop for ITE , never get return.
yes. This the implementation today, in future I think we may need a kind of timeout mechanism, so that it can return and report the error to user. In concept, in nested translation, the page table is owned by userspace, so it makes more sense to let userspace know it and take proper action.
it doesn't make sense to retry upon an invalidation request from userspace. if retry is required that is the policy of guest iommu driver. Also it's not good to introduce a uapi flag which won't be set by current driver.
this can be solved by a simple change in qi_check_fault():
if (qi->desc_status[wait_index] == QI_ABORT) - return -EAGAIN; + return fsts ? -ETIMEDOUT : -EAGAIN;
because if the caller wants to know the fault reason the implication is that the caller will decide how to cope with the fault. It is incorrect for qi_check_fault() to decide.
From: Lu Baolu baolu.lu@linux.intel.com
This makes the pasid based cache invalidation to return QI faults to callers. This is needed when usersapce wants to invalidate cache after modifying the stage-1 page table used in nested translation.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/intel/dmar.c | 13 +++++++------ drivers/iommu/intel/iommu.c | 12 ++++++------ drivers/iommu/intel/iommu.h | 6 +++--- drivers/iommu/intel/pasid.c | 12 +++++++----- drivers/iommu/intel/svm.c | 8 ++++---- 5 files changed, 27 insertions(+), 24 deletions(-)
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index a3e0cb720e06..83e37487f46a 100644 --- a/drivers/iommu/intel/dmar.c +++ b/drivers/iommu/intel/dmar.c @@ -1525,7 +1525,7 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr, }
void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid, - u16 qdep, u64 addr, unsigned mask) + u16 qdep, u64 addr, unsigned mask, u32 *fault) { struct qi_desc desc;
@@ -1552,12 +1552,12 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid, desc.qw2 = 0; desc.qw3 = 0;
- qi_submit_sync(iommu, &desc, 1, 0, NULL); + qi_submit_sync(iommu, &desc, 1, 0, fault); }
/* PASID-based IOTLB invalidation */ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr, - unsigned long npages, bool ih) + unsigned long npages, bool ih, u32 *fault) { struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
@@ -1593,12 +1593,13 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr, QI_EIOTLB_AM(mask); }
- qi_submit_sync(iommu, &desc, 1, 0, NULL); + qi_submit_sync(iommu, &desc, 1, 0, fault); }
/* PASID-based device IOTLB Invalidate */ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid, - u32 pasid, u16 qdep, u64 addr, unsigned int size_order) + u32 pasid, u16 qdep, u64 addr, + unsigned int size_order, u32 *fault) { unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order - 1); struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0}; @@ -1646,7 +1647,7 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid, desc.qw1 |= QI_DEV_EIOTLB_SIZE; }
- qi_submit_sync(iommu, &desc, 1, 0, NULL); + qi_submit_sync(iommu, &desc, 1, 0, fault); }
void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 897159dba47d..68e494f1d03a 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1462,7 +1462,7 @@ static void __iommu_flush_dev_iotlb(struct device_domain_info *info, sid = info->bus << 8 | info->devfn; qdep = info->ats_qdep; qi_flush_dev_iotlb(info->iommu, sid, info->pfsid, - qdep, addr, mask); + qdep, addr, mask, NULL); quirk_extra_dev_tlb_flush(info, addr, mask, IOMMU_NO_PASID, qdep); }
@@ -1490,7 +1490,7 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain, PCI_DEVID(info->bus, info->devfn), info->pfsid, dev_pasid->pasid, info->ats_qdep, addr, - mask); + mask, NULL); } spin_unlock_irqrestore(&domain->lock, flags); } @@ -1505,10 +1505,10 @@ static void domain_flush_pasid_iotlb(struct intel_iommu *iommu,
spin_lock_irqsave(&domain->lock, flags); list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain) - qi_flush_piotlb(iommu, did, dev_pasid->pasid, addr, npages, ih); + qi_flush_piotlb(iommu, did, dev_pasid->pasid, addr, npages, ih, NULL);
if (!list_empty(&domain->devices)) - qi_flush_piotlb(iommu, did, IOMMU_NO_PASID, addr, npages, ih); + qi_flush_piotlb(iommu, did, IOMMU_NO_PASID, addr, npages, ih, NULL); spin_unlock_irqrestore(&domain->lock, flags); }
@@ -5195,10 +5195,10 @@ void quirk_extra_dev_tlb_flush(struct device_domain_info *info, sid = PCI_DEVID(info->bus, info->devfn); if (pasid == IOMMU_NO_PASID) { qi_flush_dev_iotlb(info->iommu, sid, info->pfsid, - qdep, address, mask); + qdep, address, mask, NULL); } else { qi_flush_dev_iotlb_pasid(info->iommu, sid, info->pfsid, - pasid, qdep, address, mask); + pasid, qdep, address, mask, NULL); } }
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index c6de958e4f54..ce9bd08dcd05 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -866,14 +866,14 @@ void qi_flush_context(struct intel_iommu *iommu, u16 did, void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr, unsigned int size_order, u64 type); void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid, - u16 qdep, u64 addr, unsigned mask); + u16 qdep, u64 addr, unsigned mask, u32 *fault);
void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr, - unsigned long npages, bool ih); + unsigned long npages, bool ih, u32 *fault);
void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid, u32 pasid, u16 qdep, u64 addr, - unsigned int size_order); + unsigned int size_order, u32 *fault); void quirk_extra_dev_tlb_flush(struct device_domain_info *info, unsigned long address, unsigned long pages, u32 pasid, u16 qdep); diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index 67f924760ba8..4a7fe551d8a6 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -492,9 +492,11 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu, * efficient to flush devTLB specific to the PASID. */ if (pasid == IOMMU_NO_PASID) - qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT); + qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, + 64 - VTD_PAGE_SHIFT, NULL); else - qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 64 - VTD_PAGE_SHIFT); + qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, + 64 - VTD_PAGE_SHIFT, NULL); }
void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev, @@ -521,7 +523,7 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev, pasid_cache_invalidation_with_pasid(iommu, did, pasid);
if (pgtt == PASID_ENTRY_PGTT_PT || pgtt == PASID_ENTRY_PGTT_FL_ONLY) - qi_flush_piotlb(iommu, did, pasid, 0, -1, 0); + qi_flush_piotlb(iommu, did, pasid, 0, -1, 0, NULL); else iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
@@ -543,7 +545,7 @@ static void pasid_flush_caches(struct intel_iommu *iommu,
if (cap_caching_mode(iommu->cap)) { pasid_cache_invalidation_with_pasid(iommu, did, pasid); - qi_flush_piotlb(iommu, did, pasid, 0, -1, 0); + qi_flush_piotlb(iommu, did, pasid, 0, -1, 0, NULL); } else { iommu_flush_write_buffer(iommu); } @@ -834,7 +836,7 @@ void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu, * Addr[63:12]=0x7FFFFFFF_FFFFF) to affected functions */ pasid_cache_invalidation_with_pasid(iommu, did, pasid); - qi_flush_piotlb(iommu, did, pasid, 0, -1, 0); + qi_flush_piotlb(iommu, did, pasid, 0, -1, 0, NULL);
/* Device IOTLB doesn't need to be flushed in caching mode. */ if (!cap_caching_mode(iommu->cap)) diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index 660d049ad5b6..bf7b4c5c21f4 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -179,11 +179,11 @@ static void __flush_svm_range_dev(struct intel_svm *svm, if (WARN_ON(!pages)) return;
- qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, address, pages, ih); + qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, address, pages, ih, NULL); if (info->ats_enabled) { qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid, svm->pasid, sdev->qdep, address, - order_base_2(pages)); + order_base_2(pages), NULL); quirk_extra_dev_tlb_flush(info, address, order_base_2(pages), svm->pasid, sdev->qdep); } @@ -225,11 +225,11 @@ static void intel_flush_svm_all(struct intel_svm *svm) list_for_each_entry_rcu(sdev, &svm->devs, list) { info = dev_iommu_priv_get(sdev->dev);
- qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, 0, -1UL, 0); + qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, 0, -1UL, 0, NULL); if (info->ats_enabled) { qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid, svm->pasid, sdev->qdep, - 0, 64 - VTD_PAGE_SHIFT); + 0, 64 - VTD_PAGE_SHIFT, NULL); quirk_extra_dev_tlb_flush(info, 0, 64 - VTD_PAGE_SHIFT, svm->pasid, sdev->qdep); }
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
From: Lu Baolu baolu.lu@linux.intel.com
This makes the pasid based cache invalidation to return QI faults to callers.
it's not just for pasid cache invalidation, e.g. there is also change on qi_flush_dev_iotlb().
This is needed when usersapce wants to invalidate cache after modifying the stage-1 page table used in nested translation.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
On 2023/12/22 14:04, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
From: Lu Baolu baolu.lu@linux.intel.com
This makes the pasid based cache invalidation to return QI faults to callers.
it's not just for pasid cache invalidation, e.g. there is also change on qi_flush_dev_iotlb().
yes, it is. May be as below:
This makes the pasid based cache invalidation and device TLB invalidation to return QI faults to callers.
Then the subject may be:
iommu/vt-d: Convert stage-1 cache invalidation to return QI fault
This is needed when usersapce wants to invalidate cache after modifying the stage-1 page table used in nested translation.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
From: Lu Baolu baolu.lu@linux.intel.com
This implements the .cache_invalidate_user() callback to support iotlb flush for nested domain.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Co-developed-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/intel/nested.c | 116 +++++++++++++++++++++++++++++++++++ 1 file changed, 116 insertions(+)
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c index b5a5563ab32c..c665e2647045 100644 --- a/drivers/iommu/intel/nested.c +++ b/drivers/iommu/intel/nested.c @@ -73,9 +73,125 @@ static void intel_nested_domain_free(struct iommu_domain *domain) kfree(to_dmar_domain(domain)); }
+static void nested_flush_pasid_iotlb(struct intel_iommu *iommu, + struct dmar_domain *domain, u64 addr, + unsigned long npages, bool ih) +{ + u16 did = domain_id_iommu(domain, iommu); + unsigned long flags; + + spin_lock_irqsave(&domain->lock, flags); + if (!list_empty(&domain->devices)) + qi_flush_piotlb(iommu, did, IOMMU_NO_PASID, addr, + npages, ih, NULL); + spin_unlock_irqrestore(&domain->lock, flags); +} + +static void nested_flush_dev_iotlb(struct dmar_domain *domain, u64 addr, + unsigned mask, u32 *fault) +{ + struct device_domain_info *info; + unsigned long flags; + u16 sid, qdep; + + spin_lock_irqsave(&domain->lock, flags); + list_for_each_entry(info, &domain->devices, link) { + if (!info->ats_enabled) + continue; + sid = info->bus << 8 | info->devfn; + qdep = info->ats_qdep; + qi_flush_dev_iotlb(info->iommu, sid, info->pfsid, + qdep, addr, mask, fault); + quirk_extra_dev_tlb_flush(info, addr, mask, + IOMMU_NO_PASID, qdep); + } + spin_unlock_irqrestore(&domain->lock, flags); +} + +static void intel_nested_flush_cache(struct dmar_domain *domain, u64 addr, + unsigned long npages, u32 *error) +{ + struct iommu_domain_info *info; + unsigned long i; + unsigned mask; + u32 fault = 0; + + if (npages == U64_MAX) + mask = 64 - VTD_PAGE_SHIFT; + else + mask = ilog2(__roundup_pow_of_two(npages)); + + xa_for_each(&domain->iommu_array, i, info) { + nested_flush_pasid_iotlb(info->iommu, domain, addr, npages, 0); + + if (domain->has_iotlb_device) + continue; + + nested_flush_dev_iotlb(domain, addr, mask, &fault); + if (fault & (DMA_FSTS_ITE | DMA_FSTS_ICE)) + break; + } + + if (fault & DMA_FSTS_ICE) + *error |= IOMMU_HWPT_INVALIDATE_VTD_S1_ICE; + if (fault & DMA_FSTS_ITE) + *error |= IOMMU_HWPT_INVALIDATE_VTD_S1_ITE; +} + +static int intel_nested_cache_invalidate_user(struct iommu_domain *domain, + struct iommu_user_data_array *array) +{ + struct dmar_domain *dmar_domain = to_dmar_domain(domain); + struct iommu_hwpt_vtd_s1_invalidate inv_entry; + u32 processed = 0; + int ret = 0; + u32 index; + + if (array->type != IOMMU_HWPT_INVALIDATE_DATA_VTD_S1) { + ret = -EINVAL; + goto out; + } + + for (index = 0; index < array->entry_num; index++) { + ret = iommu_copy_struct_from_user_array(&inv_entry, array, + IOMMU_HWPT_INVALIDATE_DATA_VTD_S1, + index, inv_error); + if (ret) + break; + + if (inv_entry.flags & ~IOMMU_VTD_INV_FLAGS_LEAF) { + ret = -EOPNOTSUPP; + break; + } + + if (!IS_ALIGNED(inv_entry.addr, VTD_PAGE_SIZE) || + ((inv_entry.npages == U64_MAX) && inv_entry.addr)) { + ret = -EINVAL; + break; + } + + inv_entry.inv_error = 0; + intel_nested_flush_cache(dmar_domain, inv_entry.addr, + inv_entry.npages, &inv_entry.inv_error); + + ret = iommu_respond_struct_to_user_array(array, index, + (void *)&inv_entry, + sizeof(inv_entry)); + if (ret) + break; + + processed++; + } + +out: + array->entry_num = processed; + return ret; +} + static const struct iommu_domain_ops intel_nested_domain_ops = { .attach_dev = intel_nested_attach_dev, .free = intel_nested_domain_free, + .cache_invalidate_user = intel_nested_cache_invalidate_user, };
struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
On 12/21/2023 11:39 PM, Yi Liu wrote:
From: Lu Baolu baolu.lu@linux.intel.com
This implements the .cache_invalidate_user() callback to support iotlb flush for nested domain.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Co-developed-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/intel/nested.c | 116 +++++++++++++++++++++++++++++++++++ 1 file changed, 116 insertions(+)
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c index b5a5563ab32c..c665e2647045 100644 --- a/drivers/iommu/intel/nested.c +++ b/drivers/iommu/intel/nested.c @@ -73,9 +73,125 @@ static void intel_nested_domain_free(struct iommu_domain *domain) kfree(to_dmar_domain(domain)); } +static void nested_flush_pasid_iotlb(struct intel_iommu *iommu,
struct dmar_domain *domain, u64 addr,
unsigned long npages, bool ih)
+{
- u16 did = domain_id_iommu(domain, iommu);
- unsigned long flags;
- spin_lock_irqsave(&domain->lock, flags);
- if (!list_empty(&domain->devices))
qi_flush_piotlb(iommu, did, IOMMU_NO_PASID, addr,
npages, ih, NULL);
- spin_unlock_irqrestore(&domain->lock, flags);
+}
+static void nested_flush_dev_iotlb(struct dmar_domain *domain, u64 addr,
unsigned mask, u32 *fault)
+{
- struct device_domain_info *info;
- unsigned long flags;
- u16 sid, qdep;
- spin_lock_irqsave(&domain->lock, flags);
- list_for_each_entry(info, &domain->devices, link) {
if (!info->ats_enabled)
continue;
sid = info->bus << 8 | info->devfn;
qdep = info->ats_qdep;
qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
qdep, addr, mask, fault);
quirk_extra_dev_tlb_flush(info, addr, mask,
IOMMU_NO_PASID, qdep);
- }
- spin_unlock_irqrestore(&domain->lock, flags);
+}
+static void intel_nested_flush_cache(struct dmar_domain *domain, u64 addr,
unsigned long npages, u32 *error)
+{
- struct iommu_domain_info *info;
- unsigned long i;
- unsigned mask;
- u32 fault = 0;
- if (npages == U64_MAX)
mask = 64 - VTD_PAGE_SHIFT;
- else
mask = ilog2(__roundup_pow_of_two(npages));
- xa_for_each(&domain->iommu_array, i, info) {
nested_flush_pasid_iotlb(info->iommu, domain, addr, npages, 0);
if (domain->has_iotlb_device)
continue;
Shouldn't this be if (!domain->has_iotlb_device)?
nested_flush_dev_iotlb(domain, addr, mask, &fault);
if (fault & (DMA_FSTS_ITE | DMA_FSTS_ICE))
break;
- }
- if (fault & DMA_FSTS_ICE)
*error |= IOMMU_HWPT_INVALIDATE_VTD_S1_ICE;
- if (fault & DMA_FSTS_ITE)
*error |= IOMMU_HWPT_INVALIDATE_VTD_S1_ITE;
+}
+static int intel_nested_cache_invalidate_user(struct iommu_domain *domain,
struct iommu_user_data_array *array)
+{
- struct dmar_domain *dmar_domain = to_dmar_domain(domain);
- struct iommu_hwpt_vtd_s1_invalidate inv_entry;
- u32 processed = 0;
- int ret = 0;
- u32 index;
- if (array->type != IOMMU_HWPT_INVALIDATE_DATA_VTD_S1) {
ret = -EINVAL;
goto out;
- }
- for (index = 0; index < array->entry_num; index++) {
ret = iommu_copy_struct_from_user_array(&inv_entry, array,
IOMMU_HWPT_INVALIDATE_DATA_VTD_S1,
index, inv_error);
if (ret)
break;
if (inv_entry.flags & ~IOMMU_VTD_INV_FLAGS_LEAF) {
ret = -EOPNOTSUPP;
break;
}
if (!IS_ALIGNED(inv_entry.addr, VTD_PAGE_SIZE) ||
((inv_entry.npages == U64_MAX) && inv_entry.addr)) {
ret = -EINVAL;
break;
}
inv_entry.inv_error = 0;
intel_nested_flush_cache(dmar_domain, inv_entry.addr,
inv_entry.npages, &inv_entry.inv_error);
ret = iommu_respond_struct_to_user_array(array, index,
(void *)&inv_entry,
sizeof(inv_entry));
if (ret)
break;
processed++;
- }
+out:
- array->entry_num = processed;
- return ret;
+}
- static const struct iommu_domain_ops intel_nested_domain_ops = { .attach_dev = intel_nested_attach_dev, .free = intel_nested_domain_free,
- .cache_invalidate_user = intel_nested_cache_invalidate_user, };
struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
From: Yang, Weijiang weijiang.yang@intel.com Sent: Friday, December 22, 2023 11:56 AM
- xa_for_each(&domain->iommu_array, i, info) {
nested_flush_pasid_iotlb(info->iommu, domain, addr,
npages, 0);
if (domain->has_iotlb_device)
continue;
Shouldn't this be if (!domain->has_iotlb_device)?
yes that is wrong.
actually it's weird to put domain check in a loop of domain->iommu_array.
that check along with devtlb flush should be done out of that loop.
On Dec 22, 2023, at 14:47, Tian, Kevin kevin.tian@intel.com wrote:
From: Yang, Weijiang weijiang.yang@intel.com Sent: Friday, December 22, 2023 11:56 AM
- xa_for_each(&domain->iommu_array, i, info) {
nested_flush_pasid_iotlb(info->iommu, domain, addr,
npages, 0);
if (domain->has_iotlb_device)
continue;
Shouldn't this be if (!domain->has_iotlb_device)?
yes that is wrong.
actually it's weird to put domain check in a loop of domain->iommu_array.
that check along with devtlb flush should be done out of that loop.
Maybe adding a bool, set it out of the loop, check the bool in the loop.
From: Liu, Yi L yi.l.liu@intel.com Sent: Friday, December 22, 2023 3:02 PM
On Dec 22, 2023, at 14:47, Tian, Kevin kevin.tian@intel.com wrote:
From: Yang, Weijiang weijiang.yang@intel.com Sent: Friday, December 22, 2023 11:56 AM
- xa_for_each(&domain->iommu_array, i, info) {
nested_flush_pasid_iotlb(info->iommu, domain, addr,
npages, 0);
if (domain->has_iotlb_device)
continue;
Shouldn't this be if (!domain->has_iotlb_device)?
yes that is wrong.
actually it's weird to put domain check in a loop of domain->iommu_array.
that check along with devtlb flush should be done out of that loop.
Maybe adding a bool, set it out of the loop, check the bool in the loop.
the point is that dev iotlb doesn't rely on info->iommu:
nested_flush_dev_iotlb(domain, addr, mask, &fault);
then why do it in the loop of info->iommu?
On Dec 22, 2023, at 15:12, Tian, Kevin kevin.tian@intel.com wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Friday, December 22, 2023 3:02 PM
On Dec 22, 2023, at 14:47, Tian, Kevin kevin.tian@intel.com wrote:
From: Yang, Weijiang weijiang.yang@intel.com Sent: Friday, December 22, 2023 11:56 AM
- xa_for_each(&domain->iommu_array, i, info) {
nested_flush_pasid_iotlb(info->iommu, domain, addr,
npages, 0);
if (domain->has_iotlb_device)
continue;
Shouldn't this be if (!domain->has_iotlb_device)?
yes that is wrong.
actually it's weird to put domain check in a loop of domain->iommu_array.
that check along with devtlb flush should be done out of that loop.
Maybe adding a bool, set it out of the loop, check the bool in the loop.
the point is that dev iotlb doesn't rely on info->iommu:
nested_flush_dev_iotlb(domain, addr, mask, &fault);
then why do it in the loop of info->iommu?
yes. It should have another device loop instead.
On 2023/12/22 19:59, Liu, Yi L wrote:
On Dec 22, 2023, at 15:12, Tian, Kevin kevin.tian@intel.com wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Friday, December 22, 2023 3:02 PM
On Dec 22, 2023, at 14:47, Tian, Kevin kevin.tian@intel.com wrote:
From: Yang, Weijiang weijiang.yang@intel.com Sent: Friday, December 22, 2023 11:56 AM
- xa_for_each(&domain->iommu_array, i, info) {
nested_flush_pasid_iotlb(info->iommu, domain, addr,
npages, 0);
if (domain->has_iotlb_device)
continue;
Shouldn't this be if (!domain->has_iotlb_device)?
yes that is wrong.
actually it's weird to put domain check in a loop of domain->iommu_array.
that check along with devtlb flush should be done out of that loop.
Maybe adding a bool, set it out of the loop, check the bool in the loop.
the point is that dev iotlb doesn't rely on info->iommu:
nested_flush_dev_iotlb(domain, addr, mask, &fault);
then why do it in the loop of info->iommu?
yes. It should have another device loop instead.
let me move the device tlb related code out of the info->iommu loop.
On Dec 22, 2023, at 11:56, Yang, Weijiang weijiang.yang@intel.com wrote:
On 12/21/2023 11:39 PM, Yi Liu wrote:
From: Lu Baolu baolu.lu@linux.intel.com
This implements the .cache_invalidate_user() callback to support iotlb flush for nested domain.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Co-developed-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/intel/nested.c | 116 +++++++++++++++++++++++++++++++++++ 1 file changed, 116 insertions(+)
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c index b5a5563ab32c..c665e2647045 100644 --- a/drivers/iommu/intel/nested.c +++ b/drivers/iommu/intel/nested.c @@ -73,9 +73,125 @@ static void intel_nested_domain_free(struct iommu_domain *domain) kfree(to_dmar_domain(domain)); } +static void nested_flush_pasid_iotlb(struct intel_iommu *iommu,
struct dmar_domain *domain, u64 addr,
unsigned long npages, bool ih)
+{
- u16 did = domain_id_iommu(domain, iommu);
- unsigned long flags;
- spin_lock_irqsave(&domain->lock, flags);
- if (!list_empty(&domain->devices))
qi_flush_piotlb(iommu, did, IOMMU_NO_PASID, addr,
npages, ih, NULL);
- spin_unlock_irqrestore(&domain->lock, flags);
+}
+static void nested_flush_dev_iotlb(struct dmar_domain *domain, u64 addr,
unsigned mask, u32 *fault)
+{
- struct device_domain_info *info;
- unsigned long flags;
- u16 sid, qdep;
- spin_lock_irqsave(&domain->lock, flags);
- list_for_each_entry(info, &domain->devices, link) {
if (!info->ats_enabled)
continue;
sid = info->bus << 8 | info->devfn;
qdep = info->ats_qdep;
qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
qdep, addr, mask, fault);
quirk_extra_dev_tlb_flush(info, addr, mask,
IOMMU_NO_PASID, qdep);
- }
- spin_unlock_irqrestore(&domain->lock, flags);
+}
+static void intel_nested_flush_cache(struct dmar_domain *domain, u64 addr,
unsigned long npages, u32 *error)
+{
- struct iommu_domain_info *info;
- unsigned long i;
- unsigned mask;
- u32 fault = 0;
- if (npages == U64_MAX)
mask = 64 - VTD_PAGE_SHIFT;
- else
mask = ilog2(__roundup_pow_of_two(npages));
- xa_for_each(&domain->iommu_array, i, info) {
nested_flush_pasid_iotlb(info->iommu, domain, addr, npages, 0);
if (domain->has_iotlb_device)
continue;
Shouldn't this be if (!domain->has_iotlb_device)?
oops, yes it is.
nested_flush_dev_iotlb(domain, addr, mask, &fault);
if (fault & (DMA_FSTS_ITE | DMA_FSTS_ICE))
break;
- }
- if (fault & DMA_FSTS_ICE)
*error |= IOMMU_HWPT_INVALIDATE_VTD_S1_ICE;
- if (fault & DMA_FSTS_ITE)
*error |= IOMMU_HWPT_INVALIDATE_VTD_S1_ITE;
+}
+static int intel_nested_cache_invalidate_user(struct iommu_domain *domain,
struct iommu_user_data_array *array)
+{
- struct dmar_domain *dmar_domain = to_dmar_domain(domain);
- struct iommu_hwpt_vtd_s1_invalidate inv_entry;
- u32 processed = 0;
- int ret = 0;
- u32 index;
- if (array->type != IOMMU_HWPT_INVALIDATE_DATA_VTD_S1) {
ret = -EINVAL;
goto out;
- }
- for (index = 0; index < array->entry_num; index++) {
ret = iommu_copy_struct_from_user_array(&inv_entry, array,
IOMMU_HWPT_INVALIDATE_DATA_VTD_S1,
index, inv_error);
if (ret)
break;
if (inv_entry.flags & ~IOMMU_VTD_INV_FLAGS_LEAF) {
ret = -EOPNOTSUPP;
break;
}
if (!IS_ALIGNED(inv_entry.addr, VTD_PAGE_SIZE) ||
((inv_entry.npages == U64_MAX) && inv_entry.addr)) {
ret = -EINVAL;
break;
}
inv_entry.inv_error = 0;
intel_nested_flush_cache(dmar_domain, inv_entry.addr,
inv_entry.npages, &inv_entry.inv_error);
ret = iommu_respond_struct_to_user_array(array, index,
(void *)&inv_entry,
sizeof(inv_entry));
if (ret)
break;
processed++;
- }
+out:
- array->entry_num = processed;
- return ret;
+}
static const struct iommu_domain_ops intel_nested_domain_ops = { .attach_dev = intel_nested_attach_dev, .free = intel_nested_domain_free,
- .cache_invalidate_user = intel_nested_cache_invalidate_user,
}; struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
+static void intel_nested_flush_cache(struct dmar_domain *domain, u64 addr,
unsigned long npages, u32 *error)
+{
- struct iommu_domain_info *info;
- unsigned long i;
- unsigned mask;
- u32 fault = 0;
- if (npages == U64_MAX)
mask = 64 - VTD_PAGE_SHIFT;
- else
mask = ilog2(__roundup_pow_of_two(npages));
- xa_for_each(&domain->iommu_array, i, info) {
nested_flush_pasid_iotlb(info->iommu, domain, addr,
npages, 0);
so IOMMU_VTD_INV_FLAGS_LEAF is defined but ignored?
if (domain->has_iotlb_device)
continue;
nested_flush_dev_iotlb(domain, addr, mask, &fault);
if (fault & (DMA_FSTS_ITE | DMA_FSTS_ICE))
break;
here you may add a note that we don't plan to forward invalidation queue error (i.e. IQE) to the caller as it's caused only by driver internal bug.
if (!IS_ALIGNED(inv_entry.addr, VTD_PAGE_SIZE) ||
((inv_entry.npages == U64_MAX) && inv_entry.addr)) {
ret = -EINVAL;
break;
}
why is [non-zero-addr, U64_MAX] an error? Is it explicitly stated to be not supported by underlying helpers?
On 2023/12/22 14:57, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
+static void intel_nested_flush_cache(struct dmar_domain *domain, u64 addr,
unsigned long npages, u32 *error)
+{
- struct iommu_domain_info *info;
- unsigned long i;
- unsigned mask;
- u32 fault = 0;
- if (npages == U64_MAX)
mask = 64 - VTD_PAGE_SHIFT;
- else
mask = ilog2(__roundup_pow_of_two(npages));
- xa_for_each(&domain->iommu_array, i, info) {
nested_flush_pasid_iotlb(info->iommu, domain, addr,
npages, 0);
so IOMMU_VTD_INV_FLAGS_LEAF is defined but ignored?
yeah... it is. It is named as ih in the driver code. But it appears only the below code is set ih. When calling iommu_flush_iotlb_psi(), the 5th parameter (ih) may be true.
static int intel_iommu_memory_notifier(struct notifier_block *nb, unsigned long val, void *v) { struct memory_notify *mhp = v; unsigned long start_vpfn = mm_to_dma_pfn(mhp->start_pfn); unsigned long last_vpfn = mm_to_dma_pfn(mhp->start_pfn + mhp->nr_pages - 1);
switch (val) { case MEM_GOING_ONLINE: if (iommu_domain_identity_map(si_domain, start_vpfn, last_vpfn)) { pr_warn("Failed to build identity map for [%lx-%lx]\n", start_vpfn, last_vpfn); return NOTIFY_BAD; } break;
case MEM_OFFLINE: case MEM_CANCEL_ONLINE: { struct dmar_drhd_unit *drhd; struct intel_iommu *iommu; LIST_HEAD(freelist);
domain_unmap(si_domain, start_vpfn, last_vpfn, &freelist);
rcu_read_lock(); for_each_active_iommu(iommu, drhd) iommu_flush_iotlb_psi(iommu, si_domain, start_vpfn, mhp->nr_pages, list_empty(&freelist), 0); rcu_read_unlock(); put_pages_list(&freelist); } break; }
return NOTIFY_OK; }
if (domain->has_iotlb_device)
continue;
nested_flush_dev_iotlb(domain, addr, mask, &fault);
if (fault & (DMA_FSTS_ITE | DMA_FSTS_ICE))
break;
here you may add a note that we don't plan to forward invalidation queue error (i.e. IQE) to the caller as it's caused only by driver internal bug.
yes.
if (!IS_ALIGNED(inv_entry.addr, VTD_PAGE_SIZE) ||
((inv_entry.npages == U64_MAX) && inv_entry.addr)) {
ret = -EINVAL;
break;
}
why is [non-zero-addr, U64_MAX] an error? Is it explicitly stated to be not supported by underlying helpers?
no such limitation by underlying helpers. But in such case, the addr+npages*PAGE_SIZE would exceed U64_MAX, this seems a bit strange. But I'm fine to relax the check since the underlying helper only checks npages when determining paid-selective or not.
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, December 26, 2023 12:52 PM
if (!IS_ALIGNED(inv_entry.addr, VTD_PAGE_SIZE) ||
((inv_entry.npages == U64_MAX) && inv_entry.addr)) {
ret = -EINVAL;
break;
}
why is [non-zero-addr, U64_MAX] an error? Is it explicitly stated to be not supported by underlying helpers?
no such limitation by underlying helpers. But in such case, the addr+npages*PAGE_SIZE would exceed U64_MAX, this seems a bit strange. But I'm fine to relax the check since the underlying helper only checks npages when determining paid-selective or not.
I overlooked npages as end. let's keep the check.
On 2023/12/26 12:51, Yi Liu wrote:
On 2023/12/22 14:57, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, December 21, 2023 11:40 PM
+static void intel_nested_flush_cache(struct dmar_domain *domain, u64 addr, + unsigned long npages, u32 *error) +{ + struct iommu_domain_info *info; + unsigned long i; + unsigned mask; + u32 fault = 0;
+ if (npages == U64_MAX) + mask = 64 - VTD_PAGE_SHIFT; + else + mask = ilog2(__roundup_pow_of_two(npages));
+ xa_for_each(&domain->iommu_array, i, info) { + nested_flush_pasid_iotlb(info->iommu, domain, addr, npages, 0);
so IOMMU_VTD_INV_FLAGS_LEAF is defined but ignored?
yeah... it is. It is named as ih in the driver code. But it appears only the below code is set ih. When calling iommu_flush_iotlb_psi(), the 5th parameter (ih) may be true.
static int intel_iommu_memory_notifier(struct notifier_block *nb, unsigned long val, void *v) { struct memory_notify *mhp = v; unsigned long start_vpfn = mm_to_dma_pfn(mhp->start_pfn); unsigned long last_vpfn = mm_to_dma_pfn(mhp->start_pfn + mhp->nr_pages - 1);
switch (val) { case MEM_GOING_ONLINE: if (iommu_domain_identity_map(si_domain, start_vpfn, last_vpfn)) { pr_warn("Failed to build identity map for [%lx-%lx]\n", start_vpfn, last_vpfn); return NOTIFY_BAD; } break;
case MEM_OFFLINE: case MEM_CANCEL_ONLINE: { struct dmar_drhd_unit *drhd; struct intel_iommu *iommu; LIST_HEAD(freelist);
domain_unmap(si_domain, start_vpfn, last_vpfn, &freelist);
rcu_read_lock(); for_each_active_iommu(iommu, drhd) iommu_flush_iotlb_psi(iommu, si_domain, start_vpfn, mhp->nr_pages, list_empty(&freelist), 0); rcu_read_unlock(); put_pages_list(&freelist); } break; }
return NOTIFY_OK; }
I passed this flag to the intel_nested_flush_cache() now as the helper accepts an ih parameter.
-----Original Message----- From: Liu, Yi L yi.l.liu@intel.com Subject: [PATCH v7 9/9] iommu/vt-d: Add iotlb flush for nested domain
From: Lu Baolu baolu.lu@linux.intel.com
This implements the .cache_invalidate_user() callback to support iotlb flush for nested domain.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Co-developed-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/intel/nested.c | 116 +++++++++++++++++++++++++++++++++++ 1 file changed, 116 insertions(+)
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c index b5a5563ab32c..c665e2647045 100644 --- a/drivers/iommu/intel/nested.c +++ b/drivers/iommu/intel/nested.c @@ -73,9 +73,125 @@ static void intel_nested_domain_free(struct iommu_domain *domain) kfree(to_dmar_domain(domain)); }
+static void nested_flush_pasid_iotlb(struct intel_iommu *iommu,
struct dmar_domain *domain, u64 addr,
unsigned long npages, bool ih)
+{
- u16 did = domain_id_iommu(domain, iommu);
- unsigned long flags;
- spin_lock_irqsave(&domain->lock, flags);
- if (!list_empty(&domain->devices))
qi_flush_piotlb(iommu, did, IOMMU_NO_PASID, addr,
npages, ih, NULL);
Is it optimal to check if domain attached to iommu before trigger flush? Or the check is redundant if intel_nested_flush_cache() is the only call site.
Thanks Zhenzhong
- spin_unlock_irqrestore(&domain->lock, flags);
+}
+static void nested_flush_dev_iotlb(struct dmar_domain *domain, u64 addr,
unsigned mask, u32 *fault)
+{
- struct device_domain_info *info;
- unsigned long flags;
- u16 sid, qdep;
- spin_lock_irqsave(&domain->lock, flags);
- list_for_each_entry(info, &domain->devices, link) {
if (!info->ats_enabled)
continue;
sid = info->bus << 8 | info->devfn;
qdep = info->ats_qdep;
qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
qdep, addr, mask, fault);
quirk_extra_dev_tlb_flush(info, addr, mask,
IOMMU_NO_PASID, qdep);
- }
- spin_unlock_irqrestore(&domain->lock, flags);
+}
+static void intel_nested_flush_cache(struct dmar_domain *domain, u64 addr,
unsigned long npages, u32 *error)
+{
- struct iommu_domain_info *info;
- unsigned long i;
- unsigned mask;
- u32 fault = 0;
- if (npages == U64_MAX)
mask = 64 - VTD_PAGE_SHIFT;
- else
mask = ilog2(__roundup_pow_of_two(npages));
- xa_for_each(&domain->iommu_array, i, info) {
nested_flush_pasid_iotlb(info->iommu, domain, addr,
npages, 0);
if (domain->has_iotlb_device)
continue;
nested_flush_dev_iotlb(domain, addr, mask, &fault);
if (fault & (DMA_FSTS_ITE | DMA_FSTS_ICE))
break;
- }
- if (fault & DMA_FSTS_ICE)
*error |= IOMMU_HWPT_INVALIDATE_VTD_S1_ICE;
- if (fault & DMA_FSTS_ITE)
*error |= IOMMU_HWPT_INVALIDATE_VTD_S1_ITE;
+}
+static int intel_nested_cache_invalidate_user(struct iommu_domain *domain,
struct iommu_user_data_array
*array) +{
- struct dmar_domain *dmar_domain = to_dmar_domain(domain);
- struct iommu_hwpt_vtd_s1_invalidate inv_entry;
- u32 processed = 0;
- int ret = 0;
- u32 index;
- if (array->type != IOMMU_HWPT_INVALIDATE_DATA_VTD_S1) {
ret = -EINVAL;
goto out;
- }
- for (index = 0; index < array->entry_num; index++) {
ret = iommu_copy_struct_from_user_array(&inv_entry,
array,
- IOMMU_HWPT_INVALIDATE_DATA_VTD_S1,
index, inv_error);
if (ret)
break;
if (inv_entry.flags & ~IOMMU_VTD_INV_FLAGS_LEAF) {
ret = -EOPNOTSUPP;
break;
}
if (!IS_ALIGNED(inv_entry.addr, VTD_PAGE_SIZE) ||
((inv_entry.npages == U64_MAX) && inv_entry.addr)) {
ret = -EINVAL;
break;
}
inv_entry.inv_error = 0;
intel_nested_flush_cache(dmar_domain, inv_entry.addr,
inv_entry.npages,
&inv_entry.inv_error);
ret = iommu_respond_struct_to_user_array(array, index,
(void *)&inv_entry,
sizeof(inv_entry));
if (ret)
break;
processed++;
- }
+out:
- array->entry_num = processed;
- return ret;
+}
static const struct iommu_domain_ops intel_nested_domain_ops = { .attach_dev = intel_nested_attach_dev, .free = intel_nested_domain_free,
- .cache_invalidate_user = intel_nested_cache_invalidate_user,
};
struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain
*parent,
2.34.1
On 2023/12/27 17:27, Duan, Zhenzhong wrote:
-----Original Message----- From: Liu, Yi L yi.l.liu@intel.com Subject: [PATCH v7 9/9] iommu/vt-d: Add iotlb flush for nested domain
From: Lu Baolu baolu.lu@linux.intel.com
This implements the .cache_invalidate_user() callback to support iotlb flush for nested domain.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Co-developed-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/intel/nested.c | 116 +++++++++++++++++++++++++++++++++++ 1 file changed, 116 insertions(+)
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c index b5a5563ab32c..c665e2647045 100644 --- a/drivers/iommu/intel/nested.c +++ b/drivers/iommu/intel/nested.c @@ -73,9 +73,125 @@ static void intel_nested_domain_free(struct iommu_domain *domain) kfree(to_dmar_domain(domain)); }
+static void nested_flush_pasid_iotlb(struct intel_iommu *iommu,
struct dmar_domain *domain, u64 addr,
unsigned long npages, bool ih)
+{
- u16 did = domain_id_iommu(domain, iommu);
- unsigned long flags;
- spin_lock_irqsave(&domain->lock, flags);
- if (!list_empty(&domain->devices))
qi_flush_piotlb(iommu, did, IOMMU_NO_PASID, addr,
npages, ih, NULL);
Is it optimal to check if domain attached to iommu before trigger flush? Or the check is redundant if intel_nested_flush_cache() is the only call site.
I think it is possible that userspace issue an invalidation on a hwpt which does not have any device attached.. Though this is something stupid. So checking if any device attached before flushing still makes sense.
Thanks Zhenzhong
- spin_unlock_irqrestore(&domain->lock, flags);
+}
+static void nested_flush_dev_iotlb(struct dmar_domain *domain, u64 addr,
unsigned mask, u32 *fault)
+{
- struct device_domain_info *info;
- unsigned long flags;
- u16 sid, qdep;
- spin_lock_irqsave(&domain->lock, flags);
- list_for_each_entry(info, &domain->devices, link) {
if (!info->ats_enabled)
continue;
sid = info->bus << 8 | info->devfn;
qdep = info->ats_qdep;
qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
qdep, addr, mask, fault);
quirk_extra_dev_tlb_flush(info, addr, mask,
IOMMU_NO_PASID, qdep);
- }
- spin_unlock_irqrestore(&domain->lock, flags);
+}
+static void intel_nested_flush_cache(struct dmar_domain *domain, u64 addr,
unsigned long npages, u32 *error)
+{
- struct iommu_domain_info *info;
- unsigned long i;
- unsigned mask;
- u32 fault = 0;
- if (npages == U64_MAX)
mask = 64 - VTD_PAGE_SHIFT;
- else
mask = ilog2(__roundup_pow_of_two(npages));
- xa_for_each(&domain->iommu_array, i, info) {
nested_flush_pasid_iotlb(info->iommu, domain, addr,
npages, 0);
if (domain->has_iotlb_device)
continue;
nested_flush_dev_iotlb(domain, addr, mask, &fault);
if (fault & (DMA_FSTS_ITE | DMA_FSTS_ICE))
break;
- }
- if (fault & DMA_FSTS_ICE)
*error |= IOMMU_HWPT_INVALIDATE_VTD_S1_ICE;
- if (fault & DMA_FSTS_ITE)
*error |= IOMMU_HWPT_INVALIDATE_VTD_S1_ITE;
+}
+static int intel_nested_cache_invalidate_user(struct iommu_domain *domain,
struct iommu_user_data_array
*array) +{
- struct dmar_domain *dmar_domain = to_dmar_domain(domain);
- struct iommu_hwpt_vtd_s1_invalidate inv_entry;
- u32 processed = 0;
- int ret = 0;
- u32 index;
- if (array->type != IOMMU_HWPT_INVALIDATE_DATA_VTD_S1) {
ret = -EINVAL;
goto out;
- }
- for (index = 0; index < array->entry_num; index++) {
ret = iommu_copy_struct_from_user_array(&inv_entry,
array,
- IOMMU_HWPT_INVALIDATE_DATA_VTD_S1,
index, inv_error);
if (ret)
break;
if (inv_entry.flags & ~IOMMU_VTD_INV_FLAGS_LEAF) {
ret = -EOPNOTSUPP;
break;
}
if (!IS_ALIGNED(inv_entry.addr, VTD_PAGE_SIZE) ||
((inv_entry.npages == U64_MAX) && inv_entry.addr)) {
ret = -EINVAL;
break;
}
inv_entry.inv_error = 0;
intel_nested_flush_cache(dmar_domain, inv_entry.addr,
inv_entry.npages,
&inv_entry.inv_error);
ret = iommu_respond_struct_to_user_array(array, index,
(void *)&inv_entry,
sizeof(inv_entry));
if (ret)
break;
processed++;
- }
+out:
- array->entry_num = processed;
- return ret;
+}
static const struct iommu_domain_ops intel_nested_domain_ops = { .attach_dev = intel_nested_attach_dev, .free = intel_nested_domain_free,
- .cache_invalidate_user = intel_nested_cache_invalidate_user,
};
struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain
*parent,
2.34.1
linux-kselftest-mirror@lists.linaro.org