Following the previous vIOMMU series, this adds another vDEVICE structure, representing the association from an iommufd_device to an iommufd_viommu. This gives the whole architecture a new "v" layer: _______________________________________________________________________ | iommufd (with vIOMMU/vDEVICE) | | _____________ _____________ | | | | | | | | |----------------| vIOMMU |<---| vDEVICE |<------| | | | | | |_____________| | | | | ______ | | _____________ ___|____ | | | | | | | | | | | | | | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | | | |______| |_____________| |_____________| |________| | |______|________|______________|__________________|_______________|_____| | | | | | ______v_____ | ______v_____ ______v_____ ___v__ | struct | | PFN | (paging) | | (nested) | |struct| |iommu_device| |------>|iommu_domain|<----|iommu_domain|<----|device| |____________| storage|____________| |____________| |______|
This vDEVICE object is used to collect and store all vIOMMU-related device information/attributes in a VM. As an initial series for vDEVICE, add only the virt_id to the vDEVICE, which is a vIOMMU specific device ID in a VM: e.g. vSID of ARM SMMUv3, vDeviceID of AMD IOMMU, and vID of Intel VT-d to a Context Table. This virt_id helps IOMMU drivers to link the vID to a pID of the device against the physical IOMMU instance. This is essential for a vIOMMU-based invalidation, where the request contains a device's vID for a device cache flush, e.g. ATC invalidation.
Therefore, with this vDEVICE object, support a vIOMMU-based invalidation, by reusing IOMMUFD_CMD_HWPT_INVALIDATE for a vIOMMU object to flush cache with a given driver data.
As for the implementation of the series, add driver support in ARM SMMUv3 for a real world use case.
This series is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p2-v5
For testing, try this "with-rmr" branch: https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p2-v5-with-rmr Paring QEMU branch for testing: https://github.com/nicolinc/qemu/commits/wip/for_iommufd_viommu_p2-v5
Changelog v5 * Dropped driver-allocated vDEVICE support * Changed vdev_to_dev helper to iommufd_viommu_find_dev v4 https://lore.kernel.org/all/cover.1729555967.git.nicolinc@nvidia.com/ * Added missing brackets in switch-case * Fixed the unreleased idev refcount issue * Reworked the iommufd_vdevice_alloc allocator * Dropped support for IOMMU_VIOMMU_TYPE_DEFAULT * Added missing TEST_LENGTH and fail_nth coverages * Added a verification to the driver-allocated vDEVICE object * Added an iommufd_vdevice_abort for a missing mutex protection * Added a u64 structure arm_vsmmu_invalidation_cmd for user command conversion v3 https://lore.kernel.org/all/cover.1728491532.git.nicolinc@nvidia.com/ * Added Jason's Reviewed-by * Split this invalidation part out of the part-1 series * Repurposed VDEV_ID ioctl to a wider vDEVICE structure and ioctl * Reduced viommu_api functions by allowing drivers to access viommu and vdevice structure directly * Dropped vdevs_rwsem by using xa_lock instead * Dropped arm_smmu_cache_invalidate_user v2 https://lore.kernel.org/all/cover.1724776335.git.nicolinc@nvidia.com/ * Limited vdev_id to one per idev * Added a rw_sem to protect the vdev_id list * Reworked driver-level APIs with proper lockings * Added a new viommu_api file for IOMMUFD_DRIVER config * Dropped useless iommu_dev point from the viommu structure * Added missing index numnbers to new types in the uAPI header * Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one * Reworked mock_viommu_cache_invalidate() using the new iommu helper * Reordered details of set/unset_vdev_id handlers for proper lockings v1 https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/
Thanks! Nicolin
Jason Gunthorpe (2): iommu: Add iommu_copy_struct_from_full_user_array helper iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED
Nicolin Chen (11): iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage iommu/viommu: Add cache_invalidate to iommufd_viommu_ops iommufd/hw_pagetable: Enforce invalidation op on vIOMMU-based hwpt_nested iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE iommufd/viommu: Add iommufd_viommu_find_dev helper iommufd/selftest: Add mock_viommu_cache_invalidate iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command iommufd/selftest: Add vIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl Documentation: userspace-api: iommufd: Update vDEVICE iommu/arm-smmu-v3: Add arm_vsmmu_cache_invalidate
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 9 +- drivers/iommu/iommufd/iommufd_private.h | 20 ++ drivers/iommu/iommufd/iommufd_test.h | 30 +++ include/linux/iommu.h | 48 ++++- include/linux/iommufd.h | 21 ++ include/uapi/linux/iommufd.h | 61 +++++- tools/testing/selftests/iommu/iommufd_utils.h | 83 +++++++ .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 161 +++++++++++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 32 ++- drivers/iommu/iommufd/device.c | 11 + drivers/iommu/iommufd/driver.c | 13 ++ drivers/iommu/iommufd/hw_pagetable.c | 36 +++- drivers/iommu/iommufd/main.c | 7 + drivers/iommu/iommufd/selftest.c | 98 ++++++++- drivers/iommu/iommufd/viommu.c | 101 +++++++++ tools/testing/selftests/iommu/iommufd.c | 204 +++++++++++++++++- .../selftests/iommu/iommufd_fail_nth.c | 4 + Documentation/userspace-api/iommufd.rst | 41 +++- 18 files changed, 942 insertions(+), 38 deletions(-)
Introduce a new IOMMUFD_OBJ_VDEVICE to represent a physical device, i.e. iommufd_device (idev) object, against an iommufd_viommu (vIOMMU) object in the VM. This vDEVICE object (and its structure) holds all the information and attributes in a VM, regarding the device related to the vIOMMU.
As an initial patch, add a per-vIOMMU virtual ID. This can be: - Virtual StreamID on a nested ARM SMMUv3, an index to a Stream Table - Virtual DeviceID on a nested AMD IOMMU, an index to a Device Table - Virtual ID on a nested Intel VT-D IOMMU, an index to a Context Table Potentially, this vDEVICE structure would hold some vData for Confidential Compute Architecture (CCA). Use this virtual ID to index an "vdevs" xarray that belongs to a vIOMMU object.
Add a new ioctl for vDEVICE allocations. Since a vDEVICE is a connection of an iommufd_device object and an iommufd_viommu object, require both as the ioctl inputs and take refcounts in the ioctl handler.
Then, let the idev structure hold the allocated vdev pointer with a proper locking protection.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 20 +++++ include/linux/iommufd.h | 3 + include/uapi/linux/iommufd.h | 26 ++++++ drivers/iommu/iommufd/device.c | 11 +++ drivers/iommu/iommufd/main.c | 7 ++ drivers/iommu/iommufd/viommu.c | 101 ++++++++++++++++++++++++ 6 files changed, 168 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 8c9ab35eaea5..365cf5a56cdf 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -391,6 +391,7 @@ struct iommufd_device { struct iommufd_object obj; struct iommufd_ctx *ictx; struct iommufd_group *igroup; + struct iommufd_vdevice *vdev; struct list_head group_item; /* always the physical device */ struct device *dev; @@ -505,8 +506,27 @@ static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev, return iommu_group_replace_domain(idev->igroup->group, hwpt->domain); }
+static inline struct iommufd_viommu * +iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id) +{ + return container_of(iommufd_get_object(ucmd->ictx, id, + IOMMUFD_OBJ_VIOMMU), + struct iommufd_viommu, obj); +} + int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); void iommufd_viommu_destroy(struct iommufd_object *obj); +int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd); +void iommufd_vdevice_destroy(struct iommufd_object *obj); +void iommufd_vdevice_abort(struct iommufd_object *obj); + +struct iommufd_vdevice { + struct iommufd_object obj; + struct iommufd_ctx *ictx; + struct iommufd_device *idev; + struct iommufd_viommu *viommu; + u64 id; /* per-vIOMMU virtual ID */ +};
#ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 083ceb209704..e6cd288e8b83 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -31,6 +31,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_ACCESS, IOMMUFD_OBJ_FAULT, IOMMUFD_OBJ_VIOMMU, + IOMMUFD_OBJ_VDEVICE, #ifdef CONFIG_IOMMUFD_TEST IOMMUFD_OBJ_SELFTEST, #endif @@ -89,6 +90,8 @@ struct iommufd_viommu {
const struct iommufd_viommu_ops *ops;
+ struct xarray vdevs; + unsigned int type; };
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 56c742106a45..b699ecb7aa9c 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -52,6 +52,7 @@ enum { IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d, IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e, IOMMUFD_CMD_VIOMMU_ALLOC = 0x8f, + IOMMUFD_CMD_VDEVICE_ALLOC = 0x90, };
/** @@ -896,4 +897,29 @@ struct iommu_viommu_alloc { __u32 out_viommu_id; }; #define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC) + +/** + * struct iommu_vdevice_alloc - ioctl(IOMMU_VDEVICE_ALLOC) + * @size: sizeof(struct iommu_vdevice_alloc) + * @viommu_id: vIOMMU ID to associate with the virtual device + * @dev_id: The pyhsical device to allocate a virtual instance on the vIOMMU + * @__reserved: Must be 0 + * @virt_id: Virtual device ID per vIOMMU, e.g. vSID of ARM SMMUv3, vDeviceID + * of AMD IOMMU, and vID of a nested Intel VT-d to a Context Table. + * @out_vdevice_id: Output virtual instance ID for the allocated object + * @__reserved2: Must be 0 + * + * Allocate a virtual device instance (for a physical device) against a vIOMMU. + * This instance holds the device's information (related to its vIOMMU) in a VM. + */ +struct iommu_vdevice_alloc { + __u32 size; + __u32 viommu_id; + __u32 dev_id; + __u32 __reserved; + __aligned_u64 virt_id; + __u32 out_vdevice_id; + __u32 __reserved2; +}; +#define IOMMU_VDEVICE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VDEVICE_ALLOC) #endif diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 5fd3dd420290..e50113305a9c 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -277,6 +277,17 @@ EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD); */ void iommufd_device_unbind(struct iommufd_device *idev) { + u32 vdev_id = 0; + + /* idev->vdev object should be destroyed prior, yet just in case.. */ + mutex_lock(&idev->igroup->lock); + if (idev->vdev) + vdev_id = idev->vdev->obj.id; + mutex_unlock(&idev->igroup->lock); + /* Relying on xa_lock against a race with iommufd_destroy() */ + if (vdev_id) + iommufd_object_remove(idev->ictx, NULL, vdev_id, 0); + iommufd_object_destroy_user(idev->ictx, &idev->obj); } EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index ab5ee325d809..696ac9e0e74b 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -322,6 +322,7 @@ union ucmd_buffer { struct iommu_option option; struct iommu_vfio_ioas vfio_ioas; struct iommu_viommu_alloc viommu; + struct iommu_vdevice_alloc vdev; #ifdef CONFIG_IOMMUFD_TEST struct iommu_test_cmd test; #endif @@ -375,6 +376,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { __reserved), IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, struct iommu_viommu_alloc, out_viommu_id), + IOCTL_OP(IOMMU_VDEVICE_ALLOC, iommufd_vdevice_alloc_ioctl, + struct iommu_vdevice_alloc, __reserved2), #ifdef CONFIG_IOMMUFD_TEST IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), #endif @@ -513,6 +516,10 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { [IOMMUFD_OBJ_VIOMMU] = { .destroy = iommufd_viommu_destroy, }, + [IOMMUFD_OBJ_VDEVICE] = { + .destroy = iommufd_vdevice_destroy, + .abort = iommufd_vdevice_abort, + }, #ifdef CONFIG_IOMMUFD_TEST [IOMMUFD_OBJ_SELFTEST] = { .destroy = iommufd_selftest_destroy, diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index eb41e15ebab1..2b9a9a80298d 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -12,6 +12,7 @@ void iommufd_viommu_destroy(struct iommufd_object *obj) if (viommu->ops && viommu->ops->free) viommu->ops->free(viommu); refcount_dec(&viommu->hwpt->common.obj.users); + xa_destroy(&viommu->vdevs); }
int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) @@ -70,6 +71,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) */ viommu->iommu_dev = __iommu_get_iommu_dev(idev->dev);
+ xa_init(&viommu->vdevs); refcount_inc(&viommu->hwpt->common.obj.users);
cmd->out_viommu_id = viommu->obj.id; @@ -87,3 +89,102 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) iommufd_put_object(ucmd->ictx, &idev->obj); return rc; } + +void iommufd_vdevice_abort(struct iommufd_object *obj) +{ + struct iommufd_vdevice *old, + *vdev = container_of(obj, struct iommufd_vdevice, obj); + struct iommufd_viommu *viommu = vdev->viommu; + struct iommufd_device *idev = vdev->idev; + + lockdep_assert_held(&idev->igroup->lock); + + old = xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); + if (old) + WARN_ON(old != vdev); + + refcount_dec(&viommu->obj.users); + refcount_dec(&idev->obj.users); + idev->vdev = NULL; +} + +void iommufd_vdevice_destroy(struct iommufd_object *obj) +{ + struct iommufd_vdevice *vdev = + container_of(obj, struct iommufd_vdevice, obj); + + mutex_lock(&vdev->idev->igroup->lock); + iommufd_vdevice_abort(obj); + mutex_unlock(&vdev->idev->igroup->lock); +} + +int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) +{ + struct iommu_vdevice_alloc *cmd = ucmd->cmd; + struct iommufd_vdevice *vdev, *curr; + struct iommufd_viommu *viommu; + struct iommufd_device *idev; + u64 virt_id = cmd->virt_id; + int rc = 0; + + if (virt_id > ULONG_MAX) + return -EINVAL; + + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); + if (IS_ERR(viommu)) + return PTR_ERR(viommu); + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) { + rc = PTR_ERR(idev); + goto out_put_viommu; + } + + mutex_lock(&idev->igroup->lock); + if (idev->vdev) { + rc = -EEXIST; + goto out_unlock_igroup; + } + + vdev = iommufd_object_alloc(ucmd->ictx, vdev, IOMMUFD_OBJ_VDEVICE); + if (IS_ERR(vdev)) { + rc = PTR_ERR(vdev); + goto out_unlock_igroup; + } + + rc = iommufd_verify_unfinalized_object(ucmd->ictx, &vdev->obj); + if (rc) { + kfree(vdev); + goto out_unlock_igroup; + } + + vdev->idev = idev; + vdev->id = virt_id; + vdev->viommu = viommu; + + idev->vdev = vdev; + refcount_inc(&idev->obj.users); + refcount_inc(&viommu->obj.users); + + curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL); + if (curr) { + rc = xa_err(curr) ?: -EBUSY; + goto out_abort; + } + + cmd->out_vdevice_id = vdev->obj.id; + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + if (rc) + goto out_abort; + iommufd_object_finalize(ucmd->ictx, &vdev->obj); + goto out_unlock_igroup; + +out_abort: + iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj); +out_unlock_igroup: + mutex_unlock(&idev->igroup->lock); + iommufd_put_object(ucmd->ictx, &idev->obj); +out_put_viommu: + iommufd_put_object(ucmd->ictx, &viommu->obj); + return rc; +}
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:51 AM
+/**
- struct iommu_vdevice_alloc - ioctl(IOMMU_VDEVICE_ALLOC)
- @size: sizeof(struct iommu_vdevice_alloc)
- @viommu_id: vIOMMU ID to associate with the virtual device
- @dev_id: The pyhsical device to allocate a virtual instance on the
vIOMMU
s/pyhsical/physical/, or just say 'iommufd device"
+int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) +{
- struct iommu_vdevice_alloc *cmd = ucmd->cmd;
- struct iommufd_vdevice *vdev, *curr;
- struct iommufd_viommu *viommu;
- struct iommufd_device *idev;
- u64 virt_id = cmd->virt_id;
- int rc = 0;
- if (virt_id > ULONG_MAX)
return -EINVAL;
- viommu = iommufd_get_viommu(ucmd, cmd->viommu_id);
- if (IS_ERR(viommu))
return PTR_ERR(viommu);
- idev = iommufd_get_device(ucmd, cmd->dev_id);
- if (IS_ERR(idev)) {
rc = PTR_ERR(idev);
goto out_put_viommu;
- }
- mutex_lock(&idev->igroup->lock);
- if (idev->vdev) {
rc = -EEXIST;
goto out_unlock_igroup;
- }
- vdev = iommufd_object_alloc(ucmd->ictx, vdev,
IOMMUFD_OBJ_VDEVICE);
- if (IS_ERR(vdev)) {
rc = PTR_ERR(vdev);
goto out_unlock_igroup;
- }
also need to check that the device and the viommu are associated to a same physical iommu.
- rc = iommufd_verify_unfinalized_object(ucmd->ictx, &vdev->obj);
- if (rc) {
kfree(vdev);
goto out_unlock_igroup;
- }
- vdev->idev = idev;
- vdev->id = virt_id;
- vdev->viommu = viommu;
- idev->vdev = vdev;
- refcount_inc(&idev->obj.users);
- refcount_inc(&viommu->obj.users);
- curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev,
GFP_KERNEL);
- if (curr) {
rc = xa_err(curr) ?: -EBUSY;
goto out_abort;
- }
- cmd->out_vdevice_id = vdev->obj.id;
- rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
- if (rc)
goto out_abort;
- iommufd_object_finalize(ucmd->ictx, &vdev->obj);
- goto out_unlock_igroup;
+out_abort:
- iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj);
+out_unlock_igroup:
- mutex_unlock(&idev->igroup->lock);
- iommufd_put_object(ucmd->ictx, &idev->obj);
+out_put_viommu:
- iommufd_put_object(ucmd->ictx, &viommu->obj);
- return rc;
+}
2.43.0
On Mon, Oct 28, 2024 at 03:11:32AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:51 AM
+/**
- struct iommu_vdevice_alloc - ioctl(IOMMU_VDEVICE_ALLOC)
- @size: sizeof(struct iommu_vdevice_alloc)
- @viommu_id: vIOMMU ID to associate with the virtual device
- @dev_id: The pyhsical device to allocate a virtual instance on the
vIOMMU
s/pyhsical/physical/, or just say 'iommufd device"
Ack for "physical", aligning with other @dev_id lines.
+int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) +{
struct iommu_vdevice_alloc *cmd = ucmd->cmd;
struct iommufd_vdevice *vdev, *curr;
struct iommufd_viommu *viommu;
struct iommufd_device *idev;
u64 virt_id = cmd->virt_id;
int rc = 0;
if (virt_id > ULONG_MAX)
return -EINVAL;
viommu = iommufd_get_viommu(ucmd, cmd->viommu_id);
if (IS_ERR(viommu))
return PTR_ERR(viommu);
idev = iommufd_get_device(ucmd, cmd->dev_id);
if (IS_ERR(idev)) {
rc = PTR_ERR(idev);
goto out_put_viommu;
}
mutex_lock(&idev->igroup->lock);
if (idev->vdev) {
rc = -EEXIST;
goto out_unlock_igroup;
}
vdev = iommufd_object_alloc(ucmd->ictx, vdev,
IOMMUFD_OBJ_VDEVICE);
if (IS_ERR(vdev)) {
rc = PTR_ERR(vdev);
goto out_unlock_igroup;
}
also need to check that the device and the viommu are associated to a same physical iommu.
Ack. Will add this prior to mutex_lock(&idev->igroup->lock);
+ if (viommu->iommu_dev != __iommu_get_iommu_dev(idev->dev)) { + rc = -EINVAL; + goto out_put_idev; + }
Thanks! Nicolin
On Fri, Oct 25, 2024 at 04:50:30PM -0700, Nicolin Chen wrote:
+/**
- struct iommu_vdevice_alloc - ioctl(IOMMU_VDEVICE_ALLOC)
- @size: sizeof(struct iommu_vdevice_alloc)
- @viommu_id: vIOMMU ID to associate with the virtual device
- @dev_id: The pyhsical device to allocate a virtual instance on the vIOMMU
- @__reserved: Must be 0
- @virt_id: Virtual device ID per vIOMMU, e.g. vSID of ARM SMMUv3, vDeviceID
of AMD IOMMU, and vID of a nested Intel VT-d to a Context Table.
- @out_vdevice_id: Output virtual instance ID for the allocated object
How about:
@out_vdevice_id: Object handle for the vDevice. Pass to IOMMU_DESTORY
- Allocate a virtual device instance (for a physical device) against a vIOMMU.
- This instance holds the device's information (related to its vIOMMU) in a VM.
- */
+struct iommu_vdevice_alloc {
- __u32 size;
- __u32 viommu_id;
- __u32 dev_id;
- __u32 __reserved;
- __aligned_u64 virt_id;
- __u32 out_vdevice_id;
- __u32 __reserved2;
Lets not have two u32 reserved, put the out_vdevice_id above virt_id
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 5fd3dd420290..e50113305a9c 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -277,6 +277,17 @@ EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD); */ void iommufd_device_unbind(struct iommufd_device *idev) {
- u32 vdev_id = 0;
- /* idev->vdev object should be destroyed prior, yet just in case.. */
- mutex_lock(&idev->igroup->lock);
- if (idev->vdev)
Then should it have a WARN_ON here?
vdev_id = idev->vdev->obj.id;
- mutex_unlock(&idev->igroup->lock);
- /* Relying on xa_lock against a race with iommufd_destroy() */
- if (vdev_id)
iommufd_object_remove(idev->ictx, NULL, vdev_id, 0);
That doesn't seem right, iommufd_object_remove() should never be used to destroy an object that userspace created with an IOCTL, in fact that just isn't allowed.
Ugh, there is worse here, we can't hold a long term reference on a kernel owned object:
idev->vdev = vdev; refcount_inc(&idev->obj.users);
As it prevents the kernel from disconnecting it.
I came up with this that seems like it will work. Maybe we will need to improve it later. Instead of using the idev, just keep the raw struct device. We can hold a refcount on the struct device without races. There is no need for the idev igroup lock since the xa_lock does everything we need.
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index e50113305a9c47..5fd3dd42029015 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -277,17 +277,6 @@ EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD); */ void iommufd_device_unbind(struct iommufd_device *idev) { - u32 vdev_id = 0; - - /* idev->vdev object should be destroyed prior, yet just in case.. */ - mutex_lock(&idev->igroup->lock); - if (idev->vdev) - vdev_id = idev->vdev->obj.id; - mutex_unlock(&idev->igroup->lock); - /* Relying on xa_lock against a race with iommufd_destroy() */ - if (vdev_id) - iommufd_object_remove(idev->ictx, NULL, vdev_id, 0); - iommufd_object_destroy_user(idev->ictx, &idev->obj); } EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD); diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c index 9849474f429f98..6e870bce2a0cd0 100644 --- a/drivers/iommu/iommufd/driver.c +++ b/drivers/iommu/iommufd/driver.c @@ -46,6 +46,6 @@ struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, lockdep_assert_held(&viommu->vdevs.xa_lock);
vdev = xa_load(&viommu->vdevs, vdev_id); - return vdev ? vdev->idev->dev : NULL; + return vdev ? vdev->dev : NULL; } EXPORT_SYMBOL_NS_GPL(iommufd_viommu_find_dev, IOMMUFD); diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 365cf5a56cdf20..275f954235940c 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -152,9 +152,6 @@ static inline void iommufd_put_object(struct iommufd_ctx *ictx, wake_up_interruptible_all(&ictx->destroy_wait); }
-int iommufd_verify_unfinalized_object(struct iommufd_ctx *ictx, - struct iommufd_object *to_verify); - void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj); void iommufd_object_abort_and_destroy(struct iommufd_ctx *ictx, struct iommufd_object *obj); @@ -391,7 +388,6 @@ struct iommufd_device { struct iommufd_object obj; struct iommufd_ctx *ictx; struct iommufd_group *igroup; - struct iommufd_vdevice *vdev; struct list_head group_item; /* always the physical device */ struct device *dev; @@ -523,7 +519,7 @@ void iommufd_vdevice_abort(struct iommufd_object *obj); struct iommufd_vdevice { struct iommufd_object obj; struct iommufd_ctx *ictx; - struct iommufd_device *idev; + struct device *dev; struct iommufd_viommu *viommu; u64 id; /* per-vIOMMU virtual ID */ }; diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 696ac9e0e74b89..c90fe15af98be4 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -43,9 +43,10 @@ void iommufd_object_finalize(struct iommufd_ctx *ictx, { void *old;
- old = xa_store(&ictx->objects, obj->id, obj, GFP_KERNEL); + old = xa_cmpxchg(&ictx->objects, obj->id, XA_ZERO_ENTRY, obj, + GFP_KERNEL); /* obj->id was returned from xa_alloc() so the xa_store() cannot fail */ - WARN_ON(old); + WARN_ON(old != XA_ZERO_ENTRY); }
/* Undo _iommufd_object_alloc() if iommufd_object_finalize() was not called */ @@ -89,26 +90,6 @@ struct iommufd_object *iommufd_get_object(struct iommufd_ctx *ictx, u32 id, return obj; }
-int iommufd_verify_unfinalized_object(struct iommufd_ctx *ictx, - struct iommufd_object *to_verify) -{ - XA_STATE(xas, &ictx->objects, 0); - struct iommufd_object *obj; - int rc = 0; - - if (!to_verify || !to_verify->id) - return -EINVAL; - xas.xa_index = to_verify->id; - - xa_lock(&ictx->objects); - obj = xas_load(&xas); - /* Being an unfinalized object, the loaded obj is a reserved space */ - if (obj != XA_ZERO_ENTRY) - rc = -ENOENT; - xa_unlock(&ictx->objects); - return rc; -} - static int iommufd_object_dec_wait_shortterm(struct iommufd_ctx *ictx, struct iommufd_object *to_destroy) { diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 2b9a9a80298d8e..e7385676f17659 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -55,12 +55,6 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) goto out_put_hwpt; }
- rc = iommufd_verify_unfinalized_object(ucmd->ictx, &viommu->obj); - if (rc) { - kfree(viommu); - goto out_put_hwpt; - } - viommu->type = cmd->type; viommu->ictx = ucmd->ictx; viommu->hwpt = hwpt_paging; @@ -95,27 +89,18 @@ void iommufd_vdevice_abort(struct iommufd_object *obj) struct iommufd_vdevice *old, *vdev = container_of(obj, struct iommufd_vdevice, obj); struct iommufd_viommu *viommu = vdev->viommu; - struct iommufd_device *idev = vdev->idev; - - lockdep_assert_held(&idev->igroup->lock);
old = xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); if (old) WARN_ON(old != vdev);
refcount_dec(&viommu->obj.users); - refcount_dec(&idev->obj.users); - idev->vdev = NULL; + put_device(vdev->dev); }
void iommufd_vdevice_destroy(struct iommufd_object *obj) { - struct iommufd_vdevice *vdev = - container_of(obj, struct iommufd_vdevice, obj); - - mutex_lock(&vdev->idev->igroup->lock); iommufd_vdevice_abort(obj); - mutex_unlock(&vdev->idev->igroup->lock); }
int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) @@ -140,30 +125,16 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) goto out_put_viommu; }
- mutex_lock(&idev->igroup->lock); - if (idev->vdev) { - rc = -EEXIST; - goto out_unlock_igroup; - } - vdev = iommufd_object_alloc(ucmd->ictx, vdev, IOMMUFD_OBJ_VDEVICE); if (IS_ERR(vdev)) { rc = PTR_ERR(vdev); goto out_unlock_igroup; }
- rc = iommufd_verify_unfinalized_object(ucmd->ictx, &vdev->obj); - if (rc) { - kfree(vdev); - goto out_unlock_igroup; - } - - vdev->idev = idev; vdev->id = virt_id; + vdev->dev = idev->dev; + get_device(idev->dev); vdev->viommu = viommu; - - idev->vdev = vdev; - refcount_inc(&idev->obj.users); refcount_inc(&viommu->obj.users);
curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL); @@ -182,7 +153,6 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) out_abort: iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj); out_unlock_igroup: - mutex_unlock(&idev->igroup->lock); iommufd_put_object(ucmd->ictx, &idev->obj); out_put_viommu: iommufd_put_object(ucmd->ictx, &viommu->obj);
On Tue, Oct 29, 2024 at 12:58:24PM -0300, Jason Gunthorpe wrote:
On Fri, Oct 25, 2024 at 04:50:30PM -0700, Nicolin Chen wrote:
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 5fd3dd420290..e50113305a9c 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -277,6 +277,17 @@ EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD); */ void iommufd_device_unbind(struct iommufd_device *idev) {
- u32 vdev_id = 0;
- /* idev->vdev object should be destroyed prior, yet just in case.. */
- mutex_lock(&idev->igroup->lock);
- if (idev->vdev)
Then should it have a WARN_ON here?
It'd be a user space mistake that forgot to call the destroy ioctl to the object, in which case I recall kernel shouldn't WARN_ON?
vdev_id = idev->vdev->obj.id;
- mutex_unlock(&idev->igroup->lock);
- /* Relying on xa_lock against a race with iommufd_destroy() */
- if (vdev_id)
iommufd_object_remove(idev->ictx, NULL, vdev_id, 0);
That doesn't seem right, iommufd_object_remove() should never be used to destroy an object that userspace created with an IOCTL, in fact that just isn't allowed.
It was for our auto destroy feature. If user space forgot to destroy the object while trying to unplug the device from VM. This saves the day.
Ugh, there is worse here, we can't hold a long term reference on a kernel owned object:
idev->vdev = vdev; refcount_inc(&idev->obj.users);
As it prevents the kernel from disconnecting it.
Hmm, mind elaborating? I think the iommufd_fops_release() would xa_for_each the object list that destroys the vdev object first then this idev (and viommu too)?
I came up with this that seems like it will work. Maybe we will need to improve it later. Instead of using the idev, just keep the raw struct device. We can hold a refcount on the struct device without races. There is no need for the idev igroup lock since the xa_lock does everything we need.
OK. If user space forgot to destroy its vdev while unplugging the device, it would not be allowed to hotplug another device (or the same device) back to the same slot having the same RID, since the RID on the vIOMMU would be occupied by the undestroyed vdev.
If we decide to do so, I think we should highlight this somewhere in the doc.
Thanks Nicolin
On Tue, Oct 29, 2024 at 10:29:56AM -0700, Nicolin Chen wrote:
On Tue, Oct 29, 2024 at 12:58:24PM -0300, Jason Gunthorpe wrote:
On Fri, Oct 25, 2024 at 04:50:30PM -0700, Nicolin Chen wrote:
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 5fd3dd420290..e50113305a9c 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -277,6 +277,17 @@ EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD); */ void iommufd_device_unbind(struct iommufd_device *idev) {
- u32 vdev_id = 0;
- /* idev->vdev object should be destroyed prior, yet just in case.. */
- mutex_lock(&idev->igroup->lock);
- if (idev->vdev)
Then should it have a WARN_ON here?
It'd be a user space mistake that forgot to call the destroy ioctl to the object, in which case I recall kernel shouldn't WARN_ON?
But you can't get here because:
refcount_inc(&idev->obj.users);
And kernel doesn't destroy objects with elevated ref counts?
vdev_id = idev->vdev->obj.id;
- mutex_unlock(&idev->igroup->lock);
- /* Relying on xa_lock against a race with iommufd_destroy() */
- if (vdev_id)
iommufd_object_remove(idev->ictx, NULL, vdev_id, 0);
That doesn't seem right, iommufd_object_remove() should never be used to destroy an object that userspace created with an IOCTL, in fact that just isn't allowed.
It was for our auto destroy feature.
auto domains are "hidden" hwpts that are kernel managed. They are not "userspace created".
"Usespace created" objects are ones that userspace is expected to call destroy on.
If you destroy them behind the scenes in the kerenl then the objecd ID can be reallocated for something else and when userspace does DESTROY on the ID it thought was still allocated it will malfunction.
So, only userspace can destroy objects that userspace created.
If user space forgot to destroy the object while trying to unplug the device from VM. This saves the day.
No, it should/does fail destroy of the VIOMMU object because the users refcount is elevated.
Ugh, there is worse here, we can't hold a long term reference on a kernel owned object:
idev->vdev = vdev; refcount_inc(&idev->obj.users);
As it prevents the kernel from disconnecting it.
Hmm, mind elaborating? I think the iommufd_fops_release() would xa_for_each the object list that destroys the vdev object first then this idev (and viommu too)?
iommufd_device_unbind() can't fail, and if the object can't be destroyed because it has an elevated long term refcount it WARN's:
ret = iommufd_object_remove(ictx, obj, obj->id, REMOVE_WAIT_SHORTTERM);
/* * If there is a bug and we couldn't destroy the object then we did put * back the caller's users refcount and will eventually try to free it * again during close. */ WARN_ON(ret);
So you cannot take long term references on kernel owned objects. Only userspace owned objects.
OK. If user space forgot to destroy its vdev while unplugging the device, it would not be allowed to hotplug another device (or the same device) back to the same slot having the same RID, since the RID on the vIOMMU would be occupied by the undestroyed vdev.
Yes, that seems correct and obvious to me. Until the vdev is explicitly destroyed the ID is in-use.
Good userspace should destroy the iommufd vDEVICE object before closing the VFIO file descriptor.
If it doesn't, then the VDEVICE object remains even though the VFIO it was linked to is gone.
Jason
On Tue, Oct 29, 2024 at 03:48:01PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 29, 2024 at 10:29:56AM -0700, Nicolin Chen wrote:
On Tue, Oct 29, 2024 at 12:58:24PM -0300, Jason Gunthorpe wrote:
On Fri, Oct 25, 2024 at 04:50:30PM -0700, Nicolin Chen wrote:
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 5fd3dd420290..e50113305a9c 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -277,6 +277,17 @@ EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD); */ void iommufd_device_unbind(struct iommufd_device *idev) {
- u32 vdev_id = 0;
- /* idev->vdev object should be destroyed prior, yet just in case.. */
- mutex_lock(&idev->igroup->lock);
- if (idev->vdev)
Then should it have a WARN_ON here?
It'd be a user space mistake that forgot to call the destroy ioctl to the object, in which case I recall kernel shouldn't WARN_ON?
But you can't get here because:
refcount_inc(&idev->obj.users);
And kernel doesn't destroy objects with elevated ref counts?
Hmm, this is not a ->destroy() but iommufd_device_unbind called by VFIO. And we actually ran into this routine when QEMU didn't destroy vdev. So, I added this chunk.
The iommufd_object_remove(vdev_id) here would destroy the vdev where its destroy() does refcount_dec(&idev->obj.users). Then, the following iommufd_object_destroy_user(.., &idev->obj) will succeed.
With that said, let's just mandate userspace to destroy vdev.
vdev_id = idev->vdev->obj.id;
- mutex_unlock(&idev->igroup->lock);
- /* Relying on xa_lock against a race with iommufd_destroy() */
- if (vdev_id)
iommufd_object_remove(idev->ictx, NULL, vdev_id, 0);
That doesn't seem right, iommufd_object_remove() should never be used to destroy an object that userspace created with an IOCTL, in fact that just isn't allowed.
It was for our auto destroy feature.
auto domains are "hidden" hwpts that are kernel managed. They are not "userspace created".
"Usespace created" objects are ones that userspace is expected to call destroy on.
OK. I misunderstood that.
If you destroy them behind the scenes in the kerenl then the objecd ID can be reallocated for something else and when userspace does DESTROY on the ID it thought was still allocated it will malfunction.
So, only userspace can destroy objects that userspace created.
I see. That makes sense.
If user space forgot to destroy the object while trying to unplug the device from VM. This saves the day.
No, it should/does fail destroy of the VIOMMU object because the users refcount is elevated.
The vIOMMU object is refcount_dec also from the unbind() calling remove(). But anyway, we aligned that userspace should destroy it explicitly.
Ugh, there is worse here, we can't hold a long term reference on a kernel owned object:
idev->vdev = vdev; refcount_inc(&idev->obj.users);
As it prevents the kernel from disconnecting it.
Hmm, mind elaborating? I think the iommufd_fops_release() would xa_for_each the object list that destroys the vdev object first then this idev (and viommu too)?
iommufd_device_unbind() can't fail, and if the object can't be destroyed because it has an elevated long term refcount it WARN's:
ret = iommufd_object_remove(ictx, obj, obj->id, REMOVE_WAIT_SHORTTERM);
/* * If there is a bug and we couldn't destroy the object then we did put * back the caller's users refcount and will eventually try to free it * again during close. */ WARN_ON(ret);
So you cannot take long term references on kernel owned objects. Only userspace owned objects.
OK. I think I had got this part. Gao ran into this WARN_ON at v3, so I added iommufd_object_remove(vdev_id) in unbind() prior to this iommufd_object_destroy_user(idev->ictx, &idev->obj).
OK. If user space forgot to destroy its vdev while unplugging the device, it would not be allowed to hotplug another device (or the same device) back to the same slot having the same RID, since the RID on the vIOMMU would be occupied by the undestroyed vdev.
Yes, that seems correct and obvious to me. Until the vdev is explicitly destroyed the ID is in-use.
Good userspace should destroy the iommufd vDEVICE object before closing the VFIO file descriptor.
If it doesn't, then the VDEVICE object remains even though the VFIO it was linked to is gone.
I see.
Thanks Nicolin
On Tue, Oct 29, 2024 at 12:30:00PM -0700, Nicolin Chen wrote:
iommufd_device_unbind() can't fail, and if the object can't be destroyed because it has an elevated long term refcount it WARN's:
ret = iommufd_object_remove(ictx, obj, obj->id, REMOVE_WAIT_SHORTTERM);
/* * If there is a bug and we couldn't destroy the object then we did put * back the caller's users refcount and will eventually try to free it * again during close. */ WARN_ON(ret);
So you cannot take long term references on kernel owned objects. Only userspace owned objects.
OK. I think I had got this part. Gao ran into this WARN_ON at v3, so I added iommufd_object_remove(vdev_id) in unbind() prior to this iommufd_object_destroy_user(idev->ictx, &idev->obj).
Oh I see, so the fix to that is to not take a longterm reference, not to try to destroy a vdev.
The alternative ould be to try to unlink the idev from the vdev and leave a zombie vdev, but that didn't look so nice to implement. If we need it we can do it later
Jason
Add a vdevice_alloc op to the viommu mock_viommu_ops for the coverage of IOMMU_VIOMMU_TYPE_SELFTEST allocations. Then, add a vdevice_alloc TEST_F to cover the IOMMU_VDEVICE_ALLOC ioctl.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- tools/testing/selftests/iommu/iommufd_utils.h | 27 +++++++++++++++++++ tools/testing/selftests/iommu/iommufd.c | 20 ++++++++++++++ .../selftests/iommu/iommufd_fail_nth.c | 4 +++ 3 files changed, 51 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index ca09308dad6a..5b17d7b2ac5c 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -790,3 +790,30 @@ static int _test_cmd_viommu_alloc(int fd, __u32 device_id, __u32 hwpt_id, EXPECT_ERRNO(_errno, \ _test_cmd_viommu_alloc(self->fd, device_id, hwpt_id, \ type, 0, viommu_id)) + +static int _test_cmd_vdevice_alloc(int fd, __u32 viommu_id, __u32 idev_id, + __u64 virt_id, __u32 *vdev_id) +{ + struct iommu_vdevice_alloc cmd = { + .size = sizeof(cmd), + .dev_id = idev_id, + .viommu_id = viommu_id, + .virt_id = virt_id, + }; + int ret; + + ret = ioctl(fd, IOMMU_VDEVICE_ALLOC, &cmd); + if (ret) + return ret; + if (vdev_id) + *vdev_id = cmd.out_vdevice_id; + return 0; +} + +#define test_cmd_vdevice_alloc(viommu_id, idev_id, virt_id, vdev_id) \ + ASSERT_EQ(0, _test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, \ + virt_id, vdev_id)) +#define test_err_vdevice_alloc(_errno, viommu_id, idev_id, virt_id, vdev_id) \ + EXPECT_ERRNO(_errno, \ + _test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, \ + virt_id, vdev_id)) diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index b48b22d33ad4..93255403dee4 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -129,6 +129,7 @@ TEST_F(iommufd, cmd_length) TEST_LENGTH(iommu_option, IOMMU_OPTION, val64); TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS, __reserved); TEST_LENGTH(iommu_viommu_alloc, IOMMU_VIOMMU_ALLOC, out_viommu_id); + TEST_LENGTH(iommu_vdevice_alloc, IOMMU_VDEVICE_ALLOC, __reserved2); #undef TEST_LENGTH }
@@ -2473,4 +2474,23 @@ TEST_F(iommufd_viommu, viommu_auto_destroy) { }
+TEST_F(iommufd_viommu, vdevice_alloc) +{ + uint32_t viommu_id = self->viommu_id; + uint32_t dev_id = self->device_id; + uint32_t vdev_id = 0; + + if (dev_id) { + /* Set vdev_id to 0x99, unset it, and set to 0x88 */ + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id); + test_err_vdevice_alloc(EEXIST, viommu_id, dev_id, 0x99, + &vdev_id); + test_ioctl_destroy(vdev_id); + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x88, &vdev_id); + test_ioctl_destroy(vdev_id); + } else { + test_err_vdevice_alloc(ENOENT, viommu_id, dev_id, 0x99, NULL); + } +} + TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index e9a980b7729b..28f11b26f836 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -583,6 +583,7 @@ TEST_FAIL_NTH(basic_fail_nth, device) uint32_t idev_id; uint32_t hwpt_id; uint32_t viommu_id; + uint32_t vdev_id; __u64 iova;
self->fd = open("/dev/iommu", O_RDWR); @@ -635,6 +636,9 @@ TEST_FAIL_NTH(basic_fail_nth, device) IOMMU_VIOMMU_TYPE_SELFTEST, 0, &viommu_id)) return -1;
+ if (_test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, 0, &vdev_id)) + return -1; + return 0; }
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:51 AM
Add a vdevice_alloc op to the viommu mock_viommu_ops for the coverage of IOMMU_VIOMMU_TYPE_SELFTEST allocations. Then, add a vdevice_alloc TEST_F to cover the IOMMU_VDEVICE_ALLOC ioctl.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
On Fri, Oct 25, 2024 at 04:50:31PM -0700, Nicolin Chen wrote:
Add a vdevice_alloc op to the viommu mock_viommu_ops for the coverage of IOMMU_VIOMMU_TYPE_SELFTEST allocations. Then, add a vdevice_alloc TEST_F to cover the IOMMU_VDEVICE_ALLOC ioctl.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
tools/testing/selftests/iommu/iommufd_utils.h | 27 +++++++++++++++++++ tools/testing/selftests/iommu/iommufd.c | 20 ++++++++++++++ .../selftests/iommu/iommufd_fail_nth.c | 4 +++ 3 files changed, 51 insertions(+)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
This per-vIOMMU cache_invalidate op is like the cache_invalidate_user op in struct iommu_domain_ops, but wider, supporting device cache (e.g. PCI ATC invaldiations).
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- include/linux/iommufd.h | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index e6cd288e8b83..0287a6d00192 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -15,6 +15,7 @@ struct device; struct file; struct iommu_group; struct iommu_user_data; +struct iommu_user_data_array; struct iommufd_access; struct iommufd_ctx; struct iommufd_device; @@ -104,12 +105,21 @@ struct iommufd_viommu { * must be defined in include/uapi/linux/iommufd.h. * It must fully initialize the new iommu_domain before * returning. Upon failure, ERR_PTR must be returned. + * @cache_invalidate: Flush hardware cache used by a vIOMMU. It can be used for + * any IOMMU hardware specific cache: TLB and device cache. + * The @array passes in the cache invalidation requests, in + * form of a driver data structure. A driver must update the + * array->entry_num to report the number of handled requests. + * The data structure of the array entry must be defined in + * include/uapi/linux/iommufd.h */ struct iommufd_viommu_ops { void (*free)(struct iommufd_viommu *viommu); struct iommu_domain *(*alloc_domain_nested)( struct iommufd_viommu *viommu, const struct iommu_user_data *user_data); + int (*cache_invalidate)(struct iommufd_viommu *viommu, + struct iommu_user_data_array *array); };
#if IS_ENABLED(CONFIG_IOMMUFD)
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:51 AM
This per-vIOMMU cache_invalidate op is like the cache_invalidate_user op in struct iommu_domain_ops, but wider, supporting device cache (e.g. PCI ATC invaldiations).
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
A vIOMMU-based hwpt_nested requires a cache invalidation op too, either using the one in iommu_domain_ops or the one in viommu_ops. Enforce that upon the allocated hwpt_nested.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/hw_pagetable.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 1df5d40c93df..fd260a67b82c 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -302,7 +302,9 @@ iommufd_viommu_alloc_hwpt_nested(struct iommufd_viommu *viommu, u32 flags, } hwpt->domain->owner = viommu->iommu_dev->ops;
- if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED)) { + if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED || + (!viommu->ops->cache_invalidate && + !hwpt->domain->ops->cache_invalidate_user))) { rc = -EINVAL; goto out_abort; }
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:51 AM
@@ -302,7 +302,9 @@ iommufd_viommu_alloc_hwpt_nested(struct iommufd_viommu *viommu, u32 flags, } hwpt->domain->owner = viommu->iommu_dev->ops;
- if (WARN_ON_ONCE(hwpt->domain->type !=
IOMMU_DOMAIN_NESTED)) {
- if (WARN_ON_ONCE(hwpt->domain->type !=
IOMMU_DOMAIN_NESTED ||
(!viommu->ops->cache_invalidate &&
rc = -EINVAL; goto out_abort; }!hwpt->domain->ops->cache_invalidate_user))) {
According to patch5, cache invalidate in viommu only uses viommu->ops->cache_invalidate. Is here intended to allow nested hwpt created via viommu to still support the old method?
On Tue, Oct 29, 2024 at 08:22:43AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:51 AM
@@ -302,7 +302,9 @@ iommufd_viommu_alloc_hwpt_nested(struct iommufd_viommu *viommu, u32 flags, } hwpt->domain->owner = viommu->iommu_dev->ops;
- if (WARN_ON_ONCE(hwpt->domain->type !=
IOMMU_DOMAIN_NESTED)) {
- if (WARN_ON_ONCE(hwpt->domain->type !=
IOMMU_DOMAIN_NESTED ||
(!viommu->ops->cache_invalidate &&
rc = -EINVAL; goto out_abort; }!hwpt->domain->ops->cache_invalidate_user))) {
According to patch5, cache invalidate in viommu only uses viommu->ops->cache_invalidate. Is here intended to allow nested hwpt created via viommu to still support the old method?
I think that is reasonable?
Jason
From: Jason Gunthorpe jgg@nvidia.com Sent: Wednesday, October 30, 2024 12:05 AM
On Tue, Oct 29, 2024 at 08:22:43AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:51 AM
@@ -302,7 +302,9 @@ iommufd_viommu_alloc_hwpt_nested(struct iommufd_viommu *viommu, u32 flags, } hwpt->domain->owner = viommu->iommu_dev->ops;
- if (WARN_ON_ONCE(hwpt->domain->type !=
IOMMU_DOMAIN_NESTED)) {
- if (WARN_ON_ONCE(hwpt->domain->type !=
IOMMU_DOMAIN_NESTED ||
(!viommu->ops->cache_invalidate &&
rc = -EINVAL; goto out_abort; }!hwpt->domain->ops->cache_invalidate_user))) {
According to patch5, cache invalidate in viommu only uses viommu->ops->cache_invalidate. Is here intended to allow nested hwpt created via viommu to still support the old method?
I think that is reasonable?
Yes, just want to confirm.
Reviewed-by: Kevin Tian kevin.tian@intel.com
On Fri, Oct 25, 2024 at 04:50:33PM -0700, Nicolin Chen wrote:
A vIOMMU-based hwpt_nested requires a cache invalidation op too, either using the one in iommu_domain_ops or the one in viommu_ops. Enforce that upon the allocated hwpt_nested.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
drivers/iommu/iommufd/hw_pagetable.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
With a vIOMMU object, use space can flush any IOMMU related cache that can be directed via a vIOMMU object. It is similar to the IOMMU_HWPT_INVALIDATE uAPI, but can cover a wider range than IOTLB, e.g. device/desciprtor cache.
Allow hwpt_id of the iommu_hwpt_invalidate structure to carry a viommu_id, and reuse the IOMMU_HWPT_INVALIDATE uAPI for vIOMMU invalidations. Drivers can define different structures for vIOMMU invalidations v.s. HWPT ones.
Update the uAPI, kdoc, and selftest case accordingly.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- include/uapi/linux/iommufd.h | 9 ++++--- drivers/iommu/iommufd/hw_pagetable.c | 32 +++++++++++++++++++------ tools/testing/selftests/iommu/iommufd.c | 4 ++-- 3 files changed, 33 insertions(+), 12 deletions(-)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index b699ecb7aa9c..c2c5f49fdf17 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -730,7 +730,7 @@ struct iommu_hwpt_vtd_s1_invalidate { /** * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE) * @size: sizeof(struct iommu_hwpt_invalidate) - * @hwpt_id: ID of a nested HWPT for cache invalidation + * @hwpt_id: ID of a nested HWPT or a vIOMMU, for cache invalidation * @data_uptr: User pointer to an array of driver-specific cache invalidation * data. * @data_type: One of enum iommu_hwpt_invalidate_data_type, defining the data @@ -741,8 +741,11 @@ struct iommu_hwpt_vtd_s1_invalidate { * Output the number of requests successfully handled by kernel. * @__reserved: Must be 0. * - * Invalidate the iommu cache for user-managed page table. Modifications on a - * user-managed page table should be followed by this operation to sync cache. + * Invalidate iommu cache for user-managed page table or vIOMMU. Modifications + * on a user-managed page table should be followed by this operation, if a HWPT + * is passed in via @hwpt_id. Other caches, such as device cache or descriptor + * cache can be flushed if a vIOMMU is passed in via the @hwpt_id field. + * * Each ioctl can support one or more cache invalidation requests in the array * that has a total size of @entry_len * @entry_num. * diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index fd260a67b82c..5301ba69fb8a 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -483,7 +483,7 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) .entry_len = cmd->entry_len, .entry_num = cmd->entry_num, }; - struct iommufd_hw_pagetable *hwpt; + struct iommufd_object *pt_obj; u32 done_num = 0; int rc;
@@ -497,17 +497,35 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) goto out; }
- hwpt = iommufd_get_hwpt_nested(ucmd, cmd->hwpt_id); - if (IS_ERR(hwpt)) { - rc = PTR_ERR(hwpt); + pt_obj = iommufd_get_object(ucmd->ictx, cmd->hwpt_id, IOMMUFD_OBJ_ANY); + if (IS_ERR(pt_obj)) { + rc = PTR_ERR(pt_obj); goto out; } + if (pt_obj->type == IOMMUFD_OBJ_HWPT_NESTED) { + struct iommufd_hw_pagetable *hwpt = + container_of(pt_obj, struct iommufd_hw_pagetable, obj); + + rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain, + &data_array); + } else if (pt_obj->type == IOMMUFD_OBJ_VIOMMU) { + struct iommufd_viommu *viommu = + container_of(pt_obj, struct iommufd_viommu, obj); + + if (!viommu->ops || !viommu->ops->cache_invalidate) { + rc = -EOPNOTSUPP; + goto out_put_pt; + } + rc = viommu->ops->cache_invalidate(viommu, &data_array); + } else { + rc = -EINVAL; + goto out_put_pt; + }
- rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain, - &data_array); done_num = data_array.entry_num;
- iommufd_put_object(ucmd->ictx, &hwpt->obj); +out_put_pt: + iommufd_put_object(ucmd->ictx, pt_obj); out: cmd->entry_num = done_num; if (iommufd_ucmd_respond(ucmd, sizeof(*cmd))) diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 93255403dee4..44fbc7e5aa2e 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -362,9 +362,9 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested) EXPECT_ERRNO(EBUSY, _test_ioctl_destroy(self->fd, parent_hwpt_id));
- /* hwpt_invalidate only supports a user-managed hwpt (nested) */ + /* hwpt_invalidate does not support a parent hwpt */ num_inv = 1; - test_err_hwpt_invalidate(ENOENT, parent_hwpt_id, inv_reqs, + test_err_hwpt_invalidate(EINVAL, parent_hwpt_id, inv_reqs, IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, sizeof(*inv_reqs), &num_inv); assert(!num_inv);
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:51 AM
With a vIOMMU object, use space can flush any IOMMU related cache that can be directed via a vIOMMU object. It is similar to the IOMMU_HWPT_INVALIDATE uAPI, but can cover a wider range than IOTLB, e.g. device/desciprtor cache.
Allow hwpt_id of the iommu_hwpt_invalidate structure to carry a viommu_id, and reuse the IOMMU_HWPT_INVALIDATE uAPI for vIOMMU invalidations. Drivers can define different structures for vIOMMU invalidations v.s. HWPT ones.
Update the uAPI, kdoc, and selftest case accordingly.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
On Fri, Oct 25, 2024 at 04:50:34PM -0700, Nicolin Chen wrote:
@@ -497,17 +497,35 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) goto out; }
- hwpt = iommufd_get_hwpt_nested(ucmd, cmd->hwpt_id);
- if (IS_ERR(hwpt)) {
rc = PTR_ERR(hwpt);
- pt_obj = iommufd_get_object(ucmd->ictx, cmd->hwpt_id, IOMMUFD_OBJ_ANY);
- if (IS_ERR(pt_obj)) {
goto out; }rc = PTR_ERR(pt_obj);
- if (pt_obj->type == IOMMUFD_OBJ_HWPT_NESTED) {
struct iommufd_hw_pagetable *hwpt =
container_of(pt_obj, struct iommufd_hw_pagetable, obj);
rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain,
&data_array);
- } else if (pt_obj->type == IOMMUFD_OBJ_VIOMMU) {
struct iommufd_viommu *viommu =
container_of(pt_obj, struct iommufd_viommu, obj);
if (!viommu->ops || !viommu->ops->cache_invalidate) {
rc = -EOPNOTSUPP;
goto out_put_pt;
}
rc = viommu->ops->cache_invalidate(viommu, &data_array);
- } else {
rc = -EINVAL;
goto out_put_pt;
- }
Given the test in iommufd_viommu_alloc_hwpt_nested() is:
if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED || (!viommu->ops->cache_invalidate && !hwpt->domain->ops->cache_invalidate_user))) {
We will crash if the user passes a viommu allocated domain as IOMMUFD_OBJ_HWPT_NESTED since the above doesn't check it.
I suggest we put the required if (ops..) -EOPNOTSUPP above and remove the ops->cache_invalidate checks from both WARN_ONs.
Jason
On Tue, Oct 29, 2024 at 04:09:41PM -0300, Jason Gunthorpe wrote:
On Fri, Oct 25, 2024 at 04:50:34PM -0700, Nicolin Chen wrote:
@@ -497,17 +497,35 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) goto out; }
- hwpt = iommufd_get_hwpt_nested(ucmd, cmd->hwpt_id);
- if (IS_ERR(hwpt)) {
rc = PTR_ERR(hwpt);
- pt_obj = iommufd_get_object(ucmd->ictx, cmd->hwpt_id, IOMMUFD_OBJ_ANY);
- if (IS_ERR(pt_obj)) {
goto out; }rc = PTR_ERR(pt_obj);
- if (pt_obj->type == IOMMUFD_OBJ_HWPT_NESTED) {
struct iommufd_hw_pagetable *hwpt =
container_of(pt_obj, struct iommufd_hw_pagetable, obj);
rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain,
&data_array);
- } else if (pt_obj->type == IOMMUFD_OBJ_VIOMMU) {
struct iommufd_viommu *viommu =
container_of(pt_obj, struct iommufd_viommu, obj);
if (!viommu->ops || !viommu->ops->cache_invalidate) {
rc = -EOPNOTSUPP;
goto out_put_pt;
}
rc = viommu->ops->cache_invalidate(viommu, &data_array);
- } else {
rc = -EINVAL;
goto out_put_pt;
- }
Given the test in iommufd_viommu_alloc_hwpt_nested() is:
if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED || (!viommu->ops->cache_invalidate && !hwpt->domain->ops->cache_invalidate_user))) {
We will crash if the user passes a viommu allocated domain as IOMMUFD_OBJ_HWPT_NESTED since the above doesn't check it.
Ah, that was missed.
I suggest we put the required if (ops..) -EOPNOTSUPP above and remove the ops->cache_invalidate checks from both WARN_ONs.
Ack. I will add hwpt->domain->ops check: --------------------------------------------------------------------- if (pt_obj->type == IOMMUFD_OBJ_HWPT_NESTED) { struct iommufd_hw_pagetable *hwpt = container_of(pt_obj, struct iommufd_hw_pagetable, obj); if (!hwpt->domain->ops || !hwpt->domain->ops->cache_invalidate_user) { rc = -EOPNOTSUPP; goto out_put_pt; } rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain, &data_array); } else if (pt_obj->type == IOMMUFD_OBJ_VIOMMU) { struct iommufd_viommu *viommu = container_of(pt_obj, struct iommufd_viommu, obj); if (!viommu->ops || !viommu->ops->cache_invalidate) { rc = -EOPNOTSUPP; goto out_put_pt; } rc = viommu->ops->cache_invalidate(viommu, &data_array); } else { ---------------------------------------------------------------------
Thanks Nicolin
From: Jason Gunthorpe jgg@nvidia.com
The iommu_copy_struct_from_user_array helper can be used to copy a single entry from a user array which might not be efficient if the array is big.
Add a new iommu_copy_struct_from_full_user_array to copy the entire user array at once. Update the existing iommu_copy_struct_from_user_array kdoc accordingly.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- include/linux/iommu.h | 48 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 47 insertions(+), 1 deletion(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 14f24b5cd16f..3c3d5d0849d7 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -493,7 +493,9 @@ static inline int __iommu_copy_struct_from_user_array( * @index: Index to the location in the array to copy user data from * @min_last: The last member of the data structure @kdst points in the * initial version. - * Return 0 for success, otherwise -error. + * + * Copy a single entry from a user array. Return 0 for success, otherwise + * -error. */ #define iommu_copy_struct_from_user_array(kdst, user_array, data_type, index, \ min_last) \ @@ -501,6 +503,50 @@ static inline int __iommu_copy_struct_from_user_array( kdst, user_array, data_type, index, sizeof(*(kdst)), \ offsetofend(typeof(*(kdst)), min_last))
+/** + * iommu_copy_struct_from_full_user_array - Copy iommu driver specific user + * space data from an iommu_user_data_array + * @kdst: Pointer to an iommu driver specific user data that is defined in + * include/uapi/linux/iommufd.h + * @kdst_entry_size: sizeof(*kdst) + * @user_array: Pointer to a struct iommu_user_data_array for a user space + * array + * @data_type: The data type of the @kdst. Must match with @user_array->type + * + * Copy the entire user array. kdst must have room for kdst_entry_size * + * user_array->entry_num bytes. Return 0 for success, otherwise -error. + */ +static inline int +iommu_copy_struct_from_full_user_array(void *kdst, size_t kdst_entry_size, + struct iommu_user_data_array *user_array, + unsigned int data_type) +{ + unsigned int i; + int ret; + + if (user_array->type != data_type) + return -EINVAL; + if (!user_array->entry_num) + return -EINVAL; + if (likely(user_array->entry_len == kdst_entry_size)) { + if (copy_from_user(kdst, user_array->uptr, + user_array->entry_num * + user_array->entry_len)) + return -EFAULT; + } + + /* Copy item by item */ + for (i = 0; i != user_array->entry_num; i++) { + ret = copy_struct_from_user( + kdst + kdst_entry_size * i, kdst_entry_size, + user_array->uptr + user_array->entry_len * i, + user_array->entry_len); + if (ret) + return ret; + } + return 0; +} + /** * struct iommu_ops - iommu ops and capabilities * @capable: check capability
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:51 AM
From: Jason Gunthorpe jgg@nvidia.com
The iommu_copy_struct_from_user_array helper can be used to copy a single entry from a user array which might not be efficient if the array is big.
Add a new iommu_copy_struct_from_full_user_array to copy the entire user array at once. Update the existing iommu_copy_struct_from_user_array kdoc accordingly.
what about:
iommu_copy_struct_from_user_array_single() iommu_copy_struct_from_user_array_full()
?
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
On Tue, Oct 29, 2024 at 08:24:52AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:51 AM
From: Jason Gunthorpe jgg@nvidia.com
The iommu_copy_struct_from_user_array helper can be used to copy a single entry from a user array which might not be efficient if the array is big.
Add a new iommu_copy_struct_from_full_user_array to copy the entire user array at once. Update the existing iommu_copy_struct_from_user_array kdoc accordingly.
what about:
iommu_copy_struct_from_user_array_single() iommu_copy_struct_from_user_array_full()
I am okay with that, yet might prefer that in a separate patch? As I am trying to reduce the number of changes in the series since we are likely merging three series at once :)
Thanks Nicolin
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
This avoids a bigger trouble of exposing struct iommufd_device and struct iommufd_vdevice in the public header.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- include/linux/iommufd.h | 8 ++++++++ drivers/iommu/iommufd/driver.c | 13 +++++++++++++ 2 files changed, 21 insertions(+)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 0287a6d00192..dc174d02132b 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -184,6 +184,8 @@ static inline int iommufd_vfio_compat_set_no_iommu(struct iommufd_ctx *ictx) struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, enum iommufd_object_type type); +struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, + unsigned long vdev_id); #else /* !CONFIG_IOMMUFD_DRIVER */ static inline struct iommufd_object * _iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, @@ -191,6 +193,12 @@ _iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, { return ERR_PTR(-EOPNOTSUPP); } + +static inline struct device * +iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id) +{ + return NULL; +} #endif /* CONFIG_IOMMUFD_DRIVER */
/* diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c index c0876d3f91c7..3604443f82a1 100644 --- a/drivers/iommu/iommufd/driver.c +++ b/drivers/iommu/iommufd/driver.c @@ -36,3 +36,16 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, return ERR_PTR(rc); } EXPORT_SYMBOL_NS_GPL(_iommufd_object_alloc, IOMMUFD); + +/* Caller should xa_lock(&viommu->vdevs) to protect the return value */ +struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, + unsigned long vdev_id) +{ + struct iommufd_vdevice *vdev; + + lockdep_is_held(&viommu->vdevs.xa_lock); + + vdev = xa_load(&viommu->vdevs, vdev_id); + return vdev ? vdev->idev->dev : NULL; +} +EXPORT_SYMBOL_NS_GPL(iommufd_viommu_find_dev, IOMMUFD);
On Sat, 26 Oct 2024 at 07:51, Nicolin Chen nicolinc@nvidia.com wrote:
This avoids a bigger trouble of exposing struct iommufd_device and struct iommufd_vdevice in the public header.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
include/linux/iommufd.h | 8 ++++++++ drivers/iommu/iommufd/driver.c | 13 +++++++++++++ 2 files changed, 21 insertions(+)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 0287a6d00192..dc174d02132b 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -184,6 +184,8 @@ static inline int iommufd_vfio_compat_set_no_iommu(struct iommufd_ctx *ictx) struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, enum iommufd_object_type type); +struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu,
unsigned long vdev_id);
#else /* !CONFIG_IOMMUFD_DRIVER */ static inline struct iommufd_object * _iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, @@ -191,6 +193,12 @@ _iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, { return ERR_PTR(-EOPNOTSUPP); }
+static inline struct device * +iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id) +{
return NULL;
+} #endif /* CONFIG_IOMMUFD_DRIVER */
/* diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c index c0876d3f91c7..3604443f82a1 100644 --- a/drivers/iommu/iommufd/driver.c +++ b/drivers/iommu/iommufd/driver.c @@ -36,3 +36,16 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, return ERR_PTR(rc); } EXPORT_SYMBOL_NS_GPL(_iommufd_object_alloc, IOMMUFD);
+/* Caller should xa_lock(&viommu->vdevs) to protect the return value */ +struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu,
unsigned long vdev_id)
+{
struct iommufd_vdevice *vdev;
lockdep_is_held(&viommu->vdevs.xa_lock);
vdev = xa_load(&viommu->vdevs, vdev_id);
return vdev ? vdev->idev->dev : NULL;
+}
Got this error?
ld: Unexpected GOT/PLT entries detected! ld: Unexpected run-time procedure linkages detected! ld: drivers/iommu/iommufd/driver.o: in function `iommufd_viommu_find_dev': linux/drivers/iommu/iommufd/driver.c:47: undefined reference to `lockdep_is_held' make[2]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1 make[1]: *** [/home/linaro/iommufd/linux/Makefile:1166: vmlinux] Error 2 make: *** [Makefile:224: __sub-make] Error 2
On Sun, Oct 27, 2024 at 11:02:31PM +0800, Zhangfei Gao wrote:
On Sat, 26 Oct 2024 at 07:51, Nicolin Chen nicolinc@nvidia.com wrote:
+/* Caller should xa_lock(&viommu->vdevs) to protect the return value */ +struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu,
unsigned long vdev_id)
+{
struct iommufd_vdevice *vdev;
lockdep_is_held(&viommu->vdevs.xa_lock);
vdev = xa_load(&viommu->vdevs, vdev_id);
return vdev ? vdev->idev->dev : NULL;
+}
Got this error?
ld: Unexpected GOT/PLT entries detected! ld: Unexpected run-time procedure linkages detected! ld: drivers/iommu/iommufd/driver.o: in function `iommufd_viommu_find_dev': linux/drivers/iommu/iommufd/driver.c:47: undefined reference to `lockdep_is_held' make[2]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1 make[1]: *** [/home/linaro/iommufd/linux/Makefile:1166: vmlinux] Error 2 make: *** [Makefile:224: __sub-make] Error 2
Should fix it with: - lockdep_is_held(&viommu->vdevs.xa_lock); + lockdep_assert_held(&viommu->vdevs.xa_lock);
Thanks Nicolin
From: Nicolin Chen nicolinc@nvidia.com Sent: Monday, October 28, 2024 6:49 AM
On Sun, Oct 27, 2024 at 11:02:31PM +0800, Zhangfei Gao wrote:
On Sat, 26 Oct 2024 at 07:51, Nicolin Chen nicolinc@nvidia.com wrote:
+/* Caller should xa_lock(&viommu->vdevs) to protect the return value
*/
+struct device *iommufd_viommu_find_dev(struct iommufd_viommu
*viommu,
unsigned long vdev_id)
+{
struct iommufd_vdevice *vdev;
lockdep_is_held(&viommu->vdevs.xa_lock);
vdev = xa_load(&viommu->vdevs, vdev_id);
return vdev ? vdev->idev->dev : NULL;
+}
Got this error?
ld: Unexpected GOT/PLT entries detected! ld: Unexpected run-time procedure linkages detected! ld: drivers/iommu/iommufd/driver.o: in function
`iommufd_viommu_find_dev':
linux/drivers/iommu/iommufd/driver.c:47: undefined reference to `lockdep_is_held' make[2]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1 make[1]: *** [/home/linaro/iommufd/linux/Makefile:1166: vmlinux] Error
2
make: *** [Makefile:224: __sub-make] Error 2
Should fix it with:
- lockdep_is_held(&viommu->vdevs.xa_lock);
- lockdep_assert_held(&viommu->vdevs.xa_lock);
with that,
Reviewed-by: Kevin Tian kevin.tian@intel.com
Similar to the coverage of cache_invalidate_user for iotlb invalidation, add a device cache and a viommu_cache_invalidate function to test it out.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_test.h | 25 +++++++++ drivers/iommu/iommufd/selftest.c | 76 +++++++++++++++++++++++++++- 2 files changed, 100 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index edced4ac7cd3..46558f83e734 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -54,6 +54,11 @@ enum { MOCK_NESTED_DOMAIN_IOTLB_NUM = 4, };
+enum { + MOCK_DEV_CACHE_ID_MAX = 3, + MOCK_DEV_CACHE_NUM = 4, +}; + struct iommu_test_cmd { __u32 size; __u32 op; @@ -152,6 +157,7 @@ struct iommu_test_hw_info { /* Should not be equal to any defined value in enum iommu_hwpt_data_type */ #define IOMMU_HWPT_DATA_SELFTEST 0xdead #define IOMMU_TEST_IOTLB_DEFAULT 0xbadbeef +#define IOMMU_TEST_DEV_CACHE_DEFAULT 0xbaddad
/** * struct iommu_hwpt_selftest @@ -182,4 +188,23 @@ struct iommu_hwpt_invalidate_selftest {
#define IOMMU_VIOMMU_TYPE_SELFTEST 0xdeadbeef
+/* Should not be equal to any defined value in enum iommu_viommu_invalidate_data_type */ +#define IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST 0xdeadbeef +#define IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID 0xdadbeef + +/** + * struct iommu_viommu_invalidate_selftest - Invalidation data for Mock VIOMMU + * (IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST) + * @flags: Invalidate flags + * @cache_id: Invalidate cache entry index + * + * If IOMMU_TEST_INVALIDATE_ALL is set in @flags, @cache_id will be ignored + */ +struct iommu_viommu_invalidate_selftest { +#define IOMMU_TEST_INVALIDATE_FLAG_ALL (1 << 0) + __u32 flags; + __u32 vdev_id; + __u32 cache_id; +}; + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 33a0fcc0eff7..01556854f2f2 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -163,6 +163,7 @@ struct mock_dev { struct device dev; unsigned long flags; int id; + u32 cache[MOCK_DEV_CACHE_NUM]; };
static inline struct mock_dev *to_mock_dev(struct device *dev) @@ -606,9 +607,80 @@ mock_viommu_alloc_domain_nested(struct iommufd_viommu *viommu, return &mock_nested->domain; }
+static int mock_viommu_cache_invalidate(struct iommufd_viommu *viommu, + struct iommu_user_data_array *array) +{ + struct iommu_viommu_invalidate_selftest *cmds; + struct iommu_viommu_invalidate_selftest *cur; + struct iommu_viommu_invalidate_selftest *end; + int rc; + + /* A zero-length array is allowed to validate the array type */ + if (array->entry_num == 0 && + array->type == IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST) { + array->entry_num = 0; + return 0; + } + + cmds = kcalloc(array->entry_num, sizeof(*cmds), GFP_KERNEL); + if (!cmds) + return -ENOMEM; + cur = cmds; + end = cmds + array->entry_num; + + static_assert(sizeof(*cmds) == 3 * sizeof(u32)); + rc = iommu_copy_struct_from_full_user_array( + cmds, sizeof(*cmds), array, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST); + if (rc) + goto out; + + while (cur != end) { + struct mock_dev *mdev; + struct device *dev; + int i; + + if (cur->flags & ~IOMMU_TEST_INVALIDATE_FLAG_ALL) { + rc = -EOPNOTSUPP; + goto out; + } + + if (cur->cache_id > MOCK_DEV_CACHE_ID_MAX) { + rc = -EINVAL; + goto out; + } + + xa_lock(&viommu->vdevs); + dev = iommufd_viommu_find_dev(viommu, + (unsigned long)cur->vdev_id); + if (!dev) { + xa_unlock(&viommu->vdevs); + rc = -EINVAL; + goto out; + } + mdev = container_of(dev, struct mock_dev, dev); + + if (cur->flags & IOMMU_TEST_INVALIDATE_FLAG_ALL) { + /* Invalidate all cache entries and ignore cache_id */ + for (i = 0; i < MOCK_DEV_CACHE_NUM; i++) + mdev->cache[i] = 0; + } else { + mdev->cache[cur->cache_id] = 0; + } + xa_unlock(&viommu->vdevs); + + cur++; + } +out: + array->entry_num = cur - cmds; + kfree(cmds); + return rc; +} + static struct iommufd_viommu_ops mock_viommu_ops = { .free = mock_viommu_free, .alloc_domain_nested = mock_viommu_alloc_domain_nested, + .cache_invalidate = mock_viommu_cache_invalidate, };
static struct iommufd_viommu *mock_viommu_alloc(struct device *dev, @@ -779,7 +851,7 @@ static void mock_dev_release(struct device *dev) static struct mock_dev *mock_dev_create(unsigned long dev_flags) { struct mock_dev *mdev; - int rc; + int rc, i;
if (dev_flags & ~(MOCK_FLAGS_DEVICE_NO_DIRTY | MOCK_FLAGS_DEVICE_HUGE_IOVA)) @@ -793,6 +865,8 @@ static struct mock_dev *mock_dev_create(unsigned long dev_flags) mdev->flags = dev_flags; mdev->dev.release = mock_dev_release; mdev->dev.bus = &iommufd_mock_bus_type.bus; + for (i = 0; i < MOCK_DEV_CACHE_NUM; i++) + mdev->cache[i] = IOMMU_TEST_DEV_CACHE_DEFAULT;
rc = ida_alloc(&mock_dev_ida, GFP_KERNEL); if (rc < 0)
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:51 AM
Similar to the coverage of cache_invalidate_user for iotlb invalidation, add a device cache and a viommu_cache_invalidate function to test it out.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
Similar to IOMMU_TEST_OP_MD_CHECK_IOTLB verifying a mock_domain's iotlb, IOMMU_TEST_OP_DEV_CHECK_CACHE will be used to verify a mock_dev's cache.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_test.h | 5 ++++ tools/testing/selftests/iommu/iommufd_utils.h | 24 +++++++++++++++++++ drivers/iommu/iommufd/selftest.c | 22 +++++++++++++++++ tools/testing/selftests/iommu/iommufd.c | 7 +++++- 4 files changed, 57 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 46558f83e734..a6b7a163f636 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -23,6 +23,7 @@ enum { IOMMU_TEST_OP_DIRTY, IOMMU_TEST_OP_MD_CHECK_IOTLB, IOMMU_TEST_OP_TRIGGER_IOPF, + IOMMU_TEST_OP_DEV_CHECK_CACHE, };
enum { @@ -140,6 +141,10 @@ struct iommu_test_cmd { __u32 perm; __u64 addr; } trigger_iopf; + struct { + __u32 id; + __u32 cache; + } check_dev_cache; }; __u32 last; }; diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 5b17d7b2ac5c..3ae6cb5eed7d 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -234,6 +234,30 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, __u32 ft_i test_cmd_hwpt_check_iotlb(hwpt_id, i, expected); \ })
+#define test_cmd_dev_check_cache(device_id, cache_id, expected) \ + ({ \ + struct iommu_test_cmd test_cmd = { \ + .size = sizeof(test_cmd), \ + .op = IOMMU_TEST_OP_DEV_CHECK_CACHE, \ + .id = device_id, \ + .check_dev_cache = { \ + .id = cache_id, \ + .cache = expected, \ + }, \ + }; \ + ASSERT_EQ(0, ioctl(self->fd, \ + _IOMMU_TEST_CMD( \ + IOMMU_TEST_OP_DEV_CHECK_CACHE), \ + &test_cmd)); \ + }) + +#define test_cmd_dev_check_cache_all(device_id, expected) \ + ({ \ + int c; \ + for (c = 0; c < MOCK_DEV_CACHE_NUM; c++) \ + test_cmd_dev_check_cache(device_id, c, expected); \ + }) + static int _test_cmd_hwpt_invalidate(int fd, __u32 hwpt_id, void *reqs, uint32_t data_type, uint32_t lreq, uint32_t *nreqs) diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 01556854f2f2..7bb47b4a6d63 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -1122,6 +1122,24 @@ static int iommufd_test_md_check_iotlb(struct iommufd_ucmd *ucmd, return rc; }
+static int iommufd_test_dev_check_cache(struct iommufd_ucmd *ucmd, u32 idev_id, + unsigned int cache_id, u32 cache) +{ + struct iommufd_device *idev; + struct mock_dev *mdev; + int rc = 0; + + idev = iommufd_get_device(ucmd, idev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + mdev = container_of(idev->dev, struct mock_dev, dev); + + if (cache_id > MOCK_DEV_CACHE_ID_MAX || mdev->cache[cache_id] != cache) + rc = -EINVAL; + iommufd_put_object(ucmd->ictx, &idev->obj); + return rc; +} + struct selftest_access { struct iommufd_access *access; struct file *file; @@ -1631,6 +1649,10 @@ int iommufd_test(struct iommufd_ucmd *ucmd) return iommufd_test_md_check_iotlb(ucmd, cmd->id, cmd->check_iotlb.id, cmd->check_iotlb.iotlb); + case IOMMU_TEST_OP_DEV_CHECK_CACHE: + return iommufd_test_dev_check_cache(ucmd, cmd->id, + cmd->check_dev_cache.id, + cmd->check_dev_cache.cache); case IOMMU_TEST_OP_CREATE_ACCESS: return iommufd_test_create_access(ucmd, cmd->id, cmd->create_access.flags); diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 44fbc7e5aa2e..ff0181e5db48 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -222,6 +222,8 @@ FIXTURE_SETUP(iommufd_ioas) for (i = 0; i != variant->mock_domains; i++) { test_cmd_mock_domain(self->ioas_id, &self->stdev_id, &self->hwpt_id, &self->device_id); + test_cmd_dev_check_cache_all(self->device_id, + IOMMU_TEST_DEV_CACHE_DEFAULT); self->base_iova = MOCK_APERTURE_START; } } @@ -1386,9 +1388,12 @@ FIXTURE_SETUP(iommufd_mock_domain)
ASSERT_GE(ARRAY_SIZE(self->hwpt_ids), variant->mock_domains);
- for (i = 0; i != variant->mock_domains; i++) + for (i = 0; i != variant->mock_domains; i++) { test_cmd_mock_domain(self->ioas_id, &self->stdev_ids[i], &self->hwpt_ids[i], &self->idev_ids[i]); + test_cmd_dev_check_cache_all(self->idev_ids[0], + IOMMU_TEST_DEV_CACHE_DEFAULT); + } self->hwpt_id = self->hwpt_ids[0];
self->mmap_flags = MAP_SHARED | MAP_ANONYMOUS;
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:51 AM
Similar to IOMMU_TEST_OP_MD_CHECK_IOTLB verifying a mock_domain's iotlb, IOMMU_TEST_OP_DEV_CHECK_CACHE will be used to verify a mock_dev's cache.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
Add a viommu_cache test function to cover vIOMMU invalidations using the updated IOMMU_HWPT_INVALIDATE ioctl, which now allows passing in a vIOMMU via its hwpt_id field.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- tools/testing/selftests/iommu/iommufd_utils.h | 32 ++++ tools/testing/selftests/iommu/iommufd.c | 173 ++++++++++++++++++ 2 files changed, 205 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 3ae6cb5eed7d..aa458c80ad30 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -289,6 +289,38 @@ static int _test_cmd_hwpt_invalidate(int fd, __u32 hwpt_id, void *reqs, data_type, lreq, nreqs)); \ })
+static int _test_cmd_viommu_invalidate(int fd, __u32 viommu_id, void *reqs, + uint32_t data_type, uint32_t lreq, + uint32_t *nreqs) +{ + struct iommu_hwpt_invalidate cmd = { + .size = sizeof(cmd), + .hwpt_id = viommu_id, + .data_type = data_type, + .data_uptr = (uint64_t)reqs, + .entry_len = lreq, + .entry_num = *nreqs, + }; + int rc = ioctl(fd, IOMMU_HWPT_INVALIDATE, &cmd); + *nreqs = cmd.entry_num; + return rc; +} + +#define test_cmd_viommu_invalidate(viommu, reqs, lreq, nreqs) \ + ({ \ + ASSERT_EQ(0, \ + _test_cmd_viommu_invalidate(self->fd, viommu, reqs, \ + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, \ + lreq, nreqs)); \ + }) +#define test_err_viommu_invalidate(_errno, viommu_id, reqs, data_type, lreq, \ + nreqs) \ + ({ \ + EXPECT_ERRNO(_errno, _test_cmd_viommu_invalidate( \ + self->fd, viommu_id, reqs, \ + data_type, lreq, nreqs)); \ + }) + static int _test_cmd_access_replace_ioas(int fd, __u32 access_id, unsigned int ioas_id) { diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index ff0181e5db48..9a7d4b9c44f6 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -2498,4 +2498,177 @@ TEST_F(iommufd_viommu, vdevice_alloc) } }
+TEST_F(iommufd_viommu, vdevice_cache) +{ + struct iommu_viommu_invalidate_selftest inv_reqs[2] = {}; + uint32_t viommu_id = self->viommu_id; + uint32_t dev_id = self->device_id; + uint32_t vdev_id = 0; + uint32_t num_inv; + + if (dev_id) { + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id); + + test_cmd_dev_check_cache_all(dev_id, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* Check data_type by passing zero-length array */ + num_inv = 0; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: Invalid data_type */ + num_inv = 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: structure size sanity */ + num_inv = 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs) + 1, &num_inv); + assert(!num_inv); + + num_inv = 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + 1, &num_inv); + assert(!num_inv); + + /* Negative test: invalid flag is passed */ + num_inv = 1; + inv_reqs[0].flags = 0xffffffff; + inv_reqs[0].vdev_id = 0x99; + test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: invalid data_uptr when array is not empty */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + test_err_viommu_invalidate(EINVAL, viommu_id, NULL, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: invalid entry_len when array is not empty */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + 0, &num_inv); + assert(!num_inv); + + /* Negative test: invalid cache_id */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = MOCK_DEV_CACHE_ID_MAX + 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: invalid vdev_id */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x9; + inv_reqs[0].cache_id = 0; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* + * Invalidate the 1st cache entry but fail the 2nd request + * due to invalid flags configuration in the 2nd request. + */ + num_inv = 2; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 0; + inv_reqs[1].flags = 0xffffffff; + inv_reqs[1].vdev_id = 0x99; + inv_reqs[1].cache_id = 1; + test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache(dev_id, 0, 0); + test_cmd_dev_check_cache(dev_id, 1, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 2, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 3, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* + * Invalidate the 1st cache entry but fail the 2nd request + * due to invalid cache_id configuration in the 2nd request. + */ + num_inv = 2; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 0; + inv_reqs[1].flags = 0; + inv_reqs[1].vdev_id = 0x99; + inv_reqs[1].cache_id = MOCK_DEV_CACHE_ID_MAX + 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache(dev_id, 0, 0); + test_cmd_dev_check_cache(dev_id, 1, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 2, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 3, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* Invalidate the 2nd cache entry and verify */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 1; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache(dev_id, 0, 0); + test_cmd_dev_check_cache(dev_id, 1, 0); + test_cmd_dev_check_cache(dev_id, 2, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 3, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* Invalidate the 3rd and 4th cache entries and verify */ + num_inv = 2; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 2; + inv_reqs[1].flags = 0; + inv_reqs[1].vdev_id = 0x99; + inv_reqs[1].cache_id = 3; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 2); + test_cmd_dev_check_cache_all(dev_id, 0); + + /* Invalidate all cache entries for nested_dev_id[1] and verify */ + num_inv = 1; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_ALL; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache_all(dev_id, 0); + test_ioctl_destroy(vdev_id); + } +} + TEST_HARNESS_MAIN
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:51 AM
Add a viommu_cache test function to cover vIOMMU invalidations using the updated IOMMU_HWPT_INVALIDATE ioctl, which now allows passing in a vIOMMU via its hwpt_id field.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
With the introduction of the new object and its infrastructure, update the doc and the vIOMMU graph to reflect that.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- Documentation/userspace-api/iommufd.rst | 41 +++++++++++++++++++------ 1 file changed, 32 insertions(+), 9 deletions(-)
diff --git a/Documentation/userspace-api/iommufd.rst b/Documentation/userspace-api/iommufd.rst index 92d16efad5b0..3c27cc92c2cb 100644 --- a/Documentation/userspace-api/iommufd.rst +++ b/Documentation/userspace-api/iommufd.rst @@ -94,6 +94,19 @@ Following IOMMUFD objects are exposed to userspace: backed by corresponding vIOMMU objects, in which case a guest OS would do the "dispatch" naturally instead of VMM trappings.
+ - IOMMUFD_OBJ_VDEVICE, representing a virtual device for an IOMMUFD_OBJ_DEVICE + against an IOMMUFD_OBJ_VIOMMU. This virtual device holds the device's virtual + information or attributes (related to the vIOMMU) in a VM. An immediate vDATA + example can be the virtual ID of the device on a vIOMMU, which is a unique ID + that VMM assigns to the device for a translation channel/port of the vIOMMU, + e.g. vSID of ARM SMMUv3, vDeviceID of AMD IOMMU, and vID of Intel VT-d to a + Context Table. Potential use cases of some advanced security information can + be forwarded via this object too, such as security level or realm information + in a Confidential Compute Architecture. A VMM should create a vDEVICE object + to forward all the device information in a VM, when it connects a device to a + vIOMMU, which is a separate ioctl call from attaching the same device to an + HWPT_PAGING that the vIOMMU holds. + All user-visible objects are destroyed via the IOMMU_DESTROY uAPI.
The diagrams below show relationships between user-visible objects and kernel @@ -133,16 +146,16 @@ creating the objects and links:: |____________| |____________| |______|
_______________________________________________________________________ - | iommufd (with vIOMMU) | + | iommufd (with vIOMMU/vDEVICE) | | | - | [5] | - | _____________ | - | | | | - | |----------------| vIOMMU | | - | | | | | - | | | | | - | | [1] | | [4] [2] | - | | ______ | | _____________ ________ | + | [5] [6] | + | _____________ _____________ | + | | | | | | + | |----------------| vIOMMU |<---| vDEVICE |<----| | + | | | | |_____________| | | + | | | | | | + | | [1] | | [4] | [2] | + | | ______ | | _____________ _|______ | | | | | | [3] | | | | | | | | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | | | |______| |_____________| |_____________| |________| | @@ -215,6 +228,15 @@ creating the objects and links:: the vIOMMU object and the HWPT_PAGING, then this vIOMMU object can be used as a nesting parent object to allocate an HWPT_NESTED object described above.
+6. IOMMUFD_OBJ_VDEVICE can be only manually created via the IOMMU_VDEVICE_ALLOC + uAPI, provided a viommu_id for an iommufd_viommu object and a dev_id for an + iommufd_device object. The vDEVICE object will be the binding between these + two parent objects. Another @virt_id will be also set via the uAPI providing + the iommufd core an index to store the vDEVICE object to a vDEVICE array per + vIOMMU. If necessary, the IOMMU driver may choose to implement a vdevce_alloc + op to init its HW for virtualization feature related to a vDEVICE. Successful + completion of this operation sets up the linkages between vIOMMU and device. + A device can only bind to an iommufd due to DMA ownership claim and attach to at most one IOAS object (no support of PASID yet).
@@ -228,6 +250,7 @@ User visible objects are backed by following datastructures: - iommufd_hwpt_paging for IOMMUFD_OBJ_HWPT_PAGING. - iommufd_hwpt_nested for IOMMUFD_OBJ_HWPT_NESTED. - iommufd_viommu for IOMMUFD_OBJ_VIOMMU. +- iommufd_vdevice for IOMMUFD_OBJ_VDEVICE
Several terminologies when looking at these datastructures:
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:51 AM
With the introduction of the new object and its infrastructure, update the doc and the vIOMMU graph to reflect that.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
Implement the vIOMMU's cache_invalidate op for user space to invalidate the IOTLB entries, Device ATS and CD entries that are still cached by hardware.
Add struct iommu_viommu_arm_smmuv3_invalidate defining invalidation entries that are simply in the native format of a 128-bit TLBI command. Scan those commands against the permitted command list and fix their VMID/SID fields.
Co-developed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Eric Auger eric.auger@redhat.com Co-developed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 + include/uapi/linux/iommufd.h | 24 ++++ .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 130 ++++++++++++++++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 6 +- 4 files changed, 162 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 5a025d310dbe..8bd740f537ee 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -529,6 +529,7 @@ struct arm_smmu_cmdq_ent { #define CMDQ_OP_TLBI_NH_ALL 0x10 #define CMDQ_OP_TLBI_NH_ASID 0x11 #define CMDQ_OP_TLBI_NH_VA 0x12 + #define CMDQ_OP_TLBI_NH_VAA 0x13 #define CMDQ_OP_TLBI_EL2_ALL 0x20 #define CMDQ_OP_TLBI_EL2_ASID 0x21 #define CMDQ_OP_TLBI_EL2_VA 0x22 @@ -949,6 +950,10 @@ void arm_smmu_attach_commit(struct arm_smmu_attach_state *state); void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master, const struct arm_smmu_ste *target);
+int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, + struct arm_smmu_cmdq *cmdq, u64 *cmds, int n, + bool sync); + #ifdef CONFIG_ARM_SMMU_V3_SVA bool arm_smmu_sva_supported(struct arm_smmu_device *smmu); bool arm_smmu_master_sva_supported(struct arm_smmu_master *master); diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index c2c5f49fdf17..8e66e2fde1dd 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -686,9 +686,11 @@ struct iommu_hwpt_get_dirty_bitmap { * enum iommu_hwpt_invalidate_data_type - IOMMU HWPT Cache Invalidation * Data Type * @IOMMU_HWPT_INVALIDATE_DATA_VTD_S1: Invalidation data for VTD_S1 + * @IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3: Invalidation data for ARM SMMUv3 */ enum iommu_hwpt_invalidate_data_type { IOMMU_HWPT_INVALIDATE_DATA_VTD_S1 = 0, + IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3 = 1, };
/** @@ -727,6 +729,28 @@ struct iommu_hwpt_vtd_s1_invalidate { __u32 __reserved; };
+/** + * struct iommu_viommu_arm_smmuv3_invalidate - ARM SMMUv3 cahce invalidation + * (IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3) + * @cmd: 128-bit cache invalidation command that runs in SMMU CMDQ. + * Must be little-endian. + * + * Supported command list only when passing in a vIOMMU via @hwpt_id: + * CMDQ_OP_TLBI_NSNH_ALL + * CMDQ_OP_TLBI_NH_VA + * CMDQ_OP_TLBI_NH_VAA + * CMDQ_OP_TLBI_NH_ALL + * CMDQ_OP_TLBI_NH_ASID + * CMDQ_OP_ATC_INV + * CMDQ_OP_CFGI_CD + * CMDQ_OP_CFGI_CD_ALL + * + * -EIO will be returned if the command is not supported. + */ +struct iommu_viommu_arm_smmuv3_invalidate { + __aligned_le64 cmd[2]; +}; + /** * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE) * @size: sizeof(struct iommu_hwpt_invalidate) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index abb6d2868376..2479074db820 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -184,9 +184,131 @@ arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, return &nested_domain->domain; }
+static int arm_vsmmu_vsid_to_sid(struct arm_vsmmu *vsmmu, u32 vsid, u32 *sid) +{ + struct arm_smmu_master *master; + struct device *dev; + int ret = 0; + + xa_lock(&vsmmu->core.vdevs); + dev = iommufd_viommu_find_dev(&vsmmu->core, (unsigned long)vsid); + if (!dev) { + ret = -EIO; + goto unlock; + } + master = dev_iommu_priv_get(dev); + + /* At this moment, iommufd only supports PCI device that has one SID */ + if (sid) + *sid = master->streams[0].id; +unlock: + xa_unlock(&vsmmu->core.vdevs); + return ret; +} + +/* This is basically iommu_viommu_arm_smmuv3_invalidate in u64 for conversion */ +struct arm_vsmmu_invalidation_cmd { + u64 cmd[2]; +}; + +/* + * Convert, in place, the raw invalidation command into an internal format that + * can be passed to arm_smmu_cmdq_issue_cmdlist(). Internally commands are + * stored in CPU endian. + * + * Enforce the VMID or SID on the command. + */ +static int arm_vsmmu_convert_user_cmd(struct arm_vsmmu *vsmmu, + struct arm_vsmmu_invalidation_cmd *cmd) +{ + /* Commands are le64 stored in u64 */ + cmd->cmd[0] = le64_to_cpu((__force __le64)cmd->cmd[0]); + cmd->cmd[1] = le64_to_cpu((__force __le64)cmd->cmd[1]); + + switch (cmd->cmd[0] & CMDQ_0_OP) { + case CMDQ_OP_TLBI_NSNH_ALL: + /* Convert to NH_ALL */ + cmd->cmd[0] = CMDQ_OP_TLBI_NH_ALL | + FIELD_PREP(CMDQ_TLBI_0_VMID, vsmmu->vmid); + cmd->cmd[1] = 0; + break; + case CMDQ_OP_TLBI_NH_VA: + case CMDQ_OP_TLBI_NH_VAA: + case CMDQ_OP_TLBI_NH_ALL: + case CMDQ_OP_TLBI_NH_ASID: + cmd->cmd[0] &= ~CMDQ_TLBI_0_VMID; + cmd->cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, vsmmu->vmid); + break; + case CMDQ_OP_ATC_INV: + case CMDQ_OP_CFGI_CD: + case CMDQ_OP_CFGI_CD_ALL: { + u32 sid, vsid = FIELD_GET(CMDQ_CFGI_0_SID, cmd->cmd[0]); + + if (arm_vsmmu_vsid_to_sid(vsmmu, vsid, &sid)) + return -EIO; + cmd->cmd[0] &= ~CMDQ_CFGI_0_SID; + cmd->cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, sid); + break; + } + default: + return -EIO; + } + return 0; +} + +static int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu, + struct iommu_user_data_array *array) +{ + struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core); + struct arm_smmu_device *smmu = vsmmu->smmu; + struct arm_vsmmu_invalidation_cmd *last; + struct arm_vsmmu_invalidation_cmd *cmds; + struct arm_vsmmu_invalidation_cmd *cur; + struct arm_vsmmu_invalidation_cmd *end; + int ret; + + cmds = kcalloc(array->entry_num, sizeof(*cmds), GFP_KERNEL); + if (!cmds) + return -ENOMEM; + cur = cmds; + end = cmds + array->entry_num; + + static_assert(sizeof(*cmds) == 2 * sizeof(u64)); + ret = iommu_copy_struct_from_full_user_array( + cmds, sizeof(*cmds), array, + IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3); + if (ret) + goto out; + + last = cmds; + while (cur != end) { + ret = arm_vsmmu_convert_user_cmd(vsmmu, cur); + if (ret) + goto out; + + /* FIXME work in blocks of CMDQ_BATCH_ENTRIES and copy each block? */ + cur++; + if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1) + continue; + + /* FIXME always uses the main cmdq rather than trying to group by type */ + ret = arm_smmu_cmdq_issue_cmdlist(smmu, &smmu->cmdq, last->cmd, + cur - last, true); + if (ret) { + cur--; + goto out; + } + last = cur; + } +out: + array->entry_num = cur - cmds; + kfree(cmds); + return ret; +}
static const struct iommufd_viommu_ops arm_vsmmu_ops = { .alloc_domain_nested = arm_vsmmu_alloc_domain_nested, + .cache_invalidate = arm_vsmmu_cache_invalidate, };
struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, @@ -206,6 +328,14 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, if (!(smmu->features & ARM_SMMU_FEAT_NESTING)) return ERR_PTR(-EOPNOTSUPP);
+ /* + * FORCE_SYNC is not set with FEAT_NESTING. Some study of the exact HW + * defect is needed to determine if arm_vsmmu_cache_invalidate() needs + * any change to remove this. + */ + if (WARN_ON(smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC)) + return ERR_PTR(-EOPNOTSUPP); + /* * Must support some way to prevent the VM from bypassing the cache * because VFIO currently does not do any cache maintenance. diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 8215c49d3bac..d1abfb42d828 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -766,9 +766,9 @@ static void arm_smmu_cmdq_write_entries(struct arm_smmu_cmdq *cmdq, u64 *cmds, * insert their own list of commands then all of the commands from one * CPU will appear before any of the commands from the other CPU. */ -static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, - struct arm_smmu_cmdq *cmdq, - u64 *cmds, int n, bool sync) +int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, + struct arm_smmu_cmdq *cmdq, u64 *cmds, int n, + bool sync) { u64 cmd_sync[CMDQ_ENT_DWORDS]; u32 prod;
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:51 AM
Implement the vIOMMU's cache_invalidate op for user space to invalidate the IOTLB entries, Device ATS and CD entries that are still cached by hardware.
Add struct iommu_viommu_arm_smmuv3_invalidate defining invalidation
there are three 'iommu' in the name. What about:
arm_vsmmuv3_invalidate?
From: Jason Gunthorpe jgg@nvidia.com
Now, ATC invalidation can be done with the vIOMMU invalidation op. A guest owned IOMMU_DOMAIN_NESTED can do an ATS too. Allow it to pass in the EATS field via the vSTE words.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 4 ++- include/uapi/linux/iommufd.h | 2 +- .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 31 ++++++++++++++++--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 +++++++++++++--- 4 files changed, 53 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 8bd740f537ee..af25f092303f 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -305,7 +305,7 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid) #define STRTAB_STE_1_NESTING_ALLOWED \ cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR | \ STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH | \ - STRTAB_STE_1_S1STALLD) + STRTAB_STE_1_S1STALLD | STRTAB_STE_1_EATS)
/* * Context descriptors. @@ -838,6 +838,7 @@ struct arm_smmu_domain { struct arm_smmu_nested_domain { struct iommu_domain domain; struct arm_vsmmu *vsmmu; + bool enable_ats : 1;
__le64 ste[2]; }; @@ -879,6 +880,7 @@ struct arm_smmu_master_domain { struct list_head devices_elm; struct arm_smmu_master *master; ioasid_t ssid; + bool nested_ats_flush : 1; };
static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 8e66e2fde1dd..056ba05a8022 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -404,7 +404,7 @@ struct iommu_hwpt_vtd_s1 { * a user stage-1 Context Descriptor Table. Must be little-endian. * Allowed fields: (Refer to "5.2 Stream Table Entry" in SMMUv3 HW Spec) * - word-0: V, Cfg, S1Fmt, S1ContextPtr, S1CDMax - * - word-1: S1DSS, S1CIR, S1COR, S1CSH, S1STALLD + * - word-1: EATS, S1DSS, S1CIR, S1COR, S1CSH, S1STALLD * * -EIO will be returned if @ste is not legal or contains any non-allowed field. * Cfg can be used to select a S1, Bypass or Abort configuration. A Bypass diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index 2479074db820..c0c5cd807d34 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -96,8 +96,6 @@ static int arm_smmu_attach_dev_nested(struct iommu_domain *domain, .master = master, .old_domain = iommu_get_domain_for_dev(dev), .ssid = IOMMU_NO_PASID, - /* Currently invalidation of ATC is not supported */ - .disable_ats = true, }; struct arm_smmu_ste ste; int ret; @@ -108,6 +106,15 @@ static int arm_smmu_attach_dev_nested(struct iommu_domain *domain, return -EBUSY;
mutex_lock(&arm_smmu_asid_lock); + /* + * The VM has to control the actual ATS state at the PCI device because + * we forward the invalidations directly from the VM. If the VM doesn't + * think ATS is on it will not generate ATC flushes and the ATC will + * become incoherent. Since we can't access the actual virtual PCI ATS + * config bit here base this off the EATS value in the STE. If the EATS + * is set then the VM must generate ATC flushes. + */ + state.disable_ats = !nested_domain->enable_ats; ret = arm_smmu_attach_prepare(&state, domain); if (ret) { mutex_unlock(&arm_smmu_asid_lock); @@ -132,8 +139,10 @@ static const struct iommu_domain_ops arm_smmu_nested_ops = { .free = arm_smmu_domain_nested_free, };
-static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg) +static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg, + bool *enable_ats) { + unsigned int eats; unsigned int cfg;
if (!(arg->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) { @@ -150,6 +159,18 @@ static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg) if (cfg != STRTAB_STE_0_CFG_ABORT && cfg != STRTAB_STE_0_CFG_BYPASS && cfg != STRTAB_STE_0_CFG_S1_TRANS) return -EIO; + + /* + * Only Full ATS or ATS UR is supported + * The EATS field will be set by arm_smmu_make_nested_domain_ste() + */ + eats = FIELD_GET(STRTAB_STE_1_EATS, le64_to_cpu(arg->ste[1])); + arg->ste[1] &= ~cpu_to_le64(STRTAB_STE_1_EATS); + if (eats != STRTAB_STE_1_EATS_ABT && eats != STRTAB_STE_1_EATS_TRANS) + return -EIO; + + if (cfg == STRTAB_STE_0_CFG_S1_TRANS) + *enable_ats = (eats == STRTAB_STE_1_EATS_TRANS); return 0; }
@@ -160,6 +181,7 @@ arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core); struct arm_smmu_nested_domain *nested_domain; struct iommu_hwpt_arm_smmuv3 arg; + bool enable_ats = false; int ret;
ret = iommu_copy_struct_from_user(&arg, user_data, @@ -167,7 +189,7 @@ arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, if (ret) return ERR_PTR(ret);
- ret = arm_smmu_validate_vste(&arg); + ret = arm_smmu_validate_vste(&arg, &enable_ats); if (ret) return ERR_PTR(ret);
@@ -177,6 +199,7 @@ arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu,
nested_domain->domain.type = IOMMU_DOMAIN_NESTED; nested_domain->domain.ops = &arm_smmu_nested_ops; + nested_domain->enable_ats = enable_ats; nested_domain->vsmmu = vsmmu; nested_domain->ste[0] = arg.ste[0]; nested_domain->ste[1] = arg.ste[1] & ~cpu_to_le64(STRTAB_STE_1_EATS); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index d1abfb42d828..10b4dbc8d027 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2107,7 +2107,16 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, if (!master->ats_enabled) continue;
- arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size, &cmd); + if (master_domain->nested_ats_flush) { + /* + * If a S2 used as a nesting parent is changed we have + * no option but to completely flush the ATC. + */ + arm_smmu_atc_inv_to_cmd(IOMMU_NO_PASID, 0, 0, &cmd); + } else { + arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size, + &cmd); + }
for (i = 0; i < master->num_streams; i++) { cmd.atc.sid = master->streams[i].id; @@ -2631,7 +2640,7 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master) static struct arm_smmu_master_domain * arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain, struct arm_smmu_master *master, - ioasid_t ssid) + ioasid_t ssid, bool nested_ats_flush) { struct arm_smmu_master_domain *master_domain;
@@ -2640,7 +2649,8 @@ arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain, list_for_each_entry(master_domain, &smmu_domain->devices, devices_elm) { if (master_domain->master == master && - master_domain->ssid == ssid) + master_domain->ssid == ssid && + master_domain->nested_ats_flush == nested_ats_flush) return master_domain; } return NULL; @@ -2671,13 +2681,18 @@ static void arm_smmu_remove_master_domain(struct arm_smmu_master *master, { struct arm_smmu_domain *smmu_domain = to_smmu_domain_devices(domain); struct arm_smmu_master_domain *master_domain; + bool nested_ats_flush = false; unsigned long flags;
if (!smmu_domain) return;
+ if (domain->type == IOMMU_DOMAIN_NESTED) + nested_ats_flush = to_smmu_nested_domain(domain)->enable_ats; + spin_lock_irqsave(&smmu_domain->devices_lock, flags); - master_domain = arm_smmu_find_master_domain(smmu_domain, master, ssid); + master_domain = arm_smmu_find_master_domain(smmu_domain, master, ssid, + nested_ats_flush); if (master_domain) { list_del(&master_domain->devices_elm); kfree(master_domain); @@ -2744,6 +2759,9 @@ int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, return -ENOMEM; master_domain->master = master; master_domain->ssid = state->ssid; + if (new_domain->type == IOMMU_DOMAIN_NESTED) + master_domain->nested_ats_flush = + to_smmu_nested_domain(new_domain)->enable_ats;
/* * During prepare we want the current smmu_domain and new
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:50 AM
Following the previous vIOMMU series, this adds another vDEVICE structure, representing the association from an iommufd_device to an iommufd_viommu. This gives the whole architecture a new "v" layer:
| iommufd (with vIOMMU/vDEVICE) | | _____________ _____________ | | | | | | | | |----------------| vIOMMU |<---| vDEVICE |<------| | | | | | |_____________| | | | | ______ | | _____________ ___|____ | | | | | | | | | | | | | | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | | | |______| |_____________| |_____________| |________| |
|______|________|______________|__________________|_____________ __|_____| | | | | | ______v_____ | ______v_____ ______v_____ ___v__ | struct | | PFN | (paging) | | (nested) | |struct| |iommu_device| |------>|iommu_domain|<----|iommu_domain|<---- |device| |____________| storage|____________| |____________| |______|
This vDEVICE object is used to collect and store all vIOMMU-related device information/attributes in a VM. As an initial series for vDEVICE, add only the virt_id to the vDEVICE, which is a vIOMMU specific device ID in a VM: e.g. vSID of ARM SMMUv3, vDeviceID of AMD IOMMU, and vID of Intel VT-d
s/vID/vRID/ for VT-d
to a Context Table. This virt_id helps IOMMU drivers to link the vID to a pID of the device against the physical IOMMU instance. This is essential for a vIOMMU-based invalidation, where the request contains a device's vID for a device cache flush, e.g. ATC invalidation.
probably connect this to vCMDQ passthrough? otherwise for sw-based invalidation the userspace can always replace vID with pID before submitting the request.
Therefore, with this vDEVICE object, support a vIOMMU-based invalidation, by reusing IOMMUFD_CMD_HWPT_INVALIDATE for a vIOMMU object to flush cache with a given driver data.
As for the implementation of the series, add driver support in ARM SMMUv3 for a real world use case.
This series is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p2-v5
For testing, try this "with-rmr" branch: https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p2-v5- with-rmr Paring QEMU branch for testing: https://github.com/nicolinc/qemu/commits/wip/for_iommufd_viommu_p2- v5
Changelog v5
- Dropped driver-allocated vDEVICE support
- Changed vdev_to_dev helper to iommufd_viommu_find_dev
v4 https://lore.kernel.org/all/cover.1729555967.git.nicolinc@nvidia.com/
- Added missing brackets in switch-case
- Fixed the unreleased idev refcount issue
- Reworked the iommufd_vdevice_alloc allocator
- Dropped support for IOMMU_VIOMMU_TYPE_DEFAULT
- Added missing TEST_LENGTH and fail_nth coverages
- Added a verification to the driver-allocated vDEVICE object
- Added an iommufd_vdevice_abort for a missing mutex protection
- Added a u64 structure arm_vsmmu_invalidation_cmd for user command conversion
v3 https://lore.kernel.org/all/cover.1728491532.git.nicolinc@nvidia.com/
- Added Jason's Reviewed-by
- Split this invalidation part out of the part-1 series
- Repurposed VDEV_ID ioctl to a wider vDEVICE structure and ioctl
- Reduced viommu_api functions by allowing drivers to access viommu and vdevice structure directly
- Dropped vdevs_rwsem by using xa_lock instead
- Dropped arm_smmu_cache_invalidate_user
v2 https://lore.kernel.org/all/cover.1724776335.git.nicolinc@nvidia.com/
- Limited vdev_id to one per idev
- Added a rw_sem to protect the vdev_id list
- Reworked driver-level APIs with proper lockings
- Added a new viommu_api file for IOMMUFD_DRIVER config
- Dropped useless iommu_dev point from the viommu structure
- Added missing index numnbers to new types in the uAPI header
- Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT
one
- Reworked mock_viommu_cache_invalidate() using the new iommu helper
- Reordered details of set/unset_vdev_id handlers for proper lockings
v1 https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/
Thanks! Nicolin
Jason Gunthorpe (2): iommu: Add iommu_copy_struct_from_full_user_array helper iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED
Nicolin Chen (11): iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage iommu/viommu: Add cache_invalidate to iommufd_viommu_ops iommufd/hw_pagetable: Enforce invalidation op on vIOMMU-based hwpt_nested iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE iommufd/viommu: Add iommufd_viommu_find_dev helper iommufd/selftest: Add mock_viommu_cache_invalidate iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command iommufd/selftest: Add vIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl Documentation: userspace-api: iommufd: Update vDEVICE iommu/arm-smmu-v3: Add arm_vsmmu_cache_invalidate
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 9 +- drivers/iommu/iommufd/iommufd_private.h | 20 ++ drivers/iommu/iommufd/iommufd_test.h | 30 +++ include/linux/iommu.h | 48 ++++- include/linux/iommufd.h | 21 ++ include/uapi/linux/iommufd.h | 61 +++++- tools/testing/selftests/iommu/iommufd_utils.h | 83 +++++++ .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 161 +++++++++++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 32 ++- drivers/iommu/iommufd/device.c | 11 + drivers/iommu/iommufd/driver.c | 13 ++ drivers/iommu/iommufd/hw_pagetable.c | 36 +++- drivers/iommu/iommufd/main.c | 7 + drivers/iommu/iommufd/selftest.c | 98 ++++++++- drivers/iommu/iommufd/viommu.c | 101 +++++++++ tools/testing/selftests/iommu/iommufd.c | 204 +++++++++++++++++- .../selftests/iommu/iommufd_fail_nth.c | 4 + Documentation/userspace-api/iommufd.rst | 41 +++- 18 files changed, 942 insertions(+), 38 deletions(-)
-- 2.43.0
to a Context Table. This virt_id helps IOMMU drivers to link the vID to a pID of the device against the physical IOMMU instance. This is essential for a vIOMMU-based invalidation, where the request contains a device's vID for a device cache flush, e.g. ATC invalidation.
probably connect this to vCMDQ passthrough? otherwise for sw-based invalidation the userspace can always replace vID with pID before submitting the request.
You can't just do that, the ID in the invalidation command has to be validated by the kernel.
At that point you may as well just use the vID instead of inventing a new means to validate raw pIDs.
VT-D avoided this so far because it is pushing the invalidation through the device object and the device object provides the ID directly. No IDs in commands, no switching devices during a command batch.
Jason
From: Jason Gunthorpe jgg@nvidia.com Sent: Monday, October 28, 2024 10:17 PM
to a Context Table. This virt_id helps IOMMU drivers to link the vID to a pID of the device against the physical IOMMU instance. This is essential for a vIOMMU-based invalidation, where the request contains a device's vID
for a
device cache flush, e.g. ATC invalidation.
probably connect this to vCMDQ passthrough? otherwise for sw-based invalidation the userspace can always replace vID with pID before submitting the request.
You can't just do that, the ID in the invalidation command has to be validated by the kernel.
sure the ID must be validated to match the iommufd_device but not exactly going through a vID indirectly.
At that point you may as well just use the vID instead of inventing a new means to validate raw pIDs.
w/o VCMDQ stuff validating raw pID sounds the natural way while vID is more like a new means and not mandatory.
I'm fine with this design but just didn't feel the above description is accurate.
linux-kselftest-mirror@lists.linaro.org