This series introduces a new VIOMMU infrastructure and related ioctls.
IOMMUFD has been using the HWPT infrastructure for all cases, including a nested IO page table support. Yet, there're limitations for an HWPT-based structure to support some advanced HW-accelerated features, such as CMDQV on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-IOMMU environment, it is not straightforward for nested HWPTs to share the same parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone.
The new VIOMMU object is an additional layer, between the nested HWPT and its parent HWPT, to give to both the IOMMUFD core and an IOMMU driver an additional structure to support HW-accelerated feature: ---------------------------- ---------------- | | paging_hwpt0 | | hwpt_nested0 |--->| viommu0 ------------------ ---------------- | | HW-accel feats | ----------------------------
On a multi-IOMMU system, the VIOMMU object can be instanced to the number of vIOMMUs in a guest VM, while holding the same parent HWPT to share the stage-2 IO pagetable. Each VIOMMU then just need to only allocate its own VMID to attach the shared stage-2 IO pagetable to the physical IOMMU: ---------------------------- ---------------- | | paging_hwpt0 | | hwpt_nested0 |--->| viommu0 ------------------ ---------------- | | VMID0 | ---------------------------- ---------------------------- ---------------- | | paging_hwpt0 | | hwpt_nested1 |--->| viommu1 ------------------ ---------------- | | VMID1 | ----------------------------
As an initial part-1, add ioctls to support a VIOMMU-based invalidation: IOMMUFD_CMD_VIOMMU_ALLOC to allocate a VIOMMU object IOMMUFD_CMD_VIOMMU_SET/UNSET_VDEV_ID to set/clear device's virtual ID IOMMUFD_CMD_VIOMMU_INVALIDATE to flush cache by a given driver data
Worth noting that the VDEV_ID is for a per-VIOMMU device list for drivers to look up the device's physical instance from its virtual ID in a VM. It is essential for a VIOMMU-based invalidation where the request contains a device's virtual ID for its device cache flush, e.g. ATC invalidation.
As for the implementation of the series, add an IOMMU_VIOMMU_TYPE_DEFAULT type for a core-allocated-core-managed VIOMMU object, allowing drivers to simply hook a default viommu ops for viommu-based invalidation alone. And provide some viommu helpers to drivers for VDEV_ID translation and parent domain lookup. Introduce an IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3 for a real world use case. This adds supports of arm-smmuv-v3's CMDQ_OP_ATC_INV and CMDQ_OP_CFGI_CD/ALL commands, supplementing HWPT-based invalidations.
In the future, drivers will also be able to choose a driver-managed type to hold its own structure by adding a new type to enum iommu_viommu_type. More VIOMMU-based structures and ioctls will be introduced in part-2/3 to support a driver-managed VIOMMU, e.g. VQUEUE object for a HW accelerated queue, VIRQ (or VEVENT) object for IRQ injections. Although we repurposed the VIOMMU object from an earlier RFC discussion, for a referece: https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
This series is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p1-v1
Thanks! Nicolin
Jason Gunthorpe (1): iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED
Nicolin Chen (15): iommufd/viommu: Add IOMMUFD_OBJ_VIOMMU and IOMMU_VIOMMU_ALLOC ioctl iommu: Pass in a viommu pointer to domain_alloc_user op iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage iommufd/viommu: Add IOMMU_VIOMMU_SET/UNSET_VDEV_ID ioctl iommufd/selftest: Add IOMMU_VIOMMU_SET/UNSET_VDEV_ID test coverage iommufd/viommu: Add cache_invalidate for IOMMU_VIOMMU_TYPE_DEFAULT iommufd/viommu: Add IOMMU_VIOMMU_INVALIDATE ioctl iommufd/viommu: Make iommufd_viommu_find_device a public API iommufd/selftest: Add mock_viommu_invalidate_user op iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command iommufd/selftest: Add coverage for IOMMU_VIOMMU_INVALIDATE ioctl iommufd/viommu: Add iommufd_viommu_to_parent_domain helper iommu/arm-smmu-v3: Extract an __arm_smmu_cache_invalidate_user helper iommu/arm-smmu-v3: Add viommu cache invalidation support
drivers/iommu/amd/iommu.c | 1 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 90 +++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 + drivers/iommu/intel/iommu.c | 1 + drivers/iommu/iommufd/Makefile | 3 +- drivers/iommu/iommufd/device.c | 9 + drivers/iommu/iommufd/hw_pagetable.c | 27 +- drivers/iommu/iommufd/iommufd_private.h | 37 +++ drivers/iommu/iommufd/iommufd_test.h | 30 ++ drivers/iommu/iommufd/main.c | 15 + drivers/iommu/iommufd/selftest.c | 88 +++++- drivers/iommu/iommufd/viommu.c | 249 +++++++++++++++++ include/linux/iommu.h | 6 + include/linux/iommufd.h | 35 +++ include/uapi/linux/iommufd.h | 139 ++++++++- tools/testing/selftests/iommu/iommufd.c | 263 +++++++++++++++++- tools/testing/selftests/iommu/iommufd_utils.h | 126 +++++++++ 17 files changed, 1095 insertions(+), 26 deletions(-) create mode 100644 drivers/iommu/iommufd/viommu.c
Add a new IOMMUFD_OBJ_VIOMMU with an iommufd_viommu structure to represent a vIOMMU instance in the user space, backed by a physical IOMMU for its HW accelerated virtualization feature, such as nested translation support for a multi-viommu-instance VM, NVIDIA CMDQ-Virtualization extension for ARM SMMUv3, and AMD Hardware Accelerated Virtualized IOMMU (vIOMMU).
Also, add a new ioctl for user space to do a viommu allocation. It must be based on a nested parent HWPT, so take its refcount.
As an initial version, support a viommu of IOMMU_VIOMMU_TYPE_DEFAULT type. IOMMUFD core can use this viommu to store a virtual device ID lookup table in a following patch.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/Makefile | 3 +- drivers/iommu/iommufd/iommufd_private.h | 13 +++++ drivers/iommu/iommufd/main.c | 6 ++ drivers/iommu/iommufd/viommu.c | 75 +++++++++++++++++++++++++ include/uapi/linux/iommufd.h | 30 ++++++++++ 5 files changed, 126 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/iommufd/viommu.c
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile index cf4605962bea..df490e836b30 100644 --- a/drivers/iommu/iommufd/Makefile +++ b/drivers/iommu/iommufd/Makefile @@ -7,7 +7,8 @@ iommufd-y := \ ioas.o \ main.o \ pages.o \ - vfio_compat.o + vfio_compat.o \ + viommu.o
iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 5d3768d77099..0e3755408911 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -131,6 +131,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_IOAS, IOMMUFD_OBJ_ACCESS, IOMMUFD_OBJ_FAULT, + IOMMUFD_OBJ_VIOMMU, #ifdef CONFIG_IOMMUFD_TEST IOMMUFD_OBJ_SELFTEST, #endif @@ -526,6 +527,18 @@ static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev, return iommu_group_replace_domain(idev->igroup->group, hwpt->domain); }
+struct iommufd_viommu { + struct iommufd_object obj; + struct iommufd_ctx *ictx; + struct iommu_device *iommu_dev; + struct iommufd_hwpt_paging *hwpt; + + unsigned int type; +}; + +int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); +void iommufd_viommu_destroy(struct iommufd_object *obj); + #ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); void iommufd_selftest_destroy(struct iommufd_object *obj); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index b5f5d27ee963..288ee51b6829 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -333,6 +333,7 @@ union ucmd_buffer { struct iommu_ioas_unmap unmap; struct iommu_option option; struct iommu_vfio_ioas vfio_ioas; + struct iommu_viommu_alloc viommu; #ifdef CONFIG_IOMMUFD_TEST struct iommu_test_cmd test; #endif @@ -384,6 +385,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { val64), IOCTL_OP(IOMMU_VFIO_IOAS, iommufd_vfio_ioas, struct iommu_vfio_ioas, __reserved), + IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, + struct iommu_viommu_alloc, out_viommu_id), #ifdef CONFIG_IOMMUFD_TEST IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), #endif @@ -519,6 +522,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { [IOMMUFD_OBJ_FAULT] = { .destroy = iommufd_fault_destroy, }, + [IOMMUFD_OBJ_VIOMMU] = { + .destroy = iommufd_viommu_destroy, + }, #ifdef CONFIG_IOMMUFD_TEST [IOMMUFD_OBJ_SELFTEST] = { .destroy = iommufd_selftest_destroy, diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c new file mode 100644 index 000000000000..35ad6a77c9c1 --- /dev/null +++ b/drivers/iommu/iommufd/viommu.c @@ -0,0 +1,75 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ + +#include <linux/iommufd.h> + +#include "iommufd_private.h" + +void iommufd_viommu_destroy(struct iommufd_object *obj) +{ + struct iommufd_viommu *viommu = + container_of(obj, struct iommufd_viommu, obj); + + refcount_dec(&viommu->hwpt->common.obj.users); +} + +int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) +{ + struct iommu_viommu_alloc *cmd = ucmd->cmd; + struct iommufd_hwpt_paging *hwpt_paging; + struct iommufd_viommu *viommu; + struct iommufd_device *idev; + int rc; + + if (cmd->flags) + return -EOPNOTSUPP; + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + + hwpt_paging = iommufd_get_hwpt_paging(ucmd, cmd->hwpt_id); + if (IS_ERR(hwpt_paging)) { + rc = PTR_ERR(hwpt_paging); + goto out_put_idev; + } + + if (!hwpt_paging->nest_parent) { + rc = -EINVAL; + goto out_put_hwpt; + } + + if (cmd->type != IOMMU_VIOMMU_TYPE_DEFAULT) { + rc = -EOPNOTSUPP; + goto out_put_hwpt; + } + + viommu = iommufd_object_alloc(ucmd->ictx, viommu, IOMMUFD_OBJ_VIOMMU); + if (IS_ERR(viommu)) { + rc = PTR_ERR(viommu); + goto out_put_hwpt; + } + + viommu->type = cmd->type; + viommu->ictx = ucmd->ictx; + viommu->hwpt = hwpt_paging; + viommu->iommu_dev = idev->dev->iommu->iommu_dev; + + refcount_inc(&viommu->hwpt->common.obj.users); + + cmd->out_viommu_id = viommu->obj.id; + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + if (rc) + goto out_abort; + iommufd_object_finalize(ucmd->ictx, &viommu->obj); + goto out_put_hwpt; + +out_abort: + iommufd_object_abort_and_destroy(ucmd->ictx, &viommu->obj); +out_put_hwpt: + iommufd_put_object(ucmd->ictx, &hwpt_paging->common.obj); +out_put_idev: + iommufd_put_object(ucmd->ictx, &idev->obj); + return rc; +} diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index d42e0471156f..ed6cdc2e0be1 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -51,6 +51,7 @@ enum { IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP = 0x8c, IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d, IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e, + IOMMUFD_CMD_VIOMMU_ALLOC = 0x8f, };
/** @@ -876,4 +877,33 @@ struct iommu_fault_alloc { __u32 out_fault_fd; }; #define IOMMU_FAULT_QUEUE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_FAULT_QUEUE_ALLOC) + +/** + * enum iommu_viommu_type - Virtual IOMMU Type + * @IOMMU_VIOMMU_TYPE_DEFAULT: Core-managed VIOMMU type + */ +enum iommu_viommu_type { + IOMMU_VIOMMU_TYPE_DEFAULT, +}; + +/** + * struct iommu_viommu_alloc - ioctl(IOMMU_VIOMMU_ALLOC) + * @size: sizeof(struct iommu_viommu_alloc) + * @flags: Must be 0 + * @type: Type of the virtual IOMMU. Must be defined in enum iommu_viommu_type + * @dev_id: The device to allocate this virtual IOMMU for + * @hwpt_id: ID of a nesting parent HWPT to associate to + * @out_viommu_id: Output virtual IOMMU ID for the allocated object + * + * Allocate a virtual IOMMU object that holds a (shared) nesting parent HWPT + */ +struct iommu_viommu_alloc { + __u32 size; + __u32 flags; + __u32 type; + __u32 dev_id; + __u32 hwpt_id; + __u32 out_viommu_id; +}; +#define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC) #endif
On Wed, Aug 07, 2024 at 01:10:42PM -0700, Nicolin Chen wrote:
/** @@ -876,4 +877,33 @@ struct iommu_fault_alloc { __u32 out_fault_fd; }; #define IOMMU_FAULT_QUEUE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_FAULT_QUEUE_ALLOC)
+/**
- enum iommu_viommu_type - Virtual IOMMU Type
- @IOMMU_VIOMMU_TYPE_DEFAULT: Core-managed VIOMMU type
- */
+enum iommu_viommu_type {
IOMMU_VIOMMU_TYPE_DEFAULT,
This needs a "= 0x0". I fixed locally along with others in this series.
Nicolin
On Wed, Aug 07, 2024 at 01:10:42PM -0700, Nicolin Chen wrote:
+int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) +{
- struct iommu_viommu_alloc *cmd = ucmd->cmd;
- struct iommufd_hwpt_paging *hwpt_paging;
- struct iommufd_viommu *viommu;
- struct iommufd_device *idev;
- int rc;
- if (cmd->flags)
return -EOPNOTSUPP;
- idev = iommufd_get_device(ucmd, cmd->dev_id);
- if (IS_ERR(idev))
return PTR_ERR(idev);
- hwpt_paging = iommufd_get_hwpt_paging(ucmd, cmd->hwpt_id);
- if (IS_ERR(hwpt_paging)) {
rc = PTR_ERR(hwpt_paging);
goto out_put_idev;
- }
- if (!hwpt_paging->nest_parent) {
rc = -EINVAL;
goto out_put_hwpt;
- }
- if (cmd->type != IOMMU_VIOMMU_TYPE_DEFAULT) {
rc = -EOPNOTSUPP;
goto out_put_hwpt;
- }
- viommu = iommufd_object_alloc(ucmd->ictx, viommu, IOMMUFD_OBJ_VIOMMU);
- if (IS_ERR(viommu)) {
rc = PTR_ERR(viommu);
goto out_put_hwpt;
- }
- viommu->type = cmd->type;
- viommu->ictx = ucmd->ictx;
- viommu->hwpt = hwpt_paging;
- viommu->iommu_dev = idev->dev->iommu->iommu_dev;
Pedantically this is troublesome because we don't have any lifetime control on this pointer.
iommu unplug is fairly troubled on real HW, but the selftest does do it.
At least for this series the value isn't used so lets remove it.
I don't have an easy solution in mind though later as surely we will need this when we start to create more iommu bound objects. I'm pretty sure syzkaller would eventually find such a UAF using the iommufd selftest framework.
Jason
On Thu, Aug 15, 2024 at 03:11:17PM -0300, Jason Gunthorpe wrote:
On Wed, Aug 07, 2024 at 01:10:42PM -0700, Nicolin Chen wrote:
+int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) +{
- struct iommu_viommu_alloc *cmd = ucmd->cmd;
- struct iommufd_hwpt_paging *hwpt_paging;
- struct iommufd_viommu *viommu;
- struct iommufd_device *idev;
- int rc;
- if (cmd->flags)
return -EOPNOTSUPP;
- idev = iommufd_get_device(ucmd, cmd->dev_id);
- if (IS_ERR(idev))
return PTR_ERR(idev);
- hwpt_paging = iommufd_get_hwpt_paging(ucmd, cmd->hwpt_id);
- if (IS_ERR(hwpt_paging)) {
rc = PTR_ERR(hwpt_paging);
goto out_put_idev;
- }
- if (!hwpt_paging->nest_parent) {
rc = -EINVAL;
goto out_put_hwpt;
- }
- if (cmd->type != IOMMU_VIOMMU_TYPE_DEFAULT) {
rc = -EOPNOTSUPP;
goto out_put_hwpt;
- }
- viommu = iommufd_object_alloc(ucmd->ictx, viommu, IOMMUFD_OBJ_VIOMMU);
- if (IS_ERR(viommu)) {
rc = PTR_ERR(viommu);
goto out_put_hwpt;
- }
- viommu->type = cmd->type;
- viommu->ictx = ucmd->ictx;
- viommu->hwpt = hwpt_paging;
- viommu->iommu_dev = idev->dev->iommu->iommu_dev;
Pedantically this is troublesome because we don't have any lifetime control on this pointer.
iommu unplug is fairly troubled on real HW, but the selftest does do it.
At least for this series the value isn't used so lets remove it.
I recall one of my local versions had a validation using that, but not that crucial either. Will drop it.
I don't have an easy solution in mind though later as surely we will need this when we start to create more iommu bound objects. I'm pretty sure syzkaller would eventually find such a UAF using the iommufd selftest framework.
Would adding a user count in struct iommu_device help?
Thanks Nicolin
On Thu, Aug 15, 2024 at 11:20:35AM -0700, Nicolin Chen wrote:
I don't have an easy solution in mind though later as surely we will need this when we start to create more iommu bound objects. I'm pretty sure syzkaller would eventually find such a UAF using the iommufd selftest framework.
Would adding a user count in struct iommu_device help?
Yeah, maybe something like that. It is a real corner case though.
Jason
On Wed, Aug 07, 2024 at 01:10:42PM -0700, Nicolin Chen wrote:
@@ -876,4 +877,33 @@ struct iommu_fault_alloc { __u32 out_fault_fd; }; #define IOMMU_FAULT_QUEUE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_FAULT_QUEUE_ALLOC)
+/**
- enum iommu_viommu_type - Virtual IOMMU Type
- @IOMMU_VIOMMU_TYPE_DEFAULT: Core-managed VIOMMU type
- */
+enum iommu_viommu_type {
- IOMMU_VIOMMU_TYPE_DEFAULT,
= 0 here now
Jason
With a viommu object wrapping a potentially shareable S2 domain, a nested domain should be allocated by associating to a viommu instead. Driver can store this viommu pointer somewhere, so as to later use it calling viommu helpers for virtual device ID lookup and viommu invalidation.
For drivers without a viommu support, keep the parent domain input, which should be just viommu->hwpt->common.domain otherwise.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/amd/iommu.c | 1 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 + drivers/iommu/intel/iommu.c | 1 + drivers/iommu/iommufd/hw_pagetable.c | 5 +++-- drivers/iommu/iommufd/selftest.c | 1 + include/linux/iommu.h | 2 ++ 6 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index b19e8c0f48fa..e31f7a5fc650 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -2432,6 +2432,7 @@ static struct iommu_domain *amd_iommu_domain_alloc(unsigned int type) static struct iommu_domain * amd_iommu_domain_alloc_user(struct device *dev, u32 flags, struct iommu_domain *parent, + struct iommufd_viommu *viommu, const struct iommu_user_data *user_data)
{ diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 3962d5a55519..558cf3bb24e0 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3403,6 +3403,7 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, static struct iommu_domain * arm_smmu_domain_alloc_user(struct device *dev, u32 flags, struct iommu_domain *parent, + struct iommufd_viommu *viommu, const struct iommu_user_data *user_data) { struct arm_smmu_master *master = dev_iommu_priv_get(dev); diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 9ff8b83c19a3..0590528799d8 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -3703,6 +3703,7 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) static struct iommu_domain * intel_iommu_domain_alloc_user(struct device *dev, u32 flags, struct iommu_domain *parent, + struct iommufd_viommu *viommu, const struct iommu_user_data *user_data) { struct device_domain_info *info = dev_iommu_priv_get(dev); diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index aefde4443671..c21bb59c4022 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -137,7 +137,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
if (ops->domain_alloc_user) { hwpt->domain = ops->domain_alloc_user(idev->dev, flags, NULL, - user_data); + NULL, user_data); if (IS_ERR(hwpt->domain)) { rc = PTR_ERR(hwpt->domain); hwpt->domain = NULL; @@ -239,7 +239,8 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
hwpt->domain = ops->domain_alloc_user(idev->dev, flags & ~IOMMU_HWPT_FAULT_ID_VALID, - parent->common.domain, user_data); + parent->common.domain, + NULL, user_data); if (IS_ERR(hwpt->domain)) { rc = PTR_ERR(hwpt->domain); hwpt->domain = NULL; diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 642ae135ada9..a165b162c88f 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -320,6 +320,7 @@ __mock_domain_alloc_nested(struct mock_iommu_domain *mock_parent, static struct iommu_domain * mock_domain_alloc_user(struct device *dev, u32 flags, struct iommu_domain *parent, + struct iommufd_viommu *viommu, const struct iommu_user_data *user_data) { struct mock_iommu_domain *mock_parent; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 20a40fd6c4ba..0b71b2b24ede 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -42,6 +42,7 @@ struct notifier_block; struct iommu_sva; struct iommu_dma_cookie; struct iommu_fault_param; +struct iommufd_viommu;
#define IOMMU_FAULT_PERM_READ (1 << 0) /* read */ #define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */ @@ -611,6 +612,7 @@ struct iommu_ops { struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); struct iommu_domain *(*domain_alloc_user)( struct device *dev, u32 flags, struct iommu_domain *parent, + struct iommufd_viommu *viommu, const struct iommu_user_data *user_data); struct iommu_domain *(*domain_alloc_paging)(struct device *dev); struct iommu_domain *(*domain_alloc_sva)(struct device *dev,
Now a VIOMMU can wrap a shareable nested parent HWPT. So, it can act like a nested parent HWPT to allocate a nested HWPT.
Support that in the IOMMU_HWPT_ALLOC ioctl handler, and update its kdoc.
Also, associate a viommu to an allocating nested HWPT.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/hw_pagetable.c | 24 ++++++++++++++++++++++-- drivers/iommu/iommufd/iommufd_private.h | 1 + include/uapi/linux/iommufd.h | 12 ++++++------ 3 files changed, 29 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index c21bb59c4022..06adbcc304bc 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -57,6 +57,9 @@ void iommufd_hwpt_nested_destroy(struct iommufd_object *obj) container_of(obj, struct iommufd_hwpt_nested, common.obj);
__iommufd_hwpt_destroy(&hwpt_nested->common); + + if (hwpt_nested->viommu) + refcount_dec(&hwpt_nested->viommu->obj.users); refcount_dec(&hwpt_nested->parent->common.obj.users); }
@@ -213,6 +216,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, */ static struct iommufd_hwpt_nested * iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, + struct iommufd_viommu *viommu, struct iommufd_hwpt_paging *parent, struct iommufd_device *idev, u32 flags, const struct iommu_user_data *user_data) @@ -234,13 +238,16 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, return ERR_CAST(hwpt_nested); hwpt = &hwpt_nested->common;
+ if (viommu) + refcount_inc(&viommu->obj.users); + hwpt_nested->viommu = viommu; refcount_inc(&parent->common.obj.users); hwpt_nested->parent = parent;
hwpt->domain = ops->domain_alloc_user(idev->dev, flags & ~IOMMU_HWPT_FAULT_ID_VALID, parent->common.domain, - NULL, user_data); + viommu, user_data); if (IS_ERR(hwpt->domain)) { rc = PTR_ERR(hwpt->domain); hwpt->domain = NULL; @@ -307,7 +314,7 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) struct iommufd_hwpt_nested *hwpt_nested;
hwpt_nested = iommufd_hwpt_nested_alloc( - ucmd->ictx, + ucmd->ictx, NULL, container_of(pt_obj, struct iommufd_hwpt_paging, common.obj), idev, cmd->flags, &user_data); @@ -316,6 +323,19 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) goto out_unlock; } hwpt = &hwpt_nested->common; + } else if (pt_obj->type == IOMMUFD_OBJ_VIOMMU) { + struct iommufd_hwpt_nested *hwpt_nested; + struct iommufd_viommu *viommu; + + viommu = container_of(pt_obj, struct iommufd_viommu, obj); + hwpt_nested = iommufd_hwpt_nested_alloc( + ucmd->ictx, viommu, viommu->hwpt, idev, + cmd->flags, &user_data); + if (IS_ERR(hwpt_nested)) { + rc = PTR_ERR(hwpt_nested); + goto out_unlock; + } + hwpt = &hwpt_nested->common; } else { rc = -EINVAL; goto out_put_pt; diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 0e3755408911..443575fd3dd4 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -313,6 +313,7 @@ struct iommufd_hwpt_paging { struct iommufd_hwpt_nested { struct iommufd_hw_pagetable common; struct iommufd_hwpt_paging *parent; + struct iommufd_viommu *viommu; };
static inline bool hwpt_is_paging(struct iommufd_hw_pagetable *hwpt) diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index ed6cdc2e0be1..0e384331a9c8 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -430,7 +430,7 @@ enum iommu_hwpt_data_type { * @size: sizeof(struct iommu_hwpt_alloc) * @flags: Combination of enum iommufd_hwpt_alloc_flags * @dev_id: The device to allocate this HWPT for - * @pt_id: The IOAS or HWPT to connect this HWPT to + * @pt_id: The IOAS or HWPT or VIOMMU to connect this HWPT to * @out_hwpt_id: The ID of the new HWPT * @__reserved: Must be 0 * @data_type: One of enum iommu_hwpt_data_type @@ -449,11 +449,11 @@ enum iommu_hwpt_data_type { * IOMMU_HWPT_DATA_NONE. The HWPT can be allocated as a parent HWPT for a * nesting configuration by passing IOMMU_HWPT_ALLOC_NEST_PARENT via @flags. * - * A user-managed nested HWPT will be created from a given parent HWPT via - * @pt_id, in which the parent HWPT must be allocated previously via the - * same ioctl from a given IOAS (@pt_id). In this case, the @data_type - * must be set to a pre-defined type corresponding to an I/O page table - * type supported by the underlying IOMMU hardware. + * A user-managed nested HWPT will be created from a given VIOMMU (wrapping a + * parent HWPT) or a parent HWPT via @pt_id, in which the parent HWPT must be + * allocated previously via the same ioctl from a given IOAS (@pt_id). In this + * case, the @data_type must be set to a pre-defined type corresponding to an + * I/O page table type supported by the underlying IOMMU hardware. * * If the @data_type is set to IOMMU_HWPT_DATA_NONE, @data_len and * @data_uptr should be zero. Otherwise, both @data_len and @data_uptr
Use IOMMU_VIOMMU_TYPE_DEFAULT to cover the new IOMMU_VIOMMU_ALLOC ioctl.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- tools/testing/selftests/iommu/iommufd.c | 35 +++++++++++++++++++ tools/testing/selftests/iommu/iommufd_utils.h | 28 +++++++++++++++ 2 files changed, 63 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 6343f4053bd4..5c770e94f299 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -554,6 +554,41 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested) } }
+TEST_F(iommufd_ioas, viommu_default) +{ + uint32_t dev_id = self->device_id; + uint32_t viommu_id = 0; + uint32_t hwpt_id = 0; + + if (dev_id) { + /* Negative test -- invalid hwpt */ + test_err_viommu_alloc(ENOENT, dev_id, hwpt_id, + IOMMU_VIOMMU_TYPE_DEFAULT, &viommu_id); + + /* Negative test -- not a nested parent hwpt */ + test_cmd_hwpt_alloc(dev_id, self->ioas_id, 0, &hwpt_id); + test_err_viommu_alloc(EINVAL, dev_id, hwpt_id, + IOMMU_VIOMMU_TYPE_DEFAULT, &viommu_id); + test_ioctl_destroy(hwpt_id); + + /* Allocate a nested parent HWP */ + test_cmd_hwpt_alloc(dev_id, self->ioas_id, + IOMMU_HWPT_ALLOC_NEST_PARENT, + &hwpt_id); + /* Negative test -- unsupported viommu type */ + test_err_viommu_alloc(EOPNOTSUPP, dev_id, hwpt_id, + 0xdead, &viommu_id); + /* Allocate a default type of viommu */ + test_cmd_viommu_alloc(dev_id, hwpt_id, + IOMMU_VIOMMU_TYPE_DEFAULT, &viommu_id); + test_ioctl_destroy(viommu_id); + test_ioctl_destroy(hwpt_id); + } else { + test_err_viommu_alloc(ENOENT, dev_id, hwpt_id, + IOMMU_VIOMMU_TYPE_DEFAULT, &viommu_id); + } +} + TEST_F(iommufd_ioas, hwpt_attach) { /* Create a device attached directly to a hwpt */ diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index bc4556a48d82..810783d31ca5 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -771,3 +771,31 @@ static int _test_cmd_trigger_iopf(int fd, __u32 device_id, __u32 fault_fd)
#define test_cmd_trigger_iopf(device_id, fault_fd) \ ASSERT_EQ(0, _test_cmd_trigger_iopf(self->fd, device_id, fault_fd)) + +static int _test_cmd_viommu_alloc(int fd, __u32 device_id, __u32 hwpt_id, + __u32 type, __u32 flags, __u32 *viommu_id) +{ + struct iommu_viommu_alloc cmd = { + .size = sizeof(cmd), + .flags = flags, + .type = type, + .dev_id = device_id, + .hwpt_id = hwpt_id, + }; + int ret; + + ret = ioctl(fd, IOMMU_VIOMMU_ALLOC, &cmd); + if (ret) + return ret; + if (viommu_id) + *viommu_id = cmd.out_viommu_id; + return 0; +} + +#define test_cmd_viommu_alloc(device_id, hwpt_id, type, viommu_id) \ + ASSERT_EQ(0, _test_cmd_viommu_alloc(self->fd, device_id, hwpt_id, \ + type, 0, viommu_id)) +#define test_err_viommu_alloc(_errno, device_id, hwpt_id, type, viommu_id) \ + EXPECT_ERRNO(_errno, _test_cmd_viommu_alloc(self->fd, device_id, \ + hwpt_id, type, 0, \ + viommu_id))
Introduce a pair of new ioctls to set/unset a per-viommu virtual device id that should be linked to a physical device id via a struct device pointer.
Continue the support IOMMU_VIOMMU_TYPE_DEFAULT for a core-managed viommu. Provide a lookup function for drivers to load device pointer by a virtual device id.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/device.c | 9 ++ drivers/iommu/iommufd/iommufd_private.h | 20 ++++ drivers/iommu/iommufd/main.c | 6 ++ drivers/iommu/iommufd/viommu.c | 118 ++++++++++++++++++++++++ include/uapi/linux/iommufd.h | 40 ++++++++ 5 files changed, 193 insertions(+)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 5fd3dd420290..ed29bc606f5e 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -135,7 +135,14 @@ void iommufd_device_destroy(struct iommufd_object *obj) { struct iommufd_device *idev = container_of(obj, struct iommufd_device, obj); + struct iommufd_vdev_id *vdev_id, *curr;
+ list_for_each_entry(vdev_id, &idev->vdev_id_list, idev_item) { + curr = xa_cmpxchg(&vdev_id->viommu->vdev_ids, vdev_id->vdev_id, + vdev_id, NULL, GFP_KERNEL); + WARN_ON(curr != vdev_id); + kfree(vdev_id); + } iommu_device_release_dma_owner(idev->dev); iommufd_put_group(idev->igroup); if (!iommufd_selftest_is_mock_dev(idev->dev)) @@ -217,6 +224,8 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, idev->igroup = igroup; mutex_init(&idev->iopf_lock);
+ INIT_LIST_HEAD(&idev->vdev_id_list); + /* * If the caller fails after this success it must call * iommufd_unbind_device() which is safe since we hold this refcount. diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 443575fd3dd4..10c63972b9ab 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -417,6 +417,7 @@ struct iommufd_device { struct iommufd_ctx *ictx; struct iommufd_group *igroup; struct list_head group_item; + struct list_head vdev_id_list; /* always the physical device */ struct device *dev; bool enforce_cache_coherency; @@ -533,12 +534,31 @@ struct iommufd_viommu { struct iommufd_ctx *ictx; struct iommu_device *iommu_dev; struct iommufd_hwpt_paging *hwpt; + struct xarray vdev_ids;
unsigned int type; };
+struct iommufd_vdev_id { + struct iommufd_viommu *viommu; + struct device *dev; + u64 vdev_id; + + struct list_head idev_item; +}; + +static inline struct iommufd_viommu * +iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id) +{ + return container_of(iommufd_get_object(ucmd->ictx, id, + IOMMUFD_OBJ_VIOMMU), + struct iommufd_viommu, obj); +} + int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); void iommufd_viommu_destroy(struct iommufd_object *obj); +int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd); +int iommufd_viommu_unset_vdev_id(struct iommufd_ucmd *ucmd);
#ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 288ee51b6829..199ad90fa36b 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -334,6 +334,8 @@ union ucmd_buffer { struct iommu_option option; struct iommu_vfio_ioas vfio_ioas; struct iommu_viommu_alloc viommu; + struct iommu_viommu_set_vdev_id set_vdev_id; + struct iommu_viommu_unset_vdev_id unset_vdev_id; #ifdef CONFIG_IOMMUFD_TEST struct iommu_test_cmd test; #endif @@ -387,6 +389,10 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { __reserved), IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, struct iommu_viommu_alloc, out_viommu_id), + IOCTL_OP(IOMMU_VIOMMU_SET_VDEV_ID, iommufd_viommu_set_vdev_id, + struct iommu_viommu_set_vdev_id, vdev_id), + IOCTL_OP(IOMMU_VIOMMU_UNSET_VDEV_ID, iommufd_viommu_unset_vdev_id, + struct iommu_viommu_unset_vdev_id, vdev_id), #ifdef CONFIG_IOMMUFD_TEST IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), #endif diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 35ad6a77c9c1..05a688a471db 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -10,7 +10,14 @@ void iommufd_viommu_destroy(struct iommufd_object *obj) { struct iommufd_viommu *viommu = container_of(obj, struct iommufd_viommu, obj); + struct iommufd_vdev_id *vdev_id; + unsigned long index;
+ xa_for_each(&viommu->vdev_ids, index, vdev_id) { + list_del(&vdev_id->idev_item); + kfree(vdev_id); + } + xa_destroy(&viommu->vdev_ids); refcount_dec(&viommu->hwpt->common.obj.users); }
@@ -73,3 +80,114 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) iommufd_put_object(ucmd->ictx, &idev->obj); return rc; } + +int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd) +{ + struct iommu_viommu_set_vdev_id *cmd = ucmd->cmd; + struct iommufd_hwpt_nested *hwpt_nested; + struct iommufd_vdev_id *vdev_id, *curr; + struct iommufd_hw_pagetable *hwpt; + struct iommufd_viommu *viommu; + struct iommufd_device *idev; + int rc = 0; + + if (cmd->vdev_id > ULONG_MAX) + return -EINVAL; + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + hwpt = idev->igroup->hwpt; + + if (hwpt == NULL || hwpt->obj.type != IOMMUFD_OBJ_HWPT_NESTED) { + rc = -EINVAL; + goto out_put_idev; + } + hwpt_nested = container_of(hwpt, struct iommufd_hwpt_nested, common); + + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); + if (IS_ERR(viommu)) { + rc = PTR_ERR(viommu); + goto out_put_idev; + } + + if (hwpt_nested->viommu != viommu) { + rc = -EINVAL; + goto out_put_viommu; + } + + vdev_id = kzalloc(sizeof(*vdev_id), GFP_KERNEL); + if (IS_ERR(vdev_id)) { + rc = PTR_ERR(vdev_id); + goto out_put_viommu; + } + + vdev_id->viommu = viommu; + vdev_id->dev = idev->dev; + vdev_id->vdev_id = cmd->vdev_id; + + curr = xa_cmpxchg(&viommu->vdev_ids, cmd->vdev_id, + NULL, vdev_id, GFP_KERNEL); + if (curr) { + rc = xa_err(curr) ? : -EBUSY; + goto out_free_vdev_id; + } + + list_add_tail(&vdev_id->idev_item, &idev->vdev_id_list); + goto out_put_viommu; + +out_free_vdev_id: + kfree(vdev_id); +out_put_viommu: + iommufd_put_object(ucmd->ictx, &viommu->obj); +out_put_idev: + iommufd_put_object(ucmd->ictx, &idev->obj); + return rc; +} + +static struct device * +iommufd_viommu_find_device(struct iommufd_viommu *viommu, u64 id) +{ + struct iommufd_vdev_id *vdev_id; + + xa_lock(&viommu->vdev_ids); + vdev_id = xa_load(&viommu->vdev_ids, (unsigned long)id); + xa_unlock(&viommu->vdev_ids); + if (!vdev_id || vdev_id->vdev_id != id) + return NULL; + return vdev_id->dev; +} + +int iommufd_viommu_unset_vdev_id(struct iommufd_ucmd *ucmd) +{ + struct iommu_viommu_unset_vdev_id *cmd = ucmd->cmd; + struct iommufd_vdev_id *vdev_id; + struct iommufd_viommu *viommu; + struct iommufd_device *idev; + int rc = 0; + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); + if (IS_ERR(viommu)) { + rc = PTR_ERR(viommu); + goto out_put_idev; + } + + if (idev->dev != iommufd_viommu_find_device(viommu, cmd->vdev_id)) { + rc = -EINVAL; + goto out_put_viommu; + } + + vdev_id = xa_erase(&viommu->vdev_ids, cmd->vdev_id); + list_del(&vdev_id->idev_item); + kfree(vdev_id); + +out_put_viommu: + iommufd_put_object(ucmd->ictx, &viommu->obj); +out_put_idev: + iommufd_put_object(ucmd->ictx, &idev->obj); + return rc; +} diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 0e384331a9c8..d5e72682ba57 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -52,6 +52,8 @@ enum { IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d, IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e, IOMMUFD_CMD_VIOMMU_ALLOC = 0x8f, + IOMMUFD_CMD_VIOMMU_SET_VDEV_ID = 0x90, + IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID = 0x91, };
/** @@ -906,4 +908,42 @@ struct iommu_viommu_alloc { __u32 out_viommu_id; }; #define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC) + +/** + * struct iommu_viommu_set_vdev_id - ioctl(IOMMU_VIOMMU_SET_VDEV_ID) + * @size: sizeof(struct iommu_viommu_set_vdev_id) + * @viommu_id: viommu ID to associate with the device to store its virtual ID + * @dev_id: device ID to set its virtual ID + * @__reserved: Must be 0 + * @vdev_id: Virtual device ID + * + * Set a viommu-specific virtual ID of a device + */ +struct iommu_viommu_set_vdev_id { + __u32 size; + __u32 viommu_id; + __u32 dev_id; + __u32 __reserved; + __aligned_u64 vdev_id; +}; +#define IOMMU_VIOMMU_SET_VDEV_ID _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_SET_VDEV_ID) + +/** + * struct iommu_viommu_unset_vdev_id - ioctl(IOMMU_VIOMMU_UNSET_VDEV_ID) + * @size: sizeof(struct iommu_viommu_unset_vdev_id) + * @viommu_id: viommu ID associated with the device to delete its virtual ID + * @dev_id: device ID to unset its virtual ID + * @__reserved: Must be 0 + * @vdev_id: Virtual device ID (for verification) + * + * Unset a viommu-specific virtual ID of a device + */ +struct iommu_viommu_unset_vdev_id { + __u32 size; + __u32 viommu_id; + __u32 dev_id; + __u32 __reserved; + __aligned_u64 vdev_id; +}; +#define IOMMU_VIOMMU_UNSET_VDEV_ID _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID) #endif
On Wed, Aug 07, 2024 at 01:10:46PM -0700, Nicolin Chen wrote:
@@ -135,7 +135,14 @@ void iommufd_device_destroy(struct iommufd_object *obj) { struct iommufd_device *idev = container_of(obj, struct iommufd_device, obj);
struct iommufd_vdev_id *vdev_id, *curr;
list_for_each_entry(vdev_id, &idev->vdev_id_list, idev_item) {
curr = xa_cmpxchg(&vdev_id->viommu->vdev_ids, vdev_id->vdev_id,
vdev_id, NULL, GFP_KERNEL);
WARN_ON(curr != vdev_id);
kfree(vdev_id);
}
Kevin already pointed out previously during the RFC review that we probably should do one vdev_id per idev. And Jason expressed okay to either way. I didn't plan to change this part until this week for the VIRQ series.
My rethinking is that an idev is attached to one (and only one) nested HWPT. The nested HWPT is associated to one (and only one) VIOMMU object. So, it's unlikely we can a second vdev_id, i.e. idev->vdev_id is enough.
This helps us to build a device-based virq report function: +void iommufd_device_report_virq(struct device *dev, unsigned int data_type, + void *data_ptr, size_t data_len);
I built a link from device to viommu reusing Baolu's work: struct device -> struct iommu_group -> struct iommu_attach_handle -> struct iommufd_attach_handle -> struct iommufd_device (idev) -> struct iommufd_vdev_id (idev->vdev_id)
The vdev_id struct holds viommu and virtual ID, so allowing us to add another two helpers: +struct iommufd_viommu *iommufd_device_get_viommu(struct device *dev); +u64 iommufd_device_get_virtual_id(struct device *dev);
A driver that reports event/irq per device can use these helpers to report virq via the core-managed VIOMMU object. (If a driver has some non-per-device type of IRQs, it would have to allocate a driver-managed VIOMMU object instead.)
I have both a revised VIOMMU series and a new VIRQ series ready. Will send in the following days after some testing/polishing.
Thanks Nicolin
On Wed, Aug 14, 2024 at 10:09:22AM -0700, Nicolin Chen wrote:
This helps us to build a device-based virq report function: +void iommufd_device_report_virq(struct device *dev, unsigned int data_type,
void *data_ptr, size_t data_len);
I built a link from device to viommu reusing Baolu's work: struct device -> struct iommu_group -> struct iommu_attach_handle -> struct iommufd_attach_handle -> struct iommufd_device (idev) -> struct iommufd_vdev_id (idev->vdev_id)
That makes sense, the vdev id would be 1:1 with the struct device, and the iommufd_device is also supposed to be 1:1 with the struct device.
Jason
On Wed, Aug 07, 2024 at 01:10:46PM -0700, Nicolin Chen wrote:
+int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd) +{
- struct iommu_viommu_set_vdev_id *cmd = ucmd->cmd;
- struct iommufd_hwpt_nested *hwpt_nested;
- struct iommufd_vdev_id *vdev_id, *curr;
- struct iommufd_hw_pagetable *hwpt;
- struct iommufd_viommu *viommu;
- struct iommufd_device *idev;
- int rc = 0;
- if (cmd->vdev_id > ULONG_MAX)
return -EINVAL;
- idev = iommufd_get_device(ucmd, cmd->dev_id);
- if (IS_ERR(idev))
return PTR_ERR(idev);
- hwpt = idev->igroup->hwpt;
- if (hwpt == NULL || hwpt->obj.type != IOMMUFD_OBJ_HWPT_NESTED) {
rc = -EINVAL;
goto out_put_idev;
- }
- hwpt_nested = container_of(hwpt, struct iommufd_hwpt_nested, common);
This doesn't seem like a necessary check, the attached hwpt can change after this is established, so this can't be an invariant we enforce.
If you want to do 1:1 then somehow directly check if the idev is already linked to a viommu.
+static struct device * +iommufd_viommu_find_device(struct iommufd_viommu *viommu, u64 id) +{
- struct iommufd_vdev_id *vdev_id;
- xa_lock(&viommu->vdev_ids);
- vdev_id = xa_load(&viommu->vdev_ids, (unsigned long)id);
- xa_unlock(&viommu->vdev_ids);
This lock doesn't do anything
- if (!vdev_id || vdev_id->vdev_id != id)
return NULL;
And this is unlocked
- return vdev_id->dev;
+}
This isn't good.. We can't return the struct device pointer here as there is no locking for it anymore. We can't even know it is still probed to VFIO anymore.
It has to work by having the iommu driver directly access the xarray and the entirely under the spinlock the iommu driver can translate the vSID to the pSID and the let go and push the invalidation to HW. No races.
+int iommufd_viommu_unset_vdev_id(struct iommufd_ucmd *ucmd) +{
- struct iommu_viommu_unset_vdev_id *cmd = ucmd->cmd;
- struct iommufd_vdev_id *vdev_id;
- struct iommufd_viommu *viommu;
- struct iommufd_device *idev;
- int rc = 0;
- idev = iommufd_get_device(ucmd, cmd->dev_id);
- if (IS_ERR(idev))
return PTR_ERR(idev);
- viommu = iommufd_get_viommu(ucmd, cmd->viommu_id);
- if (IS_ERR(viommu)) {
rc = PTR_ERR(viommu);
goto out_put_idev;
- }
- if (idev->dev != iommufd_viommu_find_device(viommu, cmd->vdev_id)) {
Swap the order around != to be more kernely
rc = -EINVAL;
goto out_put_viommu;
- }
- vdev_id = xa_erase(&viommu->vdev_ids, cmd->vdev_id);
And this whole thing needs to be done under the xa_lock too.
xa_lock(&viommu->vdev_ids); vdev_id = xa_load(&viommu->vdev_ids, cmd->vdev_id); if (!vdev_id || vdev_id->vdev_id != cmd->vdev_id (????) || vdev_id->dev != idev->dev) err __xa_erase(&viommu->vdev_ids, cmd->vdev_id); xa_unlock((&viommu->vdev_ids);
Jason
On Thu, Aug 15, 2024 at 04:08:48PM -0300, Jason Gunthorpe wrote:
On Wed, Aug 07, 2024 at 01:10:46PM -0700, Nicolin Chen wrote:
+int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd) +{
- struct iommu_viommu_set_vdev_id *cmd = ucmd->cmd;
- struct iommufd_hwpt_nested *hwpt_nested;
- struct iommufd_vdev_id *vdev_id, *curr;
- struct iommufd_hw_pagetable *hwpt;
- struct iommufd_viommu *viommu;
- struct iommufd_device *idev;
- int rc = 0;
- if (cmd->vdev_id > ULONG_MAX)
return -EINVAL;
- idev = iommufd_get_device(ucmd, cmd->dev_id);
- if (IS_ERR(idev))
return PTR_ERR(idev);
- hwpt = idev->igroup->hwpt;
- if (hwpt == NULL || hwpt->obj.type != IOMMUFD_OBJ_HWPT_NESTED) {
rc = -EINVAL;
goto out_put_idev;
- }
- hwpt_nested = container_of(hwpt, struct iommufd_hwpt_nested, common);
This doesn't seem like a necessary check, the attached hwpt can change after this is established, so this can't be an invariant we enforce.
If you want to do 1:1 then somehow directly check if the idev is already linked to a viommu.
But idev can't link to a viommu without a proxy hwpt_nested? Even the stage-2 only configuration should have an identity hwpt_nested right?
+static struct device * +iommufd_viommu_find_device(struct iommufd_viommu *viommu, u64 id) +{
- struct iommufd_vdev_id *vdev_id;
- xa_lock(&viommu->vdev_ids);
- vdev_id = xa_load(&viommu->vdev_ids, (unsigned long)id);
- xa_unlock(&viommu->vdev_ids);
This lock doesn't do anything
- if (!vdev_id || vdev_id->vdev_id != id)
return NULL;
And this is unlocked
- return vdev_id->dev;
+}
This isn't good.. We can't return the struct device pointer here as there is no locking for it anymore. We can't even know it is still probed to VFIO anymore.
It has to work by having the iommu driver directly access the xarray and the entirely under the spinlock the iommu driver can translate the vSID to the pSID and the let go and push the invalidation to HW. No races.
Maybe the iommufd_viommu_invalidate ioctl handler should hold that xa_lock around the viommu->ops->cache_invalidate, and then add lock assert in iommufd_viommu_find_device?
+int iommufd_viommu_unset_vdev_id(struct iommufd_ucmd *ucmd) +{
- struct iommu_viommu_unset_vdev_id *cmd = ucmd->cmd;
- struct iommufd_vdev_id *vdev_id;
- struct iommufd_viommu *viommu;
- struct iommufd_device *idev;
- int rc = 0;
- idev = iommufd_get_device(ucmd, cmd->dev_id);
- if (IS_ERR(idev))
return PTR_ERR(idev);
- viommu = iommufd_get_viommu(ucmd, cmd->viommu_id);
- if (IS_ERR(viommu)) {
rc = PTR_ERR(viommu);
goto out_put_idev;
- }
- if (idev->dev != iommufd_viommu_find_device(viommu, cmd->vdev_id)) {
Swap the order around != to be more kernely
Ack.
rc = -EINVAL;
goto out_put_viommu;
- }
- vdev_id = xa_erase(&viommu->vdev_ids, cmd->vdev_id);
And this whole thing needs to be done under the xa_lock too.
xa_lock(&viommu->vdev_ids); vdev_id = xa_load(&viommu->vdev_ids, cmd->vdev_id); if (!vdev_id || vdev_id->vdev_id != cmd->vdev_id (????) || vdev_id->dev != idev->dev) err __xa_erase(&viommu->vdev_ids, cmd->vdev_id); xa_unlock((&viommu->vdev_ids);
I've changed to xa_cmpxchg() in my local tree. Would it be simpler?
Thanks Nicolin
On Thu, Aug 15, 2024 at 12:46:29PM -0700, Nicolin Chen wrote:
+static struct device * +iommufd_viommu_find_device(struct iommufd_viommu *viommu, u64 id) +{
- struct iommufd_vdev_id *vdev_id;
- xa_lock(&viommu->vdev_ids);
- vdev_id = xa_load(&viommu->vdev_ids, (unsigned long)id);
- xa_unlock(&viommu->vdev_ids);
This lock doesn't do anything
- if (!vdev_id || vdev_id->vdev_id != id)
return NULL;
And this is unlocked
- return vdev_id->dev;
+}
This isn't good.. We can't return the struct device pointer here as there is no locking for it anymore. We can't even know it is still probed to VFIO anymore.
It has to work by having the iommu driver directly access the xarray and the entirely under the spinlock the iommu driver can translate the vSID to the pSID and the let go and push the invalidation to HW. No races.
Maybe the iommufd_viommu_invalidate ioctl handler should hold that xa_lock around the viommu->ops->cache_invalidate, and then add lock assert in iommufd_viommu_find_device?
xa_lock/spinlock might be too heavy. We can have a mutex to wrap around viommu ioctl handlers..
On Thu, Aug 15, 2024 at 12:53:04PM -0700, Nicolin Chen wrote:
Maybe the iommufd_viommu_invalidate ioctl handler should hold that xa_lock around the viommu->ops->cache_invalidate, and then add lock assert in iommufd_viommu_find_device?
xa_lock/spinlock might be too heavy. We can have a mutex to wrap around viommu ioctl handlers..
A rw semaphore might be reasonable.
Jason
On Thu, Aug 15, 2024 at 12:46:24PM -0700, Nicolin Chen wrote:
On Thu, Aug 15, 2024 at 04:08:48PM -0300, Jason Gunthorpe wrote:
On Wed, Aug 07, 2024 at 01:10:46PM -0700, Nicolin Chen wrote:
+int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd) +{
- struct iommu_viommu_set_vdev_id *cmd = ucmd->cmd;
- struct iommufd_hwpt_nested *hwpt_nested;
- struct iommufd_vdev_id *vdev_id, *curr;
- struct iommufd_hw_pagetable *hwpt;
- struct iommufd_viommu *viommu;
- struct iommufd_device *idev;
- int rc = 0;
- if (cmd->vdev_id > ULONG_MAX)
return -EINVAL;
- idev = iommufd_get_device(ucmd, cmd->dev_id);
- if (IS_ERR(idev))
return PTR_ERR(idev);
- hwpt = idev->igroup->hwpt;
- if (hwpt == NULL || hwpt->obj.type != IOMMUFD_OBJ_HWPT_NESTED) {
rc = -EINVAL;
goto out_put_idev;
- }
- hwpt_nested = container_of(hwpt, struct iommufd_hwpt_nested, common);
This doesn't seem like a necessary check, the attached hwpt can change after this is established, so this can't be an invariant we enforce.
If you want to do 1:1 then somehow directly check if the idev is already linked to a viommu.
But idev can't link to a viommu without a proxy hwpt_nested?
Why not? The idev becomes linked to the viommu when the dev id is set
Unless we are also going to enforce the idev is always attached to a nested then I don't think we need to check it here.
Things will definately not entirely work as expected if the vdev is directly attached to the s2 or a blocking, but it won't harm anything.
the stage-2 only configuration should have an identity hwpt_nested right?
Yes, that is the right way to use the API
It has to work by having the iommu driver directly access the xarray and the entirely under the spinlock the iommu driver can translate the vSID to the pSID and the let go and push the invalidation to HW. No races.
Maybe the iommufd_viommu_invalidate ioctl handler should hold that xa_lock around the viommu->ops->cache_invalidate, and then add lock assert in iommufd_viommu_find_device?
That doesn't seem like a great idea, you can't do copy_from_user under a spinlock.
xa_lock(&viommu->vdev_ids); vdev_id = xa_load(&viommu->vdev_ids, cmd->vdev_id); if (!vdev_id || vdev_id->vdev_id != cmd->vdev_id (????) || vdev_id->dev != idev->dev) err __xa_erase(&viommu->vdev_ids, cmd->vdev_id); xa_unlock((&viommu->vdev_ids);
I've changed to xa_cmpxchg() in my local tree. Would it be simpler?
No, that is still not right, you can't take the vdev_id outside the lock at all. Even for cmpxchng because the vdev_id could have been freed and reallocated by another thread.
You must combine the validation of the vdev_id with the erase under a single critical region.
Jason
On Thu, Aug 15, 2024 at 08:41:19PM -0300, Jason Gunthorpe wrote:
On Thu, Aug 15, 2024 at 12:46:24PM -0700, Nicolin Chen wrote:
On Thu, Aug 15, 2024 at 04:08:48PM -0300, Jason Gunthorpe wrote:
On Wed, Aug 07, 2024 at 01:10:46PM -0700, Nicolin Chen wrote:
+int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd) +{
- struct iommu_viommu_set_vdev_id *cmd = ucmd->cmd;
- struct iommufd_hwpt_nested *hwpt_nested;
- struct iommufd_vdev_id *vdev_id, *curr;
- struct iommufd_hw_pagetable *hwpt;
- struct iommufd_viommu *viommu;
- struct iommufd_device *idev;
- int rc = 0;
- if (cmd->vdev_id > ULONG_MAX)
return -EINVAL;
- idev = iommufd_get_device(ucmd, cmd->dev_id);
- if (IS_ERR(idev))
return PTR_ERR(idev);
- hwpt = idev->igroup->hwpt;
- if (hwpt == NULL || hwpt->obj.type != IOMMUFD_OBJ_HWPT_NESTED) {
rc = -EINVAL;
goto out_put_idev;
- }
- hwpt_nested = container_of(hwpt, struct iommufd_hwpt_nested, common);
This doesn't seem like a necessary check, the attached hwpt can change after this is established, so this can't be an invariant we enforce.
If you want to do 1:1 then somehow directly check if the idev is already linked to a viommu.
But idev can't link to a viommu without a proxy hwpt_nested?
Why not? The idev becomes linked to the viommu when the dev id is set
Unless we are also going to enforce the idev is always attached to a nested then I don't think we need to check it here.
Things will definately not entirely work as expected if the vdev is directly attached to the s2 or a blocking, but it won't harm anything.
My view is that, the moment there is a VIOMMU object, that must be a nested IOMMU case, so there must be a nested hwpt. Blocking domain would be a hwpt_nested too (vSTE=Abort) as we previously concluded.
Then, in a nested case, it feels odd that an idev is attached to an S2 hwpt..
That being said, I think we can still do that with validations: If idev->hwpt is nested, compare input viommu v.s idev->hwpt->viommu. If idev->hwpt is paging, compare input viommu->hwpt v.s idev->hwpt.
the stage-2 only configuration should have an identity hwpt_nested right?
Yes, that is the right way to use the API
It has to work by having the iommu driver directly access the xarray and the entirely under the spinlock the iommu driver can translate the vSID to the pSID and the let go and push the invalidation to HW. No races.
Maybe the iommufd_viommu_invalidate ioctl handler should hold that xa_lock around the viommu->ops->cache_invalidate, and then add lock assert in iommufd_viommu_find_device?
That doesn't seem like a great idea, you can't do copy_from_user under a spinlock.
xa_lock(&viommu->vdev_ids); vdev_id = xa_load(&viommu->vdev_ids, cmd->vdev_id); if (!vdev_id || vdev_id->vdev_id != cmd->vdev_id (????) || vdev_id->dev != idev->dev) err __xa_erase(&viommu->vdev_ids, cmd->vdev_id); xa_unlock((&viommu->vdev_ids);
I've changed to xa_cmpxchg() in my local tree. Would it be simpler?
No, that is still not right, you can't take the vdev_id outside the lock at all. Even for cmpxchng because the vdev_id could have been freed and reallocated by another thread.
You must combine the validation of the vdev_id with the erase under a single critical region.
Yea, we need a wider locker to keep the vdev_id list and its data pointers unchanged. I'll try the rw semaphore that you suggested in the other mail.
This complicates things overall especially with the VIRQ that has involved interrupt context polling vdev_id, where semaphore/mutex won't fit very well. Perhaps it would need a driver-level bottom half routine to call those helpers with locks. I am glad that you noticed the problem early.
Thanks! Nicolin
On Thu, Aug 15, 2024 at 05:21:57PM -0700, Nicolin Chen wrote:
Why not? The idev becomes linked to the viommu when the dev id is set
Unless we are also going to enforce the idev is always attached to a nested then I don't think we need to check it here.
Things will definately not entirely work as expected if the vdev is directly attached to the s2 or a blocking, but it won't harm anything.
My view is that, the moment there is a VIOMMU object, that must be a nested IOMMU case, so there must be a nested hwpt. Blocking domain would be a hwpt_nested too (vSTE=Abort) as we previously concluded.
I'm not sure other vendors can do that vSTE=Abort/Bypass thing though yet..
Then, in a nested case, it feels odd that an idev is attached to an S2 hwpt..
That being said, I think we can still do that with validations: If idev->hwpt is nested, compare input viommu v.s idev->hwpt->viommu. If idev->hwpt is paging, compare input viommu->hwpt v.s idev->hwpt.
But again, if you don't contiguously validate those invariants in all the other attach paths it is sort of pointless to check them since the userspace can still violate things.
This complicates things overall especially with the VIRQ that has involved interrupt context polling vdev_id, where semaphore/mutex won't fit very well. Perhaps it would need a driver-level bottom half routine to call those helpers with locks. I am glad that you noticed the problem early.
I think you have to show the xarray to the driver and the driver can use the spinlock to access it safely. Keeping it hidden in the core code is causing all these locking problems.
Jason
On Mon, Aug 19, 2024 at 02:33:32PM -0300, Jason Gunthorpe wrote:
On Thu, Aug 15, 2024 at 05:21:57PM -0700, Nicolin Chen wrote:
Why not? The idev becomes linked to the viommu when the dev id is set
Unless we are also going to enforce the idev is always attached to a nested then I don't think we need to check it here.
Things will definately not entirely work as expected if the vdev is directly attached to the s2 or a blocking, but it won't harm anything.
My view is that, the moment there is a VIOMMU object, that must be a nested IOMMU case, so there must be a nested hwpt. Blocking domain would be a hwpt_nested too (vSTE=Abort) as we previously concluded.
I'm not sure other vendors can do that vSTE=Abort/Bypass thing though yet..
Then, in a nested case, it feels odd that an idev is attached to an S2 hwpt..
That being said, I think we can still do that with validations: If idev->hwpt is nested, compare input viommu v.s idev->hwpt->viommu. If idev->hwpt is paging, compare input viommu->hwpt v.s idev->hwpt.
But again, if you don't contiguously validate those invariants in all the other attach paths it is sort of pointless to check them since the userspace can still violate things.
Hmm, would that be unsafe? I start to wonder if we should allow an attach to viommu and put validations on that?
This complicates things overall especially with the VIRQ that has involved interrupt context polling vdev_id, where semaphore/mutex won't fit very well. Perhaps it would need a driver-level bottom half routine to call those helpers with locks. I am glad that you noticed the problem early.
I think you have to show the xarray to the driver and the driver can use the spinlock to access it safely. Keeping it hidden in the core code is causing all these locking problems.
Yea, I just figured that out... You have been right. I was able to get rid of the locking problem with invalidation API. But then irq became a headache as drivers would only know the dev pointer, so everything that the dev could convert to would be unsafe as it can not grab the idev/viommu locks until it converts.
Thanks Nicolin
On Mon, Aug 19, 2024 at 11:10:03AM -0700, Nicolin Chen wrote:
On Mon, Aug 19, 2024 at 02:33:32PM -0300, Jason Gunthorpe wrote:
On Thu, Aug 15, 2024 at 05:21:57PM -0700, Nicolin Chen wrote:
Why not? The idev becomes linked to the viommu when the dev id is set
Unless we are also going to enforce the idev is always attached to a nested then I don't think we need to check it here.
Things will definately not entirely work as expected if the vdev is directly attached to the s2 or a blocking, but it won't harm anything.
My view is that, the moment there is a VIOMMU object, that must be a nested IOMMU case, so there must be a nested hwpt. Blocking domain would be a hwpt_nested too (vSTE=Abort) as we previously concluded.
I'm not sure other vendors can do that vSTE=Abort/Bypass thing though yet..
Then, in a nested case, it feels odd that an idev is attached to an S2 hwpt..
That being said, I think we can still do that with validations: If idev->hwpt is nested, compare input viommu v.s idev->hwpt->viommu. If idev->hwpt is paging, compare input viommu->hwpt v.s idev->hwpt.
But again, if you don't contiguously validate those invariants in all the other attach paths it is sort of pointless to check them since the userspace can still violate things.
Hmm, would that be unsafe? I start to wonder if we should allow an attach to viommu and put validations on that?
I don't think it is unsafe to mismatch things, if a device is disconnected from it's VIOMMU then the HW should isolate it the same as anything else
It doesn't matter if the VIOMMU has a devid mapping for the device when it is not currently part of the viommu configuration.
IOW it is not the devid ioctl that causes the device to join the VIOMMU, it is the attach of the nest.
Jason
A core-managed VIOMMU maintains an xarray to store a list of virtual ids to mock_devs.
Add test cases to cover the new IOMMU_VIOMMU_SET/UNSET_VDEV_ID ioctls.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- tools/testing/selftests/iommu/iommufd.c | 35 +++++++++++++++- tools/testing/selftests/iommu/iommufd_utils.h | 42 +++++++++++++++++++ 2 files changed, 75 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 5c770e94f299..2c11525ae754 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -556,9 +556,15 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
TEST_F(iommufd_ioas, viommu_default) { + struct iommu_hwpt_selftest data = { + .iotlb = IOMMU_TEST_IOTLB_DEFAULT, + }; + uint32_t nested_hwpt_id = 0, hwpt_id = 0; uint32_t dev_id = self->device_id; uint32_t viommu_id = 0; - uint32_t hwpt_id = 0; + uint32_t device2; + uint32_t stdev2; + uint32_t hwpt2;
if (dev_id) { /* Negative test -- invalid hwpt */ @@ -575,17 +581,42 @@ TEST_F(iommufd_ioas, viommu_default) test_cmd_hwpt_alloc(dev_id, self->ioas_id, IOMMU_HWPT_ALLOC_NEST_PARENT, &hwpt_id); + test_cmd_mock_domain_replace(self->stdev_id, hwpt_id); + /* Negative test -- unsupported viommu type */ test_err_viommu_alloc(EOPNOTSUPP, dev_id, hwpt_id, 0xdead, &viommu_id); - /* Allocate a default type of viommu */ + + /* Allocate a default type of viommu and a nested hwpt on top */ test_cmd_viommu_alloc(dev_id, hwpt_id, IOMMU_VIOMMU_TYPE_DEFAULT, &viommu_id); + test_cmd_hwpt_alloc_nested(self->device_id, viommu_id, 0, + &nested_hwpt_id, + IOMMU_HWPT_DATA_SELFTEST, &data, + sizeof(data)); + test_cmd_mock_domain_replace(self->stdev_id, nested_hwpt_id); + + /* Set vdev_id to 0x99, unset it, and set to 0x88 */ + test_cmd_viommu_set_vdev_id(viommu_id, dev_id, 0x99); + test_err_viommu_set_vdev_id(EBUSY, viommu_id, dev_id, 0x99); + test_err_viommu_unset_vdev_id(EINVAL, viommu_id, dev_id, 0x88); + test_cmd_viommu_unset_vdev_id(viommu_id, dev_id, 0x99); + test_cmd_viommu_set_vdev_id(viommu_id, dev_id, 0x88); + + /* Negative test -- a device attached to a different HWPT */ + test_cmd_mock_domain(self->ioas_id, &stdev2, &hwpt2, &device2); + test_err_viommu_set_vdev_id(EINVAL, viommu_id, device2, 0xaa); + test_ioctl_destroy(stdev2); + + test_cmd_mock_domain_replace(self->stdev_id, hwpt_id); + test_ioctl_destroy(nested_hwpt_id); + test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id); test_ioctl_destroy(viommu_id); test_ioctl_destroy(hwpt_id); } else { test_err_viommu_alloc(ENOENT, dev_id, hwpt_id, IOMMU_VIOMMU_TYPE_DEFAULT, &viommu_id); + test_err_viommu_set_vdev_id(ENOENT, viommu_id, dev_id, 0x99); } }
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 810783d31ca5..1b494752a87f 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -799,3 +799,45 @@ static int _test_cmd_viommu_alloc(int fd, __u32 device_id, __u32 hwpt_id, EXPECT_ERRNO(_errno, _test_cmd_viommu_alloc(self->fd, device_id, \ hwpt_id, type, 0, \ viommu_id)) + +static int _test_cmd_viommu_set_vdev_id(int fd, __u32 viommu_id, + __u32 idev_id, __u64 vdev_id) +{ + struct iommu_viommu_set_vdev_id cmd = { + .size = sizeof(cmd), + .dev_id = idev_id, + .viommu_id = viommu_id, + .vdev_id = vdev_id, + }; + + return ioctl(fd, IOMMU_VIOMMU_SET_VDEV_ID, &cmd); +} + +#define test_cmd_viommu_set_vdev_id(viommu_id, idev_id, vdev_id) \ + ASSERT_EQ(0, _test_cmd_viommu_set_vdev_id(self->fd, viommu_id, \ + idev_id, vdev_id)) +#define test_err_viommu_set_vdev_id(_errno, viommu_id, idev_id, vdev_id) \ + EXPECT_ERRNO(_errno, \ + _test_cmd_viommu_set_vdev_id(self->fd, viommu_id, \ + idev_id, vdev_id)) + +static int _test_cmd_viommu_unset_vdev_id(int fd, __u32 viommu_id, + __u32 idev_id, __u64 vdev_id) +{ + struct iommu_viommu_unset_vdev_id cmd = { + .size = sizeof(cmd), + .dev_id = idev_id, + .viommu_id = viommu_id, + .vdev_id = vdev_id, + }; + + return ioctl(fd, IOMMU_VIOMMU_UNSET_VDEV_ID, &cmd); +} + +#define test_cmd_viommu_unset_vdev_id(viommu_id, idev_id, vdev_id) \ + ASSERT_EQ(0, _test_cmd_viommu_unset_vdev_id(self->fd, viommu_id, \ + idev_id, vdev_id)) +#define test_err_viommu_unset_vdev_id(_errno, viommu_id, idev_id, vdev_id) \ + EXPECT_ERRNO(_errno, \ + _test_cmd_viommu_unset_vdev_id(self->fd, viommu_id, \ + idev_id, vdev_id))
Add a default_viommu_ops with a new op for cache invaldiation, similar to the cache_invalidate_user op in structure iommu_domain_ops, but wider. An IOMMU driver that allocated a nested domain with a core-managed viommu is able to use the same viommu pointer for this cache invalidation API.
ARM SMMUv3 for example supports IOTLB and ATC device cache invaldiations. The IOTLB invalidation is per-VMID, held currently by a parent S2 domain. The ATC invalidation is per device (Stream ID) that should be tranlsated by a virtual device ID lookup table. Either case fits the viommu context.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 2 ++ drivers/iommu/iommufd/viommu.c | 3 +++ include/linux/iommu.h | 4 ++++ include/linux/iommufd.h | 19 +++++++++++++++++++ 4 files changed, 28 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 10c63972b9ab..65023211db47 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -536,6 +536,8 @@ struct iommufd_viommu { struct iommufd_hwpt_paging *hwpt; struct xarray vdev_ids;
+ const struct iommufd_viommu_ops *ops; + unsigned int type; };
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 05a688a471db..209d419fe5cd 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -27,6 +27,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) struct iommufd_hwpt_paging *hwpt_paging; struct iommufd_viommu *viommu; struct iommufd_device *idev; + struct iommu_domain *domain; int rc;
if (cmd->flags) @@ -46,6 +47,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) rc = -EINVAL; goto out_put_hwpt; } + domain = hwpt_paging->common.domain;
if (cmd->type != IOMMU_VIOMMU_TYPE_DEFAULT) { rc = -EOPNOTSUPP; @@ -61,6 +63,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) viommu->type = cmd->type; viommu->ictx = ucmd->ictx; viommu->hwpt = hwpt_paging; + viommu->ops = domain->ops->default_viommu_ops; viommu->iommu_dev = idev->dev->iommu->iommu_dev;
refcount_inc(&viommu->hwpt->common.obj.users); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 0b71b2b24ede..c884b2b9f8ea 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -680,6 +680,8 @@ struct iommu_ops { * array->entry_num to report the number of handled * invalidation requests. The driver data structure * must be defined in include/uapi/linux/iommufd.h + * @default_viommu_ops: Driver can choose to use a default core-allocated core- + * managed viommu object by providing a default viommu ops. * @iova_to_phys: translate iova to physical address * @enforce_cache_coherency: Prevent any kind of DMA from bypassing IOMMU_CACHE, * including no-snoop TLPs on PCIe or other platform @@ -712,6 +714,8 @@ struct iommu_domain_ops { phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
+ const struct iommufd_viommu_ops *default_viommu_ops; + bool (*enforce_cache_coherency)(struct iommu_domain *domain); int (*set_pgtable_quirks)(struct iommu_domain *domain, unsigned long quirks); diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index c2f2f6b9148e..7da5140fa1b0 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -17,6 +17,25 @@ struct iommufd_ctx; struct iommufd_access; struct file; struct iommu_group; +struct iommufd_viommu; +struct iommu_user_data_array; + +/** + * struct iommufd_viommu_ops - viommu specific operations + * @cache_invalidate: Flush hardware cache used by a viommu. It can be used for + * any IOMMU hardware specific cache as long as a viommu has + * enough information to identify it: for example, a VMID or + * a vdev_id lookup table. + * The @array passes in the cache invalidation requests, in + * form of a driver data structure. A driver must update the + * array->entry_num to report the number of handled requests. + * The data structure of the array entry must be defined in + * include/uapi/linux/iommufd.h + */ +struct iommufd_viommu_ops { + int (*cache_invalidate)(struct iommufd_viommu *viommu, + struct iommu_user_data_array *array); +};
struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, struct device *dev, u32 *id);
Add a new IOMMU_VIOMMU_INVALIDATE ioctl, similar to IOMMU_HWPT_INVALIDATE. This is used by the user space to flush any IOMMU specific cache that can be directed using a viommu object.
Add IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3 as the initial data type whose support is in the one of the following patches.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 1 + drivers/iommu/iommufd/main.c | 3 ++ drivers/iommu/iommufd/viommu.c | 45 +++++++++++++++++++++++++ include/uapi/linux/iommufd.h | 37 ++++++++++++++++++++ 4 files changed, 86 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 65023211db47..57c4045f3788 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -561,6 +561,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); void iommufd_viommu_destroy(struct iommufd_object *obj); int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd); int iommufd_viommu_unset_vdev_id(struct iommufd_ucmd *ucmd); +int iommufd_viommu_invalidate(struct iommufd_ucmd *ucmd);
#ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 199ad90fa36b..85d4d01f9ef6 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -334,6 +334,7 @@ union ucmd_buffer { struct iommu_option option; struct iommu_vfio_ioas vfio_ioas; struct iommu_viommu_alloc viommu; + struct iommu_viommu_invalidate viommu_cache; struct iommu_viommu_set_vdev_id set_vdev_id; struct iommu_viommu_unset_vdev_id unset_vdev_id; #ifdef CONFIG_IOMMUFD_TEST @@ -389,6 +390,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { __reserved), IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, struct iommu_viommu_alloc, out_viommu_id), + IOCTL_OP(IOMMU_VIOMMU_INVALIDATE, iommufd_viommu_invalidate, + struct iommu_viommu_invalidate, entry_num), IOCTL_OP(IOMMU_VIOMMU_SET_VDEV_ID, iommufd_viommu_set_vdev_id, struct iommu_viommu_set_vdev_id, vdev_id), IOCTL_OP(IOMMU_VIOMMU_UNSET_VDEV_ID, iommufd_viommu_unset_vdev_id, diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 209d419fe5cd..ada9f968b9f6 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -194,3 +194,48 @@ int iommufd_viommu_unset_vdev_id(struct iommufd_ucmd *ucmd) iommufd_put_object(ucmd->ictx, &idev->obj); return rc; } + +int iommufd_viommu_invalidate(struct iommufd_ucmd *ucmd) +{ + struct iommu_viommu_invalidate *cmd = ucmd->cmd; + struct iommu_user_data_array data_array = { + .type = cmd->data_type, + .uptr = u64_to_user_ptr(cmd->data_uptr), + .entry_len = cmd->entry_len, + .entry_num = cmd->entry_num, + }; + struct iommufd_viommu *viommu; + u32 done_num = 0; + int rc; + + if (cmd->__reserved) { + rc = -EOPNOTSUPP; + goto out; + } + + if (cmd->entry_num && (!cmd->data_uptr || !cmd->entry_len)) { + rc = -EINVAL; + goto out; + } + + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); + if (IS_ERR(viommu)) + return PTR_ERR(viommu); + + if (!viommu->ops || !viommu->ops->cache_invalidate) { + rc = -EOPNOTSUPP; + goto out_put_viommu; + } + + rc = viommu->ops->cache_invalidate(viommu, &data_array); + + done_num = data_array.entry_num; + +out_put_viommu: + iommufd_put_object(ucmd->ictx, &viommu->obj); +out: + cmd->entry_num = done_num; + if (iommufd_ucmd_respond(ucmd, sizeof(*cmd))) + return -EFAULT; + return rc; +} diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index d5e72682ba57..998b3f2cd2b5 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -54,6 +54,7 @@ enum { IOMMUFD_CMD_VIOMMU_ALLOC = 0x8f, IOMMUFD_CMD_VIOMMU_SET_VDEV_ID = 0x90, IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID = 0x91, + IOMMUFD_CMD_VIOMMU_INVALIDATE = 0x92, };
/** @@ -946,4 +947,40 @@ struct iommu_viommu_unset_vdev_id { __aligned_u64 vdev_id; }; #define IOMMU_VIOMMU_UNSET_VDEV_ID _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID) + +/** + * enum iommu_viommu_invalidate_data_type - VIOMMU Cache Invalidate Data Type + * @IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3: Invalidation data for ARM SMMUv3 + */ +enum iommu_viommu_invalidate_data_type { + IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3, +}; + +/** + * struct iommu_viommu_invalidate - ioctl(IOMMU_VIOMMU_INVALIDATE) + * @size: sizeof(struct iommu_viommu_invalidate) + * @viommu_id: viommu ID for cache invalidation + * @data_uptr: User pointer to an array of driver-specific viommu cache + * invalidation data + * @data_type: One of enum iommu_viommu_invalidate_data_type, defining the data + * type of all the entries in the invalidation request array. + * @entry_len: Length (in bytes) of a request entry in the request array + * @entry_num: Input the number of cache invalidation requests in the array. + * Output the number of requests successfully handled by kernel. + * @__reserved: Must be 0 + * + * Invalidate an iommu HW cache used by a viommu in the user space. + * Each ioctl can support one or more cache invalidation requests in the array + * that has a total size of @entry_len * @entry_num. + */ +struct iommu_viommu_invalidate { + __u32 size; + __u32 viommu_id; + __aligned_u64 data_uptr; + __u32 data_type; + __u32 entry_len; + __u32 entry_num; + __u32 __reserved; +}; +#define IOMMU_VIOMMU_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_INVALIDATE) #endif
On Wed, Aug 07, 2024 at 01:10:49PM -0700, Nicolin Chen wrote:
@@ -946,4 +947,40 @@ struct iommu_viommu_unset_vdev_id { __aligned_u64 vdev_id; }; #define IOMMU_VIOMMU_UNSET_VDEV_ID _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID)
+/**
- enum iommu_viommu_invalidate_data_type - VIOMMU Cache Invalidate Data Type
- @IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3: Invalidation data for ARM SMMUv3
- */
+enum iommu_viommu_invalidate_data_type {
- IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3,
+};
=1 here I think. Lets try to avoid 0 for the types..
And this shouldn't be in this patch
But also we can probably just use reuse enum iommu_hwpt_invalidate_data_type here?
+struct iommu_viommu_invalidate {
- __u32 size;
- __u32 viommu_id;
- __aligned_u64 data_uptr;
- __u32 data_type;
- __u32 entry_len;
- __u32 entry_num;
- __u32 __reserved;
+}; +#define IOMMU_VIOMMU_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_INVALIDATE)
I wonder if we should use IOMMU_HWPT_INVALIDATE? the hwpt_id can tell which mode it is in. The ioctl becomes badly named but these have identical arguments.
Jason
On Thu, Aug 15, 2024 at 08:24:05PM -0300, Jason Gunthorpe wrote:
On Wed, Aug 07, 2024 at 01:10:49PM -0700, Nicolin Chen wrote:
@@ -946,4 +947,40 @@ struct iommu_viommu_unset_vdev_id { __aligned_u64 vdev_id; }; #define IOMMU_VIOMMU_UNSET_VDEV_ID _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID)
+/**
- enum iommu_viommu_invalidate_data_type - VIOMMU Cache Invalidate Data Type
- @IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3: Invalidation data for ARM SMMUv3
- */
+enum iommu_viommu_invalidate_data_type {
- IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3,
+};
=1 here I think. Lets try to avoid 0 for the types..
And this shouldn't be in this patch
But also we can probably just use reuse enum iommu_hwpt_invalidate_data_type here?
Would that force IOMMU drivers to implement both hwpt and viommu invalidations? SMMUv3 driver would implement both anyway though..
+struct iommu_viommu_invalidate {
- __u32 size;
- __u32 viommu_id;
- __aligned_u64 data_uptr;
- __u32 data_type;
- __u32 entry_len;
- __u32 entry_num;
- __u32 __reserved;
+}; +#define IOMMU_VIOMMU_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_INVALIDATE)
I wonder if we should use IOMMU_HWPT_INVALIDATE? the hwpt_id can tell which mode it is in. The ioctl becomes badly named but these have identical arguments.
Mostly they would be identical. So, I think that's doable.
If someday we need something hwpt-specific or viommu-specific, we can always duplicate a different structure.
Thanks Nicolin
On Thu, Aug 15, 2024 at 04:51:39PM -0700, Nicolin Chen wrote:
On Thu, Aug 15, 2024 at 08:24:05PM -0300, Jason Gunthorpe wrote:
On Wed, Aug 07, 2024 at 01:10:49PM -0700, Nicolin Chen wrote:
@@ -946,4 +947,40 @@ struct iommu_viommu_unset_vdev_id { __aligned_u64 vdev_id; }; #define IOMMU_VIOMMU_UNSET_VDEV_ID _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID)
+/**
- enum iommu_viommu_invalidate_data_type - VIOMMU Cache Invalidate Data Type
- @IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3: Invalidation data for ARM SMMUv3
- */
+enum iommu_viommu_invalidate_data_type {
- IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3,
+};
=1 here I think. Lets try to avoid 0 for the types..
And this shouldn't be in this patch
But also we can probably just use reuse enum iommu_hwpt_invalidate_data_type here?
Would that force IOMMU drivers to implement both hwpt and viommu invalidations? SMMUv3 driver would implement both anyway though..
I wouldn't say force, just that they have to use a consistent numbering if they do choose to do both.
Jason
On Mon, Aug 19, 2024 at 02:30:56PM -0300, Jason Gunthorpe wrote:
On Thu, Aug 15, 2024 at 04:51:39PM -0700, Nicolin Chen wrote:
On Thu, Aug 15, 2024 at 08:24:05PM -0300, Jason Gunthorpe wrote:
On Wed, Aug 07, 2024 at 01:10:49PM -0700, Nicolin Chen wrote:
@@ -946,4 +947,40 @@ struct iommu_viommu_unset_vdev_id { __aligned_u64 vdev_id; }; #define IOMMU_VIOMMU_UNSET_VDEV_ID _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID)
+/**
- enum iommu_viommu_invalidate_data_type - VIOMMU Cache Invalidate Data Type
- @IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3: Invalidation data for ARM SMMUv3
- */
+enum iommu_viommu_invalidate_data_type {
- IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3,
+};
=1 here I think. Lets try to avoid 0 for the types..
And this shouldn't be in this patch
But also we can probably just use reuse enum iommu_hwpt_invalidate_data_type here?
Would that force IOMMU drivers to implement both hwpt and viommu invalidations? SMMUv3 driver would implement both anyway though..
I wouldn't say force, just that they have to use a consistent numbering if they do choose to do both.
But if we duplicate a driver type for two IOCTLs, that assumes our ABI supports both IOCTLs? No?
Thanks Nicolin
On Mon, Aug 19, 2024 at 10:49:56AM -0700, Nicolin Chen wrote:
On Mon, Aug 19, 2024 at 02:30:56PM -0300, Jason Gunthorpe wrote:
On Thu, Aug 15, 2024 at 04:51:39PM -0700, Nicolin Chen wrote:
On Thu, Aug 15, 2024 at 08:24:05PM -0300, Jason Gunthorpe wrote:
On Wed, Aug 07, 2024 at 01:10:49PM -0700, Nicolin Chen wrote:
@@ -946,4 +947,40 @@ struct iommu_viommu_unset_vdev_id { __aligned_u64 vdev_id; }; #define IOMMU_VIOMMU_UNSET_VDEV_ID _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID)
+/**
- enum iommu_viommu_invalidate_data_type - VIOMMU Cache Invalidate Data Type
- @IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3: Invalidation data for ARM SMMUv3
- */
+enum iommu_viommu_invalidate_data_type {
- IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3,
+};
=1 here I think. Lets try to avoid 0 for the types..
And this shouldn't be in this patch
But also we can probably just use reuse enum iommu_hwpt_invalidate_data_type here?
Would that force IOMMU drivers to implement both hwpt and viommu invalidations? SMMUv3 driver would implement both anyway though..
I wouldn't say force, just that they have to use a consistent numbering if they do choose to do both.
But if we duplicate a driver type for two IOCTLs, that assumes our ABI supports both IOCTLs? No?
No, it is just a numbering system to label the struct layout.
Jason
On Mon, Aug 19, 2024 at 03:20:35PM -0300, Jason Gunthorpe wrote:
On Mon, Aug 19, 2024 at 10:49:56AM -0700, Nicolin Chen wrote:
On Mon, Aug 19, 2024 at 02:30:56PM -0300, Jason Gunthorpe wrote:
On Thu, Aug 15, 2024 at 04:51:39PM -0700, Nicolin Chen wrote:
On Thu, Aug 15, 2024 at 08:24:05PM -0300, Jason Gunthorpe wrote:
On Wed, Aug 07, 2024 at 01:10:49PM -0700, Nicolin Chen wrote:
@@ -946,4 +947,40 @@ struct iommu_viommu_unset_vdev_id { __aligned_u64 vdev_id; }; #define IOMMU_VIOMMU_UNSET_VDEV_ID _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID)
+/**
- enum iommu_viommu_invalidate_data_type - VIOMMU Cache Invalidate Data Type
- @IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3: Invalidation data for ARM SMMUv3
- */
+enum iommu_viommu_invalidate_data_type {
- IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3,
+};
=1 here I think. Lets try to avoid 0 for the types..
And this shouldn't be in this patch
But also we can probably just use reuse enum iommu_hwpt_invalidate_data_type here?
Would that force IOMMU drivers to implement both hwpt and viommu invalidations? SMMUv3 driver would implement both anyway though..
I wouldn't say force, just that they have to use a consistent numbering if they do choose to do both.
But if we duplicate a driver type for two IOCTLs, that assumes our ABI supports both IOCTLs? No?
No, it is just a numbering system to label the struct layout.
OK. Let's merge the then.
Nicolin
Now, an IOMMU driver can use viommu to invalidate a device cache via the viommu_cache_invalidate op. Very likely, this would require the driver to lookup a virtual device ID for a physical device (or just its ID). Since the core already has a helper, make it public.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/viommu.c | 2 +- include/linux/iommufd.h | 8 ++++++++ 2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index ada9f968b9f6..2304f199b033 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -148,7 +148,7 @@ int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd) return rc; }
-static struct device * +struct device * iommufd_viommu_find_device(struct iommufd_viommu *viommu, u64 id) { struct iommufd_vdev_id *vdev_id; diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 7da5140fa1b0..e0d7a53acbd5 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -89,6 +89,8 @@ int iommufd_access_rw(struct iommufd_access *access, unsigned long iova, int iommufd_vfio_compat_ioas_get_id(struct iommufd_ctx *ictx, u32 *out_ioas_id); int iommufd_vfio_compat_ioas_create(struct iommufd_ctx *ictx); int iommufd_vfio_compat_set_no_iommu(struct iommufd_ctx *ictx); +struct device * +iommufd_viommu_find_device(struct iommufd_viommu *viommu, u64 id); #else /* !CONFIG_IOMMUFD */ static inline struct iommufd_ctx *iommufd_ctx_from_file(struct file *file) { @@ -129,5 +131,11 @@ static inline int iommufd_vfio_compat_set_no_iommu(struct iommufd_ctx *ictx) { return -EOPNOTSUPP; } + +static inline struct device * +iommufd_viommu_find_device(struct iommufd_viommu *viommu, u64 id) +{ + return NULL; +} #endif /* CONFIG_IOMMUFD */ #endif
Similar to the coverage of cache_invalidate_user for iotlb invalidation, add a device cache and an invalidation op to test IOMMU_VIOMMU_INVALIDATE ioctl.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_test.h | 25 +++++++++++ drivers/iommu/iommufd/selftest.c | 63 +++++++++++++++++++++++++++- 2 files changed, 87 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 0bb30275a92f..da824b58927f 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -55,6 +55,11 @@ enum { MOCK_NESTED_DOMAIN_IOTLB_NUM = 4, };
+enum { + MOCK_DEV_CACHE_ID_MAX = 3, + MOCK_DEV_CACHE_NUM = 4, +}; + struct iommu_test_cmd { __u32 size; __u32 op; @@ -156,6 +161,7 @@ struct iommu_test_hw_info { /* Should not be equal to any defined value in enum iommu_hwpt_data_type */ #define IOMMU_HWPT_DATA_SELFTEST 0xdead #define IOMMU_TEST_IOTLB_DEFAULT 0xbadbeef +#define IOMMU_TEST_DEV_CACHE_DEFAULT 0xbaddad
/** * struct iommu_hwpt_selftest @@ -184,4 +190,23 @@ struct iommu_hwpt_invalidate_selftest { __u32 iotlb_id; };
+/* Should not be equal to any defined value in enum iommu_viommu_invalidate_data_type */ +#define IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST 0xdeadbeef +#define IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID 0xdadbeef + +/** + * struct iommu_viommu_invalidate_selftest - Invalidation data for Mock VIOMMU + * (IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST) + * @flags: Invalidate flags + * @cache_id: Invalidate cache entry index + * + * If IOMMU_TEST_INVALIDATE_ALL is set in @flags, @cache_id will be ignored + */ +struct iommu_viommu_invalidate_selftest { +#define IOMMU_TEST_INVALIDATE_FLAG_ALL (1 << 0) + __u32 flags; + __u32 vdev_id; + __u32 cache_id; +}; + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index a165b162c88f..4858c74c67be 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -141,6 +141,7 @@ struct mock_dev { struct device dev; unsigned long flags; int id; + u32 cache[MOCK_DEV_CACHE_NUM]; };
struct selftest_obj { @@ -560,6 +561,61 @@ static void mock_dev_get_resv_regions(struct device *dev, iommu_dma_get_resv_regions(dev, head); }
+static int mock_viommu_cache_invalidate(struct iommufd_viommu *viommu, + struct iommu_user_data_array *array) +{ + struct iommu_viommu_invalidate_selftest inv; + u32 processed = 0; + int i = 0, j; + int rc = 0; + + if (array->type != IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST) { + rc = -EINVAL; + goto out; + } + + for ( ; i < array->entry_num; i++) { + struct mock_dev *mdev; + struct device *dev; + + rc = iommu_copy_struct_from_user_array(&inv, array, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, i, cache_id); + if (rc) + break; + + if (inv.flags & ~IOMMU_TEST_INVALIDATE_FLAG_ALL) { + rc = -EOPNOTSUPP; + break; + } + + if (inv.cache_id > MOCK_DEV_CACHE_ID_MAX) { + rc = -EINVAL; + break; + } + + dev = iommufd_viommu_find_device(viommu, inv.vdev_id); + if (!dev) { + rc = -EINVAL; + break; + } + mdev = container_of(dev, struct mock_dev, dev); + + if (inv.flags & IOMMU_TEST_INVALIDATE_FLAG_ALL) { + /* Invalidate all cache entries and ignore cache_id */ + for (j = 0; j < MOCK_DEV_CACHE_NUM; j++) + mdev->cache[j] = 0; + } else { + mdev->cache[inv.cache_id] = 0; + } + + processed++; + } + +out: + array->entry_num = processed; + return rc; +} + static const struct iommu_ops mock_ops = { /* * IOMMU_DOMAIN_BLOCKED cannot be returned from def_domain_type() @@ -587,6 +643,9 @@ static const struct iommu_ops mock_ops = { .map_pages = mock_domain_map_pages, .unmap_pages = mock_domain_unmap_pages, .iova_to_phys = mock_domain_iova_to_phys, + .default_viommu_ops = &(struct iommufd_viommu_ops){ + .cache_invalidate = mock_viommu_cache_invalidate, + }, }, };
@@ -722,7 +781,7 @@ static void mock_dev_release(struct device *dev) static struct mock_dev *mock_dev_create(unsigned long dev_flags) { struct mock_dev *mdev; - int rc; + int rc, i;
if (dev_flags & ~(MOCK_FLAGS_DEVICE_NO_DIRTY | MOCK_FLAGS_DEVICE_HUGE_IOVA)) @@ -737,6 +796,8 @@ static struct mock_dev *mock_dev_create(unsigned long dev_flags) mdev->flags = dev_flags; mdev->dev.release = mock_dev_release; mdev->dev.bus = &iommufd_mock_bus_type.bus; + for (i = 0; i < MOCK_DEV_CACHE_NUM; i++) + mdev->cache[i] = IOMMU_TEST_DEV_CACHE_DEFAULT;
rc = ida_alloc(&mock_dev_ida, GFP_KERNEL); if (rc < 0)
Similar to IOMMU_TEST_OP_MD_CHECK_IOTLB verifying a mock_domain's iotlb, IOMMU_TEST_OP_DEV_CHECK_CACHE will be used to verify a mock_dev's cache.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_test.h | 5 ++++ drivers/iommu/iommufd/selftest.c | 24 +++++++++++++++++++ tools/testing/selftests/iommu/iommufd.c | 7 +++++- tools/testing/selftests/iommu/iommufd_utils.h | 24 +++++++++++++++++++ 4 files changed, 59 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index da824b58927f..8c21c37a41cb 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -24,6 +24,7 @@ enum { IOMMU_TEST_OP_MD_CHECK_IOTLB, IOMMU_TEST_OP_TRIGGER_IOPF, IOMMU_TEST_OP_MD_CHECK_SW_MSI, + IOMMU_TEST_OP_DEV_CHECK_CACHE, };
enum { @@ -144,6 +145,10 @@ struct iommu_test_cmd { struct { __u32 stdev_id; } check_sw_msi; + struct { + __u32 id; + __u32 cache; + } check_dev_cache; }; __u32 last; }; diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 4858c74c67be..a79584fe8331 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -1103,6 +1103,26 @@ static int iommufd_test_md_check_sw_msi(struct iommufd_ucmd *ucmd, return rc; }
+static int iommufd_test_dev_check_cache(struct iommufd_ucmd *ucmd, + u32 idev_id, unsigned int cache_id, + u32 cache) +{ + struct iommufd_device *idev; + struct mock_dev *mdev; + int rc = 0; + + idev = iommufd_get_device(ucmd, idev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + mdev = container_of(idev->dev, struct mock_dev, dev); + + if (cache_id > MOCK_DEV_CACHE_ID_MAX || + mdev->cache[cache_id] != cache) + rc = -EINVAL; + iommufd_put_object(ucmd->ictx, &idev->obj); + return rc; +} + struct selftest_access { struct iommufd_access *access; struct file *file; @@ -1616,6 +1636,10 @@ int iommufd_test(struct iommufd_ucmd *ucmd) case IOMMU_TEST_OP_MD_CHECK_SW_MSI: return iommufd_test_md_check_sw_msi(ucmd, cmd->id, cmd->check_sw_msi.stdev_id); + case IOMMU_TEST_OP_DEV_CHECK_CACHE: + return iommufd_test_dev_check_cache(ucmd, cmd->id, + cmd->check_dev_cache.id, + cmd->check_dev_cache.cache); case IOMMU_TEST_OP_CREATE_ACCESS: return iommufd_test_create_access(ucmd, cmd->id, cmd->create_access.flags); diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 2c11525ae754..ce2e8c9ede9e 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -220,6 +220,8 @@ FIXTURE_SETUP(iommufd_ioas) for (i = 0; i != variant->mock_domains; i++) { test_cmd_mock_domain(self->ioas_id, &self->stdev_id, &self->hwpt_id, &self->device_id); + test_cmd_dev_check_cache_all(self->device_id, + IOMMU_TEST_DEV_CACHE_DEFAULT); self->base_iova = MOCK_APERTURE_START; } } @@ -1450,9 +1452,12 @@ FIXTURE_SETUP(iommufd_mock_domain)
ASSERT_GE(ARRAY_SIZE(self->hwpt_ids), variant->mock_domains);
- for (i = 0; i != variant->mock_domains; i++) + for (i = 0; i != variant->mock_domains; i++) { test_cmd_mock_domain(self->ioas_id, &self->stdev_ids[i], &self->hwpt_ids[i], &self->idev_ids[i]); + test_cmd_dev_check_cache_all(self->idev_ids[0], + IOMMU_TEST_DEV_CACHE_DEFAULT); + } self->hwpt_id = self->hwpt_ids[0];
self->mmap_flags = MAP_SHARED | MAP_ANONYMOUS; diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 1b494752a87f..a2b9a6bbcfcc 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -243,6 +243,30 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, __u32 ft_i test_cmd_hwpt_check_iotlb(hwpt_id, i, expected); \ })
+#define test_cmd_dev_check_cache(device_id, cache_id, expected) \ + ({ \ + struct iommu_test_cmd test_cmd = { \ + .size = sizeof(test_cmd), \ + .op = IOMMU_TEST_OP_DEV_CHECK_CACHE, \ + .id = device_id, \ + .check_dev_cache = { \ + .id = cache_id, \ + .cache = expected, \ + }, \ + }; \ + ASSERT_EQ(0, \ + ioctl(self->fd, \ + _IOMMU_TEST_CMD(IOMMU_TEST_OP_DEV_CHECK_CACHE),\ + &test_cmd)); \ + }) + +#define test_cmd_dev_check_cache_all(device_id, expected) \ + ({ \ + int c; \ + for (c = 0; c < MOCK_DEV_CACHE_NUM; c++) \ + test_cmd_dev_check_cache(device_id, c, expected); \ + }) + static int _test_cmd_hwpt_invalidate(int fd, __u32 hwpt_id, void *reqs, uint32_t data_type, uint32_t lreq, uint32_t *nreqs)
Add a viommu_cache test function to cover the new IOMMU_VIOMMU_INVALIDATE ioctl with similar postive and negative cases to the existing iotlb ones.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- tools/testing/selftests/iommu/iommufd.c | 190 ++++++++++++++++++ tools/testing/selftests/iommu/iommufd_utils.h | 32 +++ 2 files changed, 222 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index ce2e8c9ede9e..b8f20eeddd5b 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -622,6 +622,196 @@ TEST_F(iommufd_ioas, viommu_default) } }
+TEST_F(iommufd_ioas, viommu_dev_cache) +{ + struct iommu_viommu_invalidate_selftest inv_reqs[2] = {}; + struct iommu_hwpt_selftest data = { + .iotlb = IOMMU_TEST_IOTLB_DEFAULT, + }; + uint32_t nested_hwpt_id = 0, hwpt_id = 0; + uint32_t dev_id = self->device_id; + uint32_t viommu_id = 0; + uint32_t num_inv; + + if (dev_id) { + test_cmd_hwpt_alloc(dev_id, self->ioas_id, + IOMMU_HWPT_ALLOC_NEST_PARENT, &hwpt_id); + test_cmd_viommu_alloc(dev_id, hwpt_id, + IOMMU_VIOMMU_TYPE_DEFAULT, &viommu_id); + test_cmd_hwpt_alloc_nested(self->device_id, viommu_id, 0, + &nested_hwpt_id, + IOMMU_HWPT_DATA_SELFTEST, &data, + sizeof(data)); + test_cmd_mock_domain_replace(self->stdev_id, nested_hwpt_id); + test_cmd_viommu_set_vdev_id(viommu_id, dev_id, 0x99); + + test_cmd_dev_check_cache_all(dev_id, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* Check data_type by passing zero-length array */ + num_inv = 0; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: Invalid data_type */ + num_inv = 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: structure size sanity */ + num_inv = 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs) + 1, &num_inv); + assert(!num_inv); + + num_inv = 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + 1, &num_inv); + assert(!num_inv); + + /* Negative test: invalid flag is passed */ + num_inv = 1; + inv_reqs[0].flags = 0xffffffff; + inv_reqs[0].vdev_id = 0x99; + test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: invalid data_uptr when array is not empty */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + test_err_viommu_invalidate(EINVAL, viommu_id, NULL, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: invalid entry_len when array is not empty */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + 0, &num_inv); + assert(!num_inv); + + /* Negative test: invalid cache_id */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = MOCK_DEV_CACHE_ID_MAX + 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: invalid vdev_id */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x9; + inv_reqs[0].cache_id = 0; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* + * Invalidate the 1st cache entry but fail the 2nd request + * due to invalid flags configuration in the 2nd request. + */ + num_inv = 2; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 0; + inv_reqs[1].flags = 0xffffffff; + inv_reqs[1].vdev_id = 0x99; + inv_reqs[1].cache_id = 1; + test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache(dev_id, 0, 0); + test_cmd_dev_check_cache(dev_id, 1, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 2, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 3, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* + * Invalidate the 1st cache entry but fail the 2nd request + * due to invalid cache_id configuration in the 2nd request. + */ + num_inv = 2; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 0; + inv_reqs[1].flags = 0; + inv_reqs[1].vdev_id = 0x99; + inv_reqs[1].cache_id = MOCK_DEV_CACHE_ID_MAX + 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache(dev_id, 0, 0); + test_cmd_dev_check_cache(dev_id, 1, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 2, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 3, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* Invalidate the 2nd cache entry and verify */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 1; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache(dev_id, 0, 0); + test_cmd_dev_check_cache(dev_id, 1, 0); + test_cmd_dev_check_cache(dev_id, 2, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 3, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* Invalidate the 3rd and 4th cache entries and verify */ + num_inv = 2; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 2; + inv_reqs[1].flags = 0; + inv_reqs[1].vdev_id = 0x99; + inv_reqs[1].cache_id = 3; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 2); + test_cmd_dev_check_cache_all(dev_id, 0); + + /* Invalidate all cache entries for nested_dev_id[1] and verify */ + num_inv = 1; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_ALL; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache_all(dev_id, 0); + + test_cmd_mock_domain_replace(self->stdev_id, hwpt_id); + test_ioctl_destroy(nested_hwpt_id); + test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id); + test_ioctl_destroy(viommu_id); + test_ioctl_destroy(hwpt_id); + } +} + TEST_F(iommufd_ioas, hwpt_attach) { /* Create a device attached directly to a hwpt */ diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index a2b9a6bbcfcc..69c51151238a 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -298,6 +298,38 @@ static int _test_cmd_hwpt_invalidate(int fd, __u32 hwpt_id, void *reqs, data_type, lreq, nreqs)); \ })
+static int _test_cmd_viommu_invalidate(int fd, __u32 viommu_id, void *reqs, + uint32_t data_type, uint32_t lreq, + uint32_t *nreqs) +{ + struct iommu_viommu_invalidate cmd = { + .size = sizeof(cmd), + .viommu_id = viommu_id, + .data_type = data_type, + .data_uptr = (uint64_t)reqs, + .entry_len = lreq, + .entry_num = *nreqs, + }; + int rc = ioctl(fd, IOMMU_VIOMMU_INVALIDATE, &cmd); + *nreqs = cmd.entry_num; + return rc; +} + +#define test_cmd_viommu_invalidate(viommu, reqs, lreq, nreqs) \ + ({ \ + ASSERT_EQ(0, \ + _test_cmd_viommu_invalidate(self->fd, viommu, reqs, \ + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, \ + lreq, nreqs)); \ + }) +#define test_err_viommu_invalidate(_errno, viommu_id, reqs, data_type, lreq, \ + nreqs) \ + ({ \ + EXPECT_ERRNO(_errno, _test_cmd_viommu_invalidate( \ + self->fd, viommu_id, reqs, \ + data_type, lreq, nreqs)); \ + }) + static int _test_cmd_access_replace_ioas(int fd, __u32 access_id, unsigned int ioas_id) {
Allow an IOMMU driver to convert a core-managed viommu to a nested parent domain for the info that the domain holds.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/viommu.c | 8 ++++++++ include/linux/iommufd.h | 8 ++++++++ 2 files changed, 16 insertions(+)
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 2304f199b033..39a7c04c4132 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -239,3 +239,11 @@ int iommufd_viommu_invalidate(struct iommufd_ucmd *ucmd) return -EFAULT; return rc; } + +struct iommu_domain * +iommufd_viommu_to_parent_domain(struct iommufd_viommu *viommu) +{ + if (!viommu || !viommu->hwpt) + return NULL; + return viommu->hwpt->common.domain; +} diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index e0d7a53acbd5..c0ce12cb1374 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -91,6 +91,8 @@ int iommufd_vfio_compat_ioas_create(struct iommufd_ctx *ictx); int iommufd_vfio_compat_set_no_iommu(struct iommufd_ctx *ictx); struct device * iommufd_viommu_find_device(struct iommufd_viommu *viommu, u64 id); +struct iommu_domain * +iommufd_viommu_to_parent_domain(struct iommufd_viommu *viommu); #else /* !CONFIG_IOMMUFD */ static inline struct iommufd_ctx *iommufd_ctx_from_file(struct file *file) { @@ -137,5 +139,11 @@ iommufd_viommu_find_device(struct iommufd_viommu *viommu, u64 id) { return NULL; } + +static inline struct iommu_domain * +iommufd_viommu_to_parent_domain(struct iommufd_viommu *viommu) +{ + return NULL; +} #endif /* CONFIG_IOMMUFD */ #endif
Extract a helper accepting an s2_parent input directly so the following patch can take advantage of it for viommu's cache invalidate callback, which doesn't have a nested domain but a viommu object linked to an S2 parent domain.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 24 ++++++++++++++------- 1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 558cf3bb24e0..ec76377d505c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3227,10 +3227,10 @@ static void arm_smmu_domain_nested_free(struct iommu_domain *domain) * Enforce the VMID on the command. */ static int -arm_smmu_convert_user_cmd(struct arm_smmu_nested_domain *nested_domain, +arm_smmu_convert_user_cmd(struct arm_smmu_domain *s2_parent, struct iommu_hwpt_arm_smmuv3_invalidate *cmd) { - u16 vmid = nested_domain->s2_parent->s2_cfg.vmid; + u16 vmid = s2_parent->s2_cfg.vmid;
cmd->cmd[0] = le64_to_cpu(cmd->cmd[0]); cmd->cmd[1] = le64_to_cpu(cmd->cmd[1]); @@ -3255,12 +3255,10 @@ arm_smmu_convert_user_cmd(struct arm_smmu_nested_domain *nested_domain, return 0; }
-static int arm_smmu_cache_invalidate_user(struct iommu_domain *domain, - struct iommu_user_data_array *array) +static int __arm_smmu_cache_invalidate_user(struct arm_smmu_domain *s2_parent, + struct iommu_user_data_array *array) { - struct arm_smmu_nested_domain *nested_domain = - container_of(domain, struct arm_smmu_nested_domain, domain); - struct arm_smmu_device *smmu = nested_domain->s2_parent->smmu; + struct arm_smmu_device *smmu = s2_parent->smmu; struct iommu_hwpt_arm_smmuv3_invalidate *last_batch; struct iommu_hwpt_arm_smmuv3_invalidate *cmds; struct iommu_hwpt_arm_smmuv3_invalidate *cur; @@ -3282,7 +3280,7 @@ static int arm_smmu_cache_invalidate_user(struct iommu_domain *domain,
last_batch = cmds; while (cur != end) { - ret = arm_smmu_convert_user_cmd(nested_domain, cur); + ret = arm_smmu_convert_user_cmd(s2_parent, cur); if (ret) goto out;
@@ -3305,6 +3303,16 @@ static int arm_smmu_cache_invalidate_user(struct iommu_domain *domain, return ret; }
+static int arm_smmu_cache_invalidate_user(struct iommu_domain *domain, + struct iommu_user_data_array *array) +{ + struct arm_smmu_nested_domain *nested_domain = + container_of(domain, struct arm_smmu_nested_domain, domain); + + return __arm_smmu_cache_invalidate_user( + nested_domain->s2_parent, array); +} + static struct iommu_domain * arm_smmu_get_msi_mapping_domain(struct iommu_domain *domain) {
Add an arm_smmu_viommu_cache_invalidate() function for user space to issue cache invalidation commands via viommu.
The viommu invalidation takes the same native format of a 128-bit command, as the hwpt invalidation. Thus, reuse the same driver data structure, but make it wider to accept CMDQ_OP_ATC_INV and CMDQ_OP_CFGI_CD{_ALL}.
Scan the commands against the supported ist and fix the VMIDs and SIDs.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 54 +++++++++++++++++++-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 + include/uapi/linux/iommufd.h | 20 ++++++++ 3 files changed, 70 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index ec76377d505c..be4f849f1a48 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3219,15 +3219,32 @@ static void arm_smmu_domain_nested_free(struct iommu_domain *domain) kfree(container_of(domain, struct arm_smmu_nested_domain, domain)); }
+static int arm_smmu_convert_viommu_vdev_id(struct iommufd_viommu *viommu, + u32 vdev_id, u32 *sid) +{ + struct arm_smmu_master *master; + struct device *dev; + + dev = iommufd_viommu_find_device(viommu, vdev_id); + if (!dev) + return -EIO; + master = dev_iommu_priv_get(dev); + + if (sid) + *sid = master->streams[0].id; + return 0; +} + /* * Convert, in place, the raw invalidation command into an internal format that * can be passed to arm_smmu_cmdq_issue_cmdlist(). Internally commands are * stored in CPU endian. * - * Enforce the VMID on the command. + * Enforce the VMID or the SID on the command. */ static int arm_smmu_convert_user_cmd(struct arm_smmu_domain *s2_parent, + struct iommufd_viommu *viommu, struct iommu_hwpt_arm_smmuv3_invalidate *cmd) { u16 vmid = s2_parent->s2_cfg.vmid; @@ -3249,6 +3266,19 @@ arm_smmu_convert_user_cmd(struct arm_smmu_domain *s2_parent, cmd->cmd[0] &= ~CMDQ_TLBI_0_VMID; cmd->cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, vmid); break; + case CMDQ_OP_ATC_INV: + case CMDQ_OP_CFGI_CD: + case CMDQ_OP_CFGI_CD_ALL: + if (viommu) { + u32 sid, vsid = FIELD_GET(CMDQ_CFGI_0_SID, cmd->cmd[0]); + + if (arm_smmu_convert_viommu_vdev_id(viommu, vsid, &sid)) + return -EIO; + cmd->cmd[0] &= ~CMDQ_CFGI_0_SID; + cmd->cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, sid); + break; + } + fallthrough; default: return -EIO; } @@ -3256,8 +3286,11 @@ arm_smmu_convert_user_cmd(struct arm_smmu_domain *s2_parent, }
static int __arm_smmu_cache_invalidate_user(struct arm_smmu_domain *s2_parent, + struct iommufd_viommu *viommu, struct iommu_user_data_array *array) { + unsigned int type = viommu ? IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3 : + IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3; struct arm_smmu_device *smmu = s2_parent->smmu; struct iommu_hwpt_arm_smmuv3_invalidate *last_batch; struct iommu_hwpt_arm_smmuv3_invalidate *cmds; @@ -3273,14 +3306,13 @@ static int __arm_smmu_cache_invalidate_user(struct arm_smmu_domain *s2_parent,
static_assert(sizeof(*cmds) == 2 * sizeof(u64)); ret = iommu_copy_struct_from_full_user_array( - cmds, sizeof(*cmds), array, - IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3); + cmds, sizeof(*cmds), array, type); if (ret) goto out;
last_batch = cmds; while (cur != end) { - ret = arm_smmu_convert_user_cmd(s2_parent, cur); + ret = arm_smmu_convert_user_cmd(s2_parent, viommu, cur); if (ret) goto out;
@@ -3310,7 +3342,7 @@ static int arm_smmu_cache_invalidate_user(struct iommu_domain *domain, container_of(domain, struct arm_smmu_nested_domain, domain);
return __arm_smmu_cache_invalidate_user( - nested_domain->s2_parent, array); + nested_domain->s2_parent, NULL, array); }
static struct iommu_domain * @@ -3812,6 +3844,15 @@ static int arm_smmu_def_domain_type(struct device *dev) return 0; }
+static int arm_smmu_viommu_cache_invalidate(struct iommufd_viommu *viommu, + struct iommu_user_data_array *array) +{ + struct iommu_domain *domain = iommufd_viommu_to_parent_domain(viommu); + + return __arm_smmu_cache_invalidate_user( + to_smmu_domain(domain), viommu, array); +} + static struct iommu_ops arm_smmu_ops = { .identity_domain = &arm_smmu_identity_domain, .blocked_domain = &arm_smmu_blocked_domain, @@ -3842,6 +3883,9 @@ static struct iommu_ops arm_smmu_ops = { .iotlb_sync = arm_smmu_iotlb_sync, .iova_to_phys = arm_smmu_iova_to_phys, .free = arm_smmu_domain_free_paging, + .default_viommu_ops = &(const struct iommufd_viommu_ops) { + .cache_invalidate = arm_smmu_viommu_cache_invalidate, + } } };
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 3f7442f0167e..a3fb08e0a195 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -10,6 +10,7 @@
#include <linux/bitfield.h> #include <linux/iommu.h> +#include <linux/iommufd.h> #include <linux/kernel.h> #include <linux/mmzone.h> #include <linux/sizes.h> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 998b3f2cd2b5..416b9a18e6bb 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -956,6 +956,26 @@ enum iommu_viommu_invalidate_data_type { IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3, };
+/** + * struct iommu_viommu_arm_smmuv3_invalidate - ARM SMMUv3 cahce invalidation + * (IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3) + * @cmd: 128-bit cache invalidation command that runs in SMMU CMDQ. + * Must be little-endian. + * + * Supported command list: + * CMDQ_OP_TLBI_NSNH_ALL + * CMDQ_OP_TLBI_NH_VA + * CMDQ_OP_TLBI_NH_VAA + * CMDQ_OP_TLBI_NH_ALL + * CMDQ_OP_TLBI_NH_ASID + * CMDQ_OP_ATC_INV + * CMDQ_OP_CFGI_CD + * CMDQ_OP_CFGI_CD_ALL + * + * -EIO will be returned if the command is not supported. + */ +#define iommu_viommu_arm_smmuv3_invalidate iommu_hwpt_arm_smmuv3_invalidate + /** * struct iommu_viommu_invalidate - ioctl(IOMMU_VIOMMU_INVALIDATE) * @size: sizeof(struct iommu_viommu_invalidate)
On Wed, Aug 07, 2024 at 01:10:56PM -0700, Nicolin Chen wrote:
Add an arm_smmu_viommu_cache_invalidate() function for user space to issue cache invalidation commands via viommu.
The viommu invalidation takes the same native format of a 128-bit command, as the hwpt invalidation. Thus, reuse the same driver data structure, but make it wider to accept CMDQ_OP_ATC_INV and CMDQ_OP_CFGI_CD{_ALL}.
Scan the commands against the supported ist and fix the VMIDs and SIDs.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 54 +++++++++++++++++++-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 + include/uapi/linux/iommufd.h | 20 ++++++++ 3 files changed, 70 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index ec76377d505c..be4f849f1a48 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3219,15 +3219,32 @@ static void arm_smmu_domain_nested_free(struct iommu_domain *domain) kfree(container_of(domain, struct arm_smmu_nested_domain, domain)); } +static int arm_smmu_convert_viommu_vdev_id(struct iommufd_viommu *viommu,
u32 vdev_id, u32 *sid)
+{
- struct arm_smmu_master *master;
- struct device *dev;
- dev = iommufd_viommu_find_device(viommu, vdev_id);
- if (!dev)
return -EIO;
- master = dev_iommu_priv_get(dev);
- if (sid)
*sid = master->streams[0].id;
See this is the thing that needs to be locked right
xa_lock() dev = xa_load() master = dev_iommu_priv_get(dev); *sid = master->streams[0].id; xa_unlock()
Then no risk of dev going away under us.
@@ -3249,6 +3266,19 @@ arm_smmu_convert_user_cmd(struct arm_smmu_domain *s2_parent, cmd->cmd[0] &= ~CMDQ_TLBI_0_VMID; cmd->cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, vmid); break;
- case CMDQ_OP_ATC_INV:
- case CMDQ_OP_CFGI_CD:
- case CMDQ_OP_CFGI_CD_ALL:
Oh, I didn't catch on that CD was needing this too.. :\
That makes the other op much more useless than I expected. I really wanted to break these two series apart.
Maybe we need to drop the hwpt invalidation from the other series and aim to merge this all together through the iommufd tree.
Jason
On Thu, Aug 15, 2024 at 08:36:35PM -0300, Jason Gunthorpe wrote:
On Wed, Aug 07, 2024 at 01:10:56PM -0700, Nicolin Chen wrote:
+static int arm_smmu_convert_viommu_vdev_id(struct iommufd_viommu *viommu,
u32 vdev_id, u32 *sid)
+{
- struct arm_smmu_master *master;
- struct device *dev;
- dev = iommufd_viommu_find_device(viommu, vdev_id);
- if (!dev)
return -EIO;
- master = dev_iommu_priv_get(dev);
- if (sid)
*sid = master->streams[0].id;
See this is the thing that needs to be locked right
xa_lock() dev = xa_load() master = dev_iommu_priv_get(dev); *sid = master->streams[0].id; xa_unlock()
Then no risk of dev going away under us.
Yea, I figured that out.
Though only driver would know whether it would eventually access the vdev_id list, I'd like to keep things in the way of having a core-managed VIOMMU object (IOMMU_VIOMMU_TYPE_DEFAULT), so the viommu invalidation handler could have a lock at its top level to protect any potential access to the vdev_id list.
@@ -3249,6 +3266,19 @@ arm_smmu_convert_user_cmd(struct arm_smmu_domain *s2_parent, cmd->cmd[0] &= ~CMDQ_TLBI_0_VMID; cmd->cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, vmid); break;
- case CMDQ_OP_ATC_INV:
- case CMDQ_OP_CFGI_CD:
- case CMDQ_OP_CFGI_CD_ALL:
Oh, I didn't catch on that CD was needing this too.. :\
Well, viommu cache has a very wide range :)
That makes the other op much more useless than I expected. I really wanted to break these two series apart.
HWPT invalidate and VIOMMU invalidate are somewhat duplicated in both concept and implementation for SMMUv3. It's not a problem to have both but practically I can't think of the reason why VMM not simply stick to the wider VIOMMU invalidate uAPI alone..
Maybe we need to drop the hwpt invalidation from the other series and
Yea, the hwpt invalidate is just one patch in your series, it's easy to move if we want to.
aim to merge this all together through the iommufd tree.
I have been hoping for that, as you can see those driver patches are included here :)
And there will be another two series that I'd like to go through the IOMMUFD tree as well: VIOMMU part-1 (ALLOC/SET_VDEV_ID/INVALIDATE) + smmu user cache invalidate VIOMMU part-2 (VIRQ) + smmu virtual IRQ handling VIOMMU part-3 (VQUEUE) + CMDQV user-space support
Thanks Nicolin
On Thu, Aug 15, 2024 at 05:50:06PM -0700, Nicolin Chen wrote:
Though only driver would know whether it would eventually access the vdev_id list, I'd like to keep things in the way of having a core-managed VIOMMU object (IOMMU_VIOMMU_TYPE_DEFAULT), so the viommu invalidation handler could have a lock at its top level to protect any potential access to the vdev_id list.
It is a bit tortured to keep the xarray hidden. It would be better to find a way to expose the right struct to the driver.
@@ -3249,6 +3266,19 @@ arm_smmu_convert_user_cmd(struct arm_smmu_domain *s2_parent, cmd->cmd[0] &= ~CMDQ_TLBI_0_VMID; cmd->cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, vmid); break;
- case CMDQ_OP_ATC_INV:
- case CMDQ_OP_CFGI_CD:
- case CMDQ_OP_CFGI_CD_ALL:
Oh, I didn't catch on that CD was needing this too.. :\
Well, viommu cache has a very wide range :)
That makes the other op much more useless than I expected. I really wanted to break these two series apart.
HWPT invalidate and VIOMMU invalidate are somewhat duplicated in both concept and implementation for SMMUv3. It's not a problem to have both but practically I can't think of the reason why VMM not simply stick to the wider VIOMMU invalidate uAPI alone..
Maybe we need to drop the hwpt invalidation from the other series and
Yea, the hwpt invalidate is just one patch in your series, it's easy to move if we want to.
aim to merge this all together through the iommufd tree.
I have been hoping for that, as you can see those driver patches are included here :)
Well, this series has to go through iommufd of course
I was hoping will could take the nesting enablement and we'd do the viommu next window.
But nesting enablment with out viommu is alot less useful than I had thought :(
So maybe Will acks the nesting patches and we take the bunch together.
Jason
On Mon, Aug 19, 2024 at 02:36:15PM -0300, Jason Gunthorpe wrote:
On Thu, Aug 15, 2024 at 05:50:06PM -0700, Nicolin Chen wrote:
Though only driver would know whether it would eventually access the vdev_id list, I'd like to keep things in the way of having a core-managed VIOMMU object (IOMMU_VIOMMU_TYPE_DEFAULT), so the viommu invalidation handler could have a lock at its top level to protect any potential access to the vdev_id list.
It is a bit tortured to keep the xarray hidden. It would be better to find a way to expose the right struct to the driver.
Yes. I will try adding set/unset_vdev_id to the default viommu ops.
@@ -3249,6 +3266,19 @@ arm_smmu_convert_user_cmd(struct arm_smmu_domain *s2_parent, cmd->cmd[0] &= ~CMDQ_TLBI_0_VMID; cmd->cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, vmid); break;
- case CMDQ_OP_ATC_INV:
- case CMDQ_OP_CFGI_CD:
- case CMDQ_OP_CFGI_CD_ALL:
Oh, I didn't catch on that CD was needing this too.. :\
Well, viommu cache has a very wide range :)
That makes the other op much more useless than I expected. I really wanted to break these two series apart.
HWPT invalidate and VIOMMU invalidate are somewhat duplicated in both concept and implementation for SMMUv3. It's not a problem to have both but practically I can't think of the reason why VMM not simply stick to the wider VIOMMU invalidate uAPI alone..
Maybe we need to drop the hwpt invalidation from the other series and
Yea, the hwpt invalidate is just one patch in your series, it's easy to move if we want to.
aim to merge this all together through the iommufd tree.
I have been hoping for that, as you can see those driver patches are included here :)
Well, this series has to go through iommufd of course
I was hoping will could take the nesting enablement and we'd do the viommu next window.
But nesting enablment with out viommu is alot less useful than I had thought :(
Actually, without viommu, the hwpt cache invalidate alone could still support non-SVA case?
Though we still have the blocker at the msi mapping... It still requires a solution, even for viommu series.
Thanks Nicolin
On Mon, Aug 19, 2024 at 11:19:39AM -0700, Nicolin Chen wrote:
But nesting enablment with out viommu is alot less useful than I had thought :(
Actually, without viommu, the hwpt cache invalidate alone could still support non-SVA case?
That is what I thought, but doesn't the guest still have to invalidate the CD table entry # 0?
Though we still have the blocker at the msi mapping... It still requires a solution, even for viommu series.
Yes, small steps. The point of this step was to get the nested paging only (without msi, pri, etc, etc)
Jason
On Mon, Aug 19, 2024 at 03:28:11PM -0300, Jason Gunthorpe wrote:
On Mon, Aug 19, 2024 at 11:19:39AM -0700, Nicolin Chen wrote:
But nesting enablment with out viommu is alot less useful than I had thought :(
Actually, without viommu, the hwpt cache invalidate alone could still support non-SVA case?
That is what I thought, but doesn't the guest still have to invalidate the CD table entry # 0?
I recall it doesn't. The CD cache invalidation is required in the viommu invalidation for an SVA case where we need a PASID number to specify CD to the substream. But the CD to the default stream is only changed during a vSTE setup, and the host knows the PASID number (=0)?
Though we still have the blocker at the msi mapping... It still requires a solution, even for viommu series.
Yes, small steps. The point of this step was to get the nested paging only (without msi, pri, etc, etc)
Ack.
Thanks Nicolin
On Mon, Aug 19, 2024 at 11:38:22AM -0700, Nicolin Chen wrote:
On Mon, Aug 19, 2024 at 03:28:11PM -0300, Jason Gunthorpe wrote:
On Mon, Aug 19, 2024 at 11:19:39AM -0700, Nicolin Chen wrote:
But nesting enablment with out viommu is alot less useful than I had thought :(
Actually, without viommu, the hwpt cache invalidate alone could still support non-SVA case?
That is what I thought, but doesn't the guest still have to invalidate the CD table entry # 0?
I recall it doesn't. The CD cache invalidation is required in the viommu invalidation for an SVA case where we need a PASID number to specify CD to the substream. But the CD to the default stream is only changed during a vSTE setup, and the host knows the PASID number (=0)?
I think that would subtly assume certain things about how the driver does ordering, ie the that CD table entry 0 is setup with the S1 before we load it into the STE.
Yes, the Linux driver does that now, but I don't think anyone should rely on that..
Jason
On Mon, Aug 19, 2024 at 03:47:16PM -0300, Jason Gunthorpe wrote:
On Mon, Aug 19, 2024 at 11:38:22AM -0700, Nicolin Chen wrote:
On Mon, Aug 19, 2024 at 03:28:11PM -0300, Jason Gunthorpe wrote:
On Mon, Aug 19, 2024 at 11:19:39AM -0700, Nicolin Chen wrote:
But nesting enablment with out viommu is alot less useful than I had thought :(
Actually, without viommu, the hwpt cache invalidate alone could still support non-SVA case?
That is what I thought, but doesn't the guest still have to invalidate the CD table entry # 0?
I recall it doesn't. The CD cache invalidation is required in the viommu invalidation for an SVA case where we need a PASID number to specify CD to the substream. But the CD to the default stream is only changed during a vSTE setup, and the host knows the PASID number (=0)?
I think that would subtly assume certain things about how the driver does ordering, ie the that CD table entry 0 is setup with the S1 before we load it into the STE.
Yes, the Linux driver does that now, but I don't think anyone should rely on that..
Oh that's true...
Thanks Nicolin
From: Jason Gunthorpe jgg@nvidia.com
Now, ATC invalidation can be done with the VIOMMU invalidation op. A guest owned IOMMU_DOMAIN_NESTED can do an ATS too. Allow it to pass in the EATS field via the vSTE words.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 ++++++++++++--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 + 2 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index be4f849f1a48..ce84f0c04022 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3189,8 +3189,6 @@ static int arm_smmu_attach_dev_nested(struct iommu_domain *domain, .master = master, .old_domain = iommu_get_domain_for_dev(dev), .ssid = IOMMU_NO_PASID, - /* Currently invalidation of ATC is not supported */ - .disable_ats = true, }; struct arm_smmu_ste ste; int ret; @@ -3200,6 +3198,15 @@ static int arm_smmu_attach_dev_nested(struct iommu_domain *domain, return -EINVAL;
mutex_lock(&arm_smmu_asid_lock); + /* + * The VM has to control the actual ATS state at the PCI device because + * we forward the invalidations directly from the VM. If the VM doesn't + * think ATS is on it will not generate ATC flushes and the ATC will + * become incoherent. Since we can't access the actual virtual PCI ATS + * config bit here base this off the EATS value in the STE. If the EATS + * is set then the VM must generate ATC flushes. + */ + state.disable_ats = !nested_domain->enable_ats; ret = arm_smmu_attach_prepare(&state, domain); if (ret) { mutex_unlock(&arm_smmu_asid_lock); @@ -3420,8 +3427,9 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, cfg != STRTAB_STE_0_CFG_S1_TRANS) return ERR_PTR(-EIO);
+ /* Only Full ATS or ATS UR is supported */ eats = FIELD_GET(STRTAB_STE_1_EATS, le64_to_cpu(arg.ste[1])); - if (eats != STRTAB_STE_1_EATS_ABT) + if (eats != STRTAB_STE_1_EATS_ABT && eats != STRTAB_STE_1_EATS_TRANS) return ERR_PTR(-EIO);
if (cfg != STRTAB_STE_0_CFG_S1_TRANS) @@ -3434,6 +3442,7 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, nested_domain->domain.type = IOMMU_DOMAIN_NESTED; nested_domain->domain.ops = &arm_smmu_nested_ops; nested_domain->s2_parent = smmu_parent; + nested_domain->enable_ats = eats == STRTAB_STE_1_EATS_TRANS; nested_domain->ste[0] = arg.ste[0]; nested_domain->ste[1] = arg.ste[1] & ~cpu_to_le64(STRTAB_STE_1_EATS);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index a3fb08e0a195..65f90b00e16d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -811,6 +811,7 @@ struct arm_smmu_domain { struct arm_smmu_nested_domain { struct iommu_domain domain; struct arm_smmu_domain *s2_parent; + u8 enable_ats : 1;
__le64 ste[2]; };
linux-kselftest-mirror@lists.linaro.org