This series introduces a new vIOMMU infrastructure and related ioctls.
IOMMUFD has been using the HWPT infrastructure for all cases, including a nested IO page table support. Yet, there're limitations for an HWPT-based structure to support some advanced HW-accelerated features, such as CMDQV on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-IOMMU environment, it is not straightforward for nested HWPTs to share the same parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone: a parent HWPT typically hold one stage-2 IO pagetable and tag it with only one ID in the cache entries. When sharing one large stage-2 IO pagetable across physical IOMMU instances, that one ID may not always be available across all the IOMMU instances. In other word, it's ideal for SW to have a different container for the stage-2 IO pagetable so it can hold another ID that's available. And this container will be able to hold some advanced feature too.
For this "different container", add vIOMMU, an additional layer to hold extra virtualization information: _______________________________________________________________________ | iommufd (with vIOMMU) | | _____________ | | | | | | |----------------| vIOMMU | | | | ______ | | _____________ ________ | | | | | | | | | | | | | | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | | | |______| |_____________| |_____________| |________| | | | | | | | | |______|________|______________|__________________|_______________|_____| | | | | | ______v_____ | ______v_____ ______v_____ ___v__ | struct | | PFN | (paging) | | (nested) | |struct| |iommu_device| |------>|iommu_domain|<----|iommu_domain|<----|device| |____________| storage|____________| |____________| |______|
The vIOMMU object should be seen as a slice of a physical IOMMU instance that is passed to or shared with a VM. That can be some HW/SW resources: - Security namespace for guest owned ID, e.g. guest-controlled cache tags - Access to a sharable nesting parent pagetable across physical IOMMUs - Non-affiliated event reporting (e.g. an invalidation queue error) - Virtualization of various platforms IDs, e.g. RIDs and others - Delivery of paravirtualized invalidation - Direct assigned invalidation queues - Direct assigned interrupts
On a multi-IOMMU system, the vIOMMU object must be instanced to the number of the physical IOMMUs that have a slice passed to (via device) a guest VM, while being able to hold the shareable parent HWPT. Each vIOMMU then just needs to allocate its own individual ID to tag its own cache: ---------------------------- ---------------- | | paging_hwpt0 | | hwpt_nested0 |--->| viommu0 ------------------ ---------------- | | IDx | ---------------------------- ---------------------------- ---------------- | | paging_hwpt0 | | hwpt_nested1 |--->| viommu1 ------------------ ---------------- | | IDy | ----------------------------
As an initial part-1, add IOMMUFD_CMD_VIOMMU_ALLOC ioctl for an allocation only. And implement it in arm-smmu-v3 driver as a real world use case.
More vIOMMU-based structs and ioctls will be introduced in the follow-up series to support vDEVICE, vIRQ (vEVENT) and vQUEUE objects. Although we repurposed the vIOMMU object from an earlier RFC, just for a referece: https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
This series is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p1-v5 (paring QEMU branch for testing will be provided with the part2 series)
Changelog v5 * Added "Reviewed-by" from Kevin * Reworked iommufd_viommu_alloc helper * Revised the uAPI kdoc for vIOMMU object * Revised comments for pluggable iommu_dev * Added a couple of cleanup patches for selftest * Renamed domain_alloc_nested op to alloc_domain_nested * Updated a few commit messages to reflect the latest series * Renamed iommufd_hwpt_nested_alloc_for_viommu to iommufd_viommu_alloc_hwpt_nested, and added flag validation v4 https://lore.kernel.org/all/cover.1729553811.git.nicolinc@nvidia.com/ * Added "Reviewed-by" from Jason * Dropped IOMMU_VIOMMU_TYPE_DEFAULT support * Dropped iommufd_object_alloc_elm renamings * Renamed iommufd's viommu_api.c to driver.c * Reworked iommufd_viommu_alloc helper * Added a separate iommufd_hwpt_nested_alloc_for_viommu function for hwpt_nested allocations on a vIOMMU, and added comparison between viommu->iommu_dev->ops and dev_iommu_ops(idev->dev) * Replaced s2_parent with vsmmu in arm_smmu_nested_domain * Replaced domain_alloc_user in iommu_ops with domain_alloc_nested in viommu_ops * Replaced wait_queue_head_t with a completion, to delay the unplug of mock_iommu_dev * Corrected documentation graph that was missing struct iommu_device * Added an iommufd_verify_unfinalized_object helper to verify driver- allocated vIOMMU/vDEVICE objects * Added missing test cases for TEST_LENGTH and fail_nth v3 https://lore.kernel.org/all/cover.1728491453.git.nicolinc@nvidia.com/ * Rebased on top of Jason's nesting v3 series https://lore.kernel.org/all/0-v3-e2e16cd7467f+2a6a1-smmuv3_nesting_jgg@nvidi... * Split the series into smaller parts * Added Jason's Reviewed-by * Added back viommu->iommu_dev * Added support for driver-allocated vIOMMU v.s. core-allocated * Dropped arm_smmu_cache_invalidate_user * Added an iommufd_test_wait_for_users() in selftest * Reworked test code to make viommu an individual FIXTURE * Added missing TEST_LENGTH case for the new ioctl command v2 https://lore.kernel.org/all/cover.1724776335.git.nicolinc@nvidia.com/ * Limited vdev_id to one per idev * Added a rw_sem to protect the vdev_id list * Reworked driver-level APIs with proper lockings * Added a new viommu_api file for IOMMUFD_DRIVER config * Dropped useless iommu_dev point from the viommu structure * Added missing index numnbers to new types in the uAPI header * Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one * Reworked mock_viommu_cache_invalidate() using the new iommu helper * Reordered details of set/unset_vdev_id handlers for proper lockings v1 https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/
Thanks! Nicolin
Nicolin Chen (13): iommufd: Move struct iommufd_object to public iommufd header iommufd: Introduce IOMMUFD_OBJ_VIOMMU and its related struct iommufd: Add iommufd_verify_unfinalized_object iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl iommufd: Add alloc_domain_nested op to iommufd_viommu_ops iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC iommufd/selftest: Add container_of helpers iommufd/selftest: Prepare for mock_viommu_alloc_domain_nested() iommufd/selftest: Add refcount to mock_iommu_device iommufd/selftest: Add IOMMU_VIOMMU_TYPE_SELFTEST iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage Documentation: userspace-api: iommufd: Update vIOMMU iommu/arm-smmu-v3: Add IOMMU_VIOMMU_TYPE_ARM_SMMUV3 support
drivers/iommu/iommufd/Makefile | 5 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 26 +- drivers/iommu/iommufd/iommufd_private.h | 36 +-- drivers/iommu/iommufd/iommufd_test.h | 2 + include/linux/iommu.h | 14 + include/linux/iommufd.h | 86 ++++++ include/uapi/linux/iommufd.h | 56 +++- tools/testing/selftests/iommu/iommufd_utils.h | 28 ++ .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 80 ++++-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +- drivers/iommu/iommufd/driver.c | 38 +++ drivers/iommu/iommufd/hw_pagetable.c | 71 ++++- drivers/iommu/iommufd/main.c | 58 ++-- drivers/iommu/iommufd/selftest.c | 256 ++++++++++++------ drivers/iommu/iommufd/viommu.c | 89 ++++++ tools/testing/selftests/iommu/iommufd.c | 87 ++++++ .../selftests/iommu/iommufd_fail_nth.c | 11 + Documentation/userspace-api/iommufd.rst | 69 ++++- 18 files changed, 827 insertions(+), 194 deletions(-) create mode 100644 drivers/iommu/iommufd/driver.c create mode 100644 drivers/iommu/iommufd/viommu.c
Prepare for an embedded structure design for driver-level iommufd_viommu objects: // include/linux/iommufd.h struct iommufd_viommu { struct iommufd_object obj; .... };
// Some IOMMU driver struct iommu_driver_viommu { struct iommufd_viommu core; .... };
It has to expose struct iommufd_object and enum iommufd_object_type from the core-level private header to the public iommufd header.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 25 +------------------------ include/linux/iommufd.h | 24 ++++++++++++++++++++++++ 2 files changed, 25 insertions(+), 24 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index f1d865e6fab6..1bb8c0aaecd1 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -5,8 +5,8 @@ #define __IOMMUFD_PRIVATE_H
#include <linux/iommu.h> +#include <linux/iommufd.h> #include <linux/iova_bitmap.h> -#include <linux/refcount.h> #include <linux/rwsem.h> #include <linux/uaccess.h> #include <linux/xarray.h> @@ -122,29 +122,6 @@ static inline int iommufd_ucmd_respond(struct iommufd_ucmd *ucmd, return 0; }
-enum iommufd_object_type { - IOMMUFD_OBJ_NONE, - IOMMUFD_OBJ_ANY = IOMMUFD_OBJ_NONE, - IOMMUFD_OBJ_DEVICE, - IOMMUFD_OBJ_HWPT_PAGING, - IOMMUFD_OBJ_HWPT_NESTED, - IOMMUFD_OBJ_IOAS, - IOMMUFD_OBJ_ACCESS, - IOMMUFD_OBJ_FAULT, -#ifdef CONFIG_IOMMUFD_TEST - IOMMUFD_OBJ_SELFTEST, -#endif - IOMMUFD_OBJ_MAX, -}; - -/* Base struct for all objects with a userspace ID handle. */ -struct iommufd_object { - refcount_t shortterm_users; - refcount_t users; - enum iommufd_object_type type; - unsigned int id; -}; - static inline bool iommufd_lock_obj(struct iommufd_object *obj) { if (!refcount_inc_not_zero(&obj->users)) diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 30f832a60ccb..22948dd03d67 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -8,6 +8,7 @@
#include <linux/err.h> #include <linux/errno.h> +#include <linux/refcount.h> #include <linux/types.h>
struct device; @@ -18,6 +19,29 @@ struct iommufd_ctx; struct iommufd_device; struct page;
+enum iommufd_object_type { + IOMMUFD_OBJ_NONE, + IOMMUFD_OBJ_ANY = IOMMUFD_OBJ_NONE, + IOMMUFD_OBJ_DEVICE, + IOMMUFD_OBJ_HWPT_PAGING, + IOMMUFD_OBJ_HWPT_NESTED, + IOMMUFD_OBJ_IOAS, + IOMMUFD_OBJ_ACCESS, + IOMMUFD_OBJ_FAULT, +#ifdef CONFIG_IOMMUFD_TEST + IOMMUFD_OBJ_SELFTEST, +#endif + IOMMUFD_OBJ_MAX, +}; + +/* Base struct for all objects with a userspace ID handle. */ +struct iommufd_object { + refcount_t shortterm_users; + refcount_t users; + enum iommufd_object_type type; + unsigned int id; +}; + struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, struct device *dev, u32 *id); void iommufd_device_unbind(struct iommufd_device *idev);
Add a new IOMMUFD_OBJ_VIOMMU with an iommufd_viommu structure to represent a slice of physical IOMMU device passed to or shared with a user space VM. This slice, now a vIOMMU object, is a group of virtualization resources of a physical IOMMU's, such as: - Security namespace for guest owned ID, e.g. guest-controlled cache tags - Access to a sharable nesting parent pagetable across physical IOMMUs - Virtualization of various platforms IDs, e.g. RIDs and others - Delivery of paravirtualized invalidation - Direct assigned invalidation queues - Direct assigned interrupts - Non-affiliated event reporting
Add a new viommu_alloc op in iommu_ops, for drivers to allocate their own vIOMMU structures. And this allocation also needs a free(), so add struct iommufd_viommu_ops.
To simplify a vIOMMU allocation, provide a iommufd_viommu_alloc() helper. It's suggested that a driver should embed a core-level viommu structure in its driver-level viommu struct and call the iommufd_viommu_alloc() helper, meanwhile the driver can also implement a viommu ops: struct my_driver_viommu { struct iommufd_viommu core; /* driver-owned properties/features */ .... };
static const struct iommufd_viommu_ops my_driver_viommu_ops = { .free = my_driver_viommu_free, /* future ops for virtualization features */ .... };
static struct iommufd_viommu my_driver_viommu_alloc(...) { struct my_driver_viommu *my_viommu = iommufd_viommu_alloc(ictx, my_driver_viommu, core, my_driver_viommu_ops); /* Init my_viommu and related HW feature */ .... return &my_viommu->core; }
static struct iommu_domain_ops my_driver_domain_ops = { .... .viommu_alloc = my_driver_viommu_alloc, };
To make the Kernel config work between a driver and the iommufd core, move the _iommufd_object_alloc helper into a new driver.c file that builds with CONFIG_IOMMUFD_DRIVER.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/Makefile | 2 +- drivers/iommu/iommufd/iommufd_private.h | 4 -- include/linux/iommu.h | 14 +++++++ include/linux/iommufd.h | 53 +++++++++++++++++++++++++ drivers/iommu/iommufd/driver.c | 38 ++++++++++++++++++ drivers/iommu/iommufd/main.c | 32 --------------- 6 files changed, 106 insertions(+), 37 deletions(-) create mode 100644 drivers/iommu/iommufd/driver.c
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile index cf4605962bea..435124a8e1f1 100644 --- a/drivers/iommu/iommufd/Makefile +++ b/drivers/iommu/iommufd/Makefile @@ -12,4 +12,4 @@ iommufd-y := \ iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
obj-$(CONFIG_IOMMUFD) += iommufd.o -obj-$(CONFIG_IOMMUFD_DRIVER) += iova_bitmap.o +obj-$(CONFIG_IOMMUFD_DRIVER) += iova_bitmap.o driver.o diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 1bb8c0aaecd1..5bd41257f2ef 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -202,10 +202,6 @@ iommufd_object_put_and_try_destroy(struct iommufd_ctx *ictx, iommufd_object_remove(ictx, obj, obj->id, 0); }
-struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, - size_t size, - enum iommufd_object_type type); - #define __iommufd_object_alloc(ictx, ptr, type, obj) \ container_of(_iommufd_object_alloc( \ ictx, \ diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 4ad9b9ec6c9b..14f24b5cd16f 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -42,6 +42,8 @@ struct notifier_block; struct iommu_sva; struct iommu_dma_cookie; struct iommu_fault_param; +struct iommufd_ctx; +struct iommufd_viommu;
#define IOMMU_FAULT_PERM_READ (1 << 0) /* read */ #define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */ @@ -542,6 +544,14 @@ static inline int __iommu_copy_struct_from_user_array( * @remove_dev_pasid: Remove any translation configurations of a specific * pasid, so that any DMA transactions with this pasid * will be blocked by the hardware. + * @viommu_alloc: Allocate an iommufd_viommu on a physical IOMMU instance behind + * the @dev, as the set of virtualization resources shared/passed + * to user space IOMMU instance. And associate it with a nesting + * @parent_domain. The @viommu_type must be defined in the header + * include/uapi/linux/iommufd.h + * It is suggested to call iommufd_viommu_alloc() helper for + * a bundled allocation of the core and the driver structures, + * using the given @ictx pointer. * @pgsize_bitmap: bitmap of all possible supported page sizes * @owner: Driver module providing these ops * @identity_domain: An always available, always attachable identity @@ -591,6 +601,10 @@ struct iommu_ops { void (*remove_dev_pasid)(struct device *dev, ioasid_t pasid, struct iommu_domain *domain);
+ struct iommufd_viommu *(*viommu_alloc)( + struct device *dev, struct iommu_domain *parent_domain, + struct iommufd_ctx *ictx, unsigned int viommu_type); + const struct iommu_domain_ops *default_domain_ops; unsigned long pgsize_bitmap; struct module *owner; diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 22948dd03d67..4435f21bd833 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -17,6 +17,7 @@ struct iommu_group; struct iommufd_access; struct iommufd_ctx; struct iommufd_device; +struct iommufd_viommu_ops; struct page;
enum iommufd_object_type { @@ -28,6 +29,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_IOAS, IOMMUFD_OBJ_ACCESS, IOMMUFD_OBJ_FAULT, + IOMMUFD_OBJ_VIOMMU, #ifdef CONFIG_IOMMUFD_TEST IOMMUFD_OBJ_SELFTEST, #endif @@ -78,6 +80,26 @@ void iommufd_access_detach(struct iommufd_access *access);
void iommufd_ctx_get(struct iommufd_ctx *ictx);
+struct iommufd_viommu { + struct iommufd_object obj; + struct iommufd_ctx *ictx; + struct iommu_device *iommu_dev; + struct iommufd_hwpt_paging *hwpt; + + const struct iommufd_viommu_ops *ops; + + unsigned int type; +}; + +/** + * struct iommufd_viommu_ops - vIOMMU specific operations + * @free: Free all driver-specific parts of an iommufd_viommu. The memory of the + * vIOMMU will be free-ed by iommufd core after calling this free op. + */ +struct iommufd_viommu_ops { + void (*free)(struct iommufd_viommu *viommu); +}; + #if IS_ENABLED(CONFIG_IOMMUFD) struct iommufd_ctx *iommufd_ctx_from_file(struct file *file); struct iommufd_ctx *iommufd_ctx_from_fd(int fd); @@ -135,4 +157,35 @@ static inline int iommufd_vfio_compat_set_no_iommu(struct iommufd_ctx *ictx) return -EOPNOTSUPP; } #endif /* CONFIG_IOMMUFD */ + +#if IS_ENABLED(CONFIG_IOMMUFD_DRIVER) +struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, + size_t size, + enum iommufd_object_type type); +#else /* !CONFIG_IOMMUFD_DRIVER */ +static inline struct iommufd_object * +_iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, + enum iommufd_object_type type) +{ + return ERR_PTR(-EOPNOTSUPP); +} +#endif /* CONFIG_IOMMUFD_DRIVER */ + +/* + * Helpers for IOMMU driver to allocate driver structures that will be freed by + * the iommufd core. The free op will be called prior to freeing the memory. + */ +#define iommufd_viommu_alloc(ictx, drv_struct, member, viommu_ops) \ + ({ \ + drv_struct *ret; \ + \ + static_assert(__same_type(struct iommufd_viommu, \ + ((drv_struct *)NULL)->member)); \ + static_assert(offsetof(drv_struct, member.obj) == 0); \ + ret = (drv_struct *)_iommufd_object_alloc( \ + ictx, sizeof(drv_struct), IOMMUFD_OBJ_VIOMMU); \ + if (!IS_ERR(ret)) \ + ret->member.ops = viommu_ops; \ + ret; \ + }) #endif diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c new file mode 100644 index 000000000000..c0876d3f91c7 --- /dev/null +++ b/drivers/iommu/iommufd/driver.c @@ -0,0 +1,38 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ + +#include "iommufd_private.h" + +struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, + size_t size, + enum iommufd_object_type type) +{ + struct iommufd_object *obj; + int rc; + + obj = kzalloc(size, GFP_KERNEL_ACCOUNT); + if (!obj) + return ERR_PTR(-ENOMEM); + obj->type = type; + /* Starts out bias'd by 1 until it is removed from the xarray */ + refcount_set(&obj->shortterm_users, 1); + refcount_set(&obj->users, 1); + + /* + * Reserve an ID in the xarray but do not publish the pointer yet since + * the caller hasn't initialized it yet. Once the pointer is published + * in the xarray and visible to other threads we can't reliably destroy + * it anymore, so the caller must complete all errorable operations + * before calling iommufd_object_finalize(). + */ + rc = xa_alloc(&ictx->objects, &obj->id, XA_ZERO_ENTRY, xa_limit_31b, + GFP_KERNEL_ACCOUNT); + if (rc) + goto out_free; + return obj; +out_free: + kfree(obj); + return ERR_PTR(rc); +} +EXPORT_SYMBOL_NS_GPL(_iommufd_object_alloc, IOMMUFD); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index b5f5d27ee963..92bd075108e5 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -29,38 +29,6 @@ struct iommufd_object_ops { static const struct iommufd_object_ops iommufd_object_ops[]; static struct miscdevice vfio_misc_dev;
-struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, - size_t size, - enum iommufd_object_type type) -{ - struct iommufd_object *obj; - int rc; - - obj = kzalloc(size, GFP_KERNEL_ACCOUNT); - if (!obj) - return ERR_PTR(-ENOMEM); - obj->type = type; - /* Starts out bias'd by 1 until it is removed from the xarray */ - refcount_set(&obj->shortterm_users, 1); - refcount_set(&obj->users, 1); - - /* - * Reserve an ID in the xarray but do not publish the pointer yet since - * the caller hasn't initialized it yet. Once the pointer is published - * in the xarray and visible to other threads we can't reliably destroy - * it anymore, so the caller must complete all errorable operations - * before calling iommufd_object_finalize(). - */ - rc = xa_alloc(&ictx->objects, &obj->id, XA_ZERO_ENTRY, - xa_limit_31b, GFP_KERNEL_ACCOUNT); - if (rc) - goto out_free; - return obj; -out_free: - kfree(obj); - return ERR_PTR(rc); -} - /* * Allow concurrent access to the object. *
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:50 AM
- @viommu_alloc: Allocate an iommufd_viommu on a physical IOMMU
instance behind
the @dev, as the set of virtualization resources shared/passed
to user space IOMMU instance. And associate it with a nesting
@parent_domain. The @viommu_type must be defined in the
header
include/uapi/linux/iommufd.h
It is suggested to call iommufd_viommu_alloc() helper for
a bundled allocation of the core and the driver structures,
using the given @ictx pointer.
s/suggested/required/?
Reviewed-by: Kevin Tian kevin.tian@intel.com
On Fri, Oct 25, 2024 at 04:49:42PM -0700, Nicolin Chen wrote:
Add a new IOMMUFD_OBJ_VIOMMU with an iommufd_viommu structure to represent a slice of physical IOMMU device passed to or shared with a user space VM. This slice, now a vIOMMU object, is a group of virtualization resources of a physical IOMMU's, such as:
- Security namespace for guest owned ID, e.g. guest-controlled cache tags
- Access to a sharable nesting parent pagetable across physical IOMMUs
- Virtualization of various platforms IDs, e.g. RIDs and others
- Delivery of paravirtualized invalidation
- Direct assigned invalidation queues
- Direct assigned interrupts
- Non-affiliated event reporting
Add a new viommu_alloc op in iommu_ops, for drivers to allocate their own vIOMMU structures. And this allocation also needs a free(), so add struct iommufd_viommu_ops.
To simplify a vIOMMU allocation, provide a iommufd_viommu_alloc() helper. It's suggested that a driver should embed a core-level viommu structure in its driver-level viommu struct and call the iommufd_viommu_alloc() helper, meanwhile the driver can also implement a viommu ops: struct my_driver_viommu { struct iommufd_viommu core; /* driver-owned properties/features */ .... };
static const struct iommufd_viommu_ops my_driver_viommu_ops = { .free = my_driver_viommu_free, /* future ops for virtualization features */ .... }; static struct iommufd_viommu my_driver_viommu_alloc(...) { struct my_driver_viommu *my_viommu = iommufd_viommu_alloc(ictx, my_driver_viommu, core, my_driver_viommu_ops); /* Init my_viommu and related HW feature */ .... return &my_viommu->core; } static struct iommu_domain_ops my_driver_domain_ops = { .... .viommu_alloc = my_driver_viommu_alloc, };
To make the Kernel config work between a driver and the iommufd core, move the _iommufd_object_alloc helper into a new driver.c file that builds with CONFIG_IOMMUFD_DRIVER.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com
drivers/iommu/iommufd/Makefile | 2 +- drivers/iommu/iommufd/iommufd_private.h | 4 -- include/linux/iommu.h | 14 +++++++ include/linux/iommufd.h | 53 +++++++++++++++++++++++++ drivers/iommu/iommufd/driver.c | 38 ++++++++++++++++++ drivers/iommu/iommufd/main.c | 32 --------------- 6 files changed, 106 insertions(+), 37 deletions(-) create mode 100644 drivers/iommu/iommufd/driver.c
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
To support driver-allocated vIOMMU objects, it's suggested to call the allocator helper in IOMMU dirvers. However, there is no guarantee that drivers will all use it and allocate objects properly.
Add a helper for iommufd core to verify if an unfinalized object is at least reserved in the ictx.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 3 +++ drivers/iommu/iommufd/main.c | 20 ++++++++++++++++++++ 2 files changed, 23 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 5bd41257f2ef..d53c1ca75532 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -152,6 +152,9 @@ static inline void iommufd_put_object(struct iommufd_ctx *ictx, wake_up_interruptible_all(&ictx->destroy_wait); }
+int iommufd_verify_unfinalized_object(struct iommufd_ctx *ictx, + struct iommufd_object *to_verify); + void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj); void iommufd_object_abort_and_destroy(struct iommufd_ctx *ictx, struct iommufd_object *obj); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 92bd075108e5..e244fed1b7ab 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -89,6 +89,26 @@ struct iommufd_object *iommufd_get_object(struct iommufd_ctx *ictx, u32 id, return obj; }
+int iommufd_verify_unfinalized_object(struct iommufd_ctx *ictx, + struct iommufd_object *to_verify) +{ + XA_STATE(xas, &ictx->objects, 0); + struct iommufd_object *obj; + int rc = 0; + + if (!to_verify || !to_verify->id) + return -EINVAL; + xas.xa_index = to_verify->id; + + xa_lock(&ictx->objects); + obj = xas_load(&xas); + /* Being an unfinalized object, the loaded obj is a reserved space */ + if (obj != XA_ZERO_ENTRY) + rc = -ENOENT; + xa_unlock(&ictx->objects); + return rc; +} + static int iommufd_object_dec_wait_shortterm(struct iommufd_ctx *ictx, struct iommufd_object *to_destroy) {
On Fri, Oct 25, 2024 at 04:49:43PM -0700, Nicolin Chen wrote:
To support driver-allocated vIOMMU objects, it's suggested to call the allocator helper in IOMMU dirvers. However, there is no guarantee that drivers will all use it and allocate objects properly.
Add a helper for iommufd core to verify if an unfinalized object is at least reserved in the ictx.
I don't think we need this..
iommufd_object_finalize() already does:
old = xa_store(&ictx->objects, obj->id, obj, GFP_KERNEL); /* obj->id was returned from xa_alloc() so the xa_store() cannot fail */ WARN_ON(old);
So, we could enhance it to be more robust, like this patch is doing:
void iommufd_object_finalize(struct iommufd_ctx *ictx, struct iommufd_object *obj) { void *old;
old = xa_cmpxchg(&ictx->objects, obj->id, XA_ZERO_ENTRY, obj, GFP_KERNEL); /* obj->id was returned from xa_alloc() so the xa_store() cannot fail */ WARN_ON(old != XA_ZERO_ENTRY);
It is always a driver bug to not initialize that object, so WARN_ON is OK.
Jason
On Tue, Oct 29, 2024 at 11:49:07AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 25, 2024 at 04:49:43PM -0700, Nicolin Chen wrote:
To support driver-allocated vIOMMU objects, it's suggested to call the allocator helper in IOMMU dirvers. However, there is no guarantee that drivers will all use it and allocate objects properly.
Add a helper for iommufd core to verify if an unfinalized object is at least reserved in the ictx.
I don't think we need this..
iommufd_object_finalize() already does:
old = xa_store(&ictx->objects, obj->id, obj, GFP_KERNEL); /* obj->id was returned from xa_alloc() so the xa_store() cannot fail */ WARN_ON(old);
It feels unsafe to carry on the iommufd_viommu_alloc_ioctl() until iommufd_object_finalize() as the function would touch the returned faulty viommu pointer? E.g. what if the viommu has an even smaller size than struct iommufd_viommu?
So, we could enhance it to be more robust, like this patch is doing:
void iommufd_object_finalize(struct iommufd_ctx *ictx, struct iommufd_object *obj) { void *old;
old = xa_cmpxchg(&ictx->objects, obj->id, XA_ZERO_ENTRY, obj, GFP_KERNEL); /* obj->id was returned from xa_alloc() so the xa_store() cannot fail */ WARN_ON(old != XA_ZERO_ENTRY);
It is always a driver bug to not initialize that object, so WARN_ON is OK.
I think we'd need the same change in iommufd_object_abort() too.
Thanks Nicolin
On Tue, Oct 29, 2024 at 09:18:05AM -0700, Nicolin Chen wrote:
On Tue, Oct 29, 2024 at 11:49:07AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 25, 2024 at 04:49:43PM -0700, Nicolin Chen wrote:
To support driver-allocated vIOMMU objects, it's suggested to call the allocator helper in IOMMU dirvers. However, there is no guarantee that drivers will all use it and allocate objects properly.
Add a helper for iommufd core to verify if an unfinalized object is at least reserved in the ictx.
I don't think we need this..
iommufd_object_finalize() already does:
old = xa_store(&ictx->objects, obj->id, obj, GFP_KERNEL); /* obj->id was returned from xa_alloc() so the xa_store() cannot fail */ WARN_ON(old);
It feels unsafe to carry on the iommufd_viommu_alloc_ioctl() until iommufd_object_finalize() as the function would touch the returned faulty viommu pointer? E.g. what if the viommu has an even smaller size than struct iommufd_viommu?
This is Linux just because the output came from a driver doesn't mean we have to validate it somehow. It is reasonable to be helpful and detect driver bugs, but if the driver is buggy it is still OK to crash.
So you don't *have* to check any of this, if the driver didn't use the right function to allocate the memory then it will go bad pretty fast.
Improving the xa_store() is something that will detect more kinds of bugs everywhere, so seems more worthwhile
I think we'd need the same change in iommufd_object_abort() too.
Makes sense
Jason
On Tue, Oct 29, 2024 at 03:55:58PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 29, 2024 at 09:18:05AM -0700, Nicolin Chen wrote:
On Tue, Oct 29, 2024 at 11:49:07AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 25, 2024 at 04:49:43PM -0700, Nicolin Chen wrote:
To support driver-allocated vIOMMU objects, it's suggested to call the allocator helper in IOMMU dirvers. However, there is no guarantee that drivers will all use it and allocate objects properly.
Add a helper for iommufd core to verify if an unfinalized object is at least reserved in the ictx.
I don't think we need this..
iommufd_object_finalize() already does:
old = xa_store(&ictx->objects, obj->id, obj, GFP_KERNEL); /* obj->id was returned from xa_alloc() so the xa_store() cannot fail */ WARN_ON(old);
It feels unsafe to carry on the iommufd_viommu_alloc_ioctl() until iommufd_object_finalize() as the function would touch the returned faulty viommu pointer? E.g. what if the viommu has an even smaller size than struct iommufd_viommu?
This is Linux just because the output came from a driver doesn't mean we have to validate it somehow. It is reasonable to be helpful and detect driver bugs, but if the driver is buggy it is still OK to crash.
So you don't *have* to check any of this, if the driver didn't use the right function to allocate the memory then it will go bad pretty fast.
Improving the xa_store() is something that will detect more kinds of bugs everywhere, so seems more worthwhile
I see. Thanks!
Nicolin
On Tue, Oct 29, 2024 at 03:55:58PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 29, 2024 at 09:18:05AM -0700, Nicolin Chen wrote:
I think we'd need the same change in iommufd_object_abort() too.
Makes sense
I found xa_cmpxchg() does xas_result to its returning value, which turns XA_ZERO_ENTRY into NULL failing our intended verifications.
So, I replaced that further with xas_store: ----------------------------------------------------------------- @@ -41,20 +41,26 @@ static struct miscdevice vfio_misc_dev; void iommufd_object_finalize(struct iommufd_ctx *ictx, struct iommufd_object *obj) { + XA_STATE(xas, &ictx->objects, obj->id); void *old;
- old = xa_store(&ictx->objects, obj->id, obj, GFP_KERNEL); - /* obj->id was returned from xa_alloc() so the xa_store() cannot fail */ - WARN_ON(old); + xa_lock(&ictx->objects); + old = xas_store(&xas, obj); + xa_unlock(&ictx->objects); + /* obj->id was returned from xa_alloc() so the xas_store() cannot fail */ + WARN_ON(old != XA_ZERO_ENTRY); }
/* Undo _iommufd_object_alloc() if iommufd_object_finalize() was not called */ void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj) { + XA_STATE(xas, &ictx->objects, obj->id); void *old;
- old = xa_erase(&ictx->objects, obj->id); - WARN_ON(old); + xa_lock(&ictx->objects); + old = xas_store(&xas, NULL); + xa_unlock(&ictx->objects); + WARN_ON(old != XA_ZERO_ENTRY); kfree(obj); } -----------------------------------------------------------------
Thanks Nicolin
On Tue, Oct 29, 2024 at 09:05:14PM -0700, Nicolin Chen wrote:
On Tue, Oct 29, 2024 at 03:55:58PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 29, 2024 at 09:18:05AM -0700, Nicolin Chen wrote:
I think we'd need the same change in iommufd_object_abort() too.
Makes sense
I found xa_cmpxchg() does xas_result to its returning value, which turns XA_ZERO_ENTRY into NULL failing our intended verifications.
Oh.. that is annoying, you can't actually tell if cmpxchg failed :\ NULL means success if it was XA_ZERO_ENTRY and failure of it was not populated! Hmm, I might ask Matthew about this
Your version looks OK
Jason
Add a new ioctl for user space to do a vIOMMU allocation. It must be based on a nesting parent HWPT, so take its refcount.
IOMMU driver wanting to support vIOMMUs must define its IOMMU_VIOMMU_TYPE_ in the uAPI header and implement a viommu_alloc op in its iommu_ops.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/Makefile | 3 +- drivers/iommu/iommufd/iommufd_private.h | 3 + include/uapi/linux/iommufd.h | 40 +++++++++++ drivers/iommu/iommufd/main.c | 6 ++ drivers/iommu/iommufd/viommu.c | 89 +++++++++++++++++++++++++ 5 files changed, 140 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/iommufd/viommu.c
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile index 435124a8e1f1..7c207c5f1eb6 100644 --- a/drivers/iommu/iommufd/Makefile +++ b/drivers/iommu/iommufd/Makefile @@ -7,7 +7,8 @@ iommufd-y := \ ioas.o \ main.o \ pages.o \ - vfio_compat.o + vfio_compat.o \ + viommu.o
iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index d53c1ca75532..9adf8d616796 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -504,6 +504,9 @@ static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev, return iommu_group_replace_domain(idev->igroup->group, hwpt->domain); }
+int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); +void iommufd_viommu_destroy(struct iommufd_object *obj); + #ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); void iommufd_selftest_destroy(struct iommufd_object *obj); diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index cd4920886ad0..3d320d069654 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -51,6 +51,7 @@ enum { IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP = 0x8c, IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d, IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e, + IOMMUFD_CMD_VIOMMU_ALLOC = 0x8f, };
/** @@ -852,4 +853,43 @@ struct iommu_fault_alloc { __u32 out_fault_fd; }; #define IOMMU_FAULT_QUEUE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_FAULT_QUEUE_ALLOC) + +/** + * enum iommu_viommu_type - Virtual IOMMU Type + * @IOMMU_VIOMMU_TYPE_DEFAULT: Reserved for future use + */ +enum iommu_viommu_type { + IOMMU_VIOMMU_TYPE_DEFAULT = 0, +}; + +/** + * struct iommu_viommu_alloc - ioctl(IOMMU_VIOMMU_ALLOC) + * @size: sizeof(struct iommu_viommu_alloc) + * @flags: Must be 0 + * @type: Type of the virtual IOMMU. Must be defined in enum iommu_viommu_type + * @dev_id: The device's physical IOMMU will be used to back the virtual IOMMU + * @hwpt_id: ID of a nesting parent HWPT to associate to + * @out_viommu_id: Output virtual IOMMU ID for the allocated object + * + * Allocate a virtual IOMMU object, representing the underlying physical IOMMU's + * virtualization support that is a security-isolated slice of the real IOMMU HW + * that is unique to a specific VM. Operations global to the IOMMU are connected + * to the vIOMMU, such as: + * - Security namespace for guest owned ID, e.g. guest-controlled cache tags + * - Access to a sharable nesting parent pagetable across physical IOMMUs + * - Non-affiliated event reporting (e.g. an invalidation queue error) + * - Virtualization of various platforms IDs, e.g. RIDs and others + * - Delivery of paravirtualized invalidation + * - Direct assigned invalidation queues + * - Direct assigned interrupts + */ +struct iommu_viommu_alloc { + __u32 size; + __u32 flags; + __u32 type; + __u32 dev_id; + __u32 hwpt_id; + __u32 out_viommu_id; +}; +#define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC) #endif diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index e244fed1b7ab..ab5ee325d809 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -321,6 +321,7 @@ union ucmd_buffer { struct iommu_ioas_unmap unmap; struct iommu_option option; struct iommu_vfio_ioas vfio_ioas; + struct iommu_viommu_alloc viommu; #ifdef CONFIG_IOMMUFD_TEST struct iommu_test_cmd test; #endif @@ -372,6 +373,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { val64), IOCTL_OP(IOMMU_VFIO_IOAS, iommufd_vfio_ioas, struct iommu_vfio_ioas, __reserved), + IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, + struct iommu_viommu_alloc, out_viommu_id), #ifdef CONFIG_IOMMUFD_TEST IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), #endif @@ -507,6 +510,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { [IOMMUFD_OBJ_FAULT] = { .destroy = iommufd_fault_destroy, }, + [IOMMUFD_OBJ_VIOMMU] = { + .destroy = iommufd_viommu_destroy, + }, #ifdef CONFIG_IOMMUFD_TEST [IOMMUFD_OBJ_SELFTEST] = { .destroy = iommufd_selftest_destroy, diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c new file mode 100644 index 000000000000..eb41e15ebab1 --- /dev/null +++ b/drivers/iommu/iommufd/viommu.c @@ -0,0 +1,89 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ + +#include "iommufd_private.h" + +void iommufd_viommu_destroy(struct iommufd_object *obj) +{ + struct iommufd_viommu *viommu = + container_of(obj, struct iommufd_viommu, obj); + + if (viommu->ops && viommu->ops->free) + viommu->ops->free(viommu); + refcount_dec(&viommu->hwpt->common.obj.users); +} + +int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) +{ + struct iommu_viommu_alloc *cmd = ucmd->cmd; + struct iommufd_hwpt_paging *hwpt_paging; + struct iommufd_viommu *viommu; + struct iommufd_device *idev; + const struct iommu_ops *ops; + int rc; + + if (cmd->flags || cmd->type == IOMMU_VIOMMU_TYPE_DEFAULT) + return -EOPNOTSUPP; + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + + ops = dev_iommu_ops(idev->dev); + if (!ops->viommu_alloc) { + rc = -EOPNOTSUPP; + goto out_put_idev; + } + + hwpt_paging = iommufd_get_hwpt_paging(ucmd, cmd->hwpt_id); + if (IS_ERR(hwpt_paging)) { + rc = PTR_ERR(hwpt_paging); + goto out_put_idev; + } + + if (!hwpt_paging->nest_parent) { + rc = -EINVAL; + goto out_put_hwpt; + } + + viommu = ops->viommu_alloc(idev->dev, hwpt_paging->common.domain, + ucmd->ictx, cmd->type); + if (IS_ERR(viommu)) { + rc = PTR_ERR(viommu); + goto out_put_hwpt; + } + + rc = iommufd_verify_unfinalized_object(ucmd->ictx, &viommu->obj); + if (rc) { + kfree(viommu); + goto out_put_hwpt; + } + + viommu->type = cmd->type; + viommu->ictx = ucmd->ictx; + viommu->hwpt = hwpt_paging; + /* + * It is the most likely case that a physical IOMMU is unpluggable. A + * pluggable IOMMU instance (if exists) is responsible for refcounting + * on its own. + */ + viommu->iommu_dev = __iommu_get_iommu_dev(idev->dev); + + refcount_inc(&viommu->hwpt->common.obj.users); + + cmd->out_viommu_id = viommu->obj.id; + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + if (rc) + goto out_abort; + iommufd_object_finalize(ucmd->ictx, &viommu->obj); + goto out_put_hwpt; + +out_abort: + iommufd_object_abort_and_destroy(ucmd->ictx, &viommu->obj); +out_put_hwpt: + iommufd_put_object(ucmd->ictx, &hwpt_paging->common.obj); +out_put_idev: + iommufd_put_object(ucmd->ictx, &idev->obj); + return rc; +}
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:50 AM
Add a new ioctl for user space to do a vIOMMU allocation. It must be based on a nesting parent HWPT, so take its refcount.
IOMMU driver wanting to support vIOMMUs must define its IOMMU_VIOMMU_TYPE_ in the uAPI header and implement a viommu_alloc op in its iommu_ops.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
On Fri, Oct 25, 2024 at 04:49:44PM -0700, Nicolin Chen wrote:
+void iommufd_viommu_destroy(struct iommufd_object *obj) +{
- struct iommufd_viommu *viommu =
container_of(obj, struct iommufd_viommu, obj);
- if (viommu->ops && viommu->ops->free)
viommu->ops->free(viommu);
Ops can't be null and free can't be null, that would mean there is a memory leak.
- refcount_dec(&viommu->hwpt->common.obj.users);
Don't touch viommu after freeing it
Did you run selftests with kasn?
+int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) +{
- struct iommu_viommu_alloc *cmd = ucmd->cmd;
- struct iommufd_hwpt_paging *hwpt_paging;
- struct iommufd_viommu *viommu;
- struct iommufd_device *idev;
- const struct iommu_ops *ops;
- int rc;
- if (cmd->flags || cmd->type == IOMMU_VIOMMU_TYPE_DEFAULT)
return -EOPNOTSUPP;
- idev = iommufd_get_device(ucmd, cmd->dev_id);
- if (IS_ERR(idev))
return PTR_ERR(idev);
- ops = dev_iommu_ops(idev->dev);
- if (!ops->viommu_alloc) {
rc = -EOPNOTSUPP;
goto out_put_idev;
- }
- hwpt_paging = iommufd_get_hwpt_paging(ucmd, cmd->hwpt_id);
- if (IS_ERR(hwpt_paging)) {
rc = PTR_ERR(hwpt_paging);
goto out_put_idev;
- }
- if (!hwpt_paging->nest_parent) {
rc = -EINVAL;
goto out_put_hwpt;
- }
- viommu = ops->viommu_alloc(idev->dev, hwpt_paging->common.domain,
ucmd->ictx, cmd->type);
- if (IS_ERR(viommu)) {
rc = PTR_ERR(viommu);
goto out_put_hwpt;
- }
Check that ops and ops->free are valid here with a WARN_ON
- rc = iommufd_verify_unfinalized_object(ucmd->ictx, &viommu->obj);
- if (rc) {
kfree(viommu);
goto out_put_hwpt;
- }
- viommu->type = cmd->type;
- viommu->ictx = ucmd->ictx;
- viommu->hwpt = hwpt_paging;
- /*
* It is the most likely case that a physical IOMMU is unpluggable. A
* pluggable IOMMU instance (if exists) is responsible for refcounting
* on its own.
*/
- viommu->iommu_dev = __iommu_get_iommu_dev(idev->dev);
- refcount_inc(&viommu->hwpt->common.obj.users);
Put this line right after the one storing to viommu_>hwpt
Jason
On Tue, Oct 29, 2024 at 11:54:36AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 25, 2024 at 04:49:44PM -0700, Nicolin Chen wrote:
+void iommufd_viommu_destroy(struct iommufd_object *obj) +{
- struct iommufd_viommu *viommu =
container_of(obj, struct iommufd_viommu, obj);
- if (viommu->ops && viommu->ops->free)
viommu->ops->free(viommu);
Ops can't be null and free can't be null, that would mean there is a memory leak.
Actually, it is just named wrong, it should be called destroy like this op, it doesn't free any memory..
Jason
On Tue, Oct 29, 2024 at 12:36:24PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 29, 2024 at 11:54:36AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 25, 2024 at 04:49:44PM -0700, Nicolin Chen wrote:
+void iommufd_viommu_destroy(struct iommufd_object *obj) +{
- struct iommufd_viommu *viommu =
container_of(obj, struct iommufd_viommu, obj);
- if (viommu->ops && viommu->ops->free)
viommu->ops->free(viommu);
Ops can't be null and free can't be null, that would mean there is a memory leak.
Actually, it is just named wrong, it should be called destroy like this op, it doesn't free any memory..
Well, it frees if driver allocated something in its top structure. Yet, "destroy" does sound less confusing. Let's rename it, assuming it can still remain to be optional as we have here.
Thanks Nicolin
On Tue, Oct 29, 2024 at 08:46:40AM -0700, Nicolin Chen wrote:
On Tue, Oct 29, 2024 at 12:36:24PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 29, 2024 at 11:54:36AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 25, 2024 at 04:49:44PM -0700, Nicolin Chen wrote:
+void iommufd_viommu_destroy(struct iommufd_object *obj) +{
- struct iommufd_viommu *viommu =
container_of(obj, struct iommufd_viommu, obj);
- if (viommu->ops && viommu->ops->free)
viommu->ops->free(viommu);
Ops can't be null and free can't be null, that would mean there is a memory leak.
Actually, it is just named wrong, it should be called destroy like this op, it doesn't free any memory..
Well, it frees if driver allocated something in its top structure. Yet, "destroy" does sound less confusing. Let's rename it, assuming it can still remain to be optional as we have here.
Yes, optional is right, I was confused by the name. The core code uses destroy so I'd stick with that.
Jason
On Tue, Oct 29, 2024 at 12:59:35PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 29, 2024 at 08:46:40AM -0700, Nicolin Chen wrote:
On Tue, Oct 29, 2024 at 12:36:24PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 29, 2024 at 11:54:36AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 25, 2024 at 04:49:44PM -0700, Nicolin Chen wrote:
+void iommufd_viommu_destroy(struct iommufd_object *obj) +{
- struct iommufd_viommu *viommu =
container_of(obj, struct iommufd_viommu, obj);
- if (viommu->ops && viommu->ops->free)
viommu->ops->free(viommu);
Ops can't be null and free can't be null, that would mean there is a memory leak.
Actually, it is just named wrong, it should be called destroy like this op, it doesn't free any memory..
Well, it frees if driver allocated something in its top structure. Yet, "destroy" does sound less confusing. Let's rename it, assuming it can still remain to be optional as we have here.
Yes, optional is right, I was confused by the name. The core code uses destroy so I'd stick with that.
Ack.
Nicolin
On Tue, Oct 29, 2024 at 11:54:36AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 25, 2024 at 04:49:44PM -0700, Nicolin Chen wrote:
+void iommufd_viommu_destroy(struct iommufd_object *obj) +{
- struct iommufd_viommu *viommu =
container_of(obj, struct iommufd_viommu, obj);
- if (viommu->ops && viommu->ops->free)
viommu->ops->free(viommu);
Ops can't be null and free can't be null, that would mean there is a memory leak.
What if a driver doesn't have anything to free? You're suggesting to force the driver to have an empty free function, right? We can do that, if it is preferable:
void arm_vsmmu_free(struct iommufd_viommu *viommu) { }
- refcount_dec(&viommu->hwpt->common.obj.users);
Don't touch viommu after freeing it
Drivers should only free their own stuff without touching the core: "* @free: Free all driver-specific parts of an iommufd_viommu. The memory of the * vIOMMU will be free-ed by iommufd core after calling this free op."
Then, viommu object is freed by the core after ->destroy(), right?
Did you run selftests with kasn?
Yea, all passed with no WARN_ON.
Thanks Nicolin
Allow IOMMU driver to use a vIOMMU object that holds a nesting parent hwpt/domain to allocate a nested domain.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- include/linux/iommufd.h | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 4435f21bd833..083ceb209704 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -14,6 +14,7 @@ struct device; struct file; struct iommu_group; +struct iommu_user_data; struct iommufd_access; struct iommufd_ctx; struct iommufd_device; @@ -95,9 +96,17 @@ struct iommufd_viommu { * struct iommufd_viommu_ops - vIOMMU specific operations * @free: Free all driver-specific parts of an iommufd_viommu. The memory of the * vIOMMU will be free-ed by iommufd core after calling this free op. + * @alloc_domain_nested: Allocate a IOMMU_DOMAIN_NESTED on a vIOMMU that holds a + * nesting parent domain (IOMMU_DOMAIN_PAGING). @user_data + * must be defined in include/uapi/linux/iommufd.h. + * It must fully initialize the new iommu_domain before + * returning. Upon failure, ERR_PTR must be returned. */ struct iommufd_viommu_ops { void (*free)(struct iommufd_viommu *viommu); + struct iommu_domain *(*alloc_domain_nested)( + struct iommufd_viommu *viommu, + const struct iommu_user_data *user_data); };
#if IS_ENABLED(CONFIG_IOMMUFD)
On Fri, Oct 25, 2024 at 04:49:45PM -0700, Nicolin Chen wrote:
Allow IOMMU driver to use a vIOMMU object that holds a nesting parent hwpt/domain to allocate a nested domain.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com
include/linux/iommufd.h | 9 +++++++++ 1 file changed, 9 insertions(+)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
Now a vIOMMU holds a shareable nesting parent HWPT. So, it can act like that nesting parent HWPT to allocate a nested HWPT.
Support that in the IOMMU_HWPT_ALLOC ioctl handler, and update its kdoc.
Also, add an iommufd_hwpt_nested_alloc_for_viommu helper to allocate a nested HWPT for a vIOMMU object. Since a vIOMMU object holds the parent hwpt's refcount already, increase the refcount of the vIOMMU only.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 1 + include/uapi/linux/iommufd.h | 14 ++--- drivers/iommu/iommufd/hw_pagetable.c | 71 ++++++++++++++++++++++++- 3 files changed, 79 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 9adf8d616796..8c9ab35eaea5 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -288,6 +288,7 @@ struct iommufd_hwpt_paging { struct iommufd_hwpt_nested { struct iommufd_hw_pagetable common; struct iommufd_hwpt_paging *parent; + struct iommufd_viommu *viommu; };
static inline bool hwpt_is_paging(struct iommufd_hw_pagetable *hwpt) diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 3d320d069654..717659b9fdce 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -430,7 +430,7 @@ enum iommu_hwpt_data_type { * @size: sizeof(struct iommu_hwpt_alloc) * @flags: Combination of enum iommufd_hwpt_alloc_flags * @dev_id: The device to allocate this HWPT for - * @pt_id: The IOAS or HWPT to connect this HWPT to + * @pt_id: The IOAS or HWPT or vIOMMU to connect this HWPT to * @out_hwpt_id: The ID of the new HWPT * @__reserved: Must be 0 * @data_type: One of enum iommu_hwpt_data_type @@ -449,11 +449,13 @@ enum iommu_hwpt_data_type { * IOMMU_HWPT_DATA_NONE. The HWPT can be allocated as a parent HWPT for a * nesting configuration by passing IOMMU_HWPT_ALLOC_NEST_PARENT via @flags. * - * A user-managed nested HWPT will be created from a given parent HWPT via - * @pt_id, in which the parent HWPT must be allocated previously via the - * same ioctl from a given IOAS (@pt_id). In this case, the @data_type - * must be set to a pre-defined type corresponding to an I/O page table - * type supported by the underlying IOMMU hardware. + * A user-managed nested HWPT will be created from a given vIOMMU (wrapping a + * parent HWPT) or a parent HWPT via @pt_id, in which the parent HWPT must be + * allocated previously via the same ioctl from a given IOAS (@pt_id). In this + * case, the @data_type must be set to a pre-defined type corresponding to an + * I/O page table type supported by the underlying IOMMU hardware. The device + * via @dev_id and the vIOMMU via @pt_id must be associated to the same IOMMU + * instance. * * If the @data_type is set to IOMMU_HWPT_DATA_NONE, @data_len and * @data_uptr should be zero. Otherwise, both @data_len and @data_uptr diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index d06bf6e6c19f..1df5d40c93df 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -57,7 +57,10 @@ void iommufd_hwpt_nested_destroy(struct iommufd_object *obj) container_of(obj, struct iommufd_hwpt_nested, common.obj);
__iommufd_hwpt_destroy(&hwpt_nested->common); - refcount_dec(&hwpt_nested->parent->common.obj.users); + if (hwpt_nested->viommu) + refcount_dec(&hwpt_nested->viommu->obj.users); + else + refcount_dec(&hwpt_nested->parent->common.obj.users); }
void iommufd_hwpt_nested_abort(struct iommufd_object *obj) @@ -260,6 +263,56 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, return ERR_PTR(rc); }
+/** + * iommufd_viommu_alloc_hwpt_nested() - Get a hwpt_nested for a vIOMMU + * @viommu: vIOMMU ojbect to associate the hwpt_nested/domain with + * @user_data: user_data pointer. Must be valid + * + * Allocate a new IOMMU_DOMAIN_NESTED for a vIOMMU and return it as a NESTED + * hw_pagetable. + */ +static struct iommufd_hwpt_nested * +iommufd_viommu_alloc_hwpt_nested(struct iommufd_viommu *viommu, u32 flags, + const struct iommu_user_data *user_data) +{ + struct iommufd_hwpt_nested *hwpt_nested; + struct iommufd_hw_pagetable *hwpt; + int rc; + + if (flags) + return ERR_PTR(-EOPNOTSUPP); + if (!viommu->ops || !viommu->ops->alloc_domain_nested) + return ERR_PTR(-EOPNOTSUPP); + + hwpt_nested = __iommufd_object_alloc( + viommu->ictx, hwpt_nested, IOMMUFD_OBJ_HWPT_NESTED, common.obj); + if (IS_ERR(hwpt_nested)) + return ERR_CAST(hwpt_nested); + hwpt = &hwpt_nested->common; + + hwpt_nested->viommu = viommu; + hwpt_nested->parent = viommu->hwpt; + refcount_inc(&viommu->obj.users); + + hwpt->domain = viommu->ops->alloc_domain_nested(viommu, user_data); + if (IS_ERR(hwpt->domain)) { + rc = PTR_ERR(hwpt->domain); + hwpt->domain = NULL; + goto out_abort; + } + hwpt->domain->owner = viommu->iommu_dev->ops; + + if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED)) { + rc = -EINVAL; + goto out_abort; + } + return hwpt_nested; + +out_abort: + iommufd_object_abort_and_destroy(viommu->ictx, &hwpt->obj); + return ERR_PTR(rc); +} + int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) { struct iommu_hwpt_alloc *cmd = ucmd->cmd; @@ -316,6 +369,22 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) goto out_unlock; } hwpt = &hwpt_nested->common; + } else if (pt_obj->type == IOMMUFD_OBJ_VIOMMU) { + struct iommufd_hwpt_nested *hwpt_nested; + struct iommufd_viommu *viommu; + + viommu = container_of(pt_obj, struct iommufd_viommu, obj); + if (viommu->iommu_dev != __iommu_get_iommu_dev(idev->dev)) { + rc = -EINVAL; + goto out_unlock; + } + hwpt_nested = iommufd_viommu_alloc_hwpt_nested( + viommu, cmd->flags, &user_data); + if (IS_ERR(hwpt_nested)) { + rc = PTR_ERR(hwpt_nested); + goto out_unlock; + } + hwpt = &hwpt_nested->common; } else { rc = -EINVAL; goto out_put_pt;
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:50 AM
Now a vIOMMU holds a shareable nesting parent HWPT. So, it can act like that nesting parent HWPT to allocate a nested HWPT.
Support that in the IOMMU_HWPT_ALLOC ioctl handler, and update its kdoc.
Also, add an iommufd_hwpt_nested_alloc_for_viommu helper to allocate a
the helper name is not updated.
nested HWPT for a vIOMMU object. Since a vIOMMU object holds the parent hwpt's refcount already, increase the refcount of the vIOMMU only.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
On Sat, 26 Oct 2024 at 07:50, Nicolin Chen nicolinc@nvidia.com wrote:
Now a vIOMMU holds a shareable nesting parent HWPT. So, it can act like that nesting parent HWPT to allocate a nested HWPT.
Support that in the IOMMU_HWPT_ALLOC ioctl handler, and update its kdoc.
Also, add an iommufd_hwpt_nested_alloc_for_viommu helper to allocate a nested HWPT for a vIOMMU object. Since a vIOMMU object holds the parent hwpt's refcount already, increase the refcount of the vIOMMU only.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
drivers/iommu/iommufd/iommufd_private.h | 1 + include/uapi/linux/iommufd.h | 14 ++--- drivers/iommu/iommufd/hw_pagetable.c | 71 ++++++++++++++++++++++++- 3 files changed, 79 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 9adf8d616796..8c9ab35eaea5 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -288,6 +288,7 @@ struct iommufd_hwpt_paging { struct iommufd_hwpt_nested { struct iommufd_hw_pagetable common; struct iommufd_hwpt_paging *parent;
struct iommufd_viommu *viommu;
};
static inline bool hwpt_is_paging(struct iommufd_hw_pagetable *hwpt) diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 3d320d069654..717659b9fdce 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -430,7 +430,7 @@ enum iommu_hwpt_data_type {
- @size: sizeof(struct iommu_hwpt_alloc)
- @flags: Combination of enum iommufd_hwpt_alloc_flags
- @dev_id: The device to allocate this HWPT for
- @pt_id: The IOAS or HWPT to connect this HWPT to
- @pt_id: The IOAS or HWPT or vIOMMU to connect this HWPT to
- @out_hwpt_id: The ID of the new HWPT
- @__reserved: Must be 0
- @data_type: One of enum iommu_hwpt_data_type
@@ -449,11 +449,13 @@ enum iommu_hwpt_data_type {
- IOMMU_HWPT_DATA_NONE. The HWPT can be allocated as a parent HWPT for a
- nesting configuration by passing IOMMU_HWPT_ALLOC_NEST_PARENT via @flags.
- A user-managed nested HWPT will be created from a given parent HWPT via
- @pt_id, in which the parent HWPT must be allocated previously via the
- same ioctl from a given IOAS (@pt_id). In this case, the @data_type
- must be set to a pre-defined type corresponding to an I/O page table
- type supported by the underlying IOMMU hardware.
- A user-managed nested HWPT will be created from a given vIOMMU (wrapping a
- parent HWPT) or a parent HWPT via @pt_id, in which the parent HWPT must be
- allocated previously via the same ioctl from a given IOAS (@pt_id). In this
- case, the @data_type must be set to a pre-defined type corresponding to an
- I/O page table type supported by the underlying IOMMU hardware. The device
- via @dev_id and the vIOMMU via @pt_id must be associated to the same IOMMU
- instance.
- If the @data_type is set to IOMMU_HWPT_DATA_NONE, @data_len and
- @data_uptr should be zero. Otherwise, both @data_len and @data_uptr
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index d06bf6e6c19f..1df5d40c93df 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -57,7 +57,10 @@ void iommufd_hwpt_nested_destroy(struct iommufd_object *obj) container_of(obj, struct iommufd_hwpt_nested, common.obj);
__iommufd_hwpt_destroy(&hwpt_nested->common);
refcount_dec(&hwpt_nested->parent->common.obj.users);
if (hwpt_nested->viommu)
refcount_dec(&hwpt_nested->viommu->obj.users);
else
refcount_dec(&hwpt_nested->parent->common.obj.users);
}
void iommufd_hwpt_nested_abort(struct iommufd_object *obj) @@ -260,6 +263,56 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, return ERR_PTR(rc); }
+/**
- iommufd_viommu_alloc_hwpt_nested() - Get a hwpt_nested for a vIOMMU
- @viommu: vIOMMU ojbect to associate the hwpt_nested/domain with
- @user_data: user_data pointer. Must be valid
- Allocate a new IOMMU_DOMAIN_NESTED for a vIOMMU and return it as a NESTED
- hw_pagetable.
- */
+static struct iommufd_hwpt_nested * +iommufd_viommu_alloc_hwpt_nested(struct iommufd_viommu *viommu, u32 flags,
const struct iommu_user_data *user_data)
+{
struct iommufd_hwpt_nested *hwpt_nested;
struct iommufd_hw_pagetable *hwpt;
int rc;
if (flags)
return ERR_PTR(-EOPNOTSUPP);
This check should be removed.
When a user page fault is required, IOMMU_HWPT_FAULT_ID_VALID is set. if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
By the way, has qemu changed compared with v3? I still got a hardware error in this version, in check
Thanks
On Mon, Oct 28, 2024 at 11:24:10AM +0800, Zhangfei Gao wrote:
+/**
- iommufd_viommu_alloc_hwpt_nested() - Get a hwpt_nested for a vIOMMU
- @viommu: vIOMMU ojbect to associate the hwpt_nested/domain with
- @user_data: user_data pointer. Must be valid
- Allocate a new IOMMU_DOMAIN_NESTED for a vIOMMU and return it as a NESTED
- hw_pagetable.
- */
+static struct iommufd_hwpt_nested * +iommufd_viommu_alloc_hwpt_nested(struct iommufd_viommu *viommu, u32 flags,
const struct iommu_user_data *user_data)
+{
struct iommufd_hwpt_nested *hwpt_nested;
struct iommufd_hw_pagetable *hwpt;
int rc;
if (flags)
return ERR_PTR(-EOPNOTSUPP);
This check should be removed.
When a user page fault is required, IOMMU_HWPT_FAULT_ID_VALID is set. if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
It can't just be removed..
I suspect that IOMMU_HWPT_FAULT_ID_VALID should not be set on the nested domain but on the parent?
Jason
On Mon, Oct 28, 2024 at 10:03:09AM -0300, Jason Gunthorpe wrote:
On Mon, Oct 28, 2024 at 11:24:10AM +0800, Zhangfei Gao wrote:
+/**
- iommufd_viommu_alloc_hwpt_nested() - Get a hwpt_nested for a vIOMMU
- @viommu: vIOMMU ojbect to associate the hwpt_nested/domain with
- @user_data: user_data pointer. Must be valid
- Allocate a new IOMMU_DOMAIN_NESTED for a vIOMMU and return it as a NESTED
- hw_pagetable.
- */
+static struct iommufd_hwpt_nested * +iommufd_viommu_alloc_hwpt_nested(struct iommufd_viommu *viommu, u32 flags,
const struct iommu_user_data *user_data)
+{
struct iommufd_hwpt_nested *hwpt_nested;
struct iommufd_hw_pagetable *hwpt;
int rc;
if (flags)
return ERR_PTR(-EOPNOTSUPP);
This check should be removed.
When a user page fault is required, IOMMU_HWPT_FAULT_ID_VALID is set. if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
It can't just be removed..
I suspect that IOMMU_HWPT_FAULT_ID_VALID should not be set on the nested domain but on the parent?
By giving another look,
In iommufd_hwpt_paging_alloc(), we reject IOMMU_HWPT_FAULT_ID_VALID: const u32 valid_flags = IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING; ... if (flags & ~valid_flags) return ERR_PTR(-EOPNOTSUPP);
In iommufd_hwpt_nested_alloc(), we mask the flag away: if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) || !user_data->len || !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP); ... hwpt->domain = ops->domain_alloc_user(idev->dev, flags & ~IOMMU_HWPT_FAULT_ID_VALID, parent->common.domain, user_data);
Then, in the common function it has a section of if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) { ...
It seems that this IOMMU_HWPT_FAULT_ID_VALID is for nested domains?
So, aligning with that, here we need: if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) || !user_data->len) return ERR_PTR(-EOPNOTSUPP);
And looks like we don't have some detailed documentation w.r.t. this flag and Fault Queue. We'd likely need to update the uAPI doc, prior to (or along with) the Part-3 vIRQ series.
Thanks Nicolin
On Mon, Oct 28, 2024 at 07:52:10AM -0700, Nicolin Chen wrote:
On Mon, Oct 28, 2024 at 10:03:09AM -0300, Jason Gunthorpe wrote:
On Mon, Oct 28, 2024 at 11:24:10AM +0800, Zhangfei Gao wrote:
+/**
- iommufd_viommu_alloc_hwpt_nested() - Get a hwpt_nested for a vIOMMU
- @viommu: vIOMMU ojbect to associate the hwpt_nested/domain with
- @user_data: user_data pointer. Must be valid
- Allocate a new IOMMU_DOMAIN_NESTED for a vIOMMU and return it as a NESTED
- hw_pagetable.
- */
+static struct iommufd_hwpt_nested * +iommufd_viommu_alloc_hwpt_nested(struct iommufd_viommu *viommu, u32 flags,
const struct iommu_user_data *user_data)
+{
struct iommufd_hwpt_nested *hwpt_nested;
struct iommufd_hw_pagetable *hwpt;
int rc;
if (flags)
return ERR_PTR(-EOPNOTSUPP);
This check should be removed.
When a user page fault is required, IOMMU_HWPT_FAULT_ID_VALID is set. if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
It can't just be removed..
I suspect that IOMMU_HWPT_FAULT_ID_VALID should not be set on the nested domain but on the parent?
By giving another look,
In iommufd_hwpt_paging_alloc(), we reject IOMMU_HWPT_FAULT_ID_VALID: const u32 valid_flags = IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING; ... if (flags & ~valid_flags) return ERR_PTR(-EOPNOTSUPP);
In iommufd_hwpt_nested_alloc(), we mask the flag away: if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) || !user_data->len || !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP); ... hwpt->domain = ops->domain_alloc_user(idev->dev, flags & ~IOMMU_HWPT_FAULT_ID_VALID, parent->common.domain, user_data);
Then, in the common function it has a section of if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) { ...
It seems that this IOMMU_HWPT_FAULT_ID_VALID is for nested domains?
So, aligning with that, here we need: if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) || !user_data->len) return ERR_PTR(-EOPNOTSUPP);
I added a TEST_F for the coverage:
+TEST_F(iommufd_viommu, viommu_alloc_nested_iopf) +{ + struct iommu_hwpt_selftest data = { + .iotlb = IOMMU_TEST_IOTLB_DEFAULT, + }; + uint32_t viommu_id = self->viommu_id; + uint32_t dev_id = self->device_id; + uint32_t iopf_hwpt_id; + uint32_t fault_id; + uint32_t fault_fd; + + if (self->device_id) { + test_ioctl_fault_alloc(&fault_id, &fault_fd); + test_err_hwpt_alloc_iopf( + ENOENT, dev_id, viommu_id, UINT32_MAX, + IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id, + IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data)); + test_err_hwpt_alloc_iopf( + EOPNOTSUPP, dev_id, viommu_id, fault_id, + IOMMU_HWPT_FAULT_ID_VALID | (1 << 31), &iopf_hwpt_id, + IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data)); + test_cmd_hwpt_alloc_iopf( + dev_id, viommu_id, fault_id, IOMMU_HWPT_FAULT_ID_VALID, + &iopf_hwpt_id, IOMMU_HWPT_DATA_SELFTEST, &data, + sizeof(data)); + + test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id); + EXPECT_ERRNO(EBUSY, + _test_ioctl_destroy(self->fd, iopf_hwpt_id)); + test_cmd_trigger_iopf(dev_id, fault_fd); + + test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id); + test_ioctl_destroy(iopf_hwpt_id); + close(fault_fd); + test_ioctl_destroy(fault_id); + } +}
Thanks Nicolin
On Mon, Oct 28, 2024 at 07:52:10AM -0700, Nicolin Chen wrote:
On Mon, Oct 28, 2024 at 10:03:09AM -0300, Jason Gunthorpe wrote:
On Mon, Oct 28, 2024 at 11:24:10AM +0800, Zhangfei Gao wrote:
+/**
- iommufd_viommu_alloc_hwpt_nested() - Get a hwpt_nested for a vIOMMU
- @viommu: vIOMMU ojbect to associate the hwpt_nested/domain with
- @user_data: user_data pointer. Must be valid
- Allocate a new IOMMU_DOMAIN_NESTED for a vIOMMU and return it as a NESTED
- hw_pagetable.
- */
+static struct iommufd_hwpt_nested * +iommufd_viommu_alloc_hwpt_nested(struct iommufd_viommu *viommu, u32 flags,
const struct iommu_user_data *user_data)
+{
struct iommufd_hwpt_nested *hwpt_nested;
struct iommufd_hw_pagetable *hwpt;
int rc;
if (flags)
return ERR_PTR(-EOPNOTSUPP);
This check should be removed.
When a user page fault is required, IOMMU_HWPT_FAULT_ID_VALID is set. if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
It can't just be removed..
I suspect that IOMMU_HWPT_FAULT_ID_VALID should not be set on the nested domain but on the parent?
By giving another look,
In iommufd_hwpt_paging_alloc(), we reject IOMMU_HWPT_FAULT_ID_VALID: const u32 valid_flags = IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING; ... if (flags & ~valid_flags) return ERR_PTR(-EOPNOTSUPP);
In iommufd_hwpt_nested_alloc(), we mask the flag away: if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) || !user_data->len || !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP); ... hwpt->domain = ops->domain_alloc_user(idev->dev, flags & ~IOMMU_HWPT_FAULT_ID_VALID, parent->common.domain, user_data);
Then, in the common function it has a section of if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) { ...
It seems that this IOMMU_HWPT_FAULT_ID_VALID is for nested domains?
OK, but ARM should be blocking it since it doesn't work there.
I think we made some error here, it should have been passed in flags to the drivers and only intel should have accepted it.
This suggests we should send flags down the viommu alloc domain path too.
Jason
On Tue, Oct 29, 2024 at 12:27:46PM -0300, Jason Gunthorpe wrote:
On Mon, Oct 28, 2024 at 07:52:10AM -0700, Nicolin Chen wrote:
On Mon, Oct 28, 2024 at 10:03:09AM -0300, Jason Gunthorpe wrote: In iommufd_hwpt_paging_alloc(), we reject IOMMU_HWPT_FAULT_ID_VALID: const u32 valid_flags = IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING; ... if (flags & ~valid_flags) return ERR_PTR(-EOPNOTSUPP);
In iommufd_hwpt_nested_alloc(), we mask the flag away: if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) || !user_data->len || !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP); ... hwpt->domain = ops->domain_alloc_user(idev->dev, flags & ~IOMMU_HWPT_FAULT_ID_VALID, parent->common.domain, user_data);
Then, in the common function it has a section of if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) { ...
It seems that this IOMMU_HWPT_FAULT_ID_VALID is for nested domains?
OK, but ARM should be blocking it since it doesn't work there.
I think we made some error here, it should have been passed in flags to the drivers and only intel should have accepted it.
Trying to limit changes here since two parts are already quite large, I think a separate series fixing that would be nicer?
This suggests we should send flags down the viommu alloc domain path too.
Ack. Will pass it in.
Thanks Nicolin
On Tue, Oct 29, 2024 at 09:07:38AM -0700, Nicolin Chen wrote:
On Tue, Oct 29, 2024 at 12:27:46PM -0300, Jason Gunthorpe wrote:
On Mon, Oct 28, 2024 at 07:52:10AM -0700, Nicolin Chen wrote:
On Mon, Oct 28, 2024 at 10:03:09AM -0300, Jason Gunthorpe wrote: In iommufd_hwpt_paging_alloc(), we reject IOMMU_HWPT_FAULT_ID_VALID: const u32 valid_flags = IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING; ... if (flags & ~valid_flags) return ERR_PTR(-EOPNOTSUPP);
In iommufd_hwpt_nested_alloc(), we mask the flag away: if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) || !user_data->len || !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP); ... hwpt->domain = ops->domain_alloc_user(idev->dev, flags & ~IOMMU_HWPT_FAULT_ID_VALID, parent->common.domain, user_data);
Then, in the common function it has a section of if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) { ...
It seems that this IOMMU_HWPT_FAULT_ID_VALID is for nested domains?
OK, but ARM should be blocking it since it doesn't work there.
I think we made some error here, it should have been passed in flags to the drivers and only intel should have accepted it.
Trying to limit changes here since two parts are already quite large, I think a separate series fixing that would be nicer?
Yes, let's just make a note
This suggests we should send flags down the viommu alloc domain path too.
Ack. Will pass it in.
But this would be nice to get to
Jason
On Mon, 28 Oct 2024 at 11:24, Zhangfei Gao zhangfei.gao@linaro.org wrote:
By the way, has qemu changed compared with v3? I still got a hardware error in this version, in check
Found iommufd_viommu_p2-v5 misses some patches, Simply tested ok with iommufd_viommu_p2-v5-with-rmr, (with some hacks)
Thanks
On Mon, Oct 28, 2024 at 10:53:38PM +0800, Zhangfei Gao wrote:
On Mon, 28 Oct 2024 at 11:24, Zhangfei Gao zhangfei.gao@linaro.org wrote:
By the way, has qemu changed compared with v3? I still got a hardware error in this version, in check
Found iommufd_viommu_p2-v5 misses some patches, Simply tested ok with iommufd_viommu_p2-v5-with-rmr, (with some hacks)
I see. I put those RMR patches on top of Part-1&2 in v5, v.s. Part-1&2 rebasing on RMR patches.
Thanks for testing!
Nicolin
Use these inline helpers to shorten those container_of lines.
Note that one of them goes back and forth between iommu_domain and mock_iommu_domain, which isn't necessary. So drop its container_of.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/selftest.c | 75 ++++++++++++++++++-------------- 1 file changed, 42 insertions(+), 33 deletions(-)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 540437be168a..322e57ff3605 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -126,12 +126,24 @@ struct mock_iommu_domain { struct xarray pfns; };
+static inline struct mock_iommu_domain * +to_mock_domain(struct iommu_domain *domain) +{ + return container_of(domain, struct mock_iommu_domain, domain); +} + struct mock_iommu_domain_nested { struct iommu_domain domain; struct mock_iommu_domain *parent; u32 iotlb[MOCK_NESTED_DOMAIN_IOTLB_NUM]; };
+static inline struct mock_iommu_domain_nested * +to_mock_nested(struct iommu_domain *domain) +{ + return container_of(domain, struct mock_iommu_domain_nested, domain); +} + enum selftest_obj_type { TYPE_IDEV, }; @@ -142,6 +154,11 @@ struct mock_dev { int id; };
+static inline struct mock_dev *to_mock_dev(struct device *dev) +{ + return container_of(dev, struct mock_dev, dev); +} + struct selftest_obj { struct iommufd_object obj; enum selftest_obj_type type; @@ -155,10 +172,15 @@ struct selftest_obj { }; };
+static inline struct selftest_obj *to_selftest_obj(struct iommufd_object *obj) +{ + return container_of(obj, struct selftest_obj, obj); +} + static int mock_domain_nop_attach(struct iommu_domain *domain, struct device *dev) { - struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); + struct mock_dev *mdev = to_mock_dev(dev);
if (domain->dirty_ops && (mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY)) return -EINVAL; @@ -193,8 +215,7 @@ static void *mock_domain_hw_info(struct device *dev, u32 *length, u32 *type) static int mock_domain_set_dirty_tracking(struct iommu_domain *domain, bool enable) { - struct mock_iommu_domain *mock = - container_of(domain, struct mock_iommu_domain, domain); + struct mock_iommu_domain *mock = to_mock_domain(domain); unsigned long flags = mock->flags;
if (enable && !domain->dirty_ops) @@ -243,8 +264,7 @@ static int mock_domain_read_and_clear_dirty(struct iommu_domain *domain, unsigned long flags, struct iommu_dirty_bitmap *dirty) { - struct mock_iommu_domain *mock = - container_of(domain, struct mock_iommu_domain, domain); + struct mock_iommu_domain *mock = to_mock_domain(domain); unsigned long end = iova + size; void *ent;
@@ -281,7 +301,7 @@ static const struct iommu_dirty_ops dirty_ops = {
static struct iommu_domain *mock_domain_alloc_paging(struct device *dev) { - struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); + struct mock_dev *mdev = to_mock_dev(dev); struct mock_iommu_domain *mock;
mock = kzalloc(sizeof(*mock), GFP_KERNEL); @@ -327,7 +347,7 @@ mock_domain_alloc_user(struct device *dev, u32 flags,
/* must be mock_domain */ if (!parent) { - struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); + struct mock_dev *mdev = to_mock_dev(dev); bool has_dirty_flag = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING; bool no_dirty_ops = mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY; struct iommu_domain *domain; @@ -341,8 +361,7 @@ mock_domain_alloc_user(struct device *dev, u32 flags, if (!domain) return ERR_PTR(-ENOMEM); if (has_dirty_flag) - container_of(domain, struct mock_iommu_domain, domain) - ->domain.dirty_ops = &dirty_ops; + domain->dirty_ops = &dirty_ops; return domain; }
@@ -352,7 +371,7 @@ mock_domain_alloc_user(struct device *dev, u32 flags, if (!parent || parent->ops != mock_ops.default_domain_ops) return ERR_PTR(-EINVAL);
- mock_parent = container_of(parent, struct mock_iommu_domain, domain); + mock_parent = to_mock_domain(parent); if (!mock_parent) return ERR_PTR(-EINVAL);
@@ -366,8 +385,7 @@ mock_domain_alloc_user(struct device *dev, u32 flags,
static void mock_domain_free(struct iommu_domain *domain) { - struct mock_iommu_domain *mock = - container_of(domain, struct mock_iommu_domain, domain); + struct mock_iommu_domain *mock = to_mock_domain(domain);
WARN_ON(!xa_empty(&mock->pfns)); kfree(mock); @@ -378,8 +396,7 @@ static int mock_domain_map_pages(struct iommu_domain *domain, size_t pgsize, size_t pgcount, int prot, gfp_t gfp, size_t *mapped) { - struct mock_iommu_domain *mock = - container_of(domain, struct mock_iommu_domain, domain); + struct mock_iommu_domain *mock = to_mock_domain(domain); unsigned long flags = MOCK_PFN_START_IOVA; unsigned long start_iova = iova;
@@ -430,8 +447,7 @@ static size_t mock_domain_unmap_pages(struct iommu_domain *domain, size_t pgcount, struct iommu_iotlb_gather *iotlb_gather) { - struct mock_iommu_domain *mock = - container_of(domain, struct mock_iommu_domain, domain); + struct mock_iommu_domain *mock = to_mock_domain(domain); bool first = true; size_t ret = 0; void *ent; @@ -479,8 +495,7 @@ static size_t mock_domain_unmap_pages(struct iommu_domain *domain, static phys_addr_t mock_domain_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova) { - struct mock_iommu_domain *mock = - container_of(domain, struct mock_iommu_domain, domain); + struct mock_iommu_domain *mock = to_mock_domain(domain); void *ent;
WARN_ON(iova % MOCK_IO_PAGE_SIZE); @@ -491,7 +506,7 @@ static phys_addr_t mock_domain_iova_to_phys(struct iommu_domain *domain,
static bool mock_domain_capable(struct device *dev, enum iommu_cap cap) { - struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); + struct mock_dev *mdev = to_mock_dev(dev);
switch (cap) { case IOMMU_CAP_CACHE_COHERENCY: @@ -571,18 +586,14 @@ static const struct iommu_ops mock_ops = {
static void mock_domain_free_nested(struct iommu_domain *domain) { - struct mock_iommu_domain_nested *mock_nested = - container_of(domain, struct mock_iommu_domain_nested, domain); - - kfree(mock_nested); + kfree(to_mock_nested(domain)); }
static int mock_domain_cache_invalidate_user(struct iommu_domain *domain, struct iommu_user_data_array *array) { - struct mock_iommu_domain_nested *mock_nested = - container_of(domain, struct mock_iommu_domain_nested, domain); + struct mock_iommu_domain_nested *mock_nested = to_mock_nested(domain); struct iommu_hwpt_invalidate_selftest inv; u32 processed = 0; int i = 0, j; @@ -657,7 +668,7 @@ get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id, iommufd_put_object(ucmd->ictx, &hwpt->obj); return ERR_PTR(-EINVAL); } - *mock = container_of(hwpt->domain, struct mock_iommu_domain, domain); + *mock = to_mock_domain(hwpt->domain); return hwpt; }
@@ -675,14 +686,13 @@ get_md_pagetable_nested(struct iommufd_ucmd *ucmd, u32 mockpt_id, iommufd_put_object(ucmd->ictx, &hwpt->obj); return ERR_PTR(-EINVAL); } - *mock_nested = container_of(hwpt->domain, - struct mock_iommu_domain_nested, domain); + *mock_nested = to_mock_nested(hwpt->domain); return hwpt; }
static void mock_dev_release(struct device *dev) { - struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); + struct mock_dev *mdev = to_mock_dev(dev);
ida_free(&mock_dev_ida, mdev->id); kfree(mdev); @@ -813,7 +823,7 @@ static int iommufd_test_mock_domain_replace(struct iommufd_ucmd *ucmd, if (IS_ERR(dev_obj)) return PTR_ERR(dev_obj);
- sobj = container_of(dev_obj, struct selftest_obj, obj); + sobj = to_selftest_obj(dev_obj); if (sobj->type != TYPE_IDEV) { rc = -EINVAL; goto out_dev_obj; @@ -951,8 +961,7 @@ static int iommufd_test_md_check_iotlb(struct iommufd_ucmd *ucmd, if (IS_ERR(hwpt)) return PTR_ERR(hwpt);
- mock_nested = container_of(hwpt->domain, - struct mock_iommu_domain_nested, domain); + mock_nested = to_mock_nested(hwpt->domain);
if (iotlb_id > MOCK_NESTED_DOMAIN_IOTLB_ID_MAX || mock_nested->iotlb[iotlb_id] != iotlb) @@ -1431,7 +1440,7 @@ static int iommufd_test_trigger_iopf(struct iommufd_ucmd *ucmd,
void iommufd_selftest_destroy(struct iommufd_object *obj) { - struct selftest_obj *sobj = container_of(obj, struct selftest_obj, obj); + struct selftest_obj *sobj = to_selftest_obj(obj);
switch (sobj->type) { case TYPE_IDEV:
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:50 AM
Use these inline helpers to shorten those container_of lines.
Note that one of them goes back and forth between iommu_domain and mock_iommu_domain, which isn't necessary. So drop its container_of.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
On Fri, Oct 25, 2024 at 04:49:47PM -0700, Nicolin Chen wrote:
Use these inline helpers to shorten those container_of lines.
Note that one of them goes back and forth between iommu_domain and mock_iommu_domain, which isn't necessary. So drop its container_of.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
drivers/iommu/iommufd/selftest.c | 75 ++++++++++++++++++-------------- 1 file changed, 42 insertions(+), 33 deletions(-)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
A nested domain now can be allocated for a parent domain for a vIOMMU object. Rework the existing allocators to prepare for the latter case.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/selftest.c | 89 ++++++++++++++++++-------------- 1 file changed, 50 insertions(+), 39 deletions(-)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 322e57ff3605..92d753985640 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -318,55 +318,39 @@ static struct iommu_domain *mock_domain_alloc_paging(struct device *dev) return &mock->domain; }
-static struct iommu_domain * -__mock_domain_alloc_nested(struct mock_iommu_domain *mock_parent, - const struct iommu_hwpt_selftest *user_cfg) +static struct mock_iommu_domain_nested * +__mock_domain_alloc_nested(const struct iommu_user_data *user_data) { struct mock_iommu_domain_nested *mock_nested; - int i; + struct iommu_hwpt_selftest user_cfg; + int rc, i; + + if (user_data->type != IOMMU_HWPT_DATA_SELFTEST) + return ERR_PTR(-EOPNOTSUPP); + + rc = iommu_copy_struct_from_user(&user_cfg, user_data, + IOMMU_HWPT_DATA_SELFTEST, iotlb); + if (rc) + return ERR_PTR(rc);
mock_nested = kzalloc(sizeof(*mock_nested), GFP_KERNEL); if (!mock_nested) return ERR_PTR(-ENOMEM); - mock_nested->parent = mock_parent; mock_nested->domain.ops = &domain_nested_ops; mock_nested->domain.type = IOMMU_DOMAIN_NESTED; for (i = 0; i < MOCK_NESTED_DOMAIN_IOTLB_NUM; i++) - mock_nested->iotlb[i] = user_cfg->iotlb; - return &mock_nested->domain; + mock_nested->iotlb[i] = user_cfg.iotlb; + return mock_nested; }
static struct iommu_domain * -mock_domain_alloc_user(struct device *dev, u32 flags, - struct iommu_domain *parent, - const struct iommu_user_data *user_data) +mock_domain_alloc_nested(struct iommu_domain *parent, u32 flags, + const struct iommu_user_data *user_data) { + struct mock_iommu_domain_nested *mock_nested; struct mock_iommu_domain *mock_parent; - struct iommu_hwpt_selftest user_cfg; - int rc; - - /* must be mock_domain */ - if (!parent) { - struct mock_dev *mdev = to_mock_dev(dev); - bool has_dirty_flag = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING; - bool no_dirty_ops = mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY; - struct iommu_domain *domain; - - if (flags & (~(IOMMU_HWPT_ALLOC_NEST_PARENT | - IOMMU_HWPT_ALLOC_DIRTY_TRACKING))) - return ERR_PTR(-EOPNOTSUPP); - if (user_data || (has_dirty_flag && no_dirty_ops)) - return ERR_PTR(-EOPNOTSUPP); - domain = mock_domain_alloc_paging(dev); - if (!domain) - return ERR_PTR(-ENOMEM); - if (has_dirty_flag) - domain->dirty_ops = &dirty_ops; - return domain; - }
- /* must be mock_domain_nested */ - if (user_data->type != IOMMU_HWPT_DATA_SELFTEST || flags) + if (flags) return ERR_PTR(-EOPNOTSUPP); if (!parent || parent->ops != mock_ops.default_domain_ops) return ERR_PTR(-EINVAL); @@ -375,12 +359,39 @@ mock_domain_alloc_user(struct device *dev, u32 flags, if (!mock_parent) return ERR_PTR(-EINVAL);
- rc = iommu_copy_struct_from_user(&user_cfg, user_data, - IOMMU_HWPT_DATA_SELFTEST, iotlb); - if (rc) - return ERR_PTR(rc); + mock_nested = __mock_domain_alloc_nested(user_data); + if (IS_ERR(mock_nested)) + return ERR_CAST(mock_nested); + mock_nested->parent = mock_parent; + return &mock_nested->domain; +} + +static struct iommu_domain * +mock_domain_alloc_user(struct device *dev, u32 flags, + struct iommu_domain *parent, + const struct iommu_user_data *user_data) +{ + bool has_dirty_flag = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING; + const u32 PAGING_FLAGS = IOMMU_HWPT_ALLOC_DIRTY_TRACKING | + IOMMU_HWPT_ALLOC_NEST_PARENT; + bool no_dirty_ops = to_mock_dev(dev)->flags & + MOCK_FLAGS_DEVICE_NO_DIRTY; + struct iommu_domain *domain; + + if (parent) + return mock_domain_alloc_nested(parent, flags, user_data);
- return __mock_domain_alloc_nested(mock_parent, &user_cfg); + if (user_data) + return ERR_PTR(-EOPNOTSUPP); + if ((flags & ~PAGING_FLAGS) || (has_dirty_flag && no_dirty_ops)) + return ERR_PTR(-EOPNOTSUPP); + + domain = mock_domain_alloc_paging(dev); + if (!domain) + return ERR_PTR(-ENOMEM); + if (has_dirty_flag) + domain->dirty_ops = &dirty_ops; + return domain; }
static void mock_domain_free(struct iommu_domain *domain)
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:50 AM
A nested domain now can be allocated for a parent domain for a vIOMMU object. Rework the existing allocators to prepare for the latter case.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
On Fri, Oct 25, 2024 at 04:49:48PM -0700, Nicolin Chen wrote:
A nested domain now can be allocated for a parent domain for a vIOMMU object. Rework the existing allocators to prepare for the latter case.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
drivers/iommu/iommufd/selftest.c | 89 ++++++++++++++++++-------------- 1 file changed, 50 insertions(+), 39 deletions(-)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
For an iommu_dev that can unplug (so far only this selftest does so), the viommu->iommu_dev pointer has no guarantee of its life cycle after it is copied from the idev->dev->iommu->iommu_dev.
Track the user count of the iommu_dev. Postpone the exit routine using a completion, if refcount is unbalanced. The refcount inc/dec will be added in the following patch.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/selftest.c | 32 ++++++++++++++++++++++++-------- 1 file changed, 24 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 92d753985640..2d33b35da704 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -533,14 +533,17 @@ static bool mock_domain_capable(struct device *dev, enum iommu_cap cap)
static struct iopf_queue *mock_iommu_iopf_queue;
-static struct iommu_device mock_iommu_device = { -}; +static struct mock_iommu_device { + struct iommu_device iommu_dev; + struct completion complete; + refcount_t users; +} mock_iommu;
static struct iommu_device *mock_probe_device(struct device *dev) { if (dev->bus != &iommufd_mock_bus_type.bus) return ERR_PTR(-ENODEV); - return &mock_iommu_device; + return &mock_iommu.iommu_dev; }
static void mock_domain_page_response(struct device *dev, struct iopf_fault *evt, @@ -1556,24 +1559,27 @@ int __init iommufd_test_init(void) if (rc) goto err_platform;
- rc = iommu_device_sysfs_add(&mock_iommu_device, + rc = iommu_device_sysfs_add(&mock_iommu.iommu_dev, &selftest_iommu_dev->dev, NULL, "%s", dev_name(&selftest_iommu_dev->dev)); if (rc) goto err_bus;
- rc = iommu_device_register_bus(&mock_iommu_device, &mock_ops, + rc = iommu_device_register_bus(&mock_iommu.iommu_dev, &mock_ops, &iommufd_mock_bus_type.bus, &iommufd_mock_bus_type.nb); if (rc) goto err_sysfs;
+ refcount_set(&mock_iommu.users, 1); + init_completion(&mock_iommu.complete); + mock_iommu_iopf_queue = iopf_queue_alloc("mock-iopfq");
return 0;
err_sysfs: - iommu_device_sysfs_remove(&mock_iommu_device); + iommu_device_sysfs_remove(&mock_iommu.iommu_dev); err_bus: bus_unregister(&iommufd_mock_bus_type.bus); err_platform: @@ -1583,6 +1589,15 @@ int __init iommufd_test_init(void) return rc; }
+static void iommufd_test_wait_for_users(void) +{ + if (refcount_dec_and_test(&mock_iommu.users)) + return; + /* Time out waiting for iommu device user count to become 0 */ + WARN_ON(!wait_for_completion_timeout(&mock_iommu.complete, + msecs_to_jiffies(10000))); +} + void iommufd_test_exit(void) { if (mock_iommu_iopf_queue) { @@ -1590,8 +1605,9 @@ void iommufd_test_exit(void) mock_iommu_iopf_queue = NULL; }
- iommu_device_sysfs_remove(&mock_iommu_device); - iommu_device_unregister_bus(&mock_iommu_device, + iommufd_test_wait_for_users(); + iommu_device_sysfs_remove(&mock_iommu.iommu_dev); + iommu_device_unregister_bus(&mock_iommu.iommu_dev, &iommufd_mock_bus_type.bus, &iommufd_mock_bus_type.nb); bus_unregister(&iommufd_mock_bus_type.bus);
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:50 AM
For an iommu_dev that can unplug (so far only this selftest does so), the viommu->iommu_dev pointer has no guarantee of its life cycle after it is copied from the idev->dev->iommu->iommu_dev.
Track the user count of the iommu_dev. Postpone the exit routine using a completion, if refcount is unbalanced. The refcount inc/dec will be added in the following patch.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
On Fri, Oct 25, 2024 at 04:49:49PM -0700, Nicolin Chen wrote:
For an iommu_dev that can unplug (so far only this selftest does so), the viommu->iommu_dev pointer has no guarantee of its life cycle after it is copied from the idev->dev->iommu->iommu_dev.
Track the user count of the iommu_dev. Postpone the exit routine using a completion, if refcount is unbalanced. The refcount inc/dec will be added in the following patch.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
drivers/iommu/iommufd/selftest.c | 32 ++++++++++++++++++++++++-------- 1 file changed, 24 insertions(+), 8 deletions(-)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Since this is built into the iommufd module it can't be unloaded without also unloading iommufd, which is impossible as long as any iommufd FDs are open. So I expect that the WARN_ON can never happen.
Jason
On Tue, Oct 29, 2024 at 12:34:38PM -0300, Jason Gunthorpe wrote:
On Fri, Oct 25, 2024 at 04:49:49PM -0700, Nicolin Chen wrote:
For an iommu_dev that can unplug (so far only this selftest does so), the viommu->iommu_dev pointer has no guarantee of its life cycle after it is copied from the idev->dev->iommu->iommu_dev.
Track the user count of the iommu_dev. Postpone the exit routine using a completion, if refcount is unbalanced. The refcount inc/dec will be added in the following patch.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
drivers/iommu/iommufd/selftest.c | 32 ++++++++++++++++++++++++-------- 1 file changed, 24 insertions(+), 8 deletions(-)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Since this is built into the iommufd module it can't be unloaded without also unloading iommufd, which is impossible as long as any iommufd FDs are open. So I expect that the WARN_ON can never happen.
Hmm, I assume we still need this patch then?
Could a faulty "--force" possibly trigger it?
Nicolin
On Tue, Oct 29, 2024 at 09:02:58AM -0700, Nicolin Chen wrote:
On Tue, Oct 29, 2024 at 12:34:38PM -0300, Jason Gunthorpe wrote:
On Fri, Oct 25, 2024 at 04:49:49PM -0700, Nicolin Chen wrote:
For an iommu_dev that can unplug (so far only this selftest does so), the viommu->iommu_dev pointer has no guarantee of its life cycle after it is copied from the idev->dev->iommu->iommu_dev.
Track the user count of the iommu_dev. Postpone the exit routine using a completion, if refcount is unbalanced. The refcount inc/dec will be added in the following patch.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
drivers/iommu/iommufd/selftest.c | 32 ++++++++++++++++++++++++-------- 1 file changed, 24 insertions(+), 8 deletions(-)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Since this is built into the iommufd module it can't be unloaded without also unloading iommufd, which is impossible as long as any iommufd FDs are open. So I expect that the WARN_ON can never happen.
Hmm, I assume we still need this patch then?
I was thinking, I think it still is a reasonable example of what it might look like
You might include the above remark as a comment above the WARN_ON though.
Could a faulty "--force" possibly trigger it?
I'm not sure, I suspect not?
Jason
Implement the viommu alloc/free functions to increase/reduce refcount of its dependent mock iommu device. User space can verify this loop via the IOMMU_VIOMMU_TYPE_SELFTEST.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_test.h | 2 + drivers/iommu/iommufd/selftest.c | 64 ++++++++++++++++++++++++++++ 2 files changed, 66 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index f4bc23a92f9a..edced4ac7cd3 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -180,4 +180,6 @@ struct iommu_hwpt_invalidate_selftest { __u32 iotlb_id; };
+#define IOMMU_VIOMMU_TYPE_SELFTEST 0xdeadbeef + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 2d33b35da704..33a0fcc0eff7 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -134,6 +134,7 @@ to_mock_domain(struct iommu_domain *domain)
struct mock_iommu_domain_nested { struct iommu_domain domain; + struct mock_viommu *mock_viommu; struct mock_iommu_domain *parent; u32 iotlb[MOCK_NESTED_DOMAIN_IOTLB_NUM]; }; @@ -144,6 +145,16 @@ to_mock_nested(struct iommu_domain *domain) return container_of(domain, struct mock_iommu_domain_nested, domain); }
+struct mock_viommu { + struct iommufd_viommu core; + struct mock_iommu_domain *s2_parent; +}; + +static inline struct mock_viommu *to_mock_viommu(struct iommufd_viommu *viommu) +{ + return container_of(viommu, struct mock_viommu, core); +} + enum selftest_obj_type { TYPE_IDEV, }; @@ -569,6 +580,58 @@ static int mock_dev_disable_feat(struct device *dev, enum iommu_dev_features fea return 0; }
+static void mock_viommu_free(struct iommufd_viommu *viommu) +{ + struct mock_iommu_device *mock_iommu = container_of( + viommu->iommu_dev, struct mock_iommu_device, iommu_dev); + + if (refcount_dec_and_test(&mock_iommu->users)) + complete(&mock_iommu->complete); + + /* iommufd core frees mock_viommu and viommu */ +} + +static struct iommu_domain * +mock_viommu_alloc_domain_nested(struct iommufd_viommu *viommu, + const struct iommu_user_data *user_data) +{ + struct mock_viommu *mock_viommu = to_mock_viommu(viommu); + struct mock_iommu_domain_nested *mock_nested; + + mock_nested = __mock_domain_alloc_nested(user_data); + if (IS_ERR(mock_nested)) + return ERR_CAST(mock_nested); + mock_nested->mock_viommu = mock_viommu; + mock_nested->parent = mock_viommu->s2_parent; + return &mock_nested->domain; +} + +static struct iommufd_viommu_ops mock_viommu_ops = { + .free = mock_viommu_free, + .alloc_domain_nested = mock_viommu_alloc_domain_nested, +}; + +static struct iommufd_viommu *mock_viommu_alloc(struct device *dev, + struct iommu_domain *domain, + struct iommufd_ctx *ictx, + unsigned int viommu_type) +{ + struct mock_iommu_device *mock_iommu = + iommu_get_iommu_dev(dev, struct mock_iommu_device, iommu_dev); + struct mock_viommu *mock_viommu; + + if (viommu_type != IOMMU_VIOMMU_TYPE_SELFTEST) + return ERR_PTR(-EOPNOTSUPP); + + mock_viommu = iommufd_viommu_alloc(ictx, struct mock_viommu, core, + &mock_viommu_ops); + if (IS_ERR(mock_viommu)) + return ERR_CAST(mock_viommu); + + refcount_inc(&mock_iommu->users); + return &mock_viommu->core; +} + static const struct iommu_ops mock_ops = { /* * IOMMU_DOMAIN_BLOCKED cannot be returned from def_domain_type() @@ -588,6 +651,7 @@ static const struct iommu_ops mock_ops = { .dev_enable_feat = mock_dev_enable_feat, .dev_disable_feat = mock_dev_disable_feat, .user_pasid_table = true, + .viommu_alloc = mock_viommu_alloc, .default_domain_ops = &(struct iommu_domain_ops){ .free = mock_domain_free,
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:50 AM
Implement the viommu alloc/free functions to increase/reduce refcount of its dependent mock iommu device. User space can verify this loop via the IOMMU_VIOMMU_TYPE_SELFTEST.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
On Fri, Oct 25, 2024 at 04:49:50PM -0700, Nicolin Chen wrote:
Implement the viommu alloc/free functions to increase/reduce refcount of its dependent mock iommu device. User space can verify this loop via the IOMMU_VIOMMU_TYPE_SELFTEST.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
drivers/iommu/iommufd/iommufd_test.h | 2 + drivers/iommu/iommufd/selftest.c | 64 ++++++++++++++++++++++++++++ 2 files changed, 66 insertions(+)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
Add a new iommufd_viommu FIXTURE and setup it up with a vIOMMU object.
Any new vIOMMU feature will be added as a TEST_F under that.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- tools/testing/selftests/iommu/iommufd_utils.h | 28 ++++++ tools/testing/selftests/iommu/iommufd.c | 87 +++++++++++++++++++ .../selftests/iommu/iommufd_fail_nth.c | 11 +++ 3 files changed, 126 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 40f6f14ce136..ca09308dad6a 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -762,3 +762,31 @@ static int _test_cmd_trigger_iopf(int fd, __u32 device_id, __u32 fault_fd)
#define test_cmd_trigger_iopf(device_id, fault_fd) \ ASSERT_EQ(0, _test_cmd_trigger_iopf(self->fd, device_id, fault_fd)) + +static int _test_cmd_viommu_alloc(int fd, __u32 device_id, __u32 hwpt_id, + __u32 type, __u32 flags, __u32 *viommu_id) +{ + struct iommu_viommu_alloc cmd = { + .size = sizeof(cmd), + .flags = flags, + .type = type, + .dev_id = device_id, + .hwpt_id = hwpt_id, + }; + int ret; + + ret = ioctl(fd, IOMMU_VIOMMU_ALLOC, &cmd); + if (ret) + return ret; + if (viommu_id) + *viommu_id = cmd.out_viommu_id; + return 0; +} + +#define test_cmd_viommu_alloc(device_id, hwpt_id, type, viommu_id) \ + ASSERT_EQ(0, _test_cmd_viommu_alloc(self->fd, device_id, hwpt_id, \ + type, 0, viommu_id)) +#define test_err_viommu_alloc(_errno, device_id, hwpt_id, type, viommu_id) \ + EXPECT_ERRNO(_errno, \ + _test_cmd_viommu_alloc(self->fd, device_id, hwpt_id, \ + type, 0, viommu_id)) diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 4927b9add5ad..b48b22d33ad4 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -128,6 +128,7 @@ TEST_F(iommufd, cmd_length) TEST_LENGTH(iommu_ioas_unmap, IOMMU_IOAS_UNMAP, length); TEST_LENGTH(iommu_option, IOMMU_OPTION, val64); TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS, __reserved); + TEST_LENGTH(iommu_viommu_alloc, IOMMU_VIOMMU_ALLOC, out_viommu_id); #undef TEST_LENGTH }
@@ -2386,4 +2387,90 @@ TEST_F(vfio_compat_mock_domain, huge_map) } }
+FIXTURE(iommufd_viommu) +{ + int fd; + uint32_t ioas_id; + uint32_t stdev_id; + uint32_t hwpt_id; + uint32_t nested_hwpt_id; + uint32_t device_id; + uint32_t viommu_id; +}; + +FIXTURE_VARIANT(iommufd_viommu) +{ + unsigned int viommu; +}; + +FIXTURE_SETUP(iommufd_viommu) +{ + self->fd = open("/dev/iommu", O_RDWR); + ASSERT_NE(-1, self->fd); + test_ioctl_ioas_alloc(&self->ioas_id); + test_ioctl_set_default_memory_limit(); + + if (variant->viommu) { + struct iommu_hwpt_selftest data = { + .iotlb = IOMMU_TEST_IOTLB_DEFAULT, + }; + + test_cmd_mock_domain(self->ioas_id, &self->stdev_id, NULL, + &self->device_id); + + /* Negative test -- invalid hwpt */ + test_err_viommu_alloc(ENOENT, self->device_id, self->hwpt_id, + IOMMU_VIOMMU_TYPE_SELFTEST, NULL); + + /* Negative test -- not a nesting parent hwpt */ + test_cmd_hwpt_alloc(self->device_id, self->ioas_id, 0, + &self->hwpt_id); + test_err_viommu_alloc(EINVAL, self->device_id, self->hwpt_id, + IOMMU_VIOMMU_TYPE_SELFTEST, NULL); + test_ioctl_destroy(self->hwpt_id); + + /* Allocate a nesting parent HWP */ + test_cmd_hwpt_alloc(self->device_id, self->ioas_id, + IOMMU_HWPT_ALLOC_NEST_PARENT, + &self->hwpt_id); + /* Negative test -- unsupported viommu type */ + test_err_viommu_alloc(EOPNOTSUPP, self->device_id, + self->hwpt_id, 0xdead, NULL); + + test_cmd_viommu_alloc(self->device_id, self->hwpt_id, + IOMMU_VIOMMU_TYPE_SELFTEST, + &self->viommu_id); + EXPECT_ERRNO(EBUSY, + _test_ioctl_destroy(self->fd, self->hwpt_id)); + test_cmd_hwpt_alloc_nested(self->device_id, self->viommu_id, 0, + &self->nested_hwpt_id, + IOMMU_HWPT_DATA_SELFTEST, &data, + sizeof(data)); + EXPECT_ERRNO(EBUSY, + _test_ioctl_destroy(self->fd, self->viommu_id)); + } else { + test_err_viommu_alloc(ENOENT, self->device_id, self->hwpt_id, + IOMMU_VIOMMU_TYPE_SELFTEST, NULL); + } +} + +FIXTURE_TEARDOWN(iommufd_viommu) +{ + teardown_iommufd(self->fd, _metadata); +} + +FIXTURE_VARIANT_ADD(iommufd_viommu, no_viommu) +{ + .viommu = 0, +}; + +FIXTURE_VARIANT_ADD(iommufd_viommu, mock_viommu) +{ + .viommu = 1, +}; + +TEST_F(iommufd_viommu, viommu_auto_destroy) +{ +} + TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index c5d5e69452b0..e9a980b7729b 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -582,6 +582,7 @@ TEST_FAIL_NTH(basic_fail_nth, device) uint32_t stdev_id; uint32_t idev_id; uint32_t hwpt_id; + uint32_t viommu_id; __u64 iova;
self->fd = open("/dev/iommu", O_RDWR); @@ -624,6 +625,16 @@ TEST_FAIL_NTH(basic_fail_nth, device)
if (_test_cmd_mock_domain_replace(self->fd, stdev_id, hwpt_id, NULL)) return -1; + + if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, + IOMMU_HWPT_ALLOC_NEST_PARENT, &hwpt_id, + IOMMU_HWPT_DATA_NONE, 0, 0)) + return -1; + + if (_test_cmd_viommu_alloc(self->fd, idev_id, hwpt_id, + IOMMU_VIOMMU_TYPE_SELFTEST, 0, &viommu_id)) + return -1; + return 0; }
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:50 AM
Add a new iommufd_viommu FIXTURE and setup it up with a vIOMMU object.
Any new vIOMMU feature will be added as a TEST_F under that.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
With the introduction of the new object and its infrastructure, update the doc to reflect that and add a new graph.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- Documentation/userspace-api/iommufd.rst | 69 ++++++++++++++++++++++++- 1 file changed, 68 insertions(+), 1 deletion(-)
diff --git a/Documentation/userspace-api/iommufd.rst b/Documentation/userspace-api/iommufd.rst index 2deba93bf159..92d16efad5b0 100644 --- a/Documentation/userspace-api/iommufd.rst +++ b/Documentation/userspace-api/iommufd.rst @@ -63,6 +63,37 @@ Following IOMMUFD objects are exposed to userspace: space usually has mappings from guest-level I/O virtual addresses to guest- level physical addresses.
+ - IOMMUFD_OBJ_VIOMMU, representing a slice of the physical IOMMU instance, + passed to or shared with a VM. It may be some HW-accelerated virtualization + features and some SW resources used by the VM. For examples: + * Security namespace for guest owned ID, e.g. guest-controlled cache tags + * Access to a sharable nesting parent pagetable across physical IOMMUs + * Virtualization of various platforms IDs, e.g. RIDs and others + * Delivery of paravirtualized invalidation + * Direct assigned invalidation queues + * Direct assigned interrupts + * Non-affiliated event reporting + Such a vIOMMU object generally has the access to a nesting parent pagetable + to support some HW-accelerated virtualization features. So, a vIOMMU object + must be created given a nesting parent HWPT_PAGING object, and then it would + encapsulate that HWPT_PAGING object. Therefore, a vIOMMU object can be used + to allocate an HWPT_NESTED object in place of the encapsulated HWPT_PAGING. + + .. note:: + + The name "vIOMMU" isn't necessarily identical to a virtualized IOMMU in a + VM. A VM can have one giant virtualized IOMMU running on a machine having + multiple physical IOMMUs, in which case the VMM will dispatch the requests + or configurations from this single virtualized IOMMU instance to multiple + vIOMMU objects created for individual slices of different physical IOMMUs. + In other words, a vIOMMU object is always a representation of one physical + IOMMU, not necessarily of a virtualized IOMMU. For VMMs that want the full + virtualization features from physical IOMMUs, it is suggested to build the + same number of virtualized IOMMUs as the number of physical IOMMUs, so the + passed-through devices would be connected to their own virtualized IOMMUs + backed by corresponding vIOMMU objects, in which case a guest OS would do + the "dispatch" naturally instead of VMM trappings. + All user-visible objects are destroyed via the IOMMU_DESTROY uAPI.
The diagrams below show relationships between user-visible objects and kernel @@ -101,6 +132,28 @@ creating the objects and links:: |------------>|iommu_domain|<----|iommu_domain|<----|device| |____________| |____________| |______|
+ _______________________________________________________________________ + | iommufd (with vIOMMU) | + | | + | [5] | + | _____________ | + | | | | + | |----------------| vIOMMU | | + | | | | | + | | | | | + | | [1] | | [4] [2] | + | | ______ | | _____________ ________ | + | | | | | [3] | | | | | | + | | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | + | | |______| |_____________| |_____________| |________| | + | | | | | | | + |______|________|______________|__________________|_______________|_____| + | | | | | + ______v_____ | ______v_____ ______v_____ ___v__ + | struct | | PFN | (paging) | | (nested) | |struct| + |iommu_device| |------>|iommu_domain|<----|iommu_domain|<----|device| + |____________| storage|____________| |____________| |______| + 1. IOMMUFD_OBJ_IOAS is created via the IOMMU_IOAS_ALLOC uAPI. An iommufd can hold multiple IOAS objects. IOAS is the most generic object and does not expose interfaces that are specific to single IOMMU drivers. All operations @@ -132,7 +185,8 @@ creating the objects and links:: flag is set.
4. IOMMUFD_OBJ_HWPT_NESTED can be only manually created via the IOMMU_HWPT_ALLOC - uAPI, provided an hwpt_id via @pt_id to associate the new HWPT_NESTED object + uAPI, provided an hwpt_id or a viommu_id of a vIOMMU object encapsulating a + nesting parent HWPT_PAGING via @pt_id to associate the new HWPT_NESTED object to the corresponding HWPT_PAGING object. The associating HWPT_PAGING object must be a nesting parent manually allocated via the same uAPI previously with an IOMMU_HWPT_ALLOC_NEST_PARENT flag, otherwise the allocation will fail. The @@ -149,6 +203,18 @@ creating the objects and links:: created via the same IOMMU_HWPT_ALLOC uAPI. The difference is at the type of the object passed in via the @pt_id field of struct iommufd_hwpt_alloc.
+5. IOMMUFD_OBJ_VIOMMU can be only manually created via the IOMMU_VIOMMU_ALLOC + uAPI, provided a dev_id (for the device's physical IOMMU to back the vIOMMU) + and an hwpt_id (to associate the vIOMMU to a nesting parent HWPT_PAGING). The + iommufd core will link the vIOMMU object to the struct iommu_device that the + struct device is behind. And an IOMMU driver can implement a viommu_alloc op + to allocate its own vIOMMU data structure embedding the core-level structure + iommufd_viommu and some driver-specific data. If necessary, the driver can + also configure its HW virtualization feature for that vIOMMU (and thus for + the VM). Successful completion of this operation sets up the linkages between + the vIOMMU object and the HWPT_PAGING, then this vIOMMU object can be used + as a nesting parent object to allocate an HWPT_NESTED object described above. + A device can only bind to an iommufd due to DMA ownership claim and attach to at most one IOAS object (no support of PASID yet).
@@ -161,6 +227,7 @@ User visible objects are backed by following datastructures: - iommufd_device for IOMMUFD_OBJ_DEVICE. - iommufd_hwpt_paging for IOMMUFD_OBJ_HWPT_PAGING. - iommufd_hwpt_nested for IOMMUFD_OBJ_HWPT_NESTED. +- iommufd_viommu for IOMMUFD_OBJ_VIOMMU.
Several terminologies when looking at these datastructures:
On Fri, Oct 25, 2024 at 04:49:52PM -0700, Nicolin Chen wrote:
With the introduction of the new object and its infrastructure, update the doc to reflect that and add a new graph.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Documentation/userspace-api/iommufd.rst | 69 ++++++++++++++++++++++++- 1 file changed, 68 insertions(+), 1 deletion(-)
diff --git a/Documentation/userspace-api/iommufd.rst b/Documentation/userspace-api/iommufd.rst index 2deba93bf159..92d16efad5b0 100644 --- a/Documentation/userspace-api/iommufd.rst +++ b/Documentation/userspace-api/iommufd.rst @@ -63,6 +63,37 @@ Following IOMMUFD objects are exposed to userspace: space usually has mappings from guest-level I/O virtual addresses to guest- level physical addresses.
- IOMMUFD_OBJ_VIOMMU, representing a slice of the physical IOMMU instance,
I just found an extra space at this entire paragraph, resulting in the VIOMMU/VDEVICE chunks right-shifted by that one space...
Will fix this in v6 and confirm with htmldocs. Also, I think I forgot to CC doc folks/maillist.. will do that too.
Thanks Nicolin
Add a new driver-type for ARM SMMUv3 to enum iommu_viommu_type. Implement an arm_vsmmu_alloc() with its viommu op arm_vsmmu_domain_alloc_nested(), to replace arm_smmu_domain_alloc_nesting(). As an initial step, copy the VMID from s2_parent. A later cleanup series is required to move the VMID allocation out of the stage-2 domain allocation routine to this.
After that, replace nested_domain->s2_parent with nested_domain->vsmmu.
Note that the validatting conditions for a nested_domain allocation are moved from arm_vsmmu_domain_alloc_nested to arm_vsmmu_alloc, since there is no point in creating a vIOMMU (vsmmu) from the beginning if it would not support a nested_domain.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 26 +++--- include/uapi/linux/iommufd.h | 2 + .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 80 ++++++++++++------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +-- 4 files changed, 71 insertions(+), 46 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 956c12637866..5a025d310dbe 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -10,6 +10,7 @@
#include <linux/bitfield.h> #include <linux/iommu.h> +#include <linux/iommufd.h> #include <linux/kernel.h> #include <linux/mmzone.h> #include <linux/sizes.h> @@ -835,7 +836,7 @@ struct arm_smmu_domain {
struct arm_smmu_nested_domain { struct iommu_domain domain; - struct arm_smmu_domain *s2_parent; + struct arm_vsmmu *vsmmu;
__le64 ste[2]; }; @@ -1005,21 +1006,22 @@ tegra241_cmdqv_probe(struct arm_smmu_device *smmu) } #endif /* CONFIG_TEGRA241_CMDQV */
+struct arm_vsmmu { + struct iommufd_viommu core; + struct arm_smmu_device *smmu; + struct arm_smmu_domain *s2_parent; + u16 vmid; +}; + #if IS_ENABLED(CONFIG_ARM_SMMU_V3_IOMMUFD) void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type); -struct iommu_domain * -arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, - struct iommu_domain *parent, - const struct iommu_user_data *user_data); +struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, + struct iommu_domain *parent, + struct iommufd_ctx *ictx, + unsigned int viommu_type); #else #define arm_smmu_hw_info NULL -static inline struct iommu_domain * -arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, - struct iommu_domain *parent, - const struct iommu_user_data *user_data) -{ - return ERR_PTR(-EOPNOTSUPP); -} +#define arm_vsmmu_alloc NULL #endif /* CONFIG_ARM_SMMU_V3_IOMMUFD */
#endif /* _ARM_SMMU_V3_H */ diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 717659b9fdce..56c742106a45 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -859,9 +859,11 @@ struct iommu_fault_alloc { /** * enum iommu_viommu_type - Virtual IOMMU Type * @IOMMU_VIOMMU_TYPE_DEFAULT: Reserved for future use + * @IOMMU_VIOMMU_TYPE_ARM_SMMUV3: ARM SMMUv3 driver specific type */ enum iommu_viommu_type { IOMMU_VIOMMU_TYPE_DEFAULT = 0, + IOMMU_VIOMMU_TYPE_ARM_SMMUV3 = 1, };
/** diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index 44e1b9bef850..abb6d2868376 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -34,7 +34,8 @@ static void arm_smmu_make_nested_cd_table_ste( struct arm_smmu_ste *target, struct arm_smmu_master *master, struct arm_smmu_nested_domain *nested_domain, bool ats_enabled) { - arm_smmu_make_s2_domain_ste(target, master, nested_domain->s2_parent, + arm_smmu_make_s2_domain_ste(target, master, + nested_domain->vsmmu->s2_parent, ats_enabled);
target->data[0] = cpu_to_le64(STRTAB_STE_0_V | @@ -75,7 +76,8 @@ static void arm_smmu_make_nested_domain_ste( break; case STRTAB_STE_0_CFG_BYPASS: arm_smmu_make_s2_domain_ste( - target, master, nested_domain->s2_parent, ats_enabled); + target, master, nested_domain->vsmmu->s2_parent, + ats_enabled); break; case STRTAB_STE_0_CFG_ABORT: default: @@ -100,7 +102,7 @@ static int arm_smmu_attach_dev_nested(struct iommu_domain *domain, struct arm_smmu_ste ste; int ret;
- if (nested_domain->s2_parent->smmu != master->smmu) + if (nested_domain->vsmmu->smmu != master->smmu) return -EINVAL; if (arm_smmu_ssids_in_use(&master->cd_table)) return -EBUSY; @@ -151,36 +153,15 @@ static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg) return 0; }
-struct iommu_domain * -arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, - struct iommu_domain *parent, +static struct iommu_domain * +arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, const struct iommu_user_data *user_data) { - struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core); struct arm_smmu_nested_domain *nested_domain; - struct arm_smmu_domain *smmu_parent; struct iommu_hwpt_arm_smmuv3 arg; int ret;
- if (flags || !(master->smmu->features & ARM_SMMU_FEAT_NESTING)) - return ERR_PTR(-EOPNOTSUPP); - - /* - * Must support some way to prevent the VM from bypassing the cache - * because VFIO currently does not do any cache maintenance. - */ - if (!arm_smmu_master_canwbs(master) && - !(master->smmu->features & ARM_SMMU_FEAT_S2FWB)) - return ERR_PTR(-EOPNOTSUPP); - - /* - * The core code checks that parent was created with - * IOMMU_HWPT_ALLOC_NEST_PARENT - */ - smmu_parent = to_smmu_domain(parent); - if (smmu_parent->smmu != master->smmu) - return ERR_PTR(-EINVAL); - ret = iommu_copy_struct_from_user(&arg, user_data, IOMMU_HWPT_DATA_ARM_SMMUV3, ste); if (ret) @@ -196,9 +177,52 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags,
nested_domain->domain.type = IOMMU_DOMAIN_NESTED; nested_domain->domain.ops = &arm_smmu_nested_ops; - nested_domain->s2_parent = smmu_parent; + nested_domain->vsmmu = vsmmu; nested_domain->ste[0] = arg.ste[0]; nested_domain->ste[1] = arg.ste[1] & ~cpu_to_le64(STRTAB_STE_1_EATS);
return &nested_domain->domain; } + + +static const struct iommufd_viommu_ops arm_vsmmu_ops = { + .alloc_domain_nested = arm_vsmmu_alloc_domain_nested, +}; + +struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, + struct iommu_domain *parent, + struct iommufd_ctx *ictx, + unsigned int viommu_type) +{ + struct arm_smmu_device *smmu = + iommu_get_iommu_dev(dev, struct arm_smmu_device, iommu); + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_domain *s2_parent = to_smmu_domain(parent); + struct arm_vsmmu *vsmmu; + + if (viommu_type != IOMMU_VIOMMU_TYPE_ARM_SMMUV3) + return ERR_PTR(-EOPNOTSUPP); + + if (!(smmu->features & ARM_SMMU_FEAT_NESTING)) + return ERR_PTR(-EOPNOTSUPP); + + /* + * Must support some way to prevent the VM from bypassing the cache + * because VFIO currently does not do any cache maintenance. + */ + if (!arm_smmu_master_canwbs(master) && + !(smmu->features & ARM_SMMU_FEAT_S2FWB)) + return ERR_PTR(-EOPNOTSUPP); + + vsmmu = iommufd_viommu_alloc(ictx, struct arm_vsmmu, core, + &arm_vsmmu_ops); + if (IS_ERR(vsmmu)) + return ERR_CAST(vsmmu); + + vsmmu->smmu = smmu; + vsmmu->s2_parent = s2_parent; + /* FIXME Move VMID allocation from the S2 domain allocation to here */ + vsmmu->vmid = s2_parent->s2_cfg.vmid; + + return &vsmmu->core; +} diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 0d79a1bd9049..8215c49d3bac 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2661,7 +2661,7 @@ to_smmu_domain_devices(struct iommu_domain *domain) domain->type == IOMMU_DOMAIN_SVA) return to_smmu_domain(domain); if (domain->type == IOMMU_DOMAIN_NESTED) - return to_smmu_nested_domain(domain)->s2_parent; + return to_smmu_nested_domain(domain)->vsmmu->s2_parent; return NULL; }
@@ -3126,13 +3126,9 @@ arm_smmu_domain_alloc_user(struct device *dev, u32 flags, struct arm_smmu_domain *smmu_domain; int ret;
- if (parent) - return arm_smmu_domain_alloc_nesting(dev, flags, parent, - user_data); - if (flags & ~PAGING_FLAGS) return ERR_PTR(-EOPNOTSUPP); - if (user_data) + if (parent || user_data) return ERR_PTR(-EOPNOTSUPP);
smmu_domain = arm_smmu_domain_alloc(); @@ -3541,6 +3537,7 @@ static struct iommu_ops arm_smmu_ops = { .dev_disable_feat = arm_smmu_dev_disable_feature, .page_response = arm_smmu_page_response, .def_domain_type = arm_smmu_def_domain_type, + .viommu_alloc = arm_vsmmu_alloc, .user_pasid_table = 1, .pgsize_bitmap = -1UL, /* Restricted during device attach */ .owner = THIS_MODULE,
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 26, 2024 7:50 AM
Add a new driver-type for ARM SMMUv3 to enum iommu_viommu_type. Implement an arm_vsmmu_alloc() with its viommu op arm_vsmmu_domain_alloc_nested(), to replace arm_smmu_domain_alloc_nesting(). As an initial step, copy the VMID from s2_parent. A later cleanup series is required to move the VMID allocation out of the stage-2 domain allocation routine to this.
After that, replace nested_domain->s2_parent with nested_domain->vsmmu.
Note that the validatting conditions for a nested_domain allocation are moved from arm_vsmmu_domain_alloc_nested to arm_vsmmu_alloc, since there is no point in creating a vIOMMU (vsmmu) from the beginning if it would not support a nested_domain.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
linux-kselftest-mirror@lists.linaro.org