This series introduces a new vIOMMU infrastructure and related ioctls.
IOMMUFD has been using the HWPT infrastructure for all cases, including a nested IO page table support. Yet, there're limitations for an HWPT-based structure to support some advanced HW-accelerated features, such as CMDQV on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-IOMMU environment, it is not straightforward for nested HWPTs to share the same parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone: a parent HWPT typically hold one stage-2 IO pagetable and tag it with only one ID in the cache entries. When sharing one large stage-2 IO pagetable across physical IOMMU instances, that one ID may not always be available across all the IOMMU instances. In other word, it's ideal for SW to have a different container for the stage-2 IO pagetable so it can hold another ID that's available.
For this "different container", add vIOMMU, an additional layer to hold extra virtualization information: _______________________________________________________________________ | iommufd (with vIOMMU) | | | | [5] | | _____________ | | | | | | [1] | vIOMMU | [4] [2] | | ________________ | | _____________ ________ | | | | | [3] | | | | | | | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | | |________________| |_____________| |_____________| |________| | | | | | | | |_________|____________________|__________________|_______________|_____| | | | | | ______v_____ ______v_____ ___v__ | PFN storage | (paging) | | (nested) | |struct| |------------>|iommu_domain|<----|iommu_domain|<----|device| |____________| |____________| |______|
The vIOMMU object should be seen as a slice of a physical IOMMU instance that is passed to or shared with a VM. That can be some HW/SW resources: - Security namespace for guest owned ID, e.g. guest-controlled cache tags - Access to a sharable nesting parent pagetable across physical IOMMUs - Virtualization of various platforms IDs, e.g. RIDs and others - Delivery of paravirtualized invalidation - Direct assigned invalidation queues - Direct assigned interrupts - Non-affiliated event reporting
On a multi-IOMMU system, the vIOMMU object must be instanced to the number of the physical IOMMUs that are passed to (via devices) a guest VM, while being able to hold the shareable parent HWPT. Each vIOMMU then just needs to allocate its own individual ID to tag its own cache: ---------------------------- ---------------- | | paging_hwpt0 | | hwpt_nested0 |--->| viommu0 ------------------ ---------------- | | IDx | ---------------------------- ---------------------------- ---------------- | | paging_hwpt0 | | hwpt_nested1 |--->| viommu1 ------------------ ---------------- | | IDy | ----------------------------
As an initial part-1, add IOMMUFD_CMD_VIOMMU_ALLOC ioctl for an allocation only. Later series will add more data structures and their ioctls.
As for the implementation of the series, add an IOMMU_VIOMMU_TYPE_DEFAULT type for a core-allocated-core-managed vIOMMU object, allowing drivers to simply hook a default viommu ops for viommu-based invalidation alone. And add support for driver-specific type of vIOMMU allocation, and implement that in the ARM SMMUv3 driver for a real world use case.
More vIOMMU-based structs and ioctls will be introduced in the follow-up series to support vDEVICE, vIRQ (vEVENT) and VQUEUE objects. Although we repurposed the vIOMMU object from an earlier RFC, just for a referece: https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
This series is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p1-v3 (paring QEMU branch for testing will be provided with the part2 series)
Changelog v3 * Rebased on top of Jason's nesting v3 series https://lore.kernel.org/all/0-v3-e2e16cd7467f+2a6a1-smmuv3_nesting_jgg@nvidi... * Split the series into smaller parts * Added Jason's Reviewed-by * Added back viommu->iommu_dev * Added support for driver-allocated vIOMMU v.s. core-allocated * Dropped arm_smmu_cache_invalidate_user * Added an iommufd_test_wait_for_users() in selftest * Reworked test code to make viommu an individual FIXTURE * Added missing TEST_LENGTH case for the new ioctl command v2 https://lore.kernel.org/all/cover.1724776335.git.nicolinc@nvidia.com/ * Limited vdev_id to one per idev * Added a rw_sem to protect the vdev_id list * Reworked driver-level APIs with proper lockings * Added a new viommu_api file for IOMMUFD_DRIVER config * Dropped useless iommu_dev point from the viommu structure * Added missing index numnbers to new types in the uAPI header * Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one * Reworked mock_viommu_cache_invalidate() using the new iommu helper * Reordered details of set/unset_vdev_id handlers for proper lockings v1 https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/
Thanks! Nicolin
Nicolin Chen (11): iommufd: Move struct iommufd_object to public iommufd header iommufd: Rename _iommufd_object_alloc to iommufd_object_alloc_elm iommufd: Introduce IOMMUFD_OBJ_VIOMMU and its related struct iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl iommu: Pass in a viommu pointer to domain_alloc_user op iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC iommufd/selftest: Add refcount to mock_iommu_device iommufd/selftest: Add IOMMU_VIOMMU_TYPE_SELFTEST iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage Documentation: userspace-api: iommufd: Update vIOMMU iommu/arm-smmu-v3: Add IOMMU_VIOMMU_TYPE_ARM_SMMUV3 support
drivers/iommu/iommufd/Makefile | 5 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 18 ++++ drivers/iommu/iommufd/iommufd_private.h | 23 ++--- drivers/iommu/iommufd/iommufd_test.h | 2 + include/linux/iommu.h | 15 +++ include/linux/iommufd.h | 52 +++++++++++ include/uapi/linux/iommufd.h | 54 +++++++++-- tools/testing/selftests/iommu/iommufd_utils.h | 28 ++++++ drivers/iommu/amd/iommu.c | 1 + .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 24 +++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 + drivers/iommu/intel/iommu.c | 1 + drivers/iommu/iommufd/hw_pagetable.c | 27 +++++- drivers/iommu/iommufd/main.c | 38 ++------ drivers/iommu/iommufd/selftest.c | 79 ++++++++++++++-- drivers/iommu/iommufd/viommu.c | 91 +++++++++++++++++++ drivers/iommu/iommufd/viommu_api.c | 57 ++++++++++++ tools/testing/selftests/iommu/iommufd.c | 84 +++++++++++++++++ Documentation/userspace-api/iommufd.rst | 66 +++++++++++++- 19 files changed, 602 insertions(+), 65 deletions(-) create mode 100644 drivers/iommu/iommufd/viommu.c create mode 100644 drivers/iommu/iommufd/viommu_api.c
Prepare for an embedded structure design for driver-level iommufd_viommu objects: // include/linux/iommufd.h struct iommufd_viommu { struct iommufd_object obj; .... };
// Some IOMMU driver struct iommu_driver_viommu { struct iommufd_viommu core; .... };
It has to expose struct iommufd_object from core-level private header to the public iommufd header.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 10 +--------- include/linux/iommufd.h | 9 +++++++++ 2 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index f1d865e6fab6..7221f81ae211 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -5,8 +5,8 @@ #define __IOMMUFD_PRIVATE_H
#include <linux/iommu.h> +#include <linux/iommufd.h> #include <linux/iova_bitmap.h> -#include <linux/refcount.h> #include <linux/rwsem.h> #include <linux/uaccess.h> #include <linux/xarray.h> @@ -137,14 +137,6 @@ enum iommufd_object_type { IOMMUFD_OBJ_MAX, };
-/* Base struct for all objects with a userspace ID handle. */ -struct iommufd_object { - refcount_t shortterm_users; - refcount_t users; - enum iommufd_object_type type; - unsigned int id; -}; - static inline bool iommufd_lock_obj(struct iommufd_object *obj) { if (!refcount_inc_not_zero(&obj->users)) diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 30f832a60ccb..6b9d46981870 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -8,6 +8,7 @@
#include <linux/err.h> #include <linux/errno.h> +#include <linux/refcount.h> #include <linux/types.h>
struct device; @@ -18,6 +19,14 @@ struct iommufd_ctx; struct iommufd_device; struct page;
+/* Base struct for all objects with a userspace ID handle. */ +struct iommufd_object { + refcount_t shortterm_users; + refcount_t users; + unsigned int type; /* enum iommufd_object_type in iommufd_private.h */ + unsigned int id; +}; + struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, struct device *dev, u32 *id); void iommufd_device_unbind(struct iommufd_device *idev);
Currently, the object allocation function calls: level-0: iommufd_object_alloc() level-1: __iommufd_object_alloc() level-2: _iommufd_object_alloc()
So the level-1 and level-2 look inverted.
The level-1 allocator is a container_of converter with a pointer sanity, backing the level-0 allocator. But the level-2 allocator does the actual object element allocations. Thus, rename the level-2 allocator, so those two inverted namings would be:
level-0: iommufd_object_alloc() level-1: __iommufd_object_alloc() level-2: iommufd_object_alloc_elm()
Suggested-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 8 ++++---- drivers/iommu/iommufd/main.c | 6 +++--- 2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 7221f81ae211..f2f3a906eac9 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -217,12 +217,12 @@ iommufd_object_put_and_try_destroy(struct iommufd_ctx *ictx, iommufd_object_remove(ictx, obj, obj->id, 0); }
-struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, - size_t size, - enum iommufd_object_type type); +struct iommufd_object *iommufd_object_alloc_elm(struct iommufd_ctx *ictx, + size_t size, + enum iommufd_object_type type);
#define __iommufd_object_alloc(ictx, ptr, type, obj) \ - container_of(_iommufd_object_alloc( \ + container_of(iommufd_object_alloc_elm( \ ictx, \ sizeof(*(ptr)) + BUILD_BUG_ON_ZERO( \ offsetof(typeof(*(ptr)), \ diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index b5f5d27ee963..28e1ef5666e9 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -29,9 +29,9 @@ struct iommufd_object_ops { static const struct iommufd_object_ops iommufd_object_ops[]; static struct miscdevice vfio_misc_dev;
-struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, - size_t size, - enum iommufd_object_type type) +struct iommufd_object *iommufd_object_alloc_elm(struct iommufd_ctx *ictx, + size_t size, + enum iommufd_object_type type) { struct iommufd_object *obj; int rc;
On Wed, Oct 09, 2024 at 09:38:02AM -0700, Nicolin Chen wrote:
Maybe call it raw instead of elm? elm suggests it is an item in an array or likewise
Naming aside
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
On Thu, Oct 17, 2024 at 11:14:16AM -0300, Jason Gunthorpe wrote:
Or keep this as the __ and rename
#define __iommufd_object_alloc(ictx, ptr, type, obj) \
That one to _elm like this:
#define iommufd_object_alloc_elm(ictx, ptr, type, elm) \ container_of(_iommufd_object_alloc( \ ictx, \ sizeof(*(ptr)) + BUILD_BUG_ON_ZERO( \ offsetof(typeof(*(ptr)), \ obj) != 0), \ type), \ typeof(*(ptr)), elm)
#define iommufd_object_alloc(ictx, ptr, type) \ iommufd_object_alloc_elm(ictx, ptr, type, obj)
Then you can keep the pattern of _ being the allocation function of the macro
Jason
On Thu, Oct 17, 2024 at 12:37:49PM -0300, Jason Gunthorpe wrote:
You mean "_" v.s. "__"?
If I get it correctly, the change would be [From] level-0: iommufd_object_alloc() level-1: __iommufd_object_alloc() level-2: _iommufd_object_alloc() [To] level-0: iommufd_object_alloc() level-1: iommufd_object_alloc_elm() level-2: _iommufd_object_alloc()
i.e. change the level-1 only.
Thanks Nicolin
On Thu, Oct 17, 2024 at 09:12:03AM -0700, Nicolin Chen wrote:
You could also call it _iommufd_object_alloc_elm() to keep the pattern
Maymbe "member" is a better word here than elm
Jason
On Thu, Oct 17, 2024 at 01:35:07PM -0300, Jason Gunthorpe wrote:
Ack.
level-0: iommufd_object_alloc() level-1: _iommufd_object_alloc_member() level-2: _iommufd_object_alloc()
Thanks Nicolin
On Thu, Oct 17, 2024 at 09:48:23AM -0700, Nicolin Chen wrote:
Keep the pattern:
level-0: iommufd_object_alloc() level-1: iommufd_object_alloc_member() level-2: _iommufd_object_alloc_member()
Jason
On 18/10/24 02:37, Jason Gunthorpe wrote:
Bikeshedding, yay :)
After starring at it for 10min - honestly - ditch iommufd_object_alloc_elm() and just pass "obj" (or "common.obj" in that single other occasion) to iommufd_object_alloc().
__iommufd_object_alloc() - a function - will the actual alloc, iommufd_object_alloc() - a macro - will do the types + call the __ variant, simple and no naming issues.
And it would be real nice if it was "iobj" not this "obj" which is way too generic. Thanks,
On Mon, Oct 21, 2024 at 12:26:54PM +1100, Alexey Kardashevskiy wrote:
All three-level helpers have callers. So that would be a bigger patch than I expected to include in this series. Maybe I should just drop this patch, since it's not functionally necessary. If we want to clean the whole thing, can do with a separate series.
And it would be real nice if it was "iobj" not this "obj" which is way too generic. Thanks,
Again, the renaming would be across the whole folder, not only here. So, I think it could be a separate cleanup series later.
Thanks Nicolin
Add a new IOMMUFD_OBJ_VIOMMU with an iommufd_viommu structure to represent a slice of physical IOMMU device passed to or shared with a user space VM. This slice, now a vIOMMU object, is a group of virtualization resources of a physical IOMMU's, such as: - Security namespace for guest owned ID, e.g. guest-controlled cache tags - Access to a sharable nesting parent pagetable across physical IOMMUs - Virtualization of various platforms IDs, e.g. RIDs and others - Delivery of paravirtualized invalidation - Direct assigned invalidation queues - Direct assigned interrupts - Non-affiliated event reporting
Add a new viommu_alloc op in iommu_ops, for drivers to allocate their own vIOMMU structures. And this allocation also needs a free(), so add struct iommufd_viommu_ops.
To simplify a vIOMMU allocation, provide a iommufd_viommu_alloc() helper. It's suggested that a driver should embed a core-level viommu structure in its driver-level viommu struct and call the iommufd_viommu_alloc() helper, meanwhile the driver can also implement a viommu ops: struct my_driver_viommu { struct iommufd_viommu core; /* driver-owned properties/features */ .... };
static const struct iommufd_viommu_ops my_driver_viommu_ops = { .free = my_driver_viommu_free, /* future ops for virtualization features */ .... };
static struct iommufd_viommu my_driver_viommu_alloc(...) { struct my_driver_viommu *my_viommu = iommufd_viommu_alloc(ictx, my_driver_viommu, core, my_driver_viommu_ops); /* Init my_viommu and related HW feature */ .... return &my_viommu->core; }
static struct iommu_domain_ops my_driver_domain_ops = { .... .viommu_alloc = my_driver_viommu_alloc, };
To make the Kernel config work between a driver and the iommufd core, put the for-driver allocation helpers into a new viommu_api file building with CONFIG_IOMMUFD_DRIVER.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/Makefile | 2 +- drivers/iommu/iommufd/iommufd_private.h | 1 + include/linux/iommu.h | 14 ++++++ include/linux/iommufd.h | 43 +++++++++++++++++++ drivers/iommu/iommufd/main.c | 32 -------------- drivers/iommu/iommufd/viommu_api.c | 57 +++++++++++++++++++++++++ 6 files changed, 116 insertions(+), 33 deletions(-) create mode 100644 drivers/iommu/iommufd/viommu_api.c
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile index cf4605962bea..93daedd7e5c8 100644 --- a/drivers/iommu/iommufd/Makefile +++ b/drivers/iommu/iommufd/Makefile @@ -12,4 +12,4 @@ iommufd-y := \ iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
obj-$(CONFIG_IOMMUFD) += iommufd.o -obj-$(CONFIG_IOMMUFD_DRIVER) += iova_bitmap.o +obj-$(CONFIG_IOMMUFD_DRIVER) += iova_bitmap.o viommu_api.o diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index f2f3a906eac9..6a364073f699 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -131,6 +131,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_IOAS, IOMMUFD_OBJ_ACCESS, IOMMUFD_OBJ_FAULT, + IOMMUFD_OBJ_VIOMMU, #ifdef CONFIG_IOMMUFD_TEST IOMMUFD_OBJ_SELFTEST, #endif diff --git a/include/linux/iommu.h b/include/linux/iommu.h index c8d18f5f644e..3a50f57b0861 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -42,6 +42,8 @@ struct notifier_block; struct iommu_sva; struct iommu_dma_cookie; struct iommu_fault_param; +struct iommufd_ctx; +struct iommufd_viommu;
#define IOMMU_FAULT_PERM_READ (1 << 0) /* read */ #define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */ @@ -542,6 +544,13 @@ static inline int __iommu_copy_struct_from_user_array( * @remove_dev_pasid: Remove any translation configurations of a specific * pasid, so that any DMA transactions with this pasid * will be blocked by the hardware. + * @viommu_alloc: Allocate an iommufd_viommu on an @iommu_dev as the group of + * virtualization resources shared/passed to user space IOMMU + * instance. Associate it with a nesting parent @domain. The + * @viommu_type must be defined in include/uapi/linux/iommufd.h + * It is suggested to call iommufd_viommu_alloc() helper for + * a bundled allocation of the core and the driver structures, + * using the given @ictx pointer. * @pgsize_bitmap: bitmap of all possible supported page sizes * @owner: Driver module providing these ops * @identity_domain: An always available, always attachable identity @@ -591,6 +600,11 @@ struct iommu_ops { void (*remove_dev_pasid)(struct device *dev, ioasid_t pasid, struct iommu_domain *domain);
+ struct iommufd_viommu *(*viommu_alloc)(struct iommu_device *iommu_dev, + struct iommu_domain *domain, + struct iommufd_ctx *ictx, + unsigned int viommu_type); + const struct iommu_domain_ops *default_domain_ops; unsigned long pgsize_bitmap; struct module *owner; diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 6b9d46981870..069a38999cdd 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -17,6 +17,7 @@ struct iommu_group; struct iommufd_access; struct iommufd_ctx; struct iommufd_device; +struct iommufd_viommu_ops; struct page;
/* Base struct for all objects with a userspace ID handle. */ @@ -63,6 +64,26 @@ void iommufd_access_detach(struct iommufd_access *access);
void iommufd_ctx_get(struct iommufd_ctx *ictx);
+struct iommufd_viommu { + struct iommufd_object obj; + struct iommufd_ctx *ictx; + struct iommu_device *iommu_dev; + struct iommufd_hwpt_paging *hwpt; + + const struct iommufd_viommu_ops *ops; + + unsigned int type; +}; + +/** + * struct iommufd_viommu_ops - vIOMMU specific operations + * @free: Free all driver-specific parts of an iommufd_viommu. The memory of the + * vIOMMU will be free-ed by iommufd core after calling this free op. + */ +struct iommufd_viommu_ops { + void (*free)(struct iommufd_viommu *viommu); +}; + #if IS_ENABLED(CONFIG_IOMMUFD) struct iommufd_ctx *iommufd_ctx_from_file(struct file *file); struct iommufd_ctx *iommufd_ctx_from_fd(int fd); @@ -79,6 +100,9 @@ int iommufd_access_rw(struct iommufd_access *access, unsigned long iova, int iommufd_vfio_compat_ioas_get_id(struct iommufd_ctx *ictx, u32 *out_ioas_id); int iommufd_vfio_compat_ioas_create(struct iommufd_ctx *ictx); int iommufd_vfio_compat_set_no_iommu(struct iommufd_ctx *ictx); +struct iommufd_viommu * +__iommufd_viommu_alloc(struct iommufd_ctx *ictx, size_t size, + const struct iommufd_viommu_ops *ops); #else /* !CONFIG_IOMMUFD */ static inline struct iommufd_ctx *iommufd_ctx_from_file(struct file *file) { @@ -119,5 +143,24 @@ static inline int iommufd_vfio_compat_set_no_iommu(struct iommufd_ctx *ictx) { return -EOPNOTSUPP; } + +static inline struct iommufd_viommu * +__iommufd_viommu_alloc(struct iommufd_ctx *ictx, size_t size, + const struct iommufd_viommu_ops *ops) +{ + return ERR_PTR(-EOPNOTSUPP); +} #endif /* CONFIG_IOMMUFD */ + +/* + * Helpers for IOMMU driver to allocate driver structures that will be freed by + * the iommufd core. Yet, a driver is responsible for its own struct cleanup. + */ +#define iommufd_viommu_alloc(ictx, drv_struct, member, ops) \ + container_of(__iommufd_viommu_alloc(ictx, \ + sizeof(struct drv_struct) + \ + BUILD_BUG_ON_ZERO(offsetof( \ + struct drv_struct, member)), \ + ops), \ + struct drv_struct, member) #endif diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 28e1ef5666e9..92bd075108e5 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -29,38 +29,6 @@ struct iommufd_object_ops { static const struct iommufd_object_ops iommufd_object_ops[]; static struct miscdevice vfio_misc_dev;
-struct iommufd_object *iommufd_object_alloc_elm(struct iommufd_ctx *ictx, - size_t size, - enum iommufd_object_type type) -{ - struct iommufd_object *obj; - int rc; - - obj = kzalloc(size, GFP_KERNEL_ACCOUNT); - if (!obj) - return ERR_PTR(-ENOMEM); - obj->type = type; - /* Starts out bias'd by 1 until it is removed from the xarray */ - refcount_set(&obj->shortterm_users, 1); - refcount_set(&obj->users, 1); - - /* - * Reserve an ID in the xarray but do not publish the pointer yet since - * the caller hasn't initialized it yet. Once the pointer is published - * in the xarray and visible to other threads we can't reliably destroy - * it anymore, so the caller must complete all errorable operations - * before calling iommufd_object_finalize(). - */ - rc = xa_alloc(&ictx->objects, &obj->id, XA_ZERO_ENTRY, - xa_limit_31b, GFP_KERNEL_ACCOUNT); - if (rc) - goto out_free; - return obj; -out_free: - kfree(obj); - return ERR_PTR(rc); -} - /* * Allow concurrent access to the object. * diff --git a/drivers/iommu/iommufd/viommu_api.c b/drivers/iommu/iommufd/viommu_api.c new file mode 100644 index 000000000000..c1731f080d6b --- /dev/null +++ b/drivers/iommu/iommufd/viommu_api.c @@ -0,0 +1,57 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ + +#include "iommufd_private.h" + +struct iommufd_object *iommufd_object_alloc_elm(struct iommufd_ctx *ictx, + size_t size, + enum iommufd_object_type type) +{ + struct iommufd_object *obj; + int rc; + + obj = kzalloc(size, GFP_KERNEL_ACCOUNT); + if (!obj) + return ERR_PTR(-ENOMEM); + obj->type = type; + /* Starts out bias'd by 1 until it is removed from the xarray */ + refcount_set(&obj->shortterm_users, 1); + refcount_set(&obj->users, 1); + + /* + * Reserve an ID in the xarray but do not publish the pointer yet since + * the caller hasn't initialized it yet. Once the pointer is published + * in the xarray and visible to other threads we can't reliably destroy + * it anymore, so the caller must complete all errorable operations + * before calling iommufd_object_finalize(). + */ + rc = xa_alloc(&ictx->objects, &obj->id, XA_ZERO_ENTRY, + xa_limit_31b, GFP_KERNEL_ACCOUNT); + if (rc) + goto out_free; + return obj; +out_free: + kfree(obj); + return ERR_PTR(rc); +} +EXPORT_SYMBOL_NS_GPL(iommufd_object_alloc_elm, IOMMUFD); + +struct iommufd_viommu * +__iommufd_viommu_alloc(struct iommufd_ctx *ictx, size_t size, + const struct iommufd_viommu_ops *ops) +{ + struct iommufd_viommu *viommu; + struct iommufd_object *obj; + + if (WARN_ON(size < sizeof(*viommu))) + return ERR_PTR(-EINVAL); + obj = iommufd_object_alloc_elm(ictx, size, IOMMUFD_OBJ_VIOMMU); + if (IS_ERR(obj)) + return ERR_CAST(obj); + viommu = container_of(obj, struct iommufd_viommu, obj); + if (ops) + viommu->ops = ops; + return viommu; +} +EXPORT_SYMBOL_NS_GPL(__iommufd_viommu_alloc, IOMMUFD);
On Thu, 10 Oct 2024 at 00:40, Nicolin Chen nicolinc@nvidia.com wrote:
here set refcont 1
iommufd_device_bind -> iommufd_object_alloc(ictx, idev, IOMMUFD_OBJ_DEVICE): refcont -> 1 refcount_inc(&idev->obj.users); refcount -> 2 will cause iommufd_device_unbind fail.
May remove refcount_inc(&idev->obj.users) in iommufd_device_bind
Thanks
On Sat, Oct 12, 2024 at 11:23:07AM +0800, Zhangfei Gao wrote:
Hmm, why would it fail? Or is it failing on your system?
This patch doesn't change the function but only moved it..
Thanks Nicolin
On Sat, 12 Oct 2024 at 12:49, Nicolin Chen nicolinc@nvidia.com wrote:
Not sure, still in check, it may only be on my platform.
it hit iommufd_object_remove: if (WARN_ON(obj != to_destroy))
iommufd_device_bind refcount=2 iommufd_device_attach refcount=3 //still not sure which operation inc the count? iommufd_device_detach refcount=4
Thanks
Hi, Nico
On Sat, 12 Oct 2024 at 18:18, Zhangfei Gao zhangfei.gao@linaro.org wrote:
Have a question, when should iommufd_vdevice_destroy be called, before or after iommufd_device_unbind.
Now iommufd_vdevice_destroy (ref--) is after unbind, hits the if (!refcount_dec_if_one(&obj->users)) check.
iommufd_device_bind iommufd_device_attach iommufd_vdevice_alloc_ioctl
iommufd_device_detach iommufd_device_unbind // refcount check fail iommufd_vdevice_destroy ref--
Thanks
On Mon, Oct 14, 2024 at 03:58:55PM +0800, Zhangfei Gao wrote:
Before.
Now iommufd_vdevice_destroy (ref--) is after unbind, hits the if (!refcount_dec_if_one(&obj->users)) check.
Hmm, where do we have an iommufd_vdevice_destroy after unbind?
Things should be symmetric. As you suspected, vdevice should be destroyed before iommufd_device_detach.
A vdev is an object on top of a vIOMMU obj and an idev obj, so it takes a refcount from each of them. That's why idev couldn't unbind.
Thanks Nicolin
On Mon, 14 Oct 2024 at 23:46, Nicolin Chen nicolinc@nvidia.com wrote:
Looks like it is called by close fd? [ 2480.909319] iommufd_vdevice_destroy+0xdc/0x168 [ 2480.909324] iommufd_fops_release+0xa4/0x140 [ 2480.909328] __fput+0xd0/0x2e0 [ 2480.909331] ____fput+0x1c/0x30 [ 2480.909333] task_work_run+0x78/0xe0 [ 2480.909337] do_exit+0x2fc/0xa50 [ 2480.909340] do_group_exit+0x3c/0xa0 [ 2480.909344] get_signal+0x96c/0x978 [ 2480.909346] do_signal+0x94/0x3a8 [ 2480.909348] do_notify_resume+0x100/0x190
I am trying based on your for_iommufd_viommu_p2-v3 branch, do you have this issue? In checking whether close fd before unbind?
Thanks
Thanks Nicolin
On Mon, Oct 14, 2024 at 07:01:40PM -0700, Nicolin Chen wrote:
This should fix the problem:
--------------------------------------------------------------------- diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 5fd3dd420290..13100cfea29d 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -277,6 +277,11 @@ EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD); */ void iommufd_device_unbind(struct iommufd_device *idev) { + mutex_lock(&idev->igroup->lock); + /* idev->vdev object should be destroyed prior, yet just in case.. */ + if (idev->vdev) + iommufd_object_remove(idev->ictx, NULL, idev->vdev->obj.id, 0); + mutex_unlock(&idev->igroup->lock); iommufd_object_destroy_user(idev->ictx, &idev->obj); } EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD); ---------------------------------------------------------------------
Thanks Nicolin
On Wed, 16 Oct 2024 at 02:44, Nicolin Chen nicolinc@nvidia.com wrote:
Not yet [ 574.162112] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004 [ 574.261102] pc : iommufd_object_remove+0x7c/0x278 [ 574.265795] lr : iommufd_device_unbind+0x44/0x98 in check
Thanks
Thanks Nicolin
On Wed, Oct 16, 2024 at 09:56:51AM +0800, Zhangfei Gao wrote:
Hmm, it's kinda odd it crashes inside iommufd_object_remove(). Did you happen to change something there?
The added iommufd_object_remove() is equivalent to userspace calling the destroy ioctl on the vDEVICE object.
Nicolin
On Wed, 16 Oct 2024 at 14:52, Nicolin Chen nicolinc@nvidia.com wrote:
Yes, double confirmed, it can solve the issue. The guest can stop and run again
The Null pointer may be caused by the added debug.
Thanks Nico.
Nicolin
On Wed, Oct 09, 2024 at 09:38:03AM -0700, Nicolin Chen wrote:
Let's call this parent_domain ie nesting parent
'own struct cleanup via the free callback'
Let's call this file driver.c to match CONFIG_IOMMUFD_DRIVER
The macro does this already
And this too..
No need for an if...
It could just be in the macro which is a bit appealing given we want this linked into the drivers..
/* * Helpers for IOMMU driver to allocate driver structures that will be freed by * the iommufd core. The free op will be called prior to freeing the memory. */ #define iommufd_viommu_alloc(ictx, drv_struct, member, viommu_ops) \ ({ \ struct drv_struct *ret; \ \ static_assert( \ __same_type(struct iommufd_viommu, \ ((struct drv_struct *)NULL)->member)); \ static_assert(offsetof(struct drv_struct, member.obj) == 0); \ ret = (struct drv_struct *)iommufd_object_alloc_elm( \ ictx, sizeof(struct drv_struct), IOMMUFD_OBJ_VIOMMU); \ ret->member.ops = viommu_ops; \ ret; \ })
Jason
On Thu, Oct 17, 2024 at 01:33:59PM -0300, Jason Gunthorpe wrote:
Would this make its scope too large that it might feel odd to have the iova_bitmap.c in a separate file?
OK. Will try with this.
Thanks Nicolin
Add a new ioctl for user space to do a vIOMMU allocation. It must be based on a nesting parent HWPT, so take its refcount.
As an initial version, define an IOMMU_VIOMMU_TYPE_DEFAULT type. Using it, the object will be allocated by the iommufd core.
If an IOMMU driver supports a driver-managed vIOMMU object, it must define its own IOMMU_VIOMMU_TYPE_ in the uAPI header and implement a viommu_alloc op in its iommu_ops.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/Makefile | 3 +- drivers/iommu/iommufd/iommufd_private.h | 3 + include/uapi/linux/iommufd.h | 40 +++++++++++ drivers/iommu/iommufd/main.c | 6 ++ drivers/iommu/iommufd/viommu.c | 91 +++++++++++++++++++++++++ 5 files changed, 142 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/iommufd/viommu.c
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile index 93daedd7e5c8..288ef3e895e3 100644 --- a/drivers/iommu/iommufd/Makefile +++ b/drivers/iommu/iommufd/Makefile @@ -7,7 +7,8 @@ iommufd-y := \ ioas.o \ main.o \ pages.o \ - vfio_compat.o + vfio_compat.o \ + viommu.o
iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 6a364073f699..4aefac6af23f 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -521,6 +521,9 @@ static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev, return iommu_group_replace_domain(idev->igroup->group, hwpt->domain); }
+int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); +void iommufd_viommu_destroy(struct iommufd_object *obj); + #ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); void iommufd_selftest_destroy(struct iommufd_object *obj); diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index cd4920886ad0..db9008a4eeef 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -51,6 +51,7 @@ enum { IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP = 0x8c, IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d, IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e, + IOMMUFD_CMD_VIOMMU_ALLOC = 0x8f, };
/** @@ -852,4 +853,43 @@ struct iommu_fault_alloc { __u32 out_fault_fd; }; #define IOMMU_FAULT_QUEUE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_FAULT_QUEUE_ALLOC) + +/** + * enum iommu_viommu_type - Virtual IOMMU Type + * @IOMMU_VIOMMU_TYPE_DEFAULT: Core-managed virtual IOMMU type + */ +enum iommu_viommu_type { + IOMMU_VIOMMU_TYPE_DEFAULT = 0, +}; + +/** + * struct iommu_viommu_alloc - ioctl(IOMMU_VIOMMU_ALLOC) + * @size: sizeof(struct iommu_viommu_alloc) + * @flags: Must be 0 + * @type: Type of the virtual IOMMU. Must be defined in enum iommu_viommu_type + * @dev_id: The device's physical IOMMU will be used to back the virtual IOMMU + * @hwpt_id: ID of a nesting parent HWPT to associate to + * @out_viommu_id: Output virtual IOMMU ID for the allocated object + * + * Allocate a virtual IOMMU object that represents the underlying physical + * IOMMU's virtualization support. The vIOMMU object is a security-isolated + * slice of the physical IOMMU HW that is unique to a specific VM. Operations + * global to the IOMMU are connected to the vIOMMU, such as: + * - Security namespace for guest owned ID, e.g. guest-controlled cache tags + * - Access to a sharable nesting parent pagetable across physical IOMMUs + * - Virtualization of various platforms IDs, e.g. RIDs and others + * - Delivery of paravirtualized invalidation + * - Direct assigned invalidation queues + * - Direct assigned interrupts + * - Non-affiliated event reporting + */ +struct iommu_viommu_alloc { + __u32 size; + __u32 flags; + __u32 type; + __u32 dev_id; + __u32 hwpt_id; + __u32 out_viommu_id; +}; +#define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC) #endif diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 92bd075108e5..cbd0a80b2d67 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -301,6 +301,7 @@ union ucmd_buffer { struct iommu_ioas_unmap unmap; struct iommu_option option; struct iommu_vfio_ioas vfio_ioas; + struct iommu_viommu_alloc viommu; #ifdef CONFIG_IOMMUFD_TEST struct iommu_test_cmd test; #endif @@ -352,6 +353,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { val64), IOCTL_OP(IOMMU_VFIO_IOAS, iommufd_vfio_ioas, struct iommu_vfio_ioas, __reserved), + IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, + struct iommu_viommu_alloc, out_viommu_id), #ifdef CONFIG_IOMMUFD_TEST IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), #endif @@ -487,6 +490,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { [IOMMUFD_OBJ_FAULT] = { .destroy = iommufd_fault_destroy, }, + [IOMMUFD_OBJ_VIOMMU] = { + .destroy = iommufd_viommu_destroy, + }, #ifdef CONFIG_IOMMUFD_TEST [IOMMUFD_OBJ_SELFTEST] = { .destroy = iommufd_selftest_destroy, diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c new file mode 100644 index 000000000000..3a903baeee6a --- /dev/null +++ b/drivers/iommu/iommufd/viommu.c @@ -0,0 +1,91 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ + +#include "iommufd_private.h" + +void iommufd_viommu_destroy(struct iommufd_object *obj) +{ + struct iommufd_viommu *viommu = + container_of(obj, struct iommufd_viommu, obj); + + if (viommu->ops && viommu->ops->free) + viommu->ops->free(viommu); + refcount_dec(&viommu->hwpt->common.obj.users); +} + +int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) +{ + struct iommu_viommu_alloc *cmd = ucmd->cmd; + struct iommufd_hwpt_paging *hwpt_paging; + struct iommufd_viommu *viommu; + struct iommufd_device *idev; + const struct iommu_ops *ops; + int rc; + + if (cmd->flags) + return -EOPNOTSUPP; + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + ops = dev_iommu_ops(idev->dev); + + hwpt_paging = iommufd_get_hwpt_paging(ucmd, cmd->hwpt_id); + if (IS_ERR(hwpt_paging)) { + rc = PTR_ERR(hwpt_paging); + goto out_put_idev; + } + + if (!hwpt_paging->nest_parent) { + rc = -EINVAL; + goto out_put_hwpt; + } + + if (cmd->type == IOMMU_VIOMMU_TYPE_DEFAULT) { + viommu = __iommufd_viommu_alloc(ucmd->ictx, sizeof(*viommu), + NULL); + } else { + if (!ops->viommu_alloc) { + rc = -EOPNOTSUPP; + goto out_put_hwpt; + } + + viommu = ops->viommu_alloc(idev->dev->iommu->iommu_dev, + hwpt_paging->common.domain, + ucmd->ictx, cmd->type); + } + if (IS_ERR(viommu)) { + rc = PTR_ERR(viommu); + goto out_put_hwpt; + } + + viommu->type = cmd->type; + viommu->ictx = ucmd->ictx; + viommu->hwpt = hwpt_paging; + + /* + * A real physical IOMMU instance would unlikely get unplugged, so the + * life cycle of this iommu_dev is guaranteed to stay alive, mostly. A + * pluggable IOMMU instance (if exists) is responsible for refcounting + * on its own. + */ + viommu->iommu_dev = idev->dev->iommu->iommu_dev; + + refcount_inc(&viommu->hwpt->common.obj.users); + + cmd->out_viommu_id = viommu->obj.id; + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + if (rc) + goto out_abort; + iommufd_object_finalize(ucmd->ictx, &viommu->obj); + goto out_put_hwpt; + +out_abort: + iommufd_object_abort_and_destroy(ucmd->ictx, &viommu->obj); +out_put_hwpt: + iommufd_put_object(ucmd->ictx, &hwpt_paging->common.obj); +out_put_idev: + iommufd_put_object(ucmd->ictx, &idev->obj); + return rc; +}
On Wed, Oct 09, 2024 at 09:38:04AM -0700, Nicolin Chen wrote:
+int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd)
[...]
While reworking the vIRQ series, I found here we should verify this driver-allocated viommu object if it is allocated properly via the suggested API (or if it is properly init-ed as a legit ictx object).
Likely it needs a helper doing a comparison between viommu->obj and the returned obj of xa_load(&ictx->objects, viommu->obj.id). And the following vDEVICE alloc needs it too.
Nicolin
On Wed, Oct 09, 2024 at 09:38:04AM -0700, Nicolin Chen wrote:
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
On 10/10/24 03:38, Nicolin Chen wrote:
iommufd_get_hwpt_paging() is container_of() which does not test for the value and just does a simple math so the actual error value from iommufd_get_object() is ... lost?
Oh well, not really lost, as "obj" and "common.obj" seem to be forced to be at the beginning of iommufd object structs so container_of() is no-op or is not it?
"Assume IOMMUs are unpluggable (the most likely case)" would do imho, all these "unlikely" and "mostly" and "if exists" confuse.
If something needs to be guaranteed to stay alive, may be just call device_get(viommu->dev), with the comment above? Or it is some different refcounting which is missing? Thanks,
On Mon, Oct 21, 2024 at 07:11:47PM +1100, Alexey Kardashevskiy wrote:
Yes, it is just a type check, the revised code I suggested doesn't rely on this. Jonathon remarked on the same for fwctl and we came up with something different.
device_get is different, it just makes sure the struct device memory is around, it doesn't prevent iommu unplug
Jason
On Mon, Oct 21, 2024 at 07:11:47PM +1100, Alexey Kardashevskiy wrote:
Done.
----------------------------------------------------------------- @@ -63,13 +63,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) viommu->type = cmd->type; viommu->ictx = ucmd->ictx; viommu->hwpt = hwpt_paging; - - /* - * A real physical IOMMU instance would unlikely get unplugged, so the - * life cycle of this iommu_dev is guaranteed to stay alive, mostly. A - * pluggable IOMMU instance (if exists) is responsible for refcounting - * on its own. - */ + /* Assume physical IOMMUs are unpluggable (the most likely case) */ viommu->iommu_dev = __iommu_get_iommu_dev(idev->dev);
refcount_inc(&viommu->hwpt->common.obj.users); -----------------------------------------------------------------
Thanks Nicolin
With a viommu object wrapping a potentially shareable S2 domain, a nested domain should be allocated by associating to a viommu instead.
For drivers without a viommu support, keep the parent domain input, which should be just viommu->hwpt->common.domain otherwise.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- include/linux/iommu.h | 1 + drivers/iommu/amd/iommu.c | 1 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 + drivers/iommu/intel/iommu.c | 1 + drivers/iommu/iommufd/hw_pagetable.c | 5 +++-- drivers/iommu/iommufd/selftest.c | 1 + 6 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 3a50f57b0861..9105478bdbcd 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -573,6 +573,7 @@ struct iommu_ops { struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); struct iommu_domain *(*domain_alloc_user)( struct device *dev, u32 flags, struct iommu_domain *parent, + struct iommufd_viommu *viommu, const struct iommu_user_data *user_data); struct iommu_domain *(*domain_alloc_paging)(struct device *dev); struct iommu_domain *(*domain_alloc_sva)(struct device *dev, diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 8364cd6fa47d..3100ddcaf62e 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -2394,6 +2394,7 @@ static struct iommu_domain *amd_iommu_domain_alloc(unsigned int type) static struct iommu_domain * amd_iommu_domain_alloc_user(struct device *dev, u32 flags, struct iommu_domain *parent, + struct iommufd_viommu *viommu, const struct iommu_user_data *user_data)
{ diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 4e559e025149..4b836a5e9fde 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3117,6 +3117,7 @@ static struct iommu_domain arm_smmu_blocked_domain = { static struct iommu_domain * arm_smmu_domain_alloc_user(struct device *dev, u32 flags, struct iommu_domain *parent, + struct iommufd_viommu *viommu, const struct iommu_user_data *user_data) { struct arm_smmu_master *master = dev_iommu_priv_get(dev); diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 9f6b0780f2ef..a8a66a954c27 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -3522,6 +3522,7 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) static struct iommu_domain * intel_iommu_domain_alloc_user(struct device *dev, u32 flags, struct iommu_domain *parent, + struct iommufd_viommu *viommu, const struct iommu_user_data *user_data) { struct device_domain_info *info = dev_iommu_priv_get(dev); diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index d06bf6e6c19f..77a1d30031d2 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -137,7 +137,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
if (ops->domain_alloc_user) { hwpt->domain = ops->domain_alloc_user(idev->dev, flags, NULL, - user_data); + NULL, user_data); if (IS_ERR(hwpt->domain)) { rc = PTR_ERR(hwpt->domain); hwpt->domain = NULL; @@ -240,7 +240,8 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
hwpt->domain = ops->domain_alloc_user(idev->dev, flags & ~IOMMU_HWPT_FAULT_ID_VALID, - parent->common.domain, user_data); + parent->common.domain, + NULL, user_data); if (IS_ERR(hwpt->domain)) { rc = PTR_ERR(hwpt->domain); hwpt->domain = NULL; diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 540437be168a..f4be87b49447 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -319,6 +319,7 @@ __mock_domain_alloc_nested(struct mock_iommu_domain *mock_parent, static struct iommu_domain * mock_domain_alloc_user(struct device *dev, u32 flags, struct iommu_domain *parent, + struct iommufd_viommu *viommu, const struct iommu_user_data *user_data) { struct mock_iommu_domain *mock_parent;
On Wed, Oct 09, 2024 at 09:38:05AM -0700, Nicolin Chen wrote:
This re-enforces my feeling we should have made a domain_alloc_nested()..
Do these all need to check for NULL viommu now too? Something is needed at least, only valid viommus should be accepted here. Like you can't pass an ARM viommu to AMD or something silly.
Should drivers accept the default viommu?
Jason
On Thu, Oct 17, 2024 at 01:51:51PM -0300, Jason Gunthorpe wrote:
That could make these changes slightly cleaner. Maybe adding a small series prior to your initial nesting, and get it merged quickly?
Assuming these nesting series can make it to this cycle, that would be in the late rc stage anyway.
These drivers don't have iommu_ops->viommu_alloc, so a viommu (!TYPE_DEFAULT) object won't reach to here.
Perhaps we should block IOMMU_VIOMMU_TYPE_DEFAULT allocations if !iommu_ops->default_viommu_ops, in the core.
Should drivers accept the default viommu?
Yes, driver can use default_viommu_ops.
Thanks Nicolin
On Thu, Oct 17, 2024 at 10:21:31AM -0700, Nicolin Chen wrote:
Maybe we should put an op on the viommu_ops to allocate a nested domain?
It make some sense and resolves my worries about checking for driver ownership.
Jason
Now a vIOMMU holds a shareable nesting parent HWPT. So, it can act like that nesting parent HWPT to allocate a nested HWPT.
Support that in the IOMMU_HWPT_ALLOC ioctl handler, and update its kdoc.
Also, associate a vIOMMU to an allocating nested HWPT.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 1 + include/uapi/linux/iommufd.h | 12 ++++++------ drivers/iommu/iommufd/hw_pagetable.c | 24 ++++++++++++++++++++++-- 3 files changed, 29 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 4aefac6af23f..c80d880f8b6a 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -305,6 +305,7 @@ struct iommufd_hwpt_paging { struct iommufd_hwpt_nested { struct iommufd_hw_pagetable common; struct iommufd_hwpt_paging *parent; + struct iommufd_viommu *viommu; };
static inline bool hwpt_is_paging(struct iommufd_hw_pagetable *hwpt) diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index db9008a4eeef..ff8aece8212f 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -430,7 +430,7 @@ enum iommu_hwpt_data_type { * @size: sizeof(struct iommu_hwpt_alloc) * @flags: Combination of enum iommufd_hwpt_alloc_flags * @dev_id: The device to allocate this HWPT for - * @pt_id: The IOAS or HWPT to connect this HWPT to + * @pt_id: The IOAS or HWPT or vIOMMU to connect this HWPT to * @out_hwpt_id: The ID of the new HWPT * @__reserved: Must be 0 * @data_type: One of enum iommu_hwpt_data_type @@ -449,11 +449,11 @@ enum iommu_hwpt_data_type { * IOMMU_HWPT_DATA_NONE. The HWPT can be allocated as a parent HWPT for a * nesting configuration by passing IOMMU_HWPT_ALLOC_NEST_PARENT via @flags. * - * A user-managed nested HWPT will be created from a given parent HWPT via - * @pt_id, in which the parent HWPT must be allocated previously via the - * same ioctl from a given IOAS (@pt_id). In this case, the @data_type - * must be set to a pre-defined type corresponding to an I/O page table - * type supported by the underlying IOMMU hardware. + * A user-managed nested HWPT will be created from a given vIOMMU (wrapping a + * parent HWPT) or a parent HWPT via @pt_id, in which the parent HWPT must be + * allocated previously via the same ioctl from a given IOAS (@pt_id). In this + * case, the @data_type must be set to a pre-defined type corresponding to an + * I/O page table type supported by the underlying IOMMU hardware. * * If the @data_type is set to IOMMU_HWPT_DATA_NONE, @data_len and * @data_uptr should be zero. Otherwise, both @data_len and @data_uptr diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 77a1d30031d2..b88a638d07da 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -57,6 +57,9 @@ void iommufd_hwpt_nested_destroy(struct iommufd_object *obj) container_of(obj, struct iommufd_hwpt_nested, common.obj);
__iommufd_hwpt_destroy(&hwpt_nested->common); + + if (hwpt_nested->viommu) + refcount_dec(&hwpt_nested->viommu->obj.users); refcount_dec(&hwpt_nested->parent->common.obj.users); }
@@ -213,6 +216,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, */ static struct iommufd_hwpt_nested * iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, + struct iommufd_viommu *viommu, struct iommufd_hwpt_paging *parent, struct iommufd_device *idev, u32 flags, const struct iommu_user_data *user_data) @@ -235,13 +239,16 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, return ERR_CAST(hwpt_nested); hwpt = &hwpt_nested->common;
+ if (viommu) + refcount_inc(&viommu->obj.users); + hwpt_nested->viommu = viommu; refcount_inc(&parent->common.obj.users); hwpt_nested->parent = parent;
hwpt->domain = ops->domain_alloc_user(idev->dev, flags & ~IOMMU_HWPT_FAULT_ID_VALID, parent->common.domain, - NULL, user_data); + viommu, user_data); if (IS_ERR(hwpt->domain)) { rc = PTR_ERR(hwpt->domain); hwpt->domain = NULL; @@ -308,7 +315,7 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) struct iommufd_hwpt_nested *hwpt_nested;
hwpt_nested = iommufd_hwpt_nested_alloc( - ucmd->ictx, + ucmd->ictx, NULL, container_of(pt_obj, struct iommufd_hwpt_paging, common.obj), idev, cmd->flags, &user_data); @@ -317,6 +324,19 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) goto out_unlock; } hwpt = &hwpt_nested->common; + } else if (pt_obj->type == IOMMUFD_OBJ_VIOMMU) { + struct iommufd_hwpt_nested *hwpt_nested; + struct iommufd_viommu *viommu; + + viommu = container_of(pt_obj, struct iommufd_viommu, obj); + hwpt_nested = iommufd_hwpt_nested_alloc( + ucmd->ictx, viommu, viommu->hwpt, idev, + cmd->flags, &user_data); + if (IS_ERR(hwpt_nested)) { + rc = PTR_ERR(hwpt_nested); + goto out_unlock; + } + hwpt = &hwpt_nested->common; } else { rc = -EINVAL; goto out_put_pt;
On Wed, Oct 09, 2024 at 09:38:06AM -0700, Nicolin Chen wrote:
One thought is to continue like we do with the domain and check that the idev's iommu ops match the viommu's iommu ops (or is null for default) before allowing the callback.
Jason
For an iommu_dev that can unplug (so far only this selftest does so), the viommu->iommu_dev pointer has no guarantee of its life cycle after it is copied from the idev->dev->iommu->iommu_dev.
Track the user count of the iommu_dev. Delay the exit routine if refcount is unbalanced. The refcount inc/dec will be added in the following patch.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/selftest.c | 33 ++++++++++++++++++++++++-------- 1 file changed, 25 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index f4be87b49447..a89a865617db 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -508,14 +508,17 @@ static bool mock_domain_capable(struct device *dev, enum iommu_cap cap)
static struct iopf_queue *mock_iommu_iopf_queue;
-static struct iommu_device mock_iommu_device = { -}; +static struct mock_iommu_device { + struct iommu_device iommu_dev; + wait_queue_head_t wait; + refcount_t users; +} mock_iommu;
static struct iommu_device *mock_probe_device(struct device *dev) { if (dev->bus != &iommufd_mock_bus_type.bus) return ERR_PTR(-ENODEV); - return &mock_iommu_device; + return &mock_iommu.iommu_dev; }
static void mock_domain_page_response(struct device *dev, struct iopf_fault *evt, @@ -1537,24 +1540,27 @@ int __init iommufd_test_init(void) if (rc) goto err_platform;
- rc = iommu_device_sysfs_add(&mock_iommu_device, + rc = iommu_device_sysfs_add(&mock_iommu.iommu_dev, &selftest_iommu_dev->dev, NULL, "%s", dev_name(&selftest_iommu_dev->dev)); if (rc) goto err_bus;
- rc = iommu_device_register_bus(&mock_iommu_device, &mock_ops, + rc = iommu_device_register_bus(&mock_iommu.iommu_dev, &mock_ops, &iommufd_mock_bus_type.bus, &iommufd_mock_bus_type.nb); if (rc) goto err_sysfs;
+ refcount_set(&mock_iommu.users, 1); + init_waitqueue_head(&mock_iommu.wait); + mock_iommu_iopf_queue = iopf_queue_alloc("mock-iopfq");
return 0;
err_sysfs: - iommu_device_sysfs_remove(&mock_iommu_device); + iommu_device_sysfs_remove(&mock_iommu.iommu_dev); err_bus: bus_unregister(&iommufd_mock_bus_type.bus); err_platform: @@ -1564,6 +1570,16 @@ int __init iommufd_test_init(void) return rc; }
+static void iommufd_test_wait_for_users(void) +{ + if (refcount_dec_and_test(&mock_iommu.users)) + return; + /* Time out waiting for iommu device user count to become 0 */ + WARN_ON(!wait_event_timeout(mock_iommu.wait, + refcount_read(&mock_iommu.users) == 0, + msecs_to_jiffies(10000))); +} + void iommufd_test_exit(void) { if (mock_iommu_iopf_queue) { @@ -1571,8 +1587,9 @@ void iommufd_test_exit(void) mock_iommu_iopf_queue = NULL; }
- iommu_device_sysfs_remove(&mock_iommu_device); - iommu_device_unregister_bus(&mock_iommu_device, + iommufd_test_wait_for_users(); + iommu_device_sysfs_remove(&mock_iommu.iommu_dev); + iommu_device_unregister_bus(&mock_iommu.iommu_dev, &iommufd_mock_bus_type.bus, &iommufd_mock_bus_type.nb); bus_unregister(&iommufd_mock_bus_type.bus);
On Wed, Oct 09, 2024 at 09:38:07AM -0700, Nicolin Chen wrote:
Just use a completion instead of a wait_queue, a few more bytes but it is easier to code. This has some subtle issue where the device memory could be freed while a concurrent thread is going to trigger the wait.
Jason
Implement the viommu alloc/free functions to increase/reduce refcount of its dependent mock iommu device. User space can verify this loop via the IOMMU_VIOMMU_TYPE_SELFTEST.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_test.h | 2 ++ drivers/iommu/iommufd/selftest.c | 45 ++++++++++++++++++++++++++++ 2 files changed, 47 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index f4bc23a92f9a..edced4ac7cd3 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -180,4 +180,6 @@ struct iommu_hwpt_invalidate_selftest { __u32 iotlb_id; };
+#define IOMMU_VIOMMU_TYPE_SELFTEST 0xdeadbeef + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index a89a865617db..4fcf475facb1 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -132,6 +132,10 @@ struct mock_iommu_domain_nested { u32 iotlb[MOCK_NESTED_DOMAIN_IOTLB_NUM]; };
+struct mock_viommu { + struct iommufd_viommu core; +}; + enum selftest_obj_type { TYPE_IDEV, }; @@ -544,6 +548,46 @@ static int mock_dev_disable_feat(struct device *dev, enum iommu_dev_features fea return 0; }
+static void mock_viommu_free(struct iommufd_viommu *viommu) +{ + struct mock_iommu_device *mock_iommu = + container_of(viommu->iommu_dev, struct mock_iommu_device, + iommu_dev); + + if (refcount_dec_and_test(&mock_iommu->users)) + wake_up_interruptible_all(&mock_iommu->wait); + + /* iommufd core frees mock_viommu and viommu */ +} + +static struct iommufd_viommu_ops mock_viommu_ops = { + .free = mock_viommu_free, +}; + +static struct iommufd_viommu * +mock_viommu_alloc(struct iommu_device *iommu_dev, struct iommu_domain *domain, + struct iommufd_ctx *ictx, unsigned int viommu_type) +{ + struct mock_iommu_device *mock_iommu = + container_of(iommu_dev, struct mock_iommu_device, iommu_dev); + struct mock_viommu *mock_viommu; + + if (viommu_type != IOMMU_VIOMMU_TYPE_SELFTEST) + return ERR_PTR(-EOPNOTSUPP); + + mock_viommu = iommufd_viommu_alloc(ictx, mock_viommu, core, + &mock_viommu_ops); + if (IS_ERR(mock_viommu)) + return ERR_CAST(mock_viommu); + + if (!refcount_inc_not_zero(&mock_iommu->users)) { + kfree(mock_viommu); + return ERR_PTR(-ENXIO); + } + + return &mock_viommu->core; +} + static const struct iommu_ops mock_ops = { /* * IOMMU_DOMAIN_BLOCKED cannot be returned from def_domain_type() @@ -563,6 +607,7 @@ static const struct iommu_ops mock_ops = { .dev_enable_feat = mock_dev_enable_feat, .dev_disable_feat = mock_dev_disable_feat, .user_pasid_table = true, + .viommu_alloc = mock_viommu_alloc, .default_domain_ops = &(struct iommu_domain_ops){ .free = mock_domain_free,
On Wed, Oct 09, 2024 at 09:38:08AM -0700, Nicolin Chen wrote:
What about the default viommu? What happens then?
It would be a bug if the iommu_dev being passed in was somehow released while iommufd had hold of it through vfio. So just use refcount_inc()
Jason
Add a new iommufd_viommu FIXTURE and setup it up with a vIOMMU object.
Any new vIOMMU feature will be added as a TEST_F under that.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- tools/testing/selftests/iommu/iommufd_utils.h | 28 +++++++ tools/testing/selftests/iommu/iommufd.c | 84 +++++++++++++++++++ 2 files changed, 112 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 40f6f14ce136..307d097db9dd 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -762,3 +762,31 @@ static int _test_cmd_trigger_iopf(int fd, __u32 device_id, __u32 fault_fd)
#define test_cmd_trigger_iopf(device_id, fault_fd) \ ASSERT_EQ(0, _test_cmd_trigger_iopf(self->fd, device_id, fault_fd)) + +static int _test_cmd_viommu_alloc(int fd, __u32 device_id, __u32 hwpt_id, + __u32 type, __u32 flags, __u32 *viommu_id) +{ + struct iommu_viommu_alloc cmd = { + .size = sizeof(cmd), + .flags = flags, + .type = type, + .dev_id = device_id, + .hwpt_id = hwpt_id, + }; + int ret; + + ret = ioctl(fd, IOMMU_VIOMMU_ALLOC, &cmd); + if (ret) + return ret; + if (viommu_id) + *viommu_id = cmd.out_viommu_id; + return 0; +} + +#define test_cmd_viommu_alloc(device_id, hwpt_id, type, viommu_id) \ + ASSERT_EQ(0, _test_cmd_viommu_alloc(self->fd, device_id, hwpt_id, \ + type, 0, viommu_id)) +#define test_err_viommu_alloc(_errno, device_id, hwpt_id, type, viommu_id) \ + EXPECT_ERRNO(_errno, _test_cmd_viommu_alloc(self->fd, device_id, \ + hwpt_id, type, 0, \ + viommu_id)) diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 4927b9add5ad..c03705825576 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -128,6 +128,7 @@ TEST_F(iommufd, cmd_length) TEST_LENGTH(iommu_ioas_unmap, IOMMU_IOAS_UNMAP, length); TEST_LENGTH(iommu_option, IOMMU_OPTION, val64); TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS, __reserved); + TEST_LENGTH(iommu_viommu_alloc, IOMMU_VIOMMU_ALLOC, out_viommu_id); #undef TEST_LENGTH }
@@ -2386,4 +2387,87 @@ TEST_F(vfio_compat_mock_domain, huge_map) } }
+FIXTURE(iommufd_viommu) +{ + int fd; + uint32_t ioas_id; + uint32_t stdev_id; + uint32_t hwpt_id; + uint32_t device_id; + uint32_t viommu_id; +}; + +FIXTURE_VARIANT(iommufd_viommu) +{ + unsigned int viommu; + unsigned int viommu_type; +}; + +FIXTURE_SETUP(iommufd_viommu) +{ + self->fd = open("/dev/iommu", O_RDWR); + ASSERT_NE(-1, self->fd); + test_ioctl_ioas_alloc(&self->ioas_id); + test_ioctl_set_default_memory_limit(); + + if (variant->viommu) { + test_cmd_mock_domain(self->ioas_id, &self->stdev_id, NULL, + &self->device_id); + + /* Negative test -- invalid hwpt */ + test_err_viommu_alloc(ENOENT, self->device_id, self->hwpt_id, + variant->viommu_type, &self->viommu_id); + + /* Negative test -- not a nesting parent hwpt */ + test_cmd_hwpt_alloc(self->device_id, self->ioas_id, 0, + &self->hwpt_id); + test_err_viommu_alloc(EINVAL, self->device_id, self->hwpt_id, + variant->viommu_type, &self->viommu_id); + test_ioctl_destroy(self->hwpt_id); + + /* Allocate a nesting parent HWP */ + test_cmd_hwpt_alloc(self->device_id, self->ioas_id, + IOMMU_HWPT_ALLOC_NEST_PARENT, + &self->hwpt_id); + /* Negative test -- unsupported viommu type */ + test_err_viommu_alloc(EOPNOTSUPP, self->device_id, + self->hwpt_id, 0xdead, &self->viommu_id); + /* Allocate a default type of viommu */ + test_cmd_viommu_alloc(self->device_id, self->hwpt_id, + variant->viommu_type, &self->viommu_id); + } else { + test_err_viommu_alloc(ENOENT, self->device_id, self->hwpt_id, + variant->viommu_type, &self->viommu_id); + } +} + +FIXTURE_TEARDOWN(iommufd_viommu) +{ + if (variant->viommu) { + test_ioctl_destroy(self->viommu_id); + test_ioctl_destroy(self->hwpt_id); + } + teardown_iommufd(self->fd, _metadata); +} + +FIXTURE_VARIANT_ADD(iommufd_viommu, no_viommu) +{ +}; + +FIXTURE_VARIANT_ADD(iommufd_viommu, viommu_default) +{ + .viommu = 1, + .viommu_type = IOMMU_VIOMMU_TYPE_DEFAULT, +}; + +FIXTURE_VARIANT_ADD(iommufd_viommu, mock_viommu) +{ + .viommu = 1, + .viommu_type = IOMMU_VIOMMU_TYPE_SELFTEST, +}; + +TEST_F(iommufd_viommu, viommu_auto_destroy) +{ +} + TEST_HARNESS_MAIN
With the introduction of the new object and its infrastructure, update the doc to reflect that and add a new graph.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- Documentation/userspace-api/iommufd.rst | 66 ++++++++++++++++++++++++- 1 file changed, 65 insertions(+), 1 deletion(-)
diff --git a/Documentation/userspace-api/iommufd.rst b/Documentation/userspace-api/iommufd.rst index 2deba93bf159..37eb1adda57b 100644 --- a/Documentation/userspace-api/iommufd.rst +++ b/Documentation/userspace-api/iommufd.rst @@ -63,6 +63,37 @@ Following IOMMUFD objects are exposed to userspace: space usually has mappings from guest-level I/O virtual addresses to guest- level physical addresses.
+ - IOMMUFD_OBJ_VIOMMU, representing a slice of the physical IOMMU instance, + passed to or shared with a VM. It may be some HW-accelerated virtualization + features and some SW resources used by the VM. For examples: + * Security namespace for guest owned ID, e.g. guest-controlled cache tags + * Access to a sharable nesting parent pagetable across physical IOMMUs + * Virtualization of various platforms IDs, e.g. RIDs and others + * Delivery of paravirtualized invalidation + * Direct assigned invalidation queues + * Direct assigned interrupts + * Non-affiliated event reporting + Such a vIOMMU object generally has the access to a nesting parent pagetable + to support some HW-accelerated virtualization features. So, a vIOMMU object + must be created given a nesting parent HWPT_PAGING object, and then it would + encapsulate that HWPT_PAGING object. Therefore, a vIOMMU object can be used + to allocate an HWPT_NESTED object in place of the encapsulated HWPT_PAGING. + + .. note:: + + The name "vIOMMU" isn't necessarily identical to a virtualized IOMMU in a + VM. A VM can have one giant virtualized IOMMU running on a machine having + multiple physical IOMMUs, in which case the VMM will dispatch the requests + or configurations from this single virtualized IOMMU instance to multiple + vIOMMU objects created for individual slices of different physical IOMMUs. + In other words, a vIOMMU object is always a representation of one physical + IOMMU, not necessarily of a virtualized IOMMU. For VMMs that want the full + virtualization features from physical IOMMUs, it is suggested to build the + same number of virtualized IOMMUs as the number of physical IOMMUs, so the + passed-through devices would be connected to their own virtualized IOMMUs + backed by corresponding vIOMMU objects, in which case a guest OS would do + the "dispatch" naturally instead of VMM trappings. + All user-visible objects are destroyed via the IOMMU_DESTROY uAPI.
The diagrams below show relationships between user-visible objects and kernel @@ -101,6 +132,25 @@ creating the objects and links:: |------------>|iommu_domain|<----|iommu_domain|<----|device| |____________| |____________| |______|
+ _______________________________________________________________________ + | iommufd (with vIOMMU) | + | | + | [5] | + | _____________ | + | | | | + | [1] | vIOMMU | [4] [2] | + | ________________ | | _____________ ________ | + | | | | [3] | | | | | | + | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | + | |________________| |_____________| |_____________| |________| | + | | | | | | + |_________|____________________|__________________|_______________|_____| + | | | | + | ______v_____ ______v_____ ___v__ + | PFN storage | (paging) | | (nested) | |struct| + |------------>|iommu_domain|<----|iommu_domain|<----|device| + |____________| |____________| |______| + 1. IOMMUFD_OBJ_IOAS is created via the IOMMU_IOAS_ALLOC uAPI. An iommufd can hold multiple IOAS objects. IOAS is the most generic object and does not expose interfaces that are specific to single IOMMU drivers. All operations @@ -132,7 +182,8 @@ creating the objects and links:: flag is set.
4. IOMMUFD_OBJ_HWPT_NESTED can be only manually created via the IOMMU_HWPT_ALLOC - uAPI, provided an hwpt_id via @pt_id to associate the new HWPT_NESTED object + uAPI, provided an hwpt_id or a viommu_id of a vIOMMU object encapsulating a + nesting parent HWPT_PAGING via @pt_id to associate the new HWPT_NESTED object to the corresponding HWPT_PAGING object. The associating HWPT_PAGING object must be a nesting parent manually allocated via the same uAPI previously with an IOMMU_HWPT_ALLOC_NEST_PARENT flag, otherwise the allocation will fail. The @@ -149,6 +200,18 @@ creating the objects and links:: created via the same IOMMU_HWPT_ALLOC uAPI. The difference is at the type of the object passed in via the @pt_id field of struct iommufd_hwpt_alloc.
+5. IOMMUFD_OBJ_VIOMMU can be only manually created via the IOMMU_VIOMMU_ALLOC + uAPI, provided a dev_id (for the device's physical IOMMU to back the vIOMMU) + and an hwpt_id (to associate the vIOMMU to a nesting parent HWPT_PAGING). The + iommufd core will link the vIOMMU object to the struct iommu_device that the + struct device is behind. And an IOMMU driver can implement a viommu_alloc op + to allocate its own vIOMMU data structure embedding the core-level structure + iommufd_viommu and some driver-specific data. If necessary, the driver can + also configure its HW virtualization feature for that vIOMMU (and thus for + the VM). Successful completion of this operation sets up the linkages between + the vIOMMU object and the HWPT_PAGING, then this vIOMMU object can be used + as a nesting parent object to allocate an HWPT_NESTED object described above. + A device can only bind to an iommufd due to DMA ownership claim and attach to at most one IOAS object (no support of PASID yet).
@@ -161,6 +224,7 @@ User visible objects are backed by following datastructures: - iommufd_device for IOMMUFD_OBJ_DEVICE. - iommufd_hwpt_paging for IOMMUFD_OBJ_HWPT_PAGING. - iommufd_hwpt_nested for IOMMUFD_OBJ_HWPT_NESTED. +- iommufd_viommu for IOMMUFD_OBJ_VIOMMU.
Several terminologies when looking at these datastructures:
On Wed, Oct 09, 2024 at 09:38:10AM -0700, Nicolin Chen wrote:
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
Add a new driver-type for ARM SMMUv3 to enum iommu_viommu_type. Implement the viommu_alloc op with an arm_vsmmu_alloc function. As an initial step, copy the VMID from s2_parent. A later cleanup series is required to move the VMID allocation out of the stage-2 domain allocation routine to this.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 18 ++++++++++++++ include/uapi/linux/iommufd.h | 2 ++ .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 24 +++++++++++++++++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 + 4 files changed, 45 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index e394943c0b4b..844d1dfdea55 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -10,6 +10,7 @@
#include <linux/bitfield.h> #include <linux/iommu.h> +#include <linux/iommufd.h> #include <linux/kernel.h> #include <linux/mmzone.h> #include <linux/sizes.h> @@ -1005,12 +1006,22 @@ tegra241_cmdqv_probe(struct arm_smmu_device *smmu) } #endif /* CONFIG_TEGRA241_CMDQV */
+struct arm_vsmmu { + struct iommufd_viommu core; + struct arm_smmu_device *smmu; + struct arm_smmu_domain *s2_parent; + u16 vmid; +}; + #if IS_ENABLED(CONFIG_ARM_SMMU_V3_IOMMUFD) void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type); struct iommu_domain * arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, struct iommu_domain *parent, const struct iommu_user_data *user_data); +struct iommufd_viommu * +arm_vsmmu_alloc(struct iommu_device *iommu_dev, struct iommu_domain *parent, + struct iommufd_ctx *ictx, unsigned int viommu_type); #else #define arm_smmu_hw_info NULL static inline struct iommu_domain * @@ -1020,6 +1031,13 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, { return ERR_PTR(-EOPNOTSUPP); } + +static inline struct iommufd_viommu * +arm_vsmmu_alloc(struct iommu_device *iommu_dev, struct iommu_domain *parent, + struct iommufd_ctx *ictx, unsigned int viommu_type) +{ + return ERR_PTR(-EOPNOTSUPP); +} #endif /* CONFIG_ARM_SMMU_V3_IOMMUFD */
#endif /* _ARM_SMMU_V3_H */ diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index ff8aece8212f..6ee841a8c79b 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -857,9 +857,11 @@ struct iommu_fault_alloc { /** * enum iommu_viommu_type - Virtual IOMMU Type * @IOMMU_VIOMMU_TYPE_DEFAULT: Core-managed virtual IOMMU type + * @IOMMU_VIOMMU_TYPE_ARM_SMMUV3: ARM SMMUv3 driver specific type */ enum iommu_viommu_type { IOMMU_VIOMMU_TYPE_DEFAULT = 0, + IOMMU_VIOMMU_TYPE_ARM_SMMUV3 = 1, };
/** diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index 51260f63be94..5e235fca8f13 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -212,3 +212,27 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags,
return &nested_domain->domain; } + +struct iommufd_viommu * +arm_vsmmu_alloc(struct iommu_device *iommu_dev, struct iommu_domain *parent, + struct iommufd_ctx *ictx, unsigned int viommu_type) +{ + struct arm_smmu_device *smmu = + container_of(iommu_dev, struct arm_smmu_device, iommu); + struct arm_smmu_domain *s2_parent = to_smmu_domain(parent); + struct arm_vsmmu *vsmmu; + + if (viommu_type != IOMMU_VIOMMU_TYPE_ARM_SMMUV3) + return ERR_PTR(-EOPNOTSUPP); + + vsmmu = iommufd_viommu_alloc(ictx, arm_vsmmu, core, NULL); + if (IS_ERR(vsmmu)) + return ERR_CAST(vsmmu); + + vsmmu->smmu = smmu; + vsmmu->s2_parent = s2_parent; + /* FIXME Move VMID allocation from the S2 domain allocation to here */ + vsmmu->vmid = s2_parent->s2_cfg.vmid; + + return &vsmmu->core; +} diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 4b836a5e9fde..6a23e6dcd5cf 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3541,6 +3541,7 @@ static struct iommu_ops arm_smmu_ops = { .dev_disable_feat = arm_smmu_dev_disable_feature, .page_response = arm_smmu_page_response, .def_domain_type = arm_smmu_def_domain_type, + .viommu_alloc = arm_vsmmu_alloc, .pgsize_bitmap = -1UL, /* Restricted during device attach */ .owner = THIS_MODULE, .default_domain_ops = &(const struct iommu_domain_ops) {
On Wed, Oct 09, 2024 at 09:38:11AM -0700, Nicolin Chen wrote:
I squashed the following changes to this commit (will be in v4). It replaces nested_domain->s2_parent with nested_domain->vsmmu:
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index 5e235fca8f13..3cd8ad0a8980 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -36,14 +36,15 @@ arm_smmu_get_msi_mapping_domain(struct iommu_domain *domain) struct arm_smmu_nested_domain *nested_domain = container_of(domain, struct arm_smmu_nested_domain, domain);
- return &nested_domain->s2_parent->domain; + return &nested_domain->vsmmu->s2_parent->domain; }
static void arm_smmu_make_nested_cd_table_ste( struct arm_smmu_ste *target, struct arm_smmu_master *master, struct arm_smmu_nested_domain *nested_domain, bool ats_enabled) { - arm_smmu_make_s2_domain_ste(target, master, nested_domain->s2_parent, + arm_smmu_make_s2_domain_ste(target, master, + nested_domain->vsmmu->s2_parent, ats_enabled);
target->data[0] = cpu_to_le64(STRTAB_STE_0_V | @@ -84,7 +85,8 @@ static void arm_smmu_make_nested_domain_ste( break; case STRTAB_STE_0_CFG_BYPASS: arm_smmu_make_s2_domain_ste( - target, master, nested_domain->s2_parent, ats_enabled); + target, master, nested_domain->vsmmu->s2_parent, + ats_enabled); break; case STRTAB_STE_0_CFG_ABORT: default: @@ -109,7 +111,7 @@ static int arm_smmu_attach_dev_nested(struct iommu_domain *domain, struct arm_smmu_ste ste; int ret;
- if (nested_domain->s2_parent->smmu != master->smmu) + if (nested_domain->vsmmu->smmu != master->smmu) return -EINVAL; if (arm_smmu_ssids_in_use(&master->cd_table)) return -EBUSY; @@ -164,8 +166,10 @@ static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg) struct iommu_domain * arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, struct iommu_domain *parent, + struct iommufd_viommu *viommu, const struct iommu_user_data *user_data) { + struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core); struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct arm_smmu_nested_domain *nested_domain; struct arm_smmu_domain *smmu_parent; @@ -183,6 +187,10 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, !(master->smmu->features & ARM_SMMU_FEAT_S2FWB)) return ERR_PTR(-EOPNOTSUPP);
+ /* Driver only supports nesting with the vIOMMU infrastructure */ + if (!viommu) + return ERR_PTR(-EOPNOTSUPP); + /* * The core code checks that parent was created with * IOMMU_HWPT_ALLOC_NEST_PARENT @@ -206,7 +214,7 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags,
nested_domain->domain.type = IOMMU_DOMAIN_NESTED; nested_domain->domain.ops = &arm_smmu_nested_ops; - nested_domain->s2_parent = smmu_parent; + nested_domain->vsmmu = vsmmu; nested_domain->ste[0] = arg.ste[0]; nested_domain->ste[1] = arg.ste[1] & ~cpu_to_le64(STRTAB_STE_1_EATS);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 6a23e6dcd5cf..460d2fe5dd49 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2660,7 +2661,7 @@ to_smmu_domain_devices(struct iommu_domain *domain) domain->type == IOMMU_DOMAIN_SVA) return to_smmu_domain(domain); if (domain->type == IOMMU_DOMAIN_NESTED) - return to_smmu_nested_domain(domain)->s2_parent; + return to_smmu_nested_domain(domain)->vsmmu->s2_parent; return NULL; }
@@ -3128,7 +3129,7 @@ arm_smmu_domain_alloc_user(struct device *dev, u32 flags,
if (parent) return arm_smmu_domain_alloc_nesting(dev, flags, parent, - user_data); + viommu, user_data);
if (flags & ~PAGING_FLAGS) return ERR_PTR(-EOPNOTSUPP); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 844d1dfdea55..0ea1afe1bb7a 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -836,7 +836,7 @@ struct arm_smmu_domain {
struct arm_smmu_nested_domain { struct iommu_domain domain; - struct arm_smmu_domain *s2_parent; + struct arm_vsmmu *vsmmu;
__le64 ste[2]; }; @@ -1018,6 +1018,7 @@ void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type); struct iommu_domain * arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, struct iommu_domain *parent, + struct iommufd_viommu *viommu, const struct iommu_user_data *user_data); struct iommufd_viommu * arm_vsmmu_alloc(struct iommu_device *iommu_dev, struct iommu_domain *parent, @@ -1027,6 +1028,7 @@ arm_vsmmu_alloc(struct iommu_device *iommu_dev, struct iommu_domain *parent, static inline struct iommu_domain * arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, struct iommu_domain *parent, + struct iommufd_viommu *viommu, const struct iommu_user_data *user_data) { return ERR_PTR(-EOPNOTSUPP);
On Thu, Oct 17, 2024 at 01:41:23PM -0300, Jason Gunthorpe wrote:
For SMMUv3 driver, we have to make it a hard requirement since the invalidation can be only done with a vIOMMU, right?
Nicolin
On Wed, Oct 09, 2024 at 09:38:11AM -0700, Nicolin Chen wrote:
Let's do #define NULL here instead so we don't get an op at all.
So what happens if the user tries to create a default domain?
It skips this and just creates an normal viommu object
But then what? The driver needs to make sure it never casts that to a arm_vsmmu ? How?
Jason
On Thu, Oct 17, 2024 at 03:40:15PM -0300, Jason Gunthorpe wrote:
Ack.
So long as a driver doesn't provide iommu_ops->default_viommu_ops, it should be fine. We may also block DEFAULT viommu allocations in the core if the driver doesn't provide that default_viommu_ops.
Nicolin
Following the previous vIOMMU series, this adds another vDEVICE structure, representing the association from an iommufd_device to an iommufd_viommu. This gives the whole architecture a new "v" layer: _______________________________________________________________________ | iommufd (with vIOMMU/vDEVICE) | | _____________ _____________ | | | | | | | | |----------------| vIOMMU |<---| vDEVICE |<------| | | | | | |_____________| | | | | ______ | | _____________ ___|____ | | | | | | | | | | | | | | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | | | |______| |_____________| |_____________| |________| | |______|________|______________|__________________|_______________|_____| | | | | | ______v_____ | ______v_____ ______v_____ ___v__ | struct | | PFN | (paging) | | (nested) | |struct| |iommu_device| |------>|iommu_domain|<----|iommu_domain|<----|device| |____________| storage|____________| |____________| |______|
This vDEVICE object is used to collect and store all vIOMMU-related device information/attributes in a VM. As an initial series for vDEVICE, add only the virt_id to the vDEVICE, which is a vIOMMU specific device ID in a VM: e.g. vSID of ARM SMMUv3, vDeviceID of AMD IOMMU, and vID of Intel VT-d to a Context Table. This virt_id helps IOMMU drivers to link the vID to a pID of the device against the physical IOMMU instance. This is essential for a vIOMMU-based invalidation, where the request contains a device's vID for a device cache flush, e.g. ATC invalidation.
Therefore, with this vDEVICE object, support a vIOMMU-based invalidation, by reusing IOMMUFD_CMD_HWPT_INVALIDATE for a vIOMMU object to flush cache with a given driver data.
As for the implementation of the series, add driver support in ARM SMMUv3 for a real world use case.
This series is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p2-v3 Paring QEMU branch for testing: https://github.com/nicolinc/qemu/commits/wip/for_iommufd_viommu_p2-v3
Changelog v3 * Added Jason's Reviewed-by * Split this invalidation part out of the part-1 series * Repurposed VDEV_ID ioctl to a wider vDEVICE structure and ioctl * Reduced viommu_api functions by allowing drivers to access viommu and vdevice structure directly * Dropped vdevs_rwsem by using xa_lock instead * Dropped arm_smmu_cache_invalidate_user v2 https://lore.kernel.org/all/cover.1724776335.git.nicolinc@nvidia.com/ * Limited vdev_id to one per idev * Added a rw_sem to protect the vdev_id list * Reworked driver-level APIs with proper lockings * Added a new viommu_api file for IOMMUFD_DRIVER config * Dropped useless iommu_dev point from the viommu structure * Added missing index numnbers to new types in the uAPI header * Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one * Reworked mock_viommu_cache_invalidate() using the new iommu helper * Reordered details of set/unset_vdev_id handlers for proper lockings v1 https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/
Thanks! Nicolin
Jason Gunthorpe (3): iommu: Add iommu_copy_struct_from_full_user_array helper iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED iommu/arm-smmu-v3: Update comments about ATS and bypass
Nicolin Chen (13): iommufd/viommu: Introduce IOMMUFD_OBJ_VDEVICE and its related struct iommufd/viommu: Add a default_viommu_ops for IOMMU_VIOMMU_TYPE_DEFAULT iommufd/viommu: Add IOMMU_VDEVICE_ALLOC ioctl iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage iommu/viommu: Add cache_invalidate for IOMMU_VIOMMU_TYPE_DEFAULT iommufd/hw_pagetable: Allow viommu->ops->cache_invalidate for hwpt_nested iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE iommufd/viommu: Add vdev_to_dev helper iommufd/selftest: Add mock_viommu_cache_invalidate iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command iommufd/selftest: Add vIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl Documentation: userspace-api: iommufd: Update vDEVICE iommu/arm-smmu-v3: Add arm_vsmmu_cache_invalidate
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 9 +- drivers/iommu/iommufd/iommufd_private.h | 12 ++ drivers/iommu/iommufd/iommufd_test.h | 30 +++ include/linux/iommu.h | 53 ++++- include/linux/iommufd.h | 50 +++++ include/uapi/linux/iommufd.h | 61 +++++- tools/testing/selftests/iommu/iommufd_utils.h | 83 +++++++ .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 162 +++++++++++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +++-- drivers/iommu/iommufd/hw_pagetable.c | 45 +++- drivers/iommu/iommufd/main.c | 6 + drivers/iommu/iommufd/selftest.c | 122 ++++++++++- drivers/iommu/iommufd/viommu.c | 93 +++++++- drivers/iommu/iommufd/viommu_api.c | 21 ++ tools/testing/selftests/iommu/iommufd.c | 204 +++++++++++++++++- Documentation/userspace-api/iommufd.rst | 58 +++-- 16 files changed, 1007 insertions(+), 52 deletions(-)
Introduce a new IOMMUFD_OBJ_VDEVICE to represent a physical device, i.e. iommufd_device (idev) object, against an iommufd_viommu (vIOMMU) object in the VM. This vDEVICE object (and its structure) holds all the information and attributes in a VM, regarding the device related to the vIOMMU.
As an initial patch, add a per-vIOMMU virtual ID. This can be: - Virtual StreamID on a nested ARM SMMUv3, an index to a Stream Table - Virtual DeviceID on a nested AMD IOMMU, an index to a Device Table - Virtual ID on a nested Intel VT-D IOMMU, an index to a Context Table Potentially, this vDEVICE structure can hold some vData for Confidential Compute Architecture (CCA).
Add a pair of vdevice_alloc and vdevice_free in struct iommufd_viommu_ops to allow driver-level vDEVICE structure allocations.
Similar to iommufd_vdevice_alloc, add an iommufd_vdevice_alloc helper so IOMMU drivers can allocate core-embedded style structures.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 1 + include/linux/iommufd.h | 31 +++++++++++++++++++++++++ drivers/iommu/iommufd/viommu_api.c | 14 +++++++++++ 3 files changed, 46 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index c80d880f8b6a..0c56a467e440 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -132,6 +132,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_ACCESS, IOMMUFD_OBJ_FAULT, IOMMUFD_OBJ_VIOMMU, + IOMMUFD_OBJ_VDEVICE, #ifdef CONFIG_IOMMUFD_TEST IOMMUFD_OBJ_SELFTEST, #endif diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 069a38999cdd..510fc961a9ad 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -75,13 +75,30 @@ struct iommufd_viommu { unsigned int type; };
+struct iommufd_vdevice { + struct iommufd_object obj; + struct iommufd_ctx *ictx; + struct iommufd_device *idev; + struct iommufd_viommu *viommu; + u64 id; /* per-vIOMMU virtual ID */ +}; + /** * struct iommufd_viommu_ops - vIOMMU specific operations * @free: Free all driver-specific parts of an iommufd_viommu. The memory of the * vIOMMU will be free-ed by iommufd core after calling this free op. + * @vdevice_alloc: Allocate a driver-managed iommufd_vdevice to init some driver + * specific structure or HW procedure. Note that the core-level + * structure is filled by the iommufd core after calling this op. + * @vdevice_free: Free a driver-managed iommufd_vdevice to de-init its structure + * or HW procedure. The memory of the vdevice will be free-ed by + * iommufd core. */ struct iommufd_viommu_ops { void (*free)(struct iommufd_viommu *viommu); + struct iommufd_vdevice *(*vdevice_alloc)(struct iommufd_viommu *viommu, + struct device *dev, u64 id); + void (*vdevice_free)(struct iommufd_vdevice *vdev); };
#if IS_ENABLED(CONFIG_IOMMUFD) @@ -103,6 +120,8 @@ int iommufd_vfio_compat_set_no_iommu(struct iommufd_ctx *ictx); struct iommufd_viommu * __iommufd_viommu_alloc(struct iommufd_ctx *ictx, size_t size, const struct iommufd_viommu_ops *ops); +struct iommufd_vdevice * +__iommufd_vdevice_alloc(struct iommufd_ctx *ictx, size_t size); #else /* !CONFIG_IOMMUFD */ static inline struct iommufd_ctx *iommufd_ctx_from_file(struct file *file) { @@ -150,6 +169,12 @@ __iommufd_viommu_alloc(struct iommufd_ctx *ictx, size_t size, { return ERR_PTR(-EOPNOTSUPP); } + +static inline struct iommufd_vdevice * +__iommufd_vdevice_alloc(struct iommufd_ctx *ictx, size_t size) +{ + return ERR_PTR(-EOPNOTSUPP); +} #endif /* CONFIG_IOMMUFD */
/* @@ -163,4 +188,10 @@ __iommufd_viommu_alloc(struct iommufd_ctx *ictx, size_t size, struct drv_struct, member)), \ ops), \ struct drv_struct, member) +#define iommufd_vdevice_alloc(ictx, drv_struct, member) \ + container_of(__iommufd_vdevice_alloc(ictx, \ + sizeof(struct drv_struct) + \ + BUILD_BUG_ON_ZERO(offsetof( \ + struct drv_struct, member))), \ + struct drv_struct, member) #endif diff --git a/drivers/iommu/iommufd/viommu_api.c b/drivers/iommu/iommufd/viommu_api.c index c1731f080d6b..8419df3b658c 100644 --- a/drivers/iommu/iommufd/viommu_api.c +++ b/drivers/iommu/iommufd/viommu_api.c @@ -55,3 +55,17 @@ __iommufd_viommu_alloc(struct iommufd_ctx *ictx, size_t size, return viommu; } EXPORT_SYMBOL_NS_GPL(__iommufd_viommu_alloc, IOMMUFD); + +struct iommufd_vdevice * +__iommufd_vdevice_alloc(struct iommufd_ctx *ictx, size_t size) +{ + struct iommufd_object *obj; + + if (WARN_ON(size < sizeof(struct iommufd_vdevice))) + return ERR_PTR(-EINVAL); + obj = iommufd_object_alloc_elm(ictx, size, IOMMUFD_OBJ_VDEVICE); + if (IS_ERR(obj)) + return ERR_CAST(obj); + return container_of(obj, struct iommufd_vdevice, obj); +} +EXPORT_SYMBOL_NS_GPL(__iommufd_vdevice_alloc, IOMMUFD);
On Thu, Oct 17, 2024 at 03:45:56PM -0300, Jason Gunthorpe wrote:
It seems that we have to keep two small functions as followings, unless we want to expose the enum iommufd_object_type.
--------------------------------------------------------------- diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c index 4495f1aaccca..fdd203487bac 100644 --- a/drivers/iommu/iommufd/driver.c +++ b/drivers/iommu/iommufd/driver.c @@ -38,10 +38,19 @@ _iommufd_object_alloc_member(struct iommufd_ctx *ictx, size_t size, EXPORT_SYMBOL_NS_GPL(_iommufd_object_alloc_member, IOMMUFD);
struct iommufd_viommu *_iommufd_viommu_alloc(struct iommufd_ctx *ictx, size_t size) { return container_of(_iommufd_object_alloc_member(ictx, size, IOMMUFD_OBJ_VIOMMU), struct iommufd_viommu, obj); } EXPORT_SYMBOL_NS_GPL(_iommufd_viommu_alloc, IOMMUFD); + +struct iommufd_vdevice *_iommufd_vdevice_alloc(struct iommufd_ctx *ictx, + size_t size) +{ + return container_of(_iommufd_object_alloc_member(ictx, size, + IOMMUFD_OBJ_VDEVICE), + struct iommufd_vdevice, obj); +} +EXPORT_SYMBOL_NS_GPL(_iommufd_vdevice_alloc, IOMMUFD); ---------------------------------------------------------------
Thanks Nicolin
An IOMMU_VIOMMU_TYPE_DEFAULT doesn't need a free() op since the core can free everything in the destroy(). Now with the new vDEVICE structure, it might want to allocate its own vDEVICEs.
Add a default_viommu_ops for driver to hook ops for default vIOMMUs.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- include/linux/iommu.h | 4 ++++ drivers/iommu/iommufd/viommu.c | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 9105478bdbcd..1de2aebc4d92 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -44,6 +44,7 @@ struct iommu_dma_cookie; struct iommu_fault_param; struct iommufd_ctx; struct iommufd_viommu; +struct iommufd_viommu_ops;
#define IOMMU_FAULT_PERM_READ (1 << 0) /* read */ #define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */ @@ -551,6 +552,8 @@ static inline int __iommu_copy_struct_from_user_array( * It is suggested to call iommufd_viommu_alloc() helper for * a bundled allocation of the core and the driver structures, * using the given @ictx pointer. + * @default_viommu_ops: Driver can choose to use a default core-allocated vIOMMU + * object by providing a default_viommu_ops. * @pgsize_bitmap: bitmap of all possible supported page sizes * @owner: Driver module providing these ops * @identity_domain: An always available, always attachable identity @@ -605,6 +608,7 @@ struct iommu_ops { struct iommu_domain *domain, struct iommufd_ctx *ictx, unsigned int viommu_type); + const struct iommufd_viommu_ops *default_viommu_ops;
const struct iommu_domain_ops *default_domain_ops; unsigned long pgsize_bitmap; diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 3a903baeee6a..f512dfb535fd 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -44,7 +44,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd)
if (cmd->type == IOMMU_VIOMMU_TYPE_DEFAULT) { viommu = __iommufd_viommu_alloc(ucmd->ictx, sizeof(*viommu), - NULL); + ops->default_viommu_ops); } else { if (!ops->viommu_alloc) { rc = -EOPNOTSUPP;
On Wed, Oct 09, 2024 at 09:38:14AM -0700, Nicolin Chen wrote:
Why? arm_smmu is now creating its own viommu object, so who will use this?
Do we have any use for the default mode? It is already a bit confusing, can we just drop it?
Jason
On Thu, Oct 17, 2024 at 03:47:29PM -0300, Jason Gunthorpe wrote:
Hmm, that would make the default model completely useless..
Should we unsupport a default viommu allocation?
Nicolin
Introduce a new ioctl to allocate a vDEVICE object. Since a vDEVICE object is a connection of an iommufd_iommufd object and an iommufd_viommu object, require both as the ioctl inputs and take refcounts in the ioctl handler.
Add to the vIOMMU object a "vdevs" xarray, indexed by a per-vIOMMU virtual device ID, which can be: - Virtual StreamID on a nested ARM SMMUv3, an index to a Stream Table - Virtual DeviceID on a nested AMD IOMMU, an index to a Device Table - Virtual ID on a nested Intel VT-D IOMMU, an index to a Context Table
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 11 +++ include/linux/iommufd.h | 2 + include/uapi/linux/iommufd.h | 26 +++++++ drivers/iommu/iommufd/main.c | 6 ++ drivers/iommu/iommufd/viommu.c | 91 +++++++++++++++++++++++++ 5 files changed, 136 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 0c56a467e440..e160edb22b67 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -409,6 +409,7 @@ struct iommufd_device { struct iommufd_object obj; struct iommufd_ctx *ictx; struct iommufd_group *igroup; + struct iommufd_vdevice *vdev; struct list_head group_item; /* always the physical device */ struct device *dev; @@ -523,8 +524,18 @@ static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev, return iommu_group_replace_domain(idev->igroup->group, hwpt->domain); }
+static inline struct iommufd_viommu * +iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id) +{ + return container_of(iommufd_get_object(ucmd->ictx, id, + IOMMUFD_OBJ_VIOMMU), + struct iommufd_viommu, obj); +} + int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); void iommufd_viommu_destroy(struct iommufd_object *obj); +int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd); +void iommufd_vdevice_destroy(struct iommufd_object *obj);
#ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 510fc961a9ad..5a630952150d 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -72,6 +72,8 @@ struct iommufd_viommu {
const struct iommufd_viommu_ops *ops;
+ struct xarray vdevs; + unsigned int type; };
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 6ee841a8c79b..3039519d71b7 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -52,6 +52,7 @@ enum { IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d, IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e, IOMMUFD_CMD_VIOMMU_ALLOC = 0x8f, + IOMMUFD_CMD_VDEVICE_ALLOC = 0x90, };
/** @@ -894,4 +895,29 @@ struct iommu_viommu_alloc { __u32 out_viommu_id; }; #define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC) + +/** + * struct iommu_vdevice_alloc - ioctl(IOMMU_VDEVICE_ALLOC) + * @size: sizeof(struct iommu_vdevice_alloc) + * @viommu_id: vIOMMU ID to associate with the virtual device + * @dev_id: The pyhsical device to allocate a virtual instance on the vIOMMU + * @__reserved: Must be 0 + * @virt_id: Virtual device ID per vIOMMU, e.g. vSID of ARM SMMUv3, vDeviceID + * of AMD IOMMU, and vID of a nested Intel VT-d to a Context Table. + * @out_vdevice_id: Output virtual instance ID for the allocated object + * @__reserved2: Must be 0 + * + * Allocate a virtual device instance (for a physical device) against a vIOMMU. + * This instance holds the device's information (related to its vIOMMU) in a VM. + */ +struct iommu_vdevice_alloc { + __u32 size; + __u32 viommu_id; + __u32 dev_id; + __u32 __reserved; + __aligned_u64 virt_id; + __u32 out_vdevice_id; + __u32 __reserved2; +}; +#define IOMMU_VDEVICE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VDEVICE_ALLOC) #endif diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index cbd0a80b2d67..9a8c3cbecc11 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -302,6 +302,7 @@ union ucmd_buffer { struct iommu_option option; struct iommu_vfio_ioas vfio_ioas; struct iommu_viommu_alloc viommu; + struct iommu_vdevice_alloc vdev; #ifdef CONFIG_IOMMUFD_TEST struct iommu_test_cmd test; #endif @@ -355,6 +356,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { __reserved), IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, struct iommu_viommu_alloc, out_viommu_id), + IOCTL_OP(IOMMU_VDEVICE_ALLOC, iommufd_vdevice_alloc_ioctl, + struct iommu_vdevice_alloc, __reserved2), #ifdef CONFIG_IOMMUFD_TEST IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), #endif @@ -493,6 +496,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { [IOMMUFD_OBJ_VIOMMU] = { .destroy = iommufd_viommu_destroy, }, + [IOMMUFD_OBJ_VDEVICE] = { + .destroy = iommufd_vdevice_destroy, + }, #ifdef CONFIG_IOMMUFD_TEST [IOMMUFD_OBJ_SELFTEST] = { .destroy = iommufd_selftest_destroy, diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index f512dfb535fd..eb78c928c60f 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -9,6 +9,8 @@ void iommufd_viommu_destroy(struct iommufd_object *obj) struct iommufd_viommu *viommu = container_of(obj, struct iommufd_viommu, obj);
+ xa_destroy(&viommu->vdevs); + if (viommu->ops && viommu->ops->free) viommu->ops->free(viommu); refcount_dec(&viommu->hwpt->common.obj.users); @@ -72,6 +74,8 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) */ viommu->iommu_dev = idev->dev->iommu->iommu_dev;
+ xa_init(&viommu->vdevs); + refcount_inc(&viommu->hwpt->common.obj.users);
cmd->out_viommu_id = viommu->obj.id; @@ -89,3 +93,90 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) iommufd_put_object(ucmd->ictx, &idev->obj); return rc; } + +void iommufd_vdevice_destroy(struct iommufd_object *obj) +{ + struct iommufd_vdevice *old, *vdev = + container_of(obj, struct iommufd_vdevice, obj); + struct iommufd_viommu *viommu = vdev->viommu; + struct iommufd_device *idev = vdev->idev; + + if (viommu->ops && viommu->ops->vdevice_free) + viommu->ops->vdevice_free(vdev); + + old = xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); + if (old) + WARN_ON(old != vdev); + + refcount_dec(&viommu->obj.users); + refcount_dec(&idev->obj.users); + idev->vdev = NULL; +} + +int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) +{ + struct iommu_vdevice_alloc *cmd = ucmd->cmd; + struct iommufd_vdevice *vdev, *curr; + struct iommufd_viommu *viommu; + struct iommufd_device *idev; + u64 virt_id = cmd->virt_id; + int rc = 0; + + if (virt_id > ULONG_MAX) + return -EINVAL; + + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); + if (IS_ERR(viommu)) + return PTR_ERR(viommu); + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) { + rc = PTR_ERR(idev); + goto out_put_viommu; + } + + mutex_lock(&idev->igroup->lock); + if (idev->vdev) { + rc = -EEXIST; + goto out_unlock_igroup; + } + + if (viommu->ops && viommu->ops->vdevice_alloc) + vdev = viommu->ops->vdevice_alloc(viommu, idev->dev, virt_id); + else + vdev = __iommufd_vdevice_alloc(ucmd->ictx, sizeof(*vdev)); + if (IS_ERR(vdev)) { + rc = PTR_ERR(vdev); + goto out_unlock_igroup; + } + + vdev->idev = idev; + vdev->id = virt_id; + vdev->viommu = viommu; + + idev->vdev = vdev; + refcount_inc(&idev->obj.users); + refcount_inc(&viommu->obj.users); + + curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL); + if (curr) { + rc = xa_err(curr) ? : -EBUSY; + goto out_abort; + } + + cmd->out_vdevice_id = vdev->obj.id; + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + if (rc) + goto out_abort; + iommufd_object_finalize(ucmd->ictx, &vdev->obj); + goto out_unlock_igroup; + +out_abort: + iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj); +out_unlock_igroup: + mutex_unlock(&idev->igroup->lock); + iommufd_put_object(ucmd->ictx, &idev->obj); +out_put_viommu: + iommufd_put_object(ucmd->ictx, &viommu->obj); + return rc; +}
On Wed, Oct 09, 2024 at 09:38:15AM -0700, Nicolin Chen wrote:
This should hold the igroup lock when touching vdev?
Otherwise this won't work right
But be mindful of this abort, it doesn't want to be inside the lock if it also gets the lock.. fail_nth should be updated to cover these new ioctls to look for tricky things like that
But the design looks OK
Jason
On Thu, Oct 17, 2024 at 03:52:30PM -0300, Jason Gunthorpe wrote:
I added an abort() beside destroy():
+void iommufd_vdevice_abort(struct iommufd_object *obj) +{ + struct iommufd_vdevice *old, *vdev = + container_of(obj, struct iommufd_vdevice, obj); + struct iommufd_viommu *viommu = vdev->viommu; + struct iommufd_device *idev = vdev->idev; + + lockdep_assert_not_held(&idev->igroup->lock); + + if (viommu->ops && viommu->ops->vdevice_free) + viommu->ops->vdevice_free(vdev); + + old = xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); + if (old) + WARN_ON(old != vdev); + + refcount_dec(&viommu->obj.users); + refcount_dec(&idev->obj.users); + idev->vdev = NULL; +} + +void iommufd_vdevice_destroy(struct iommufd_object *obj) +{ + struct iommufd_vdevice *vdev = + container_of(obj, struct iommufd_vdevice, obj); + + mutex_lock(&vdev->idev->igroup->lock); + iommufd_vdevice_abort(obj); + mutex_unlock(&vdev->idev->igroup->lock); +} ----------------------------------------------------------
And I added fail_nth coverage for IOMMU_VIOMMU/VDEVICE_ALLOC cmds.
Thanks Nicolin
Add a vdevice_alloc TEST_F to cover the new IOMMU_VDEVICE_ALLOC ioctls.
Also add a vdevice_alloc op to the viommu mock_viommu_ops for a coverage of IOMMU_VIOMMU_TYPE_SELFTEST. The DEFAULT coverage is done via a core- allocated vDEVICE object.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- tools/testing/selftests/iommu/iommufd_utils.h | 27 +++++++++++++++++++ drivers/iommu/iommufd/selftest.c | 19 +++++++++++++ tools/testing/selftests/iommu/iommufd.c | 20 ++++++++++++++ 3 files changed, 66 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 307d097db9dd..ccd1f65df0b0 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -790,3 +790,30 @@ static int _test_cmd_viommu_alloc(int fd, __u32 device_id, __u32 hwpt_id, EXPECT_ERRNO(_errno, _test_cmd_viommu_alloc(self->fd, device_id, \ hwpt_id, type, 0, \ viommu_id)) + +static int _test_cmd_vdevice_alloc(int fd, __u32 viommu_id, __u32 idev_id, + __u64 virt_id, __u32 *vdev_id) +{ + struct iommu_vdevice_alloc cmd = { + .size = sizeof(cmd), + .dev_id = idev_id, + .viommu_id = viommu_id, + .virt_id = virt_id, + }; + int ret; + + ret = ioctl(fd, IOMMU_VDEVICE_ALLOC, &cmd); + if (ret) + return ret; + if (vdev_id) + *vdev_id = cmd.out_vdevice_id; + return 0; +} + +#define test_cmd_vdevice_alloc(viommu_id, idev_id, virt_id, vdev_id) \ + ASSERT_EQ(0, _test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, \ + virt_id, vdev_id)) +#define test_err_vdevice_alloc(_errno, viommu_id, idev_id, virt_id, vdev_id) \ + EXPECT_ERRNO(_errno, \ + _test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, \ + virt_id, vdev_id)) diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 4fcf475facb1..87bc45b86f9e 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -136,6 +136,11 @@ struct mock_viommu { struct iommufd_viommu core; };
+struct mock_vdevice { + struct iommufd_vdevice core; + u64 rid; +}; + enum selftest_obj_type { TYPE_IDEV, }; @@ -560,8 +565,22 @@ static void mock_viommu_free(struct iommufd_viommu *viommu) /* iommufd core frees mock_viommu and viommu */ }
+static struct iommufd_vdevice *mock_vdevice_alloc(struct iommufd_viommu *viommu, + struct device *dev, u64 id) +{ + struct mock_vdevice *mock_vdev; + + mock_vdev = iommufd_vdevice_alloc(viommu->ictx, mock_vdevice, core); + if (IS_ERR(mock_vdev)) + return ERR_CAST(mock_vdev); + + mock_vdev->rid = id; + return &mock_vdev->core; +} + static struct iommufd_viommu_ops mock_viommu_ops = { .free = mock_viommu_free, + .vdevice_alloc = mock_vdevice_alloc, };
static struct iommufd_viommu * diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index c03705825576..af00b082656e 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -129,6 +129,7 @@ TEST_F(iommufd, cmd_length) TEST_LENGTH(iommu_option, IOMMU_OPTION, val64); TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS, __reserved); TEST_LENGTH(iommu_viommu_alloc, IOMMU_VIOMMU_ALLOC, out_viommu_id); + TEST_LENGTH(iommu_vdevice_alloc, IOMMU_VDEVICE_ALLOC, __reserved2); #undef TEST_LENGTH }
@@ -2470,4 +2471,23 @@ TEST_F(iommufd_viommu, viommu_auto_destroy) { }
+TEST_F(iommufd_viommu, vdevice_alloc) +{ + uint32_t viommu_id = self->viommu_id; + uint32_t dev_id = self->device_id; + uint32_t vdev_id = 0; + + if (dev_id) { + /* Set vdev_id to 0x99, unset it, and set to 0x88 */ + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id); + test_err_vdevice_alloc(EEXIST, + viommu_id, dev_id, 0x99, &vdev_id); + test_ioctl_destroy(vdev_id); + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x88, &vdev_id); + test_ioctl_destroy(vdev_id); + } else { + test_err_vdevice_alloc(ENOENT, viommu_id, dev_id, 0x99, NULL); + } +} + TEST_HARNESS_MAIN
This per-vIOMMU cache_invalidate op is like the cache_invalidate_user op in struct iommu_domain_ops, but wider, supporting device cache (e.g. PCI ATC invaldiations).
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- include/linux/iommufd.h | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 5a630952150d..e58b3e43aa7b 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -14,6 +14,7 @@ struct device; struct file; struct iommu_group; +struct iommu_user_data_array; struct iommufd_access; struct iommufd_ctx; struct iommufd_device; @@ -95,12 +96,21 @@ struct iommufd_vdevice { * @vdevice_free: Free a driver-managed iommufd_vdevice to de-init its structure * or HW procedure. The memory of the vdevice will be free-ed by * iommufd core. + * @cache_invalidate: Flush hardware cache used by a vIOMMU. It can be used for + * any IOMMU hardware specific cache: TLB and device cache. + * The @array passes in the cache invalidation requests, in + * form of a driver data structure. A driver must update the + * array->entry_num to report the number of handled requests. + * The data structure of the array entry must be defined in + * include/uapi/linux/iommufd.h */ struct iommufd_viommu_ops { void (*free)(struct iommufd_viommu *viommu); struct iommufd_vdevice *(*vdevice_alloc)(struct iommufd_viommu *viommu, struct device *dev, u64 id); void (*vdevice_free)(struct iommufd_vdevice *vdev); + int (*cache_invalidate)(struct iommufd_viommu *viommu, + struct iommu_user_data_array *array); };
#if IS_ENABLED(CONFIG_IOMMUFD)
Now cache entries for hwpt_nested can be invalidated via the vIOMMU's cache_invalidate op alternatively. Allow iommufd_hwpt_nested_alloc to support such a case.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/hw_pagetable.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index b88a638d07da..ccaaf801955c 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -202,6 +202,17 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, return ERR_PTR(rc); }
+static inline bool +iommufd_hwpt_nested_has_invalidate_op(struct iommufd_hwpt_nested *hwpt_nested) +{ + struct iommufd_viommu *viommu = hwpt_nested->viommu; + + if (viommu) + return viommu->ops && viommu->ops->cache_invalidate; + else + return hwpt_nested->common.domain->ops->cache_invalidate_user; +} + /** * iommufd_hwpt_nested_alloc() - Get a NESTED iommu_domain for a device * @ictx: iommufd context @@ -257,7 +268,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, hwpt->domain->owner = ops;
if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED || - !hwpt->domain->ops->cache_invalidate_user)) { + !iommufd_hwpt_nested_has_invalidate_op(hwpt_nested))) { rc = -EINVAL; goto out_abort; }
On Wed, Oct 09, 2024 at 09:38:18AM -0700, Nicolin Chen wrote:
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
With a vIOMMU object, use space can flush any IOMMU related cache that can be directed using the vIOMMU object. It is similar to IOMMU_HWPT_INVALIDATE uAPI, but can cover a wider range than IOTLB, e.g. device/desciprtor cache.
Allow hwpt_id of the iommu_hwpt_invalidate structure to carry a viommu_id, and reuse the IOMMU_HWPT_INVALIDATE uAPI for vIOMMU invalidations. Drivers can define different structures for vIOMMU invalidations v.s. HWPT ones.
Update the uAPI, kdoc, and selftest case accordingly.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- include/uapi/linux/iommufd.h | 9 ++++--- drivers/iommu/iommufd/hw_pagetable.c | 32 +++++++++++++++++++------ tools/testing/selftests/iommu/iommufd.c | 4 ++-- 3 files changed, 33 insertions(+), 12 deletions(-)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 3039519d71b7..cd9e135ef563 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -728,7 +728,7 @@ struct iommu_hwpt_vtd_s1_invalidate { /** * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE) * @size: sizeof(struct iommu_hwpt_invalidate) - * @hwpt_id: ID of a nested HWPT for cache invalidation + * @hwpt_id: ID of a nested HWPT or a vIOMMU, for cache invalidation * @data_uptr: User pointer to an array of driver-specific cache invalidation * data. * @data_type: One of enum iommu_hwpt_invalidate_data_type, defining the data @@ -739,8 +739,11 @@ struct iommu_hwpt_vtd_s1_invalidate { * Output the number of requests successfully handled by kernel. * @__reserved: Must be 0. * - * Invalidate the iommu cache for user-managed page table. Modifications on a - * user-managed page table should be followed by this operation to sync cache. + * Invalidate iommu cache for user-managed page table or vIOMMU. Modifications + * on a user-managed page table should be followed by this operation, if a HWPT + * is passed in via @hwpt_id. Other caches, such as device cache or descriptor + * cache can be flushed if a vIOMMU is passed in via the @hwpt_id field. + * * Each ioctl can support one or more cache invalidation requests in the array * that has a total size of @entry_len * @entry_num. * diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index ccaaf801955c..211b3f8b4366 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -444,7 +444,7 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) .entry_len = cmd->entry_len, .entry_num = cmd->entry_num, }; - struct iommufd_hw_pagetable *hwpt; + struct iommufd_object *pt_obj; u32 done_num = 0; int rc;
@@ -458,17 +458,35 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) goto out; }
- hwpt = iommufd_get_hwpt_nested(ucmd, cmd->hwpt_id); - if (IS_ERR(hwpt)) { - rc = PTR_ERR(hwpt); + pt_obj = iommufd_get_object(ucmd->ictx, cmd->hwpt_id, IOMMUFD_OBJ_ANY); + if (IS_ERR(pt_obj)) { + rc = PTR_ERR(pt_obj); goto out; } + if (pt_obj->type == IOMMUFD_OBJ_HWPT_NESTED) { + struct iommufd_hw_pagetable *hwpt = + container_of(pt_obj, struct iommufd_hw_pagetable, obj); + + rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain, + &data_array); + } else if (pt_obj->type == IOMMUFD_OBJ_VIOMMU) { + struct iommufd_viommu *viommu = + container_of(pt_obj, struct iommufd_viommu, obj); + + if (!viommu->ops || !viommu->ops->cache_invalidate) { + rc = -EOPNOTSUPP; + goto out_put_pt; + } + rc = viommu->ops->cache_invalidate(viommu, &data_array); + } else { + rc = -EINVAL; + goto out_put_pt; + }
- rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain, - &data_array); done_num = data_array.entry_num;
- iommufd_put_object(ucmd->ictx, &hwpt->obj); +out_put_pt: + iommufd_put_object(ucmd->ictx, pt_obj); out: cmd->entry_num = done_num; if (iommufd_ucmd_respond(ucmd, sizeof(*cmd))) diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index af00b082656e..2651e2b58423 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -362,9 +362,9 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested) EXPECT_ERRNO(EBUSY, _test_ioctl_destroy(self->fd, parent_hwpt_id));
- /* hwpt_invalidate only supports a user-managed hwpt (nested) */ + /* hwpt_invalidate does not support a parent hwpt */ num_inv = 1; - test_err_hwpt_invalidate(ENOENT, parent_hwpt_id, inv_reqs, + test_err_hwpt_invalidate(EINVAL, parent_hwpt_id, inv_reqs, IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, sizeof(*inv_reqs), &num_inv); assert(!num_inv);
On Wed, Oct 09, 2024 at 09:38:19AM -0700, Nicolin Chen wrote:
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
From: Jason Gunthorpe jgg@nvidia.com
The iommu_copy_struct_from_user_array helper can be used to copy a single entry from a user array which might not be efficient if the array is big.
Add a new iommu_copy_struct_from_full_user_array to copy the entire user array at once. Update the existing iommu_copy_struct_from_user_array kdoc accordingly.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- include/linux/iommu.h | 49 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 48 insertions(+), 1 deletion(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 1de2aebc4d92..c980fd0e2174 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -494,7 +494,9 @@ static inline int __iommu_copy_struct_from_user_array( * @index: Index to the location in the array to copy user data from * @min_last: The last member of the data structure @kdst points in the * initial version. - * Return 0 for success, otherwise -error. + * + * Copy a single entry from a user array. Return 0 for success, otherwise + * -error. */ #define iommu_copy_struct_from_user_array(kdst, user_array, data_type, index, \ min_last) \ @@ -502,6 +504,51 @@ static inline int __iommu_copy_struct_from_user_array( kdst, user_array, data_type, index, sizeof(*(kdst)), \ offsetofend(typeof(*(kdst)), min_last))
+ +/** + * iommu_copy_struct_from_full_user_array - Copy iommu driver specific user + * space data from an iommu_user_data_array + * @kdst: Pointer to an iommu driver specific user data that is defined in + * include/uapi/linux/iommufd.h + * @kdst_entry_size: sizeof(*kdst) + * @user_array: Pointer to a struct iommu_user_data_array for a user space + * array + * @data_type: The data type of the @kdst. Must match with @user_array->type + * + * Copy the entire user array. kdst must have room for kdst_entry_size * + * user_array->entry_num bytes. Return 0 for success, otherwise -error. + */ +static inline int +iommu_copy_struct_from_full_user_array(void *kdst, size_t kdst_entry_size, + struct iommu_user_data_array *user_array, + unsigned int data_type) +{ + unsigned int i; + int ret; + + if (user_array->type != data_type) + return -EINVAL; + if (!user_array->entry_num) + return -EINVAL; + if (likely(user_array->entry_len == kdst_entry_size)) { + if (copy_from_user(kdst, user_array->uptr, + user_array->entry_num * + user_array->entry_len)) + return -EFAULT; + } + + /* Copy item by item */ + for (i = 0; i != user_array->entry_num; i++) { + ret = copy_struct_from_user( + kdst + kdst_entry_size * i, kdst_entry_size, + user_array->uptr + user_array->entry_len * i, + user_array->entry_len); + if (ret) + return ret; + } + return 0; +} + /** * struct iommu_ops - iommu ops and capabilities * @capable: check capability
This avoids a bigger trouble of moving the struct iommufd_device to the public header.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- include/linux/iommufd.h | 7 +++++++ drivers/iommu/iommufd/viommu_api.c | 7 +++++++ 2 files changed, 14 insertions(+)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index e58b3e43aa7b..18d9b95f1cbc 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -85,6 +85,7 @@ struct iommufd_vdevice { struct iommufd_viommu *viommu; u64 id; /* per-vIOMMU virtual ID */ }; +struct device *vdev_to_dev(struct iommufd_vdevice *vdev);
/** * struct iommufd_viommu_ops - vIOMMU specific operations @@ -134,6 +135,7 @@ __iommufd_viommu_alloc(struct iommufd_ctx *ictx, size_t size, const struct iommufd_viommu_ops *ops); struct iommufd_vdevice * __iommufd_vdevice_alloc(struct iommufd_ctx *ictx, size_t size); +struct device *vdev_to_dev(struct iommufd_vdevice *vdev); #else /* !CONFIG_IOMMUFD */ static inline struct iommufd_ctx *iommufd_ctx_from_file(struct file *file) { @@ -187,6 +189,11 @@ __iommufd_vdevice_alloc(struct iommufd_ctx *ictx, size_t size) { return ERR_PTR(-EOPNOTSUPP); } + +static inline struct device *vdev_to_dev(struct iommufd_vdevice *vdev) +{ + return NULL; +} #endif /* CONFIG_IOMMUFD */
/* diff --git a/drivers/iommu/iommufd/viommu_api.c b/drivers/iommu/iommufd/viommu_api.c index 8419df3b658c..281e85be520d 100644 --- a/drivers/iommu/iommufd/viommu_api.c +++ b/drivers/iommu/iommufd/viommu_api.c @@ -69,3 +69,10 @@ __iommufd_vdevice_alloc(struct iommufd_ctx *ictx, size_t size) return container_of(obj, struct iommufd_vdevice, obj); } EXPORT_SYMBOL_NS_GPL(__iommufd_vdevice_alloc, IOMMUFD); + +/* Caller should xa_lock(&viommu->vdevs) to protect the return value */ +struct device *vdev_to_dev(struct iommufd_vdevice *vdev) +{ + return vdev ? vdev->idev->dev : NULL; +} +EXPORT_SYMBOL_NS_GPL(vdev_to_dev, IOMMUFD);
On Wed, Oct 09, 2024 at 09:38:21AM -0700, Nicolin Chen wrote:
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
Similar to the coverage of cache_invalidate_user for iotlb invalidation, add a device cache and a viommu_cache_invalidate function to test it out.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_test.h | 25 +++++++++ drivers/iommu/iommufd/selftest.c | 79 +++++++++++++++++++++++++++- 2 files changed, 103 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index edced4ac7cd3..05f57a968d25 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -54,6 +54,11 @@ enum { MOCK_NESTED_DOMAIN_IOTLB_NUM = 4, };
+enum { + MOCK_DEV_CACHE_ID_MAX = 3, + MOCK_DEV_CACHE_NUM = 4, +}; + struct iommu_test_cmd { __u32 size; __u32 op; @@ -152,6 +157,7 @@ struct iommu_test_hw_info { /* Should not be equal to any defined value in enum iommu_hwpt_data_type */ #define IOMMU_HWPT_DATA_SELFTEST 0xdead #define IOMMU_TEST_IOTLB_DEFAULT 0xbadbeef +#define IOMMU_TEST_DEV_CACHE_DEFAULT 0xbaddad
/** * struct iommu_hwpt_selftest @@ -182,4 +188,23 @@ struct iommu_hwpt_invalidate_selftest {
#define IOMMU_VIOMMU_TYPE_SELFTEST 0xdeadbeef
+/* Should not be equal to any defined value in enum iommu_viommu_invalidate_data_type */ +#define IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST 0xdeadbeef +#define IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID 0xdadbeef + +/** + * struct iommu_viommu_invalidate_selftest - Invalidation data for Mock VIOMMU + * (IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST) + * @flags: Invalidate flags + * @cache_id: Invalidate cache entry index + * + * If IOMMU_TEST_INVALIDATE_ALL is set in @flags, @cache_id will be ignored + */ +struct iommu_viommu_invalidate_selftest { +#define IOMMU_TEST_INVALIDATE_FLAG_ALL (1 << 0) + __u32 flags; + __u32 vdev_id; + __u32 cache_id; +}; + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 87bc45b86f9e..8a1aef857922 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -149,6 +149,7 @@ struct mock_dev { struct device dev; unsigned long flags; int id; + u32 cache[MOCK_DEV_CACHE_NUM]; };
struct selftest_obj { @@ -578,9 +579,80 @@ static struct iommufd_vdevice *mock_vdevice_alloc(struct iommufd_viommu *viommu, return &mock_vdev->core; }
+static int mock_viommu_cache_invalidate(struct iommufd_viommu *viommu, + struct iommu_user_data_array *array) +{ + struct iommu_viommu_invalidate_selftest *cmds; + struct iommu_viommu_invalidate_selftest *cur; + struct iommu_viommu_invalidate_selftest *end; + int rc; + + /* A zero-length array is allowed to validate the array type */ + if (array->entry_num == 0 && + array->type == IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST) { + array->entry_num = 0; + return 0; + } + + cmds = kcalloc(array->entry_num, sizeof(*cmds), GFP_KERNEL); + if (!cmds) + return -ENOMEM; + cur = cmds; + end = cmds + array->entry_num; + + static_assert(sizeof(*cmds) == 3 * sizeof(u32)); + rc = iommu_copy_struct_from_full_user_array( + cmds, sizeof(*cmds), array, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST); + if (rc) + goto out; + + while (cur != end) { + XA_STATE(xas, &viommu->vdevs, (unsigned long)cur->vdev_id); + struct mock_dev *mdev; + struct device *dev; + int i; + + if (cur->flags & ~IOMMU_TEST_INVALIDATE_FLAG_ALL) { + rc = -EOPNOTSUPP; + goto out; + } + + if (cur->cache_id > MOCK_DEV_CACHE_ID_MAX) { + rc = -EINVAL; + goto out; + } + + xa_lock(&viommu->vdevs); + dev = vdev_to_dev(xas_load(&xas)); + if (!dev) { + xa_unlock(&viommu->vdevs); + rc = -EINVAL; + goto out; + } + mdev = container_of(dev, struct mock_dev, dev); + + if (cur->flags & IOMMU_TEST_INVALIDATE_FLAG_ALL) { + /* Invalidate all cache entries and ignore cache_id */ + for (i = 0; i < MOCK_DEV_CACHE_NUM; i++) + mdev->cache[i] = 0; + } else { + mdev->cache[cur->cache_id] = 0; + } + xa_unlock(&viommu->vdevs); + + cur++; + } +out: + array->entry_num = cur - cmds; + kfree(cmds); + return rc; +} + static struct iommufd_viommu_ops mock_viommu_ops = { .free = mock_viommu_free, .vdevice_alloc = mock_vdevice_alloc, + .cache_invalidate = mock_viommu_cache_invalidate, };
static struct iommufd_viommu * @@ -627,6 +699,9 @@ static const struct iommu_ops mock_ops = { .dev_disable_feat = mock_dev_disable_feat, .user_pasid_table = true, .viommu_alloc = mock_viommu_alloc, + .default_viommu_ops = &(struct iommufd_viommu_ops){ + .cache_invalidate = mock_viommu_cache_invalidate, + }, .default_domain_ops = &(struct iommu_domain_ops){ .free = mock_domain_free, @@ -759,7 +834,7 @@ static void mock_dev_release(struct device *dev) static struct mock_dev *mock_dev_create(unsigned long dev_flags) { struct mock_dev *mdev; - int rc; + int rc, i;
if (dev_flags & ~(MOCK_FLAGS_DEVICE_NO_DIRTY | MOCK_FLAGS_DEVICE_HUGE_IOVA)) @@ -773,6 +848,8 @@ static struct mock_dev *mock_dev_create(unsigned long dev_flags) mdev->flags = dev_flags; mdev->dev.release = mock_dev_release; mdev->dev.bus = &iommufd_mock_bus_type.bus; + for (i = 0; i < MOCK_DEV_CACHE_NUM; i++) + mdev->cache[i] = IOMMU_TEST_DEV_CACHE_DEFAULT;
rc = ida_alloc(&mock_dev_ida, GFP_KERNEL); if (rc < 0)
Similar to IOMMU_TEST_OP_MD_CHECK_IOTLB verifying a mock_domain's iotlb, IOMMU_TEST_OP_DEV_CHECK_CACHE will be used to verify a mock_dev's cache.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_test.h | 5 ++++ tools/testing/selftests/iommu/iommufd_utils.h | 24 +++++++++++++++++++ drivers/iommu/iommufd/selftest.c | 24 +++++++++++++++++++ tools/testing/selftests/iommu/iommufd.c | 7 +++++- 4 files changed, 59 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 05f57a968d25..b226636aa07a 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -23,6 +23,7 @@ enum { IOMMU_TEST_OP_DIRTY, IOMMU_TEST_OP_MD_CHECK_IOTLB, IOMMU_TEST_OP_TRIGGER_IOPF, + IOMMU_TEST_OP_DEV_CHECK_CACHE, };
enum { @@ -140,6 +141,10 @@ struct iommu_test_cmd { __u32 perm; __u64 addr; } trigger_iopf; + struct { + __u32 id; + __u32 cache; + } check_dev_cache; }; __u32 last; }; diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index ccd1f65df0b0..b3da8c96a828 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -234,6 +234,30 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, __u32 ft_i test_cmd_hwpt_check_iotlb(hwpt_id, i, expected); \ })
+#define test_cmd_dev_check_cache(device_id, cache_id, expected) \ + ({ \ + struct iommu_test_cmd test_cmd = { \ + .size = sizeof(test_cmd), \ + .op = IOMMU_TEST_OP_DEV_CHECK_CACHE, \ + .id = device_id, \ + .check_dev_cache = { \ + .id = cache_id, \ + .cache = expected, \ + }, \ + }; \ + ASSERT_EQ(0, \ + ioctl(self->fd, \ + _IOMMU_TEST_CMD(IOMMU_TEST_OP_DEV_CHECK_CACHE),\ + &test_cmd)); \ + }) + +#define test_cmd_dev_check_cache_all(device_id, expected) \ + ({ \ + int c; \ + for (c = 0; c < MOCK_DEV_CACHE_NUM; c++) \ + test_cmd_dev_check_cache(device_id, c, expected); \ + }) + static int _test_cmd_hwpt_invalidate(int fd, __u32 hwpt_id, void *reqs, uint32_t data_type, uint32_t lreq, uint32_t *nreqs) diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 8a1aef857922..ac3836c1dbcd 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -1106,6 +1106,26 @@ static int iommufd_test_md_check_iotlb(struct iommufd_ucmd *ucmd, return rc; }
+static int iommufd_test_dev_check_cache(struct iommufd_ucmd *ucmd, + u32 idev_id, unsigned int cache_id, + u32 cache) +{ + struct iommufd_device *idev; + struct mock_dev *mdev; + int rc = 0; + + idev = iommufd_get_device(ucmd, idev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + mdev = container_of(idev->dev, struct mock_dev, dev); + + if (cache_id > MOCK_DEV_CACHE_ID_MAX || + mdev->cache[cache_id] != cache) + rc = -EINVAL; + iommufd_put_object(ucmd->ictx, &idev->obj); + return rc; +} + struct selftest_access { struct iommufd_access *access; struct file *file; @@ -1615,6 +1635,10 @@ int iommufd_test(struct iommufd_ucmd *ucmd) return iommufd_test_md_check_iotlb(ucmd, cmd->id, cmd->check_iotlb.id, cmd->check_iotlb.iotlb); + case IOMMU_TEST_OP_DEV_CHECK_CACHE: + return iommufd_test_dev_check_cache(ucmd, cmd->id, + cmd->check_dev_cache.id, + cmd->check_dev_cache.cache); case IOMMU_TEST_OP_CREATE_ACCESS: return iommufd_test_create_access(ucmd, cmd->id, cmd->create_access.flags); diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 2651e2b58423..d5c5e3389182 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -222,6 +222,8 @@ FIXTURE_SETUP(iommufd_ioas) for (i = 0; i != variant->mock_domains; i++) { test_cmd_mock_domain(self->ioas_id, &self->stdev_id, &self->hwpt_id, &self->device_id); + test_cmd_dev_check_cache_all(self->device_id, + IOMMU_TEST_DEV_CACHE_DEFAULT); self->base_iova = MOCK_APERTURE_START; } } @@ -1386,9 +1388,12 @@ FIXTURE_SETUP(iommufd_mock_domain)
ASSERT_GE(ARRAY_SIZE(self->hwpt_ids), variant->mock_domains);
- for (i = 0; i != variant->mock_domains; i++) + for (i = 0; i != variant->mock_domains; i++) { test_cmd_mock_domain(self->ioas_id, &self->stdev_ids[i], &self->hwpt_ids[i], &self->idev_ids[i]); + test_cmd_dev_check_cache_all(self->idev_ids[0], + IOMMU_TEST_DEV_CACHE_DEFAULT); + } self->hwpt_id = self->hwpt_ids[0];
self->mmap_flags = MAP_SHARED | MAP_ANONYMOUS;
Add a viommu_cache test function to cover vIOMMU invalidations using the updated IOMMU_HWPT_INVALIDATE ioctl, which now allows passing in a vIOMMU via its hwpt_id field.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- tools/testing/selftests/iommu/iommufd_utils.h | 32 ++++ tools/testing/selftests/iommu/iommufd.c | 173 ++++++++++++++++++ 2 files changed, 205 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index b3da8c96a828..29e9cd1fdd82 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -289,6 +289,38 @@ static int _test_cmd_hwpt_invalidate(int fd, __u32 hwpt_id, void *reqs, data_type, lreq, nreqs)); \ })
+static int _test_cmd_viommu_invalidate(int fd, __u32 viommu_id, void *reqs, + uint32_t data_type, uint32_t lreq, + uint32_t *nreqs) +{ + struct iommu_hwpt_invalidate cmd = { + .size = sizeof(cmd), + .hwpt_id = viommu_id, + .data_type = data_type, + .data_uptr = (uint64_t)reqs, + .entry_len = lreq, + .entry_num = *nreqs, + }; + int rc = ioctl(fd, IOMMU_HWPT_INVALIDATE, &cmd); + *nreqs = cmd.entry_num; + return rc; +} + +#define test_cmd_viommu_invalidate(viommu, reqs, lreq, nreqs) \ + ({ \ + ASSERT_EQ(0, \ + _test_cmd_viommu_invalidate(self->fd, viommu, reqs, \ + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, \ + lreq, nreqs)); \ + }) +#define test_err_viommu_invalidate(_errno, viommu_id, reqs, data_type, lreq, \ + nreqs) \ + ({ \ + EXPECT_ERRNO(_errno, _test_cmd_viommu_invalidate( \ + self->fd, viommu_id, reqs, \ + data_type, lreq, nreqs)); \ + }) + static int _test_cmd_access_replace_ioas(int fd, __u32 access_id, unsigned int ioas_id) { diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index d5c5e3389182..5a41bac3c1ab 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -2495,4 +2495,177 @@ TEST_F(iommufd_viommu, vdevice_alloc) } }
+TEST_F(iommufd_viommu, vdevice_cache) +{ + struct iommu_viommu_invalidate_selftest inv_reqs[2] = {}; + uint32_t viommu_id = self->viommu_id; + uint32_t dev_id = self->device_id; + uint32_t vdev_id = 0; + uint32_t num_inv; + + if (dev_id) { + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id); + + test_cmd_dev_check_cache_all(dev_id, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* Check data_type by passing zero-length array */ + num_inv = 0; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: Invalid data_type */ + num_inv = 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: structure size sanity */ + num_inv = 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs) + 1, &num_inv); + assert(!num_inv); + + num_inv = 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + 1, &num_inv); + assert(!num_inv); + + /* Negative test: invalid flag is passed */ + num_inv = 1; + inv_reqs[0].flags = 0xffffffff; + inv_reqs[0].vdev_id = 0x99; + test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: invalid data_uptr when array is not empty */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + test_err_viommu_invalidate(EINVAL, viommu_id, NULL, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: invalid entry_len when array is not empty */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + 0, &num_inv); + assert(!num_inv); + + /* Negative test: invalid cache_id */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = MOCK_DEV_CACHE_ID_MAX + 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: invalid vdev_id */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x9; + inv_reqs[0].cache_id = 0; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* + * Invalidate the 1st cache entry but fail the 2nd request + * due to invalid flags configuration in the 2nd request. + */ + num_inv = 2; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 0; + inv_reqs[1].flags = 0xffffffff; + inv_reqs[1].vdev_id = 0x99; + inv_reqs[1].cache_id = 1; + test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache(dev_id, 0, 0); + test_cmd_dev_check_cache(dev_id, 1, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 2, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 3, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* + * Invalidate the 1st cache entry but fail the 2nd request + * due to invalid cache_id configuration in the 2nd request. + */ + num_inv = 2; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 0; + inv_reqs[1].flags = 0; + inv_reqs[1].vdev_id = 0x99; + inv_reqs[1].cache_id = MOCK_DEV_CACHE_ID_MAX + 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache(dev_id, 0, 0); + test_cmd_dev_check_cache(dev_id, 1, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 2, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 3, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* Invalidate the 2nd cache entry and verify */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 1; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache(dev_id, 0, 0); + test_cmd_dev_check_cache(dev_id, 1, 0); + test_cmd_dev_check_cache(dev_id, 2, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 3, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* Invalidate the 3rd and 4th cache entries and verify */ + num_inv = 2; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 2; + inv_reqs[1].flags = 0; + inv_reqs[1].vdev_id = 0x99; + inv_reqs[1].cache_id = 3; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 2); + test_cmd_dev_check_cache_all(dev_id, 0); + + /* Invalidate all cache entries for nested_dev_id[1] and verify */ + num_inv = 1; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_ALL; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache_all(dev_id, 0); + test_ioctl_destroy(vdev_id); + } +} + TEST_HARNESS_MAIN
With the introduction of the new object and its infrastructure, update the doc and the vIOMMU graph to reflect that.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- Documentation/userspace-api/iommufd.rst | 58 ++++++++++++++++++------- 1 file changed, 42 insertions(+), 16 deletions(-)
diff --git a/Documentation/userspace-api/iommufd.rst b/Documentation/userspace-api/iommufd.rst index 37eb1adda57b..3c27cc92c2cb 100644 --- a/Documentation/userspace-api/iommufd.rst +++ b/Documentation/userspace-api/iommufd.rst @@ -94,6 +94,19 @@ Following IOMMUFD objects are exposed to userspace: backed by corresponding vIOMMU objects, in which case a guest OS would do the "dispatch" naturally instead of VMM trappings.
+ - IOMMUFD_OBJ_VDEVICE, representing a virtual device for an IOMMUFD_OBJ_DEVICE + against an IOMMUFD_OBJ_VIOMMU. This virtual device holds the device's virtual + information or attributes (related to the vIOMMU) in a VM. An immediate vDATA + example can be the virtual ID of the device on a vIOMMU, which is a unique ID + that VMM assigns to the device for a translation channel/port of the vIOMMU, + e.g. vSID of ARM SMMUv3, vDeviceID of AMD IOMMU, and vID of Intel VT-d to a + Context Table. Potential use cases of some advanced security information can + be forwarded via this object too, such as security level or realm information + in a Confidential Compute Architecture. A VMM should create a vDEVICE object + to forward all the device information in a VM, when it connects a device to a + vIOMMU, which is a separate ioctl call from attaching the same device to an + HWPT_PAGING that the vIOMMU holds. + All user-visible objects are destroyed via the IOMMU_DESTROY uAPI.
The diagrams below show relationships between user-visible objects and kernel @@ -133,23 +146,26 @@ creating the objects and links:: |____________| |____________| |______|
_______________________________________________________________________ - | iommufd (with vIOMMU) | + | iommufd (with vIOMMU/vDEVICE) | | | - | [5] | - | _____________ | - | | | | - | [1] | vIOMMU | [4] [2] | - | ________________ | | _____________ ________ | - | | | | [3] | | | | | | - | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | - | |________________| |_____________| |_____________| |________| | - | | | | | | - |_________|____________________|__________________|_______________|_____| - | | | | - | ______v_____ ______v_____ ___v__ - | PFN storage | (paging) | | (nested) | |struct| - |------------>|iommu_domain|<----|iommu_domain|<----|device| - |____________| |____________| |______| + | [5] [6] | + | _____________ _____________ | + | | | | | | + | |----------------| vIOMMU |<---| vDEVICE |<----| | + | | | | |_____________| | | + | | | | | | + | | [1] | | [4] | [2] | + | | ______ | | _____________ _|______ | + | | | | | [3] | | | | | | + | | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | + | | |______| |_____________| |_____________| |________| | + | | | | | | | + |______|________|______________|__________________|_______________|_____| + | | | | | + ______v_____ | ______v_____ ______v_____ ___v__ + | struct | | PFN | (paging) | | (nested) | |struct| + |iommu_device| |------>|iommu_domain|<----|iommu_domain|<----|device| + |____________| storage|____________| |____________| |______|
1. IOMMUFD_OBJ_IOAS is created via the IOMMU_IOAS_ALLOC uAPI. An iommufd can hold multiple IOAS objects. IOAS is the most generic object and does not @@ -212,6 +228,15 @@ creating the objects and links:: the vIOMMU object and the HWPT_PAGING, then this vIOMMU object can be used as a nesting parent object to allocate an HWPT_NESTED object described above.
+6. IOMMUFD_OBJ_VDEVICE can be only manually created via the IOMMU_VDEVICE_ALLOC + uAPI, provided a viommu_id for an iommufd_viommu object and a dev_id for an + iommufd_device object. The vDEVICE object will be the binding between these + two parent objects. Another @virt_id will be also set via the uAPI providing + the iommufd core an index to store the vDEVICE object to a vDEVICE array per + vIOMMU. If necessary, the IOMMU driver may choose to implement a vdevce_alloc + op to init its HW for virtualization feature related to a vDEVICE. Successful + completion of this operation sets up the linkages between vIOMMU and device. + A device can only bind to an iommufd due to DMA ownership claim and attach to at most one IOAS object (no support of PASID yet).
@@ -225,6 +250,7 @@ User visible objects are backed by following datastructures: - iommufd_hwpt_paging for IOMMUFD_OBJ_HWPT_PAGING. - iommufd_hwpt_nested for IOMMUFD_OBJ_HWPT_NESTED. - iommufd_viommu for IOMMUFD_OBJ_VIOMMU. +- iommufd_vdevice for IOMMUFD_OBJ_VDEVICE
Several terminologies when looking at these datastructures:
On Wed, Oct 09, 2024 at 09:38:25AM -0700, Nicolin Chen wrote:
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
Implement the vIOMMU's cache_invalidate op for user space to invalidate the IOTLB entries, Device ATS and CD entries that are still cached by hardware.
Add struct iommu_viommu_arm_smmuv3_invalidate defining invalidation entries that are simply in the native format of a 128-bit TLBI command. Scan those commands against the permitted command list and fix their VMID/SID fields.
Co-developed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Eric Auger eric.auger@redhat.com Co-developed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 + include/uapi/linux/iommufd.h | 24 ++++ .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 131 +++++++++++++++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 6 +- 4 files changed, 162 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 844d1dfdea55..000af931a30c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -529,6 +529,7 @@ struct arm_smmu_cmdq_ent { #define CMDQ_OP_TLBI_NH_ALL 0x10 #define CMDQ_OP_TLBI_NH_ASID 0x11 #define CMDQ_OP_TLBI_NH_VA 0x12 + #define CMDQ_OP_TLBI_NH_VAA 0x13 #define CMDQ_OP_TLBI_EL2_ALL 0x20 #define CMDQ_OP_TLBI_EL2_ASID 0x21 #define CMDQ_OP_TLBI_EL2_VA 0x22 @@ -949,6 +950,10 @@ void arm_smmu_attach_commit(struct arm_smmu_attach_state *state); void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master, const struct arm_smmu_ste *target);
+int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, + struct arm_smmu_cmdq *cmdq, u64 *cmds, int n, + bool sync); + #ifdef CONFIG_ARM_SMMU_V3_SVA bool arm_smmu_sva_supported(struct arm_smmu_device *smmu); bool arm_smmu_master_sva_supported(struct arm_smmu_master *master); diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index cd9e135ef563..d9e510ce67cf 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -684,9 +684,11 @@ struct iommu_hwpt_get_dirty_bitmap { * enum iommu_hwpt_invalidate_data_type - IOMMU HWPT Cache Invalidation * Data Type * @IOMMU_HWPT_INVALIDATE_DATA_VTD_S1: Invalidation data for VTD_S1 + * @IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3: Invalidation data for ARM SMMUv3 */ enum iommu_hwpt_invalidate_data_type { IOMMU_HWPT_INVALIDATE_DATA_VTD_S1 = 0, + IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3 = 1, };
/** @@ -725,6 +727,28 @@ struct iommu_hwpt_vtd_s1_invalidate { __u32 __reserved; };
+/** + * struct iommu_viommu_arm_smmuv3_invalidate - ARM SMMUv3 cahce invalidation + * (IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3) + * @cmd: 128-bit cache invalidation command that runs in SMMU CMDQ. + * Must be little-endian. + * + * Supported command list only when passing in a vIOMMU via @hwpt_id: + * CMDQ_OP_TLBI_NSNH_ALL + * CMDQ_OP_TLBI_NH_VA + * CMDQ_OP_TLBI_NH_VAA + * CMDQ_OP_TLBI_NH_ALL + * CMDQ_OP_TLBI_NH_ASID + * CMDQ_OP_ATC_INV + * CMDQ_OP_CFGI_CD + * CMDQ_OP_CFGI_CD_ALL + * + * -EIO will be returned if the command is not supported. + */ +struct iommu_viommu_arm_smmuv3_invalidate { + __aligned_le64 cmd[2]; +}; + /** * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE) * @size: sizeof(struct iommu_hwpt_invalidate) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index 5e235fca8f13..1b82579eb252 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -191,6 +191,14 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, if (smmu_parent->smmu != master->smmu) return ERR_PTR(-EINVAL);
+ /* + * FORCE_SYNC is not set with FEAT_NESTING. Some study of the exact HW + * defect is needed to determine if arm_vsmmu_cache_invalidate() needs + * any change to remove this. + */ + if (WARN_ON(master->smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC)) + return ERR_PTR(-EOPNOTSUPP); + ret = iommu_copy_struct_from_user(&arg, user_data, IOMMU_HWPT_DATA_ARM_SMMUV3, ste); if (ret) @@ -213,6 +221,127 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, return &nested_domain->domain; }
+static int arm_vsmmu_vsid_to_sid(struct arm_vsmmu *vsmmu, u32 vsid, u32 *sid) +{ + XA_STATE(xas, &vsmmu->core.vdevs, (unsigned long)vsid); + struct arm_smmu_master *master; + struct device *dev; + int ret = 0; + + xa_lock(&vsmmu->core.vdevs); + dev = vdev_to_dev(xas_load(&xas)); + if (!dev) { + ret = -EIO; + goto unlock; + } + master = dev_iommu_priv_get(dev); + + /* At this moment, iommufd only supports PCI device that has one SID */ + if (sid) + *sid = master->streams[0].id; +unlock: + xa_unlock(&vsmmu->core.vdevs); + return ret; +} + +/* + * Convert, in place, the raw invalidation command into an internal format that + * can be passed to arm_smmu_cmdq_issue_cmdlist(). Internally commands are + * stored in CPU endian. + * + * Enforce the VMID or SID on the command. + */ +static int +arm_vsmmu_convert_user_cmd(struct arm_vsmmu *vsmmu, + struct iommu_viommu_arm_smmuv3_invalidate *cmd) +{ + cmd->cmd[0] = le64_to_cpu(cmd->cmd[0]); + cmd->cmd[1] = le64_to_cpu(cmd->cmd[1]); + + switch (cmd->cmd[0] & CMDQ_0_OP) { + case CMDQ_OP_TLBI_NSNH_ALL: + /* Convert to NH_ALL */ + cmd->cmd[0] = CMDQ_OP_TLBI_NH_ALL | + FIELD_PREP(CMDQ_TLBI_0_VMID, vsmmu->vmid); + cmd->cmd[1] = 0; + break; + case CMDQ_OP_TLBI_NH_VA: + case CMDQ_OP_TLBI_NH_VAA: + case CMDQ_OP_TLBI_NH_ALL: + case CMDQ_OP_TLBI_NH_ASID: + cmd->cmd[0] &= ~CMDQ_TLBI_0_VMID; + cmd->cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, vsmmu->vmid); + break; + case CMDQ_OP_ATC_INV: + case CMDQ_OP_CFGI_CD: + case CMDQ_OP_CFGI_CD_ALL: + u32 sid, vsid = FIELD_GET(CMDQ_CFGI_0_SID, cmd->cmd[0]); + + if (arm_vsmmu_vsid_to_sid(vsmmu, vsid, &sid)) + return -EIO; + cmd->cmd[0] &= ~CMDQ_CFGI_0_SID; + cmd->cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, sid); + break; + default: + return -EIO; + } + return 0; +} + +static int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu, + struct iommu_user_data_array *array) +{ + struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core); + struct iommu_viommu_arm_smmuv3_invalidate *last; + struct iommu_viommu_arm_smmuv3_invalidate *cmds; + struct iommu_viommu_arm_smmuv3_invalidate *cur; + struct iommu_viommu_arm_smmuv3_invalidate *end; + struct arm_smmu_device *smmu = vsmmu->smmu; + int ret; + + cmds = kcalloc(array->entry_num, sizeof(*cmds), GFP_KERNEL); + if (!cmds) + return -ENOMEM; + cur = cmds; + end = cmds + array->entry_num; + + static_assert(sizeof(*cmds) == 2 * sizeof(u64)); + ret = iommu_copy_struct_from_full_user_array( + cmds, sizeof(*cmds), array, + IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3); + if (ret) + goto out; + + last = cmds; + while (cur != end) { + ret = arm_vsmmu_convert_user_cmd(vsmmu, cur); + if (ret) + goto out; + + /* FIXME work in blocks of CMDQ_BATCH_ENTRIES and copy each block? */ + cur++; + if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1) + continue; + + /* FIXME always uses the main cmdq rather than trying to group by type */ + ret = arm_smmu_cmdq_issue_cmdlist(smmu, &smmu->cmdq, last->cmd, + cur - last, true); + if (ret) { + cur--; + goto out; + } + last = cur; + } +out: + array->entry_num = cur - cmds; + kfree(cmds); + return ret; +} + +const struct iommufd_viommu_ops arm_vsmmu_ops = { + .cache_invalidate = arm_vsmmu_cache_invalidate, +}; + struct iommufd_viommu * arm_vsmmu_alloc(struct iommu_device *iommu_dev, struct iommu_domain *parent, struct iommufd_ctx *ictx, unsigned int viommu_type) @@ -225,7 +354,7 @@ arm_vsmmu_alloc(struct iommu_device *iommu_dev, struct iommu_domain *parent, if (viommu_type != IOMMU_VIOMMU_TYPE_ARM_SMMUV3) return ERR_PTR(-EOPNOTSUPP);
- vsmmu = iommufd_viommu_alloc(ictx, arm_vsmmu, core, NULL); + vsmmu = iommufd_viommu_alloc(ictx, arm_vsmmu, core, &arm_vsmmu_ops); if (IS_ERR(vsmmu)) return ERR_CAST(vsmmu);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 6a23e6dcd5cf..a2bbd140e232 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -766,9 +766,9 @@ static void arm_smmu_cmdq_write_entries(struct arm_smmu_cmdq *cmdq, u64 *cmds, * insert their own list of commands then all of the commands from one * CPU will appear before any of the commands from the other CPU. */ -static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, - struct arm_smmu_cmdq *cmdq, - u64 *cmds, int n, bool sync) +int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, + struct arm_smmu_cmdq *cmdq, u64 *cmds, int n, + bool sync) { u64 cmd_sync[CMDQ_ENT_DWORDS]; u32 prod;
On Thu, 10 Oct 2024 at 00:44, Nicolin Chen nicolinc@nvidia.com wrote:
Here got build error
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c:302:3: error: a label can only be part of a statement and a declaration is not a statement 302 | u32 sid, vsid = FIELD_GET(CMDQ_CFGI_0_SID, cmd->cmd[0]); | ^~~
Need {} to include. case CMDQ_OP_CFGI_CD_ALL: { ... }
Thanks
From: Jason Gunthorpe jgg@nvidia.com
Now, ATC invalidation can be done with the vIOMMU invalidation op. A guest owned IOMMU_DOMAIN_NESTED can do an ATS too. Allow it to pass in the EATS field via the vSTE words.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 4 ++- include/uapi/linux/iommufd.h | 2 +- .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 31 ++++++++++++++++--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 27 +++++++++++++--- 4 files changed, 54 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 000af931a30c..470bc3ee25ef 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -305,7 +305,7 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid) #define STRTAB_STE_1_NESTING_ALLOWED \ cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR | \ STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH | \ - STRTAB_STE_1_S1STALLD) + STRTAB_STE_1_S1STALLD | STRTAB_STE_1_EATS)
/* * Context descriptors. @@ -838,6 +838,7 @@ struct arm_smmu_domain { struct arm_smmu_nested_domain { struct iommu_domain domain; struct arm_smmu_domain *s2_parent; + bool enable_ats : 1;
__le64 ste[2]; }; @@ -879,6 +880,7 @@ struct arm_smmu_master_domain { struct list_head devices_elm; struct arm_smmu_master *master; ioasid_t ssid; + bool nested_ats_flush : 1; };
static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index d9e510ce67cf..9527a4ecfd56 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -404,7 +404,7 @@ struct iommu_hwpt_vtd_s1 { * a user stage-1 Context Descriptor Table. Must be little-endian. * Allowed fields: (Refer to "5.2 Stream Table Entry" in SMMUv3 HW Spec) * - word-0: V, Cfg, S1Fmt, S1ContextPtr, S1CDMax - * - word-1: S1DSS, S1CIR, S1COR, S1CSH, S1STALLD + * - word-1: EATS, S1DSS, S1CIR, S1COR, S1CSH, S1STALLD * * -EIO will be returned if @ste is not legal or contains any non-allowed field. * Cfg can be used to select a S1, Bypass or Abort configuration. A Bypass diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index 1b82579eb252..b491017921df 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -103,8 +103,6 @@ static int arm_smmu_attach_dev_nested(struct iommu_domain *domain, .master = master, .old_domain = iommu_get_domain_for_dev(dev), .ssid = IOMMU_NO_PASID, - /* Currently invalidation of ATC is not supported */ - .disable_ats = true, }; struct arm_smmu_ste ste; int ret; @@ -115,6 +113,15 @@ static int arm_smmu_attach_dev_nested(struct iommu_domain *domain, return -EBUSY;
mutex_lock(&arm_smmu_asid_lock); + /* + * The VM has to control the actual ATS state at the PCI device because + * we forward the invalidations directly from the VM. If the VM doesn't + * think ATS is on it will not generate ATC flushes and the ATC will + * become incoherent. Since we can't access the actual virtual PCI ATS + * config bit here base this off the EATS value in the STE. If the EATS + * is set then the VM must generate ATC flushes. + */ + state.disable_ats = !nested_domain->enable_ats; ret = arm_smmu_attach_prepare(&state, domain); if (ret) { mutex_unlock(&arm_smmu_asid_lock); @@ -140,8 +147,10 @@ static const struct iommu_domain_ops arm_smmu_nested_ops = { .free = arm_smmu_domain_nested_free, };
-static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg) +static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg, + bool *enable_ats) { + unsigned int eats; unsigned int cfg;
if (!(arg->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) { @@ -158,6 +167,18 @@ static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg) if (cfg != STRTAB_STE_0_CFG_ABORT && cfg != STRTAB_STE_0_CFG_BYPASS && cfg != STRTAB_STE_0_CFG_S1_TRANS) return -EIO; + + /* + * Only Full ATS or ATS UR is supported + * The EATS field will be set by arm_smmu_make_nested_domain_ste() + */ + eats = FIELD_GET(STRTAB_STE_1_EATS, le64_to_cpu(arg->ste[1])); + arg->ste[1] &= ~cpu_to_le64(STRTAB_STE_1_EATS); + if (eats != STRTAB_STE_1_EATS_ABT && eats != STRTAB_STE_1_EATS_TRANS) + return -EIO; + + if (cfg == STRTAB_STE_0_CFG_S1_TRANS) + *enable_ats = (eats == STRTAB_STE_1_EATS_TRANS); return 0; }
@@ -170,6 +191,7 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, struct arm_smmu_nested_domain *nested_domain; struct arm_smmu_domain *smmu_parent; struct iommu_hwpt_arm_smmuv3 arg; + bool enable_ats = false; int ret;
if (flags || !(master->smmu->features & ARM_SMMU_FEAT_NESTING)) @@ -204,7 +226,7 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, if (ret) return ERR_PTR(ret);
- ret = arm_smmu_validate_vste(&arg); + ret = arm_smmu_validate_vste(&arg, &enable_ats); if (ret) return ERR_PTR(ret);
@@ -215,6 +237,7 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, nested_domain->domain.type = IOMMU_DOMAIN_NESTED; nested_domain->domain.ops = &arm_smmu_nested_ops; nested_domain->s2_parent = smmu_parent; + nested_domain->enable_ats = enable_ats; nested_domain->ste[0] = arg.ste[0]; nested_domain->ste[1] = arg.ste[1] & ~cpu_to_le64(STRTAB_STE_1_EATS);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index a2bbd140e232..1cb4afe7a90a 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2107,7 +2107,16 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, if (!master->ats_enabled) continue;
- arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size, &cmd); + if (master_domain->nested_ats_flush) { + /* + * If a S2 used as a nesting parent is changed we have + * no option but to completely flush the ATC. + */ + arm_smmu_atc_inv_to_cmd(IOMMU_NO_PASID, 0, 0, &cmd); + } else { + arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size, + &cmd); + }
for (i = 0; i < master->num_streams; i++) { cmd.atc.sid = master->streams[i].id; @@ -2630,7 +2639,8 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
static struct arm_smmu_master_domain * arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain, - struct arm_smmu_master *master, ioasid_t ssid) + struct arm_smmu_master *master, ioasid_t ssid, + bool nested_ats_flush) { struct arm_smmu_master_domain *master_domain;
@@ -2639,7 +2649,8 @@ arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain, list_for_each_entry(master_domain, &smmu_domain->devices, devices_elm) { if (master_domain->master == master && - master_domain->ssid == ssid) + master_domain->ssid == ssid && + master_domain->nested_ats_flush == nested_ats_flush) return master_domain; } return NULL; @@ -2670,13 +2681,18 @@ static void arm_smmu_remove_master_domain(struct arm_smmu_master *master, { struct arm_smmu_domain *smmu_domain = to_smmu_domain_devices(domain); struct arm_smmu_master_domain *master_domain; + bool nested_ats_flush = false; unsigned long flags;
if (!smmu_domain) return;
+ if (domain->type == IOMMU_DOMAIN_NESTED) + nested_ats_flush = to_smmu_nested_domain(domain)->enable_ats; + spin_lock_irqsave(&smmu_domain->devices_lock, flags); - master_domain = arm_smmu_find_master_domain(smmu_domain, master, ssid); + master_domain = arm_smmu_find_master_domain(smmu_domain, master, ssid, + nested_ats_flush); if (master_domain) { list_del(&master_domain->devices_elm); kfree(master_domain); @@ -2743,6 +2759,9 @@ int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, return -ENOMEM; master_domain->master = master; master_domain->ssid = state->ssid; + if (new_domain->type == IOMMU_DOMAIN_NESTED) + master_domain->nested_ats_flush = + to_smmu_nested_domain(new_domain)->enable_ats;
/* * During prepare we want the current smmu_domain and new
From: Jason Gunthorpe jgg@nvidia.com
The SMMUv3 spec has a note that BYPASS and ATS don't work together under the STE EATS field definition. However there is another section "13.6.4 Full ATS skipping stage 1" that explains under certain conditions BYPASS and ATS do work together if the STE is using S1DSS to select BYPASS and the CD table has the possibility for a substream.
When these comments were written the understanding was that all forms of BYPASS just didn't work and this was to be a future problem to solve.
It turns out that ATS and IDENTITY will always work just fine:
- If STE.Config = BYPASS then the PCI ATS is disabled
- If a PASID domain is attached then S1DSS = BYPASS and ATS will be enabled. This meets the requirements of 13.6.4 to automatically generate 1:1 ATS replies on the RID.
Update the comments to reflect this.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 1cb4afe7a90a..236f930f9a97 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2745,9 +2745,14 @@ int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, * Translation Requests and Translated transactions are denied * as though ATS is disabled for the stream (STE.EATS == 0b00), * causing F_BAD_ATS_TREQ and F_TRANSL_FORBIDDEN events - * (IHI0070Ea 5.2 Stream Table Entry). Thus ATS can only be - * enabled if we have arm_smmu_domain, those always have page - * tables. + * (IHI0070Ea 5.2 Stream Table Entry). + * + * However, if we have installed a CD table and are using S1DSS + * then ATS will work in S1DSS bypass. See "13.6.4 Full ATS + * skipping stage 1". + * + * Disable ATS if we are going to create a normal 0b100 bypass + * STE. */ state->ats_enabled = !state->disable_ats && arm_smmu_ats_supported(master); @@ -3072,8 +3077,10 @@ static void arm_smmu_attach_dev_ste(struct iommu_domain *domain, if (arm_smmu_ssids_in_use(&master->cd_table)) { /* * If a CD table has to be present then we need to run with ATS - * on even though the RID will fail ATS queries with UR. This is - * because we have no idea what the PASID's need. + * on because we have to assume a PASID is using ATS. For + * IDENTITY this will setup things so that S1DSS=bypass which + * follows the explanation in "13.6.4 Full ATS skipping stage 1" + * and allows for ATS on the RID to work. */ state.cd_needs_ats = true; arm_smmu_attach_prepare(&state, domain);
On Wed, Oct 09, 2024 at 09:38:12AM -0700, Nicolin Chen wrote:
Don't thread series together like this with reply-to, it breaks b4 and other tools ability to tell them apart.. Just post them separately normally.
Jason
linux-kselftest-mirror@lists.linaro.org