This series introduces a new vIOMMU infrastructure and related ioctls.
IOMMUFD has been using the HWPT infrastructure for all cases, including a nested IO page table support. Yet, there're limitations for an HWPT-based structure to support some advanced HW-accelerated features, such as CMDQV on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-IOMMU environment, it is not straightforward for nested HWPTs to share the same parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone: a parent HWPT typically hold one stage-2 IO pagetable and tag it with only one ID in the cache entries. When sharing one large stage-2 IO pagetable across physical IOMMU instances, that one ID may not always be available across all the IOMMU instances. In other word, it's ideal for SW to have a different container for the stage-2 IO pagetable so it can hold another ID that's available. And this container will be able to hold some advanced feature too.
For this "different container", add vIOMMU, an additional layer to hold extra virtualization information: _______________________________________________________________________ | iommufd (with vIOMMU) | | _____________ | | | | | | |----------------| vIOMMU | | | | ______ | | _____________ ________ | | | | | | | | | | | | | | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | | | |______| |_____________| |_____________| |________| | | | | | | | | |______|________|______________|__________________|_______________|_____| | | | | | ______v_____ | ______v_____ ______v_____ ___v__ | struct | | PFN | (paging) | | (nested) | |struct| |iommu_device| |------>|iommu_domain|<----|iommu_domain|<----|device| |____________| storage|____________| |____________| |______|
The vIOMMU object should be seen as a slice of a physical IOMMU instance that is passed to or shared with a VM. That can be some HW/SW resources: - Security namespace for guest owned ID, e.g. guest-controlled cache tags - Access to a sharable nesting parent pagetable across physical IOMMUs - Non-affiliated event reporting (e.g. an invalidation queue error) - Virtualization of various platforms IDs, e.g. RIDs and others - Delivery of paravirtualized invalidation - Direct assigned invalidation queues - Direct assigned interrupts
On a multi-IOMMU system, the vIOMMU object must be instanced to the number of the physical IOMMUs that have a slice passed to (via device) a guest VM, while being able to hold the shareable parent HWPT. Each vIOMMU then just needs to allocate its own individual ID to tag its own cache: ---------------------------- ---------------- | | paging_hwpt0 | | hwpt_nested0 |--->| viommu0 ------------------ ---------------- | | IDx | ---------------------------- ---------------------------- ---------------- | | paging_hwpt0 | | hwpt_nested1 |--->| viommu1 ------------------ ---------------- | | IDy | ----------------------------
As an initial part-1, add IOMMUFD_CMD_VIOMMU_ALLOC ioctl for an allocation only. And implement it in arm-smmu-v3 driver as a real world use case.
More vIOMMU-based structs and ioctls will be introduced in the follow-up series to support vDEVICE, vIRQ (vEVENT) and vQUEUE objects. Although we repurposed the vIOMMU object from an earlier RFC, just for a referece: https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
This series is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p1-v5 (paring QEMU branch for testing will be provided with the part2 series)
Changelog v5 * Added "Reviewed-by" from Kevin * Reworked iommufd_viommu_alloc helper * Revised the uAPI kdoc for vIOMMU object * Revised comments for pluggable iommu_dev * Added a couple of cleanup patches for selftest * Renamed domain_alloc_nested op to alloc_domain_nested * Updated a few commit messages to reflect the latest series * Renamed iommufd_hwpt_nested_alloc_for_viommu to iommufd_viommu_alloc_hwpt_nested, and added flag validation v4 https://lore.kernel.org/all/cover.1729553811.git.nicolinc@nvidia.com/ * Added "Reviewed-by" from Jason * Dropped IOMMU_VIOMMU_TYPE_DEFAULT support * Dropped iommufd_object_alloc_elm renamings * Renamed iommufd's viommu_api.c to driver.c * Reworked iommufd_viommu_alloc helper * Added a separate iommufd_hwpt_nested_alloc_for_viommu function for hwpt_nested allocations on a vIOMMU, and added comparison between viommu->iommu_dev->ops and dev_iommu_ops(idev->dev) * Replaced s2_parent with vsmmu in arm_smmu_nested_domain * Replaced domain_alloc_user in iommu_ops with domain_alloc_nested in viommu_ops * Replaced wait_queue_head_t with a completion, to delay the unplug of mock_iommu_dev * Corrected documentation graph that was missing struct iommu_device * Added an iommufd_verify_unfinalized_object helper to verify driver- allocated vIOMMU/vDEVICE objects * Added missing test cases for TEST_LENGTH and fail_nth v3 https://lore.kernel.org/all/cover.1728491453.git.nicolinc@nvidia.com/ * Rebased on top of Jason's nesting v3 series https://lore.kernel.org/all/0-v3-e2e16cd7467f+2a6a1-smmuv3_nesting_jgg@nvidi... * Split the series into smaller parts * Added Jason's Reviewed-by * Added back viommu->iommu_dev * Added support for driver-allocated vIOMMU v.s. core-allocated * Dropped arm_smmu_cache_invalidate_user * Added an iommufd_test_wait_for_users() in selftest * Reworked test code to make viommu an individual FIXTURE * Added missing TEST_LENGTH case for the new ioctl command v2 https://lore.kernel.org/all/cover.1724776335.git.nicolinc@nvidia.com/ * Limited vdev_id to one per idev * Added a rw_sem to protect the vdev_id list * Reworked driver-level APIs with proper lockings * Added a new viommu_api file for IOMMUFD_DRIVER config * Dropped useless iommu_dev point from the viommu structure * Added missing index numnbers to new types in the uAPI header * Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one * Reworked mock_viommu_cache_invalidate() using the new iommu helper * Reordered details of set/unset_vdev_id handlers for proper lockings v1 https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/
Thanks! Nicolin
Nicolin Chen (13): iommufd: Move struct iommufd_object to public iommufd header iommufd: Introduce IOMMUFD_OBJ_VIOMMU and its related struct iommufd: Add iommufd_verify_unfinalized_object iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl iommufd: Add alloc_domain_nested op to iommufd_viommu_ops iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC iommufd/selftest: Add container_of helpers iommufd/selftest: Prepare for mock_viommu_alloc_domain_nested() iommufd/selftest: Add refcount to mock_iommu_device iommufd/selftest: Add IOMMU_VIOMMU_TYPE_SELFTEST iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage Documentation: userspace-api: iommufd: Update vIOMMU iommu/arm-smmu-v3: Add IOMMU_VIOMMU_TYPE_ARM_SMMUV3 support
drivers/iommu/iommufd/Makefile | 5 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 26 +- drivers/iommu/iommufd/iommufd_private.h | 36 +-- drivers/iommu/iommufd/iommufd_test.h | 2 + include/linux/iommu.h | 14 + include/linux/iommufd.h | 86 ++++++ include/uapi/linux/iommufd.h | 56 +++- tools/testing/selftests/iommu/iommufd_utils.h | 28 ++ .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 80 ++++-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +- drivers/iommu/iommufd/driver.c | 38 +++ drivers/iommu/iommufd/hw_pagetable.c | 71 ++++- drivers/iommu/iommufd/main.c | 58 ++-- drivers/iommu/iommufd/selftest.c | 256 ++++++++++++------ drivers/iommu/iommufd/viommu.c | 89 ++++++ tools/testing/selftests/iommu/iommufd.c | 87 ++++++ .../selftests/iommu/iommufd_fail_nth.c | 11 + Documentation/userspace-api/iommufd.rst | 69 ++++- 18 files changed, 827 insertions(+), 194 deletions(-) create mode 100644 drivers/iommu/iommufd/driver.c create mode 100644 drivers/iommu/iommufd/viommu.c