The vIOMMU object is designed to represent a slice of an IOMMU HW for its virtualization features shared with or passed to user space (a VM mostly) in a way of HW acceleration. This extended the HWPT-based design for more advanced virtualization feature.
A vCMDQ introduced by this series as a part of the vIOMMU infrastructure represents a HW supported queue/buffer for VM to use exclusively, e.g. - NVIDIA's virtual command queue - AMD vIOMMU's command buffer either of which is an IOMMU HW feature to directly load and execute cache invalidation commands issued by a guest kernel, to shoot down TLB entries that HW cached for guest-owned stage-1 page table entries. This is a big improvement since there is no VM Exit during an invalidation, compared to the traditional invalidation pathway by trapping a guest-own invalidation queue and forwarding those commands/requests to the host kernel that will eventually fill a HW-owned queue to execute those commands.
Thus, a vCMDQ object, as an initial use case, is all about a guest-owned HW command queue that VMM can allocate/configure depending on the request from a guest kernel. Introduce a new IOMMUFD_OBJ_VCMDQ and its allocator IOMMUFD_CMD_VCMDQ_ALLOC allowing VMM to forward the IOMMU-specific queue info, such as queue base address, size, and etc.
Meanwhile, a guest-owned command queue needs the kernel (a command queue driver) to control the queue by reading/writing its consumer and producer indexes, which means the command queue HW allows the guest kernel to get a direct R/W access to those registers. Introduce an mmap infrastructure to the iommufd core so as to support pass through a piece of MMIO region from the host physical address space to the guest physical address space. The VMA info (vm_pgoff/size) used by an mmap must be pre-allocated during the IOMMUFD_CMD_VCMDQ_ALLOC and given those info to the user space as an output driver-data by the IOMMUFD_CMD_VCMDQ_ALLOC. So, this requires a driver-specific user data support by a vIOMMU object.
As a real-world use case, this series implements a vCMDQ support to the tegra241-cmdqv driver for the vCMDQ on NVIDIA Grace CPU. In another word, this is also the Tegra CMDQV series Part-2 (user-space support), reworked from Previous RFCv1: https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/ This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to the standard SMMUv3 operating in the nested translation mode trapping CMDQ for TLBI and ATC_INV commands, this gives a huge performance improvement: 70% to 90% reductions of invalidation time were measured by various DMA unmap tests running in a guest OS.
This is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_vcmdq-v2
Paring QEMU branch for testing: https://github.com/nicolinc/qemu/commits/wip/for_iommufd_vcmdq-v2
Changelog v2 * Add Reviewed-by from Jason * [smmu] Fix vsmmu initial value * [smmu] Support impl for hw_info * [tegra] Rename "slot" to "vsid" * [tegra] Update kdocs and commit logs * [tegra] Map/unmap LVCMDQ dynamically * [tegra] Refcount the previous LVCMDQ * [tegra] Return -EEXIST if LVCMDQ exists * [tegra] Simplify VINTF cleanup routine * [tegra] Use vmid and s2_domain in vsmmu * [tegra] Rename "mmap_pgoff" to "immap_id" * [tegra] Add more addr and length validation * [iommufd] Add more narrative to mmap's kdoc * [iommufd] Add iommufd_struct_depend/undepend() * [iommufd] Rename vcmdq_free op to vcmdq_destroy * [iommufd] Fix bug in iommu_copy_struct_to_user() * [iommufd] Drop is_io from iommufd_ctx_alloc_mmap() * [iommufd] Test the queue memory for its contiguity * [iommufd] Return -ENXIO if address or length fails * [iommufd] Do not change @min_last in mock_viommu_alloc() * [iommufd] Generalize TEGRA241_VCMDQ data in core structure * [iommufd] Add selftest coverage for IOMMUFD_CMD_VCMDQ_ALLOC * [iommufd] Add iopt_pin_pages() to prevent queue memory from unmapping v1 https://lore.kernel.org/all/cover.1744353300.git.nicolinc@nvidia.com/
Thanks Nicolin
Nicolin Chen (22): iommufd/viommu: Add driver-allocated vDEVICE support iommu: Pass in a driver-level user data structure to viommu_alloc op iommufd/viommu: Allow driver-specific user data for a vIOMMU object iommu: Add iommu_copy_struct_to_user helper iommufd: Add iommufd_struct_destroy to revert iommufd_viommu_alloc iommufd/selftest: Support user_data in mock_viommu_alloc iommufd/selftest: Add covearge for viommu data iommufd: Abstract iopt_pin_pages and iopt_unpin_pages helpers iommufd/viommu: Introduce IOMMUFD_OBJ_VCMDQ and its related struct iommufd/viommmu: Add IOMMUFD_CMD_VCMDQ_ALLOC ioctl iommufd: Add for-driver helpers iommufd_vcmdq_depend/undepend() iommufd/selftest: Add coverage for IOMMUFD_CMD_VCMDQ_ALLOC iommufd: Add mmap interface iommufd/selftest: Add coverage for the new mmap interface Documentation: userspace-api: iommufd: Update vCMDQ iommu/arm-smmu-v3-iommufd: Add vsmmu_alloc impl op iommu/arm-smmu-v3-iommufd: Support implementation-defined hw_info iommu/tegra241-cmdqv: Use request_threaded_irq iommu/tegra241-cmdqv: Simplify deinit flow in tegra241_cmdqv_remove_vintf() iommu/tegra241-cmdqv: Do not statically map LVCMDQs iommu/tegra241-cmdqv: Add user-space use support iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 25 +- drivers/iommu/iommufd/io_pagetable.h | 8 + drivers/iommu/iommufd/iommufd_private.h | 25 +- drivers/iommu/iommufd/iommufd_test.h | 20 + include/linux/iommu.h | 43 +- include/linux/iommufd.h | 146 ++++++ include/uapi/linux/iommufd.h | 113 ++++- tools/testing/selftests/iommu/iommufd_utils.h | 51 +- .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 42 +- .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 451 +++++++++++++++++- drivers/iommu/iommufd/device.c | 117 +---- drivers/iommu/iommufd/driver.c | 81 ++++ drivers/iommu/iommufd/io_pagetable.c | 95 ++++ drivers/iommu/iommufd/main.c | 58 ++- drivers/iommu/iommufd/selftest.c | 123 ++++- drivers/iommu/iommufd/viommu.c | 111 ++++- tools/testing/selftests/iommu/iommufd.c | 93 +++- .../selftests/iommu/iommufd_fail_nth.c | 11 +- Documentation/userspace-api/iommufd.rst | 14 + 19 files changed, 1436 insertions(+), 191 deletions(-)