Hi folks,
This series implements the functionality of delivering IO page faults to user space through the IOMMUFD framework. The use case is nested translation, where modern IOMMU hardware supports two-stage translation tables. The second-stage translation table is managed by the host VMM while the first-stage translation table is owned by the user space. Hence, any IO page fault that occurs on the first-stage page table should be delivered to the user space and handled there. The user space should respond the page fault handling result to the device top-down through the IOMMUFD response uAPI.
User space indicates its capablity of handling IO page faults by setting a user HWPT allocation flag IOMMU_HWPT_ALLOC_FLAGS_IOPF_CAPABLE. IOMMUFD will then setup its infrastructure for page fault delivery. Together with the iopf-capable flag, user space should also provide an eventfd where it will listen on any down-top page fault messages.
On a successful return of the allocation of iopf-capable HWPT, a fault fd will be returned. User space can open and read fault messages from it once the eventfd is signaled.
Besides the overall design, I'd like to hear comments about below designs:
- The IOMMUFD fault message format. It is very similar to that in uapi/linux/iommu which has been discussed before and partially used by the IOMMU SVA implementation. I'd like to get more comments on the format when it comes to IOMMUFD.
- The timeout value for the pending page fault messages. Ideally we should determine the timeout value from the device configuration, but I failed to find any statement in the PCI specification (version 6.x). A default 100 milliseconds is selected in the implementation, but it leave the room for grow the code for per-device setting.
This series is only for review comment purpose. I used IOMMUFD selftest to verify the hwpt allocation, attach/detach and replace. But I didn't get a chance to run it with real hardware yet. I will do more test in the subsequent versions when I am confident that I am heading on the right way.
This series is based on the latest implementation of the nested translation under discussion. The whole series and related patches are available on gitbub:
https://github.com/LuBaolu/intel-iommu/commits/iommufd-io-pgfault-delivery-v...
Best regards, baolu
Lu Baolu (17): iommu: Move iommu fault data to linux/iommu.h iommu: Support asynchronous I/O page fault response iommu: Add helper to set iopf handler for domain iommu: Pass device parameter to iopf handler iommu: Split IO page fault handling from SVA iommu: Add iommu page fault cookie helpers iommufd: Add iommu page fault data iommufd: IO page fault delivery initialization and release iommufd: Add iommufd hwpt iopf handler iommufd: Add IOMMU_HWPT_ALLOC_FLAGS_USER_PASID_TABLE for hwpt_alloc iommufd: Deliver fault messages to user space iommufd: Add io page fault response support iommufd: Add a timer for each iommufd fault data iommufd: Drain all pending faults when destroying hwpt iommufd: Allow new hwpt_alloc flags iommufd/selftest: Add IOPF feature for mock devices iommufd/selftest: Cover iopf-capable nested hwpt
include/linux/iommu.h | 175 +++++++++- drivers/iommu/{iommu-sva.h => io-pgfault.h} | 25 +- drivers/iommu/iommu-priv.h | 3 + drivers/iommu/iommufd/iommufd_private.h | 32 ++ include/uapi/linux/iommu.h | 161 --------- include/uapi/linux/iommufd.h | 73 +++- tools/testing/selftests/iommu/iommufd_utils.h | 20 +- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +- drivers/iommu/intel/iommu.c | 2 +- drivers/iommu/intel/svm.c | 2 +- drivers/iommu/io-pgfault.c | 7 +- drivers/iommu/iommu-sva.c | 4 +- drivers/iommu/iommu.c | 50 ++- drivers/iommu/iommufd/device.c | 64 +++- drivers/iommu/iommufd/hw_pagetable.c | 318 +++++++++++++++++- drivers/iommu/iommufd/main.c | 3 + drivers/iommu/iommufd/selftest.c | 71 ++++ tools/testing/selftests/iommu/iommufd.c | 17 +- MAINTAINERS | 1 - drivers/iommu/Kconfig | 4 + drivers/iommu/Makefile | 3 +- drivers/iommu/intel/Kconfig | 1 + 23 files changed, 837 insertions(+), 203 deletions(-) rename drivers/iommu/{iommu-sva.h => io-pgfault.h} (71%) delete mode 100644 include/uapi/linux/iommu.h