Re: [RFC PATCHES 00/17] IOMMUFD: Deliver IO page faults to user space

31 May 2023

      On 5/31/23 8:33 AM, Jason Gunthorpe wrote:
...
On Tue, May 30, 2023 at 01:37:07PM +0800, Lu Baolu wrote:
...
Hi folks,
This series implements the functionality of delivering IO page faults to
user space through the IOMMUFD framework. The use case is nested
translation, where modern IOMMU hardware supports two-stage translation
tables. The second-stage translation table is managed by the host VMM
while the first-stage translation table is owned by the user space.
Hence, any IO page fault that occurs on the first-stage page table
should be delivered to the user space and handled there. The user space
should respond the page fault handling result to the device top-down
through the IOMMUFD response uAPI.
User space indicates its capablity of handling IO page faults by setting
a user HWPT allocation flag IOMMU_HWPT_ALLOC_FLAGS_IOPF_CAPABLE. IOMMUFD
will then setup its infrastructure for page fault delivery. Together
with the iopf-capable flag, user space should also provide an eventfd
where it will listen on any down-top page fault messages.
On a successful return of the allocation of iopf-capable HWPT, a fault
fd will be returned. User space can open and read fault messages from it
once the eventfd is signaled.
This is a performance path so we really need to think about this more,
polling on an eventfd and then reading a different fd is not a good
design.
What I would like is to have a design from the start that fits into
io_uring, so we can have pre-posted 'recvs' in io_uring that just get
completed at high speed when PRIs come in.
This suggests that the PRI should be delivered via read() on a single
FD and pollability on the single FD without any eventfd.
Good suggestion. I will head in this direction.
...
...
Besides the overall design, I'd like to hear comments about below
designs:

The IOMMUFD fault message format. It is very similar to that in
 uapi/linux/iommu which has been discussed before and partially used by
 the IOMMU SVA implementation. I'd like to get more comments on the
 format when it comes to IOMMUFD.

We have to have the same discussion as always, does a generic fault
message format make any sense here?
PRI seems more likely that it would but it needs a big carefull cross
vendor check out.
Yeah, good point.
As far as I can see, there are at least three types of IOPF hardware
implementation.
- PCI/PRI: Vendors might have their own additions. For example, VT-d 3.0
   allows root-complex integrated endpoints to carry device specific
   private data in their page requests. This has been removed from the
   spec since v4.0.
- DMA stalls.
- Device-specific (non-PRI, not through IOMMU).
Does IOMMUFD want to support the last case?
Best regards,
baolu

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [RFC PATCHES 00/17] IOMMUFD: Deliver IO page faults to user space