Hi Kevin,
On 12/06/2026 09:42, Tian, Kevin wrote:
From: Matt Evans matt@ozlabs.org Sent: Wednesday, June 10, 2026 11:43 PM
+int vfio_pci_dma_buf_find_pfn(struct vfio_pci_dma_buf *priv,
struct vm_area_struct *vma,unsigned long address,unsigned int order,unsigned long *out_pfn)+{
- /*
* Given a VMA (start, end, pgoffs) and a fault address,* search the corresponding DMABUF's phys_vec[] to find the* range representing the address's offset into the VMA, and* its PFN.** The phys_vec[] ranges represent contiguous spans of VAs* upwards from the buffer offset 0; the actual PFNs might be* in any order, overlap/alias, etc. Calculate an offset of* the desired page given VMA start/pgoff and address, then* search upwards from 0 to find which span contains it.** On success, a valid PFN for a page sized by 'order' is* returned into out_pfn.** Failure occurs if:* - The page would cross the edge of the VMA* - The page isn't entirely contained within a range* - We find a range, but the final PFN isn't aligned to the* requested order.** (Upon failure, the caller is expected to try again with a* smaller order; the tests above will always succeed for* order=0 as the limit case.)** It's suboptimal if DMABUFs are created with neigbourings/neigbouring/neighboring/
Ah, not a typo. :) That is en_GB and AFAIK is permitted.
* ranges that are physically contiguous, since hugepages* can't straddle range boundaries. (The construction of the* ranges vector should merge such ranges.)though the field is called 'phys_vec', removing 'vector' in description is clearer here.
Fair, reworded.
** Finally, vma_pgoff_adjust is used for a DMABUF representing* a VFIO BAR mmap, which is created from the start of the* offset region.Elaborate it a little bit that the vm_pgoff is already counted in paddr of phys_vec so it should be skipped when finding the pfn.
OK! Expanded this paragraph slightly to explain that vma_pgoff_adjust avoids double-accounting, and that a BAR mmap() DMABUF is created such that the start of the VMA (even with an offset) equals the start of the DMABUF and equals the start of the physical range.
*/- const unsigned long pagesize = PAGE_SIZE << order;
- unsigned long vma_off = ((vma->vm_pgoff - priv->vma_pgoff_adjust)
<<
PAGE_SHIFT) & VFIO_PCI_OFFSET_MASK;- unsigned long rounded_page_addr = ALIGN_DOWN(address,
pagesize);
- unsigned long rounded_page_end = rounded_page_addr + pagesize;
- unsigned long page_buf_offset;
- unsigned long page_buf_offset_end;
what about "fault_offset[_end]"? page_buf is a bit confusing.
I went round several times with these names, thanks for the input. Just tried it out and your suggestion is clearer.
- unsigned long range_buf_offset = 0;
could this be called 'range_start' then the 'range_start' in latter loop is renamed to 'phys_start'?
Not strong... just feel such naming helps me understand the logic easier
Anything that helps helps, thanks. I ended up renaming this to range_start_offset (as offset is IMHO important).
I'm a fan of diagrams but this is too large to include in a comment. But for posterity on the list, and using the new names, an illustration of a DMABUF with 3 ranges in phys_vec, where a mapping's faulting page offset lies in range [1]:
fault_addr--+ v VMA +-----------------+----------+-----------------+ | | Faulting | | | | (hg)page | | | | | | |---- vma_off ---->+-----------------+----------+-----------------+ | . . | . . |--------- fault_offset ------------>. . DMABUF +-------------------------+---------------------------+--------------+ | phys_vec[0] | phys_vec[1] . | phys_vec[2] | | .paddr | . . | | | .len | . . | | +-------------------------+---------------------------+--------------+ 0 : . . : L |-- range_start_offset -->: . . -->: range_len : . . : V . . : +----------+----------+-----+ |.paddr | PFN | | | | | | | | | | +----------+----------+-----+ P
P = paddr + (fault_offset - range_start_offset) L = sum(phys_vec[0...2].len)
- unsigned int i;
- if (rounded_page_addr < vma->vm_start || rounded_page_end >
vma->vm_end) {
if (order > 0)return -EAGAIN;/* A fault address outside of the VMA is absurd. */WARN(1, "Fault addr 0x%lx outside VMA 0x%lx-0x%lx\n",address, vma->vm_start, vma->vm_end);return -EFAULT;- }
- /*
* page_buff_offset[_end] is the span of DMABUF offsets* corresponding to the faulting page:*/if the naming is kept then s/page_buff_offset/page_buf_offset/
otherwise,
Reviewed-by: Kevin Tian kevin.tian@intel.com
Thank you,
Matt