From: Jason Gunthorpe jgg@nvidia.com Sent: Tuesday, November 15, 2022 11:05 PM
On Tue, Nov 15, 2022 at 03:13:56AM +0000, Tian, Kevin wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Tuesday, November 15, 2022 2:44 AM
On Mon, Nov 14, 2022 at 07:28:47AM +0000, Tian, Kevin wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Tuesday, November 8, 2022 8:49 AM
+/*
- Automatically find a block of IOVA that is not being used and not
reserved.
- Does not return a 0 IOVA even if it is valid.
what is the problem with 0? should this be documented in uAPI?
0 is commonly used as an errant value for uninitialized things. We don't automatically map it into a process mm because it can cause security problems if we don't trap a bogus 0/NULL pointer reference.
The same logic applies here too, the allocator should not return 0 to reserve it as an unmapped IOVA page to catch bugs.
CPU doesn't reference IOVA. Where do such bugs exist?
SW is always buggy and SW programs the DMA address, so it could leave a 0 behind or something during the programming.
address 0 is never a bug in DMA to IOVA. if it is, it will be out of the aperture or in the reserved IOVA list.
DMA API is also a auto-iova scheme from driver p.o.v while it doesn't impose any restriction on address 0.
if (!__alloc_iova_check_used(&allowed_span, length,
iova_alignment,
page_offset))
continue;
interval_tree_for_each_span(&area_span, &iopt-
area_itree,
allowed_span.start_used,
allowed_span.last_used) {
if (!__alloc_iova_check_hole(&area_span,
length,
iova_alignment,
page_offset))
continue;
interval_tree_for_each_span(&reserved_span,
&iopt-
reserved_itree,
area_span.start_used,
area_span.last_used) {
if (!__alloc_iova_check_hole(
&reserved_span, length,
iova_alignment,
page_offset))
continue;
this could be simplified by double span.
It is subtly not compatible, the double span looks for used areas. This is looking for a used area in the allowed_itree, a hole in the area_itree, and a hole in the reserved_itree.
the inner two loops can be replaced by double span, since both are skipping used areas.
The 2nd loop is looking for a used on allowed and the 3rd loop is looking for a hole in reserved. To fix it we'd have to invert allowed to work like reserved - which complicates the uAPI code.
The 1st loop finds an allowed range which can hold requested length
The 2nd loop finds an *unused* hole in the allowed range
The 3rd loop further looks for a hole in reserved.
last two both try to find a hole.
- if (iopt->disable_large_pages)
new_iova_alignment = PAGE_SIZE;
- else
new_iova_alignment = 1;
I didn't understand why we start searching alignment from a smaller value when large pages is enabled. what is the connection here?
'disable_large_pages' is a tiny bit misnamed, what it really does is ensure that every iommu_map call is exactly PAGE_SIZE, not more (large pages) and not less (what this is protecting against).
So if a domain has less than PAGE_SIZE we upgrade to PAGE_SIZE. Otherwise we allow using the lowest possible alignment.
This allows userspace to always work in PAGE_SIZE units without fear of problems, eg with sub-page-size units becoming weird or something.
above are good stuff in a comment.
This is the comment:
/*
- This is part of the VFIO compatibility support for VFIO_TYPE1_IOMMU.
That
- mode permits splitting a mapped area up, and then one of the splits is
- unmapped. Doing this normally would cause us to violate our invariant of
- pairing map/unmap. Thus, to support old VFIO compatibility disable
support
- for batching consecutive PFNs. All PFNs mapped into the iommu are done
in
- PAGE_SIZE units, not larger or smaller.
*/ static int batch_iommu_map_small(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot)
I meant a comment in iopt_calculate_iova_alignment().