On Tue, Nov 15, 2022 at 03:13:56AM +0000, Tian, Kevin wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Tuesday, November 15, 2022 2:44 AM
On Mon, Nov 14, 2022 at 07:28:47AM +0000, Tian, Kevin wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Tuesday, November 8, 2022 8:49 AM
+/*
- Automatically find a block of IOVA that is not being used and not
reserved.
- Does not return a 0 IOVA even if it is valid.
what is the problem with 0? should this be documented in uAPI?
0 is commonly used as an errant value for uninitialized things. We don't automatically map it into a process mm because it can cause security problems if we don't trap a bogus 0/NULL pointer reference.
The same logic applies here too, the allocator should not return 0 to reserve it as an unmapped IOVA page to catch bugs.
CPU doesn't reference IOVA. Where do such bugs exist?
SW is always buggy and SW programs the DMA address, so it could leave a 0 behind or something during the programming.
I don't think it needs to be documented
this again causes a subtle difference between automatic allocation and fixed iova. If we really think address 0 is something related to bug, then why is it allowed with fixed iova?
Because fixed can do anything up to the limits of the HW. It is like mmp, where MAP_FIXED can allocate 0s as well, but automatic allocation will not.
if (!__alloc_iova_check_used(&allowed_span, length,
iova_alignment, page_offset))
continue;
interval_tree_for_each_span(&area_span, &iopt->area_itree,
allowed_span.start_used,
allowed_span.last_used) {
if (!__alloc_iova_check_hole(&area_span, length,
iova_alignment,
page_offset))
continue;
interval_tree_for_each_span(&reserved_span,
&iopt->reserved_itree,
area_span.start_used,
area_span.last_used) {
if (!__alloc_iova_check_hole(
&reserved_span, length,
iova_alignment, page_offset))
continue;
this could be simplified by double span.
It is subtly not compatible, the double span looks for used areas. This is looking for a used area in the allowed_itree, a hole in the area_itree, and a hole in the reserved_itree.
the inner two loops can be replaced by double span, since both are skipping used areas.
The 2nd loop is looking for a used on allowed and the 3rd loop is looking for a hole in reserved. To fix it we'd have to invert allowed to work like reserved - which complicates the uAPI code.
- if (iopt->disable_large_pages)
new_iova_alignment = PAGE_SIZE;
- else
new_iova_alignment = 1;
I didn't understand why we start searching alignment from a smaller value when large pages is enabled. what is the connection here?
'disable_large_pages' is a tiny bit misnamed, what it really does is ensure that every iommu_map call is exactly PAGE_SIZE, not more (large pages) and not less (what this is protecting against).
So if a domain has less than PAGE_SIZE we upgrade to PAGE_SIZE. Otherwise we allow using the lowest possible alignment.
This allows userspace to always work in PAGE_SIZE units without fear of problems, eg with sub-page-size units becoming weird or something.
above are good stuff in a comment.
This is the comment:
/* * This is part of the VFIO compatibility support for VFIO_TYPE1_IOMMU. That * mode permits splitting a mapped area up, and then one of the splits is * unmapped. Doing this normally would cause us to violate our invariant of * pairing map/unmap. Thus, to support old VFIO compatibility disable support * for batching consecutive PFNs. All PFNs mapped into the iommu are done in * PAGE_SIZE units, not larger or smaller. */ static int batch_iommu_map_small(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot)
Thanks, Jason