From: Jason Gunthorpe jgg@nvidia.com Sent: Tuesday, November 15, 2022 2:44 AM
On Mon, Nov 14, 2022 at 07:28:47AM +0000, Tian, Kevin wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Tuesday, November 8, 2022 8:49 AM
+/*
- Automatically find a block of IOVA that is not being used and not
reserved.
- Does not return a 0 IOVA even if it is valid.
what is the problem with 0? should this be documented in uAPI?
0 is commonly used as an errant value for uninitialized things. We don't automatically map it into a process mm because it can cause security problems if we don't trap a bogus 0/NULL pointer reference.
The same logic applies here too, the allocator should not return 0 to reserve it as an unmapped IOVA page to catch bugs.
CPU doesn't reference IOVA. Where do such bugs exist?
I don't think it needs to be documented
this again causes a subtle difference between automatic allocation and fixed iova. If we really think address 0 is something related to bug, then why is it allowed with fixed iova?
if (!__alloc_iova_check_used(&allowed_span, length,
iova_alignment, page_offset))
continue;
interval_tree_for_each_span(&area_span, &iopt->area_itree,
allowed_span.start_used,
allowed_span.last_used) {
if (!__alloc_iova_check_hole(&area_span, length,
iova_alignment,
page_offset))
continue;
interval_tree_for_each_span(&reserved_span,
&iopt->reserved_itree,
area_span.start_used,
area_span.last_used) {
if (!__alloc_iova_check_hole(
&reserved_span, length,
iova_alignment, page_offset))
continue;
this could be simplified by double span.
It is subtly not compatible, the double span looks for used areas. This is looking for a used area in the allowed_itree, a hole in the area_itree, and a hole in the reserved_itree.
the inner two loops can be replaced by double span, since both are skipping used areas.
I don't think IOVA allocation should be a fast path so it is not worth alot of effort to micro-optimize this.
but I'm not insisting on changing them now. It's trivial.
- if (iopt->disable_large_pages)
new_iova_alignment = PAGE_SIZE;
- else
new_iova_alignment = 1;
I didn't understand why we start searching alignment from a smaller value when large pages is enabled. what is the connection here?
'disable_large_pages' is a tiny bit misnamed, what it really does is ensure that every iommu_map call is exactly PAGE_SIZE, not more (large pages) and not less (what this is protecting against).
So if a domain has less than PAGE_SIZE we upgrade to PAGE_SIZE. Otherwise we allow using the lowest possible alignment.
This allows userspace to always work in PAGE_SIZE units without fear of problems, eg with sub-page-size units becoming weird or something.
above are good stuff in a comment.
- interval_tree_remove(&area->node, &iopt->area_itree);
- rc = iopt_insert_area(iopt, lhs, area->pages, start_iova,
iopt_area_start_byte(area, start_iova),
(new_start - 1) - start_iova + 1,
area->iommu_prot);
- if (WARN_ON(rc))
goto err_insert;
- rc = iopt_insert_area(iopt, rhs, area->pages, new_start,
iopt_area_start_byte(area, new_start),
last_iova - new_start + 1, area->iommu_prot);
- if (WARN_ON(rc))
goto err_remove_lhs;
- lhs->storage_domain = area->storage_domain;
- lhs->num_accesses = area->num_accesses;
- lhs->pages = area->pages;
- rhs->storage_domain = area->storage_domain;
- rhs->num_accesses = area->num_accesses;
if an access only spans one side, is it correct to have both split sides keep the access number?
Er, this is acatually completely broken, woops. A removal of an access will trigger a WARN_ON since the access_itree element is very likely no longer correct.
Ah.. So the only use case here is unmapping and you can't unmap something that has an access established, except in some pathalogical case where the access does not intersect with what is being mapped.
There is no way to tell which iopt_pages_access are connected to which areas, so without spending some memory this can't be fixed up. I think it is not a real issue as mdev plus this ancient VFIO interface is probably not something that exists in the real world..
/*
* Splitting is not permitted if an access exists, we don't track enough
* information to split existing accesses.
*/
if (area->num_accesses) {
rc = -EINVAL;
goto err_unlock;
}
@@ -1041,10 +1050,8 @@ static int iopt_area_split(struct iopt_area *area, unsigned long iova) goto err_remove_lhs;
lhs->storage_domain = area->storage_domain;
lhs->num_accesses = area->num_accesses; lhs->pages = area->pages; rhs->storage_domain = area->storage_domain;
rhs->num_accesses = area->num_accesses; rhs->pages = area->pages; kref_get(&rhs->pages->kref); kfree(area);
this change makes sense to me