On Thu, Oct 23, 2025 at 2:20 PM Jason Gunthorpe jgg@nvidia.com wrote:
unmap_pages removes mappings and any fully contained interior tables from the given range. This follows the now-standard iommu_domain API definition where it does not split up larger page sizes into smaller. The caller must perform unmap only on ranges created by map or it must have somehow otherwise determined safe cut points (eg iommufd/vfio use iova_to_phys to scan for them)
A future work will provide 'cut' which explicitly does the page size split if the HW can support it.
Are there plans to add "free" when a table becomes empty on an unmap? Not sure what would be an efficient implementation for that, maybe a refcnt for # entries in table?
unmap is implemented with a recursive descent of the tree. If the caller provides a VA range that spans an entire table item then the table memory can be freed as well.
If an entire table item can be freed then this version will also check the leaf-only level of the tree to ensure that all entries are present to generate -EINVAL. Many of the existing drivers don't do this extra check.
This version sits under the iommu_domain_ops as unmap_pages() but does not require the external page size calculation. The implementation is actually unmap_range() and can do arbitrary ranges, internally handling all the validation and supporting any arrangment of page sizes. A future series can optimize __iommu_unmap() to take advantage of this.
Freed page table memory is batched up in the gather and will be freed in the driver's iotlb_sync() callback after the IOTLB flush completes.
Tested-by: Alejandro Jimenez alejandro.j.jimenez@oracle.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com
Reviewed-by: Pasha Tatashin pasha.tatashin@soleen.com