On Fri, Sep 26, 2025 at 07:47:31AM +0000, Tian, Kevin wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Thursday, September 4, 2025 1:47 AM
map is slightly complicated because it has to handle a number of special edge cases:
- Overmapping a previously shared table with an OA - requries validating and freeing the possibly empty tables
- Doing the above across an entire to-be-created contiguous entry
- Installing a new shared table level concurrently with another thread
- Expanding the table by adding more top levels
what is 'shared table'? Looks this term doesn't appear in previous patches.
"shared table level". It is the actual 4k page. Shared means more than one iommu_map() calls are using indexes in it to make their mappings work.
like if you make 4k twice then the PGD/PMD/etc table would be "shared"
also it's unclear to me why overmapping a previously shared table can succeed while overmapping leaf entries cannot (w/ -EADDRINUSE)
It has to be empty, let me clarify
- Overmapping a previously shared, but now empty, table level with an OA. Requries validating and freeing the possibly empty tables
- /* Calculate target page size and level for the leaves */
- if (pt_has_system_page(common) && pgsize == PAGE_SIZE &&
pgcount == 1) {
PT_WARN_ON(!(pgsize_bitmap & PAGE_SIZE));
if (log2_mod(iova | paddr, PAGE_SHIFT))
return -ENXIO;
map.leaf_pgsize_lg2 = PAGE_SHIFT;
map.leaf_level = 0;
single_page = true;
- } else {
map.leaf_pgsize_lg2 = pt_compute_best_pgsize(
pgsize_bitmap, range.va, range.last_va, paddr);
if (!map.leaf_pgsize_lg2)
return -ENXIO;
map.leaf_level =
pt_pgsz_lg2_to_level(common, map.leaf_pgsize_lg2);
Existing driver checks alignment on pgsize, e.g. intel-iommu:
if (!IS_ALIGNED(iova | paddr, pgsize)) return -EINVAL;
Yes
But pt_compute_best_pgsize() doesn't use 'pgsize' and only have checks on calculated pgsz_lg2:
pgsz_lg2 is the same as 'pgsize' in the intel driver..
pt_compute_best_pgsize() takes in a bitmap of all supported page sizes at all levels and returns a single page size that should be used for this mapping.
The single page size satisfies the same alignemnt checks vtd had:
PT_WARN_ON(log2_mod(va, pgsz_lg2) != 0); PT_WARN_ON(oalog2_mod(oa, pgsz_lg2) != 0);
The above are equivalent to IS_ALIGNED(iova | paddr, pgsize).
If no page sizes match the alignment of va and oa then it returns 0 and we fail:
+ if (!map.leaf_pgsize_lg2) + return -ENXIO;
If it doesn't fail then it returns the single pgsize that should be used for this mapping and then we seek to that table level:
+ map.leaf_level = + pt_pgsz_lg2_to_level(common, map.leaf_pgsize_lg2);
Then there is another safety check during install leaf through pt_check_install_leaf_args()
if (PT_WARN_ON(oalog2_mod(oa, oasz_lg2))) return false;
By the time we get here oasz_lg2 is also pgsize.
Looks not identical.
It rejects unaligned the same way though.
Further, this is all dead code right now, even the vtd code. Things were switched over to map_pages() and so the core code has this:
if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) { return -EINVAL;
then iommu_pgsize() is guarenteed to work similarly to pt_compute_best_pgsize().
Meaning the drivers can't see unaligned inputs anyhow.
Jason