[PATCH] iommu/intel: Fix __domain_mapping()'s usage of switch_to_super_page()

List overview All Threads
Download

newer

older

[PATCH v2] mm: memory-tiering: fix...

[PATCH] coresight: trbe: Fix...

Eugene Koira

26 Aug 2025 26 Aug '25

2:38 p.m.

switch_to_super_page() assumes the memory range it's working on is aligned to the target large page level. Unfortunately, __domain_mapping() doesn't take this into account when using it, and will pass unaligned ranges ultimately freeing a PTE range larger than expected.

Take for example a mapping with the following iov_pfn range [0x3fe400, 0x4c0600], which should be backed by the following mappings:

iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages

Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] to switch_to_super_page() at a 1 GiB granularity, which will in turn free PTEs all the way to iov_pfn 0x4fffff.

Mitigate this by rounding down the iov_pfn range passed to switch_to_super_page() in __domain_mapping() to the target large page level.

Additionally add range alignment checks to switch_to_super_page.

Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") Signed-off-by: Eugene Koira eugkoira@amazon.com Cc: stable@vger.kernel.org --- drivers/iommu/intel/iommu.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 9c3ab9d9f69a..dff2d895b8ab 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain, unsigned long lvl_pages = lvl_to_nr_pages(level); struct dma_pte *pte = NULL;

+ if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) || + !IS_ALIGNED(end_pfn + 1, lvl_pages))) + return; + while (start_pfn <= end_pfn) { if (!pte) pte = pfn_to_dma_pte(domain, start_pfn, &level, @@ -1650,7 +1654,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, unsigned long pages_to_remove;

pteval |= DMA_PTE_LARGE_PAGE; - pages_to_remove = min_t(unsigned long, nr_pages, + pages_to_remove = min_t(unsigned long, + round_down(nr_pages, lvl_pages), nr_pte_to_next_page(pte) * lvl_pages); end_pfn = iov_pfn + pages_to_remove - 1; switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);

-- 2.47.3 Amazon Development Center (Netherlands) B.V., Johanna Westerdijkplein 1, NL-2521 EN The Hague, Registration No. Chamber of Commerce 56869649, VAT: NL 852339859B01

Show replies by date

Nicolas Saenz Julienne

26 Aug 26 Aug

2:55 p.m.

On Tue Aug 26, 2025 at 2:38 PM UTC, Eugene Koira wrote:

...

switch_to_super_page() assumes the memory range it's working on is aligned to the target large page level. Unfortunately, __domain_mapping() doesn't take this into account when using it, and will pass unaligned ranges ultimately freeing a PTE range larger than expected.

Take for example a mapping with the following iov_pfn range [0x3fe400, 0x4c0600], which should be backed by the following mappings:

iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages

Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] to switch_to_super_page() at a 1 GiB granularity, which will in turn free PTEs all the way to iov_pfn 0x4fffff.

Mitigate this by rounding down the iov_pfn range passed to switch_to_super_page() in __domain_mapping() to the target large page level.

Additionally add range alignment checks to switch_to_super_page.

Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") Signed-off-by: Eugene Koira eugkoira@amazon.com Cc: stable@vger.kernel.org

...

drivers/iommu/intel/iommu.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 9c3ab9d9f69a..dff2d895b8ab 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain, unsigned long lvl_pages = lvl_to_nr_pages(level); struct dma_pte *pte = NULL;
if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) ||
    !IS_ALIGNED(end_pfn + 1, lvl_pages)))

It might make sense to downgrade the warning to WARN_ON_ONCE().

Other than that:

Reviewed-by: Nicolas Saenz Julienne nsaenz@amazon.com

Regards, Nicolas

David Woodhouse

3:48 p.m.

On Tue, 2025-08-26 at 14:38 +0000, Eugene Koira wrote:

...

switch_to_super_page() assumes the memory range it's working on is aligned to the target large page level. Unfortunately, __domain_mapping() doesn't take this into account when using it, and will pass unaligned ranges ultimately freeing a PTE range larger than expected.

Take for example a mapping with the following iov_pfn range [0x3fe400, 0x4c0600], which should be backed by the following mappings:

The range is [0x3fe400, 0x4c0600) ?

...

iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages

Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] to switch_to_super_page() at a 1 GiB granularity, which will in turn free PTEs all the way to iov_pfn 0x4fffff.

Mitigate this by rounding down the iov_pfn range passed to switch_to_super_page() in __domain_mapping() to the target large page level.

Additionally add range alignment checks to switch_to_super_page.

Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") Signed-off-by: Eugene Koira eugkoira@amazon.com Cc: stable@vger.kernel.org

drivers/iommu/intel/iommu.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 9c3ab9d9f69a..dff2d895b8ab 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain, unsigned long lvl_pages = lvl_to_nr_pages(level); struct dma_pte *pte = NULL;
if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) ||
    !IS_ALIGNED(end_pfn + 1, lvl_pages)))
return;
while (start_pfn <= end_pfn) { if (!pte) pte = pfn_to_dma_pte(domain, start_pfn, &level, @@ -1650,7 +1654,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, unsigned long pages_to_remove; pteval |= DMA_PTE_LARGE_PAGE;
		pages_to_remove = min_t(unsigned long, nr_pages,
		pages_to_remove = min_t(unsigned long,
					round_down(nr_pages, lvl_pages),
nr_pte_to_next_page(pte) * lvl_pages); end_pfn = iov_pfn + pages_to_remove - 1; switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);

I'm mildly entertained by the fact that the *only* comment in this block of code is a lie. Would you care to address that while you're here? Maybe the comment could look something like...

/* If the new mapping is eligible for a large page, then * remove all smaller pages that the existing pte at this * level references. * XX: do we even need to bother calling switch_to_super_page() * if this PTE wasn't *present* before? */

I bet it would benefit from one or two other one-line comments to make it clearer what's going on, too...

But in general, I think this looks sane even though this code makes my brain hurt. Could do with a test case, in an ideal world. Maybe we can work on that as part of the generic pagetable support which is coming?

Baolu Lu

28 Aug 28 Aug

6:33 a.m.

On 8/26/25 23:48, David Woodhouse wrote:

...

On Tue, 2025-08-26 at 14:38 +0000, Eugene Koira wrote:

...
switch_to_super_page() assumes the memory range it's working on is aligned to the target large page level. Unfortunately, __domain_mapping() doesn't take this into account when using it, and will pass unaligned ranges ultimately freeing a PTE range larger than expected.

Take for example a mapping with the following iov_pfn range [0x3fe400, 0x4c0600], which should be backed by the following mappings:

The range is [0x3fe400, 0x4c0600) ?

...
iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages

Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] to switch_to_super_page() at a 1 GiB granularity, which will in turn free PTEs all the way to iov_pfn 0x4fffff.

Mitigate this by rounding down the iov_pfn range passed to switch_to_super_page() in __domain_mapping() to the target large page level.

Additionally add range alignment checks to switch_to_super_page.

Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") Signed-off-by: Eugene Koira eugkoira@amazon.com Cc: stable@vger.kernel.org

drivers/iommu/intel/iommu.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 9c3ab9d9f69a..dff2d895b8ab 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain, unsigned long lvl_pages = lvl_to_nr_pages(level); struct dma_pte *pte = NULL;
if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) ||
    !IS_ALIGNED(end_pfn + 1, lvl_pages)))
return;
while (start_pfn <= end_pfn) { if (!pte) pte = pfn_to_dma_pte(domain, start_pfn, &level, @@ -1650,7 +1654,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, unsigned long pages_to_remove; pteval |= DMA_PTE_LARGE_PAGE;
		pages_to_remove = min_t(unsigned long, nr_pages,
		pages_to_remove = min_t(unsigned long,
					round_down(nr_pages, lvl_pages),
nr_pte_to_next_page(pte) * lvl_pages); end_pfn = iov_pfn + pages_to_remove - 1; switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
I'm mildly entertained by the fact that the *only* comment in this block of code is a lie. Would you care to address that while you're here? Maybe the comment could look something like...
	/* If the new mapping is eligible for a large page, then
	 * remove all smaller pages that the existing pte at this
	 * level references.
	 * XX: do we even need to bother calling switch_to_super_page()
	 * if this PTE wasn't *present* before?
	 */
I bet it would benefit from one or two other one-line comments to make it clearer what's going on, too...

But in general, I think this looks sane even though this code makes my brain hurt. Could do with a test case, in an ideal world. Maybe we can work on that as part of the generic pagetable support which is coming?

Agreed. The generic pagetable work will replace this code, so this will be removed. Therefore, we need a fix patch that can be backported before the generic pagetable for vt-d lands.

Thanks, baolu

David Woodhouse

7:29 a.m.

On Thu, 2025-08-28 at 14:33 +0800, Baolu Lu wrote:

...

On 8/26/25 23:48, David Woodhouse wrote:

...
On Tue, 2025-08-26 at 14:38 +0000, Eugene Koira wrote:

...
switch_to_super_page() assumes the memory range it's working on is aligned to the target large page level. Unfortunately, __domain_mapping() doesn't take this into account when using it, and will pass unaligned ranges ultimately freeing a PTE range larger than expected.

Take for example a mapping with the following iov_pfn range [0x3fe400, 0x4c0600], which should be backed by the following mappings:

The range is [0x3fe400, 0x4c0600) ?

...
iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages     iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages     iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages

Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] to switch_to_super_page() at a 1 GiB granularity, which will in turn free PTEs all the way to iov_pfn 0x4fffff.

Mitigate this by rounding down the iov_pfn range passed to switch_to_super_page() in __domain_mapping() to the target large page level.

Additionally add range alignment checks to switch_to_super_page.

Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") Signed-off-by: Eugene Koira eugkoira@amazon.com Cc: stable@vger.kernel.org

drivers/iommu/intel/iommu.c | 7 ++++++-   1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 9c3ab9d9f69a..dff2d895b8ab 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain,    unsigned long lvl_pages = lvl_to_nr_pages(level);    struct dma_pte *pte = NULL;
if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) ||
    !IS_ALIGNED(end_pfn + 1, lvl_pages)))
return;
while (start_pfn <= end_pfn) { if (!pte) pte = pfn_to_dma_pte(domain, start_pfn, &level, @@ -1650,7 +1654,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, unsigned long pages_to_remove; pteval |= DMA_PTE_LARGE_PAGE;
		pages_to_remove = min_t(unsigned long, nr_pages,
		pages_to_remove = min_t(unsigned long,
					round_down(nr_pages, lvl_pages),
nr_pte_to_next_page(pte) * lvl_pages); end_pfn = iov_pfn + pages_to_remove - 1; switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
I'm mildly entertained by the fact that the *only* comment in this block of code is a lie. Would you care to address that while you're here? Maybe the comment could look something like...
	/* If the new mapping is eligible for a large page, then
	 * remove all smaller pages that the existing pte at this
	 * level references.
	 * XX: do we even need to bother calling switch_to_super_page()
	 * if this PTE wasn't *present* before?
	 */
I bet it would benefit from one or two other one-line comments to make it clearer what's going on, too...

But in general, I think this looks sane even though this code makes my brain hurt. Could do with a test case, in an ideal world. Maybe we can work on that as part of the generic pagetable support which is coming?
Agreed. The generic pagetable work will replace this code, so this will be removed. Therefore, we need a fix patch that can be backported before the generic pagetable for vt-d lands.

Yep. And since Jason has in fact already posted the patches which *delete* all this code, I don't think I even care about fixing the comments. Eugene's patch is fine as-is.

Reviewed-by: David Woodhouse dwmw@amazon.co.uk

Baolu Lu

1 Sep 1 Sep

5:55 a.m.

On 8/26/25 22:38, Eugene Koira wrote:

...

switch_to_super_page() assumes the memory range it's working on is aligned to the target large page level. Unfortunately, __domain_mapping() doesn't take this into account when using it, and will pass unaligned ranges ultimately freeing a PTE range larger than expected.

Take for example a mapping with the following iov_pfn range [0x3fe400, 0x4c0600], which should be backed by the following mappings:
iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages
iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages
iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] to switch_to_super_page() at a 1 GiB granularity, which will in turn free PTEs all the way to iov_pfn 0x4fffff.

Mitigate this by rounding down the iov_pfn range passed to switch_to_super_page() in __domain_mapping() to the target large page level.

Additionally add range alignment checks to switch_to_super_page.

Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") Signed-off-by: Eugene Koiraeugkoira@amazon.com Cc:stable@vger.kernel.org

drivers/iommu/intel/iommu.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)

Queued for v6.17-rc. Thank you!

Thanks, baolu

days inactive

102

days old

linux-stable-mirror@lists.linaro.org

5 comments

participants

tags (0)

participants (4)

Baolu Lu
David Woodhouse
Eugene Koira
Nicolas Saenz Julienne