switch_to_super_page() assumes the memory range it's working on is aligned to the target large page level. Unfortunately, __domain_mapping() doesn't take this into account when using it, and will pass unaligned ranges ultimately freeing a PTE range larger than expected.
Take for example a mapping with the following iov_pfn range [0x3fe400, 0x4c0600], which should be backed by the following mappings:
iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] to switch_to_super_page() at a 1 GiB granularity, which will in turn free PTEs all the way to iov_pfn 0x4fffff.
Mitigate this by rounding down the iov_pfn range passed to switch_to_super_page() in __domain_mapping() to the target large page level.
Additionally add range alignment checks to switch_to_super_page.
Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") Signed-off-by: Eugene Koira eugkoira@amazon.com Cc: stable@vger.kernel.org --- drivers/iommu/intel/iommu.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 9c3ab9d9f69a..dff2d895b8ab 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain, unsigned long lvl_pages = lvl_to_nr_pages(level); struct dma_pte *pte = NULL;
+ if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) || + !IS_ALIGNED(end_pfn + 1, lvl_pages))) + return; + while (start_pfn <= end_pfn) { if (!pte) pte = pfn_to_dma_pte(domain, start_pfn, &level, @@ -1650,7 +1654,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, unsigned long pages_to_remove;
pteval |= DMA_PTE_LARGE_PAGE; - pages_to_remove = min_t(unsigned long, nr_pages, + pages_to_remove = min_t(unsigned long, + round_down(nr_pages, lvl_pages), nr_pte_to_next_page(pte) * lvl_pages); end_pfn = iov_pfn + pages_to_remove - 1; switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
On Tue Aug 26, 2025 at 2:38 PM UTC, Eugene Koira wrote:
switch_to_super_page() assumes the memory range it's working on is aligned to the target large page level. Unfortunately, __domain_mapping() doesn't take this into account when using it, and will pass unaligned ranges ultimately freeing a PTE range larger than expected.
Take for example a mapping with the following iov_pfn range [0x3fe400, 0x4c0600], which should be backed by the following mappings:
iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] to switch_to_super_page() at a 1 GiB granularity, which will in turn free PTEs all the way to iov_pfn 0x4fffff.
Mitigate this by rounding down the iov_pfn range passed to switch_to_super_page() in __domain_mapping() to the target large page level.
Additionally add range alignment checks to switch_to_super_page.
Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") Signed-off-by: Eugene Koira eugkoira@amazon.com Cc: stable@vger.kernel.org
drivers/iommu/intel/iommu.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 9c3ab9d9f69a..dff2d895b8ab 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain, unsigned long lvl_pages = lvl_to_nr_pages(level); struct dma_pte *pte = NULL;
- if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) ||
!IS_ALIGNED(end_pfn + 1, lvl_pages)))
It might make sense to downgrade the warning to WARN_ON_ONCE().
Other than that:
Reviewed-by: Nicolas Saenz Julienne nsaenz@amazon.com
Regards, Nicolas
On Tue, 2025-08-26 at 14:38 +0000, Eugene Koira wrote:
switch_to_super_page() assumes the memory range it's working on is aligned to the target large page level. Unfortunately, __domain_mapping() doesn't take this into account when using it, and will pass unaligned ranges ultimately freeing a PTE range larger than expected.
Take for example a mapping with the following iov_pfn range [0x3fe400, 0x4c0600], which should be backed by the following mappings:
The range is [0x3fe400, 0x4c0600) ?
iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] to switch_to_super_page() at a 1 GiB granularity, which will in turn free PTEs all the way to iov_pfn 0x4fffff.
Mitigate this by rounding down the iov_pfn range passed to switch_to_super_page() in __domain_mapping() to the target large page level.
Additionally add range alignment checks to switch_to_super_page.
Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") Signed-off-by: Eugene Koira eugkoira@amazon.com Cc: stable@vger.kernel.org
drivers/iommu/intel/iommu.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 9c3ab9d9f69a..dff2d895b8ab 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain, unsigned long lvl_pages = lvl_to_nr_pages(level); struct dma_pte *pte = NULL;
- if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) ||
!IS_ALIGNED(end_pfn + 1, lvl_pages)))
return;
while (start_pfn <= end_pfn) { if (!pte) pte = pfn_to_dma_pte(domain, start_pfn, &level, @@ -1650,7 +1654,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, unsigned long pages_to_remove; pteval |= DMA_PTE_LARGE_PAGE;
pages_to_remove = min_t(unsigned long, nr_pages,
pages_to_remove = min_t(unsigned long,
round_down(nr_pages, lvl_pages),
nr_pte_to_next_page(pte) * lvl_pages); end_pfn = iov_pfn + pages_to_remove - 1; switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
I'm mildly entertained by the fact that the *only* comment in this block of code is a lie. Would you care to address that while you're here? Maybe the comment could look something like...
/* If the new mapping is eligible for a large page, then * remove all smaller pages that the existing pte at this * level references. * XX: do we even need to bother calling switch_to_super_page() * if this PTE wasn't *present* before? */
I bet it would benefit from one or two other one-line comments to make it clearer what's going on, too...
But in general, I think this looks sane even though this code makes my brain hurt. Could do with a test case, in an ideal world. Maybe we can work on that as part of the generic pagetable support which is coming?
On 8/26/25 23:48, David Woodhouse wrote:
On Tue, 2025-08-26 at 14:38 +0000, Eugene Koira wrote:
switch_to_super_page() assumes the memory range it's working on is aligned to the target large page level. Unfortunately, __domain_mapping() doesn't take this into account when using it, and will pass unaligned ranges ultimately freeing a PTE range larger than expected.
Take for example a mapping with the following iov_pfn range [0x3fe400, 0x4c0600], which should be backed by the following mappings:
The range is [0x3fe400, 0x4c0600) ?
iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] to switch_to_super_page() at a 1 GiB granularity, which will in turn free PTEs all the way to iov_pfn 0x4fffff.
Mitigate this by rounding down the iov_pfn range passed to switch_to_super_page() in __domain_mapping() to the target large page level.
Additionally add range alignment checks to switch_to_super_page.
Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") Signed-off-by: Eugene Koira eugkoira@amazon.com Cc: stable@vger.kernel.org
drivers/iommu/intel/iommu.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 9c3ab9d9f69a..dff2d895b8ab 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain, unsigned long lvl_pages = lvl_to_nr_pages(level); struct dma_pte *pte = NULL;
- if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) ||
!IS_ALIGNED(end_pfn + 1, lvl_pages)))
return;
while (start_pfn <= end_pfn) { if (!pte) pte = pfn_to_dma_pte(domain, start_pfn, &level, @@ -1650,7 +1654,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, unsigned long pages_to_remove; pteval |= DMA_PTE_LARGE_PAGE;
pages_to_remove = min_t(unsigned long, nr_pages,
pages_to_remove = min_t(unsigned long,
round_down(nr_pages, lvl_pages),
nr_pte_to_next_page(pte) * lvl_pages); end_pfn = iov_pfn + pages_to_remove - 1; switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
I'm mildly entertained by the fact that the *only* comment in this block of code is a lie. Would you care to address that while you're here? Maybe the comment could look something like...
/* If the new mapping is eligible for a large page, then * remove all smaller pages that the existing pte at this * level references. * XX: do we even need to bother calling switch_to_super_page() * if this PTE wasn't *present* before? */
I bet it would benefit from one or two other one-line comments to make it clearer what's going on, too...
But in general, I think this looks sane even though this code makes my brain hurt. Could do with a test case, in an ideal world. Maybe we can work on that as part of the generic pagetable support which is coming?
Agreed. The generic pagetable work will replace this code, so this will be removed. Therefore, we need a fix patch that can be backported before the generic pagetable for vt-d lands.
Thanks, baolu
On Thu, 2025-08-28 at 14:33 +0800, Baolu Lu wrote:
On 8/26/25 23:48, David Woodhouse wrote:
On Tue, 2025-08-26 at 14:38 +0000, Eugene Koira wrote:
switch_to_super_page() assumes the memory range it's working on is aligned to the target large page level. Unfortunately, __domain_mapping() doesn't take this into account when using it, and will pass unaligned ranges ultimately freeing a PTE range larger than expected.
Take for example a mapping with the following iov_pfn range [0x3fe400, 0x4c0600], which should be backed by the following mappings:
The range is [0x3fe400, 0x4c0600) ?
iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] to switch_to_super_page() at a 1 GiB granularity, which will in turn free PTEs all the way to iov_pfn 0x4fffff.
Mitigate this by rounding down the iov_pfn range passed to switch_to_super_page() in __domain_mapping() to the target large page level.
Additionally add range alignment checks to switch_to_super_page.
Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") Signed-off-by: Eugene Koira eugkoira@amazon.com Cc: stable@vger.kernel.org
drivers/iommu/intel/iommu.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 9c3ab9d9f69a..dff2d895b8ab 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain, unsigned long lvl_pages = lvl_to_nr_pages(level); struct dma_pte *pte = NULL;
- if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) ||
!IS_ALIGNED(end_pfn + 1, lvl_pages)))
return;
while (start_pfn <= end_pfn) { if (!pte) pte = pfn_to_dma_pte(domain, start_pfn, &level, @@ -1650,7 +1654,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, unsigned long pages_to_remove; pteval |= DMA_PTE_LARGE_PAGE;
pages_to_remove = min_t(unsigned long, nr_pages,
pages_to_remove = min_t(unsigned long,
round_down(nr_pages, lvl_pages),
nr_pte_to_next_page(pte) * lvl_pages); end_pfn = iov_pfn + pages_to_remove - 1; switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
I'm mildly entertained by the fact that the *only* comment in this block of code is a lie. Would you care to address that while you're here? Maybe the comment could look something like...
/* If the new mapping is eligible for a large page, then * remove all smaller pages that the existing pte at this * level references. * XX: do we even need to bother calling switch_to_super_page() * if this PTE wasn't *present* before? */
I bet it would benefit from one or two other one-line comments to make it clearer what's going on, too...
But in general, I think this looks sane even though this code makes my brain hurt. Could do with a test case, in an ideal world. Maybe we can work on that as part of the generic pagetable support which is coming?
Agreed. The generic pagetable work will replace this code, so this will be removed. Therefore, we need a fix patch that can be backported before the generic pagetable for vt-d lands.
Yep. And since Jason has in fact already posted the patches which *delete* all this code, I don't think I even care about fixing the comments. Eugene's patch is fine as-is.
Reviewed-by: David Woodhouse dwmw@amazon.co.uk
On 8/26/25 22:38, Eugene Koira wrote:
switch_to_super_page() assumes the memory range it's working on is aligned to the target large page level. Unfortunately, __domain_mapping() doesn't take this into account when using it, and will pass unaligned ranges ultimately freeing a PTE range larger than expected.
Take for example a mapping with the following iov_pfn range [0x3fe400, 0x4c0600], which should be backed by the following mappings:
iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] to switch_to_super_page() at a 1 GiB granularity, which will in turn free PTEs all the way to iov_pfn 0x4fffff.
Mitigate this by rounding down the iov_pfn range passed to switch_to_super_page() in __domain_mapping() to the target large page level.
Additionally add range alignment checks to switch_to_super_page.
Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") Signed-off-by: Eugene Koiraeugkoira@amazon.com Cc:stable@vger.kernel.org
drivers/iommu/intel/iommu.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
Queued for v6.17-rc. Thank you!
Thanks, baolu
linux-stable-mirror@lists.linaro.org