On 8/14/2025 12:48 PM, Ethan Zhao wrote:
On 8/6/2025 1:25 PM, Lu Baolu wrote:
In the IOMMU Shared Virtual Addressing (SVA) context, the IOMMU hardware shares and walks the CPU's page tables. The Linux x86 architecture maps the kernel address space into the upper portion of every process’s page table. Consequently, in an SVA context, the IOMMU hardware can walk and cache kernel space mappings. However, the Linux kernel currently lacks a notification mechanism for kernel space mapping changes. This means the IOMMU driver is not aware of such changes, leading to a break in IOMMU cache coherence.
Modern IOMMUs often cache page table entries of the intermediate-level page table as long as the entry is valid, no matter the permissions, to optimize walk performance. Currently the iommu driver is notified only for changes of user VA mappings, so the IOMMU's internal caches may retain stale entries for kernel VA. When kernel page table mappings are changed (e.g., by vfree()), but the IOMMU's internal caches retain stale entries, Use-After-Free (UAF) vulnerability condition arises.
If these freed page table pages are reallocated for a different purpose, potentially by an attacker, the IOMMU could misinterpret the new data as valid page table entries. This allows the IOMMU to walk into attacker- controlled memory, leading to arbitrary physical memory DMA access or privilege escalation.
To mitigate this, introduce a new iommu interface to flush IOMMU caches. This interface should be invoked from architecture-specific code that manages combined user and kernel page tables, whenever a kernel page table update is done and the CPU TLB needs to be flushed.
Fixes: 26b25a2b98e4 ("iommu: Bind process address spaces to devices") Cc: stable@vger.kernel.org Suggested-by: Jann Horn jannh@google.com Co-developed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Vasant Hegde vasant.hegde@amd.com Reviewed-by: Kevin Tian kevin.tian@intel.com Tested-by: Yi Lai yi1.lai@intel.com
arch/x86/mm/tlb.c | 4 +++ drivers/iommu/iommu-sva.c | 60 ++++++++++++++++++++++++++++++++++++++- include/linux/iommu.h | 4 +++ 3 files changed, 67 insertions(+), 1 deletion(-)
Change log: v3: - iommu_sva_mms is an unbound list; iterating it in an atomic context could introduce significant latency issues. Schedule it in a kernel thread and replace the spinlock with a mutex. - Replace the static key with a normal bool; it can be brought back if data shows the benefit. - Invalidate KVA range in the flush_tlb_all() paths. - All previous reviewed-bys are preserved. Please let me know if there are any objections.
v2: - https://lore.kernel.org/linux-iommu/20250709062800.651521-1- baolu.lu@linux.intel.com/ - Remove EXPORT_SYMBOL_GPL(iommu_sva_invalidate_kva_range); - Replace the mutex with a spinlock to make the interface usable in the critical regions.
v1: https://lore.kernel.org/linux-iommu/20250704133056.4023816-1- baolu.lu@linux.intel.com/
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 39f80111e6f1..3b85e7d3ba44 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -12,6 +12,7 @@ #include <linux/task_work.h> #include <linux/mmu_notifier.h> #include <linux/mmu_context.h> +#include <linux/iommu.h> #include <asm/tlbflush.h> #include <asm/mmu_context.h> @@ -1478,6 +1479,8 @@ void flush_tlb_all(void) else /* Fall back to the IPI-based invalidation. */ on_each_cpu(do_flush_tlb_all, NULL, 1);
+ iommu_sva_invalidate_kva_range(0, TLB_FLUSH_ALL);
Establishing such a simple one-to-one connection between CPU TLB flush and IOMMU TLB flush is debatable. At the very least, not every process is attached to an IOMMU SVA domain. Currently, devices and IOMMU operating in scalable mode are not commonly applied to every process.
You're right. As discussed, I'll defer the IOTLB invalidation to a scheduled kernel work in pte_free_kernel(). The sva_invalidation_kva_range() function on the IOMMU side is actually a no-op if there's no SVA domain in use.
Thanks, baolu