Re: [PATCH v2 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush

16 Jul 2025


      On 7/15/25 20:25, Jason Gunthorpe wrote:
...
On Tue, Jul 15, 2025 at 01:55:01PM +0800, Baolu Lu wrote:
...
Yes, the mm (struct mm of processes that are bound to devices) list is
an unbounded list and can theoretically grow indefinitely. This results
in an unpredictable critical region.
Every MM has a unique PASID so I don't see how you can avoid this.
...
@@ -654,6 +656,9 @@ struct iommu_ops {
int (*def_domain_type)(struct device *dev);

void (*paging_cache_invalidate)(struct iommu_device *dev,
			unsigned long start, unsigned long end);



How would you even implement this in a driver?
You either flush the whole iommu, in which case who needs a rage, or
the driver has to iterate over the PASID list, in which case it
doesn't really improve the situation.
The Intel iommu driver supports flushing all SVA PASIDs with a single
request in the invalidation queue. I am not sure if other IOMMU
implementations also support this, so you are right, it doesn't
generally improve the situation.
...
If this is a concern I think the better answer is to do a defered free
like the mm can sometimes do where we thread the page tables onto a
linked list, flush the CPU cache and push it all into a work which
will do the iommu flush before actually freeing the memory.
Is it a workable solution to use schedule_work() to queue the KVA cache
invalidation as a work item in the system workqueue? By doing so, we
wouldn't need the spinlock to protect the list anymore.
Perhaps we would need another interface, perhaps named
iommu_sva_flush_kva_inv_wq(), to guarantee that all flush work is
completed before actually freeing the pages.
...
One of the KPTI options might be easier at that point..
Jason
Thanks,
baolu

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v2 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush