Re: [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush

27 Aug 2025


      On 8/26/2025 10:22 PM, Dave Hansen wrote:
...
On 8/25/25 19:49, Baolu Lu wrote:
...
...
The three separate lists are needed because we're handling three
distinct types of page deallocation. Grouping the pages this way allows
the workqueue handler to free each type using the correct function.
Please allow me to add more details.
Right, I know why it got added this way: it was the quickest way to hack
together a patch that fixes the IOMMU issue without refactoring anything.
I agree that you have three cases:

A full on 'struct ptdesc' that needs its destructor run
An order-0 'struct page'
A higher-order 'struct page'

Long-term, #2 and #3 probably need to get converted over to 'struct
ptdesc'. They don't look _that_ hard to convert to me. Willy, Vishal,
any other mm folks: do you agree?
Short-term, I'd just consolidate your issue down to a single list.
#1: For 'struct ptdesc', modify pte_free_kernel() to pass information in
     to pagetable_dtor_free() to tell it to use the deferred page table
     free list. Do this with a bit in the ptdesc or a new argument to
     pagetable_dtor_free().
#2. Just append these to the deferred page table free list. Easy.
#3. The biggest hacky way to do this is to just treat the higher-order
     non-compound page and put the pages on the deferred page table
     free list one at a time. The other way to do it is to track down how
     this thing got allocated in the first place and make sure it's got
     __GFP_COMP metadata. If so, you can just use __free_pages() for
     everything.
Yeah, it'll take a couple patches up front to refactor some things. But
that refactoring will make things more consistent instead of adding
adding complexity to deal with the inconsistency.
Following the insights above, I wrote the code as follows. Does it look
good?
mm/pgtable-generic.c:
#ifdef CONFIG_ASYNC_PGTABLE_FREE
/* a 'struct ptdesc' that needs its destructor run */
#define ASYNC_PGTABLE_FREE_DTOR	BIT(NR_PAGEFLAGS)
static void kernel_pgtable_work_func(struct work_struct *work);
static struct {
    struct list_head list;
    /* protect above ptdesc lists */
    spinlock_t lock;
    struct work_struct work;
} kernel_pgtable_work = {
    .list = LIST_HEAD_INIT(kernel_pgtable_work.list),
    .lock = __SPIN_LOCK_UNLOCKED(kernel_pgtable_work.lock),
    .work = __WORK_INITIALIZER(kernel_pgtable_work.work, 
kernel_pgtable_work_func),
};
static void kernel_pgtable_work_func(struct work_struct *work)
{
    struct ptdesc *ptdesc, *next;
    LIST_HEAD(page_list);
spin_lock(&kernel_pgtable_work.lock);
    list_splice_tail_init(&kernel_pgtable_work.list, &page_list);
    spin_unlock(&kernel_pgtable_work.lock);
iommu_sva_invalidate_kva_range(0, TLB_FLUSH_ALL);
list_for_each_entry_safe(ptdesc, next, &page_list, pt_list) {
    	list_del(&ptdesc->pt_list);
    	if (ptdesc->__page_flags & ASYNC_PGTABLE_FREE_DTOR)
    		pagetable_dtor_free(ptdesc);
    	else
    		__free_page(ptdesc_page(ptdesc));
    }
}
void kernel_pgtable_async_free_dtor(struct ptdesc *ptdesc)
{
    spin_lock(&kernel_pgtable_work.lock);
    ptdesc->__page_flags |= ASYNC_PGTABLE_FREE_DTOR;
    list_add(&ptdesc->pt_list, &kernel_pgtable_work.list);
    spin_unlock(&kernel_pgtable_work.lock);
schedule_work(&kernel_pgtable_work.work);
}
void kernel_pgtable_async_free_page_list(struct list_head *list)
{
    spin_lock(&kernel_pgtable_work.lock);
    list_splice_tail(list, &kernel_pgtable_work.list);
    spin_unlock(&kernel_pgtable_work.lock);
schedule_work(&kernel_pgtable_work.work);
}
void kernel_pgtable_async_free_page(struct page *page)
{
    spin_lock(&kernel_pgtable_work.lock);
    list_add(&page_ptdesc(page)->pt_list, &kernel_pgtable_work.list);
    spin_unlock(&kernel_pgtable_work.lock);
schedule_work(&kernel_pgtable_work.work);
}
#endif
Thanks,
baolu

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush