Re: [PATCH] mm/shmem, swap: fix race of truncate and swap entry split

12 Jan 2026

On Mon, Jan 12, 2026 at 12:00 PM Baolin Wang
baolin.wang@linux.alibaba.com wrote:
...
On 1/12/26 1:53 AM, Kairui Song wrote:
...
From: Kairui Song kasong@tencent.com
The helper for shmem swap freeing is not handling the order of swap
entries correctly. It uses xa_cmpxchg_irq to erase the swap entry,
but it gets the entry order before that using xa_get_order
without lock protection. As a result the order could be a stalled value
if the entry is split after the xa_get_order and before the
xa_cmpxchg_irq. In fact that are more way for other races to occur
during the time window.
To fix that, open code the Xarray cmpxchg and put the order retrivial and
value checking in the same critical section. Also ensure the order won't
exceed the truncate border.
I observed random swapoff hangs and swap entry leaks when stress
testing ZSWAP with shmem. After applying this patch, the problem is resolved.
Fixes: 809bc86517cc ("mm: shmem: support large folio swap out")
Cc: stable@vger.kernel.org
Signed-off-by: Kairui Song kasong@tencent.com

mm/shmem.c | 35 +++++++++++++++++++++++------------
  1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 0b4c8c70d017..e160da0cd30f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -961,18 +961,28 @@ static void shmem_delete_from_page_cache(struct folio *folio, void *radswap)

the number of pages being freed. 0 means entry not found in XArray (0 pages
being freed).

*/
-static long shmem_free_swap(struct address_space *mapping,

                    pgoff_t index, void *radswap)



+static long shmem_free_swap(struct address_space *mapping, pgoff_t index,

                    unsigned int max_nr, void *radswap)

{


int order = xa_get_order(&mapping->i_pages, index);


void *old;




XA_STATE(xas, &mapping->i_pages, index);


unsigned int nr_pages = 0;


void *entry;





old = xa_cmpxchg_irq(&mapping->i_pages, index, radswap, NULL, 0);


if (old != radswap)


        return 0;


swap_put_entries_direct(radix_to_swp_entry(radswap), 1 << order);




xas_lock_irq(&xas);


entry = xas_load(&xas);


if (entry == radswap) {


        nr_pages = 1 << xas_get_order(&xas);


        if (index == round_down(xas.xa_index, nr_pages) && nr_pages < max_nr)


                xas_store(&xas, NULL);


        else


                nr_pages = 0;


}


xas_unlock_irq(&xas);



if (nr_pages)


        swap_put_entries_direct(radix_to_swp_entry(radswap), nr_pages);





return 1 << order;




return nr_pages;

}

Thanks for the analysis, and it makes sense to me. Would the following
implementation be simpler and also address your issue (we will not
release the lock in __xa_cmpxchg() since gfp = 0)?
Hi Baolin,
...
static long shmem_free_swap(struct address_space *mapping,
                             pgoff_t index, void *radswap)
{
         XA_STATE(xas, &mapping->i_pages, index);
         int order;
         void *old;
     xas_lock_irq(&xas);
     order = xas_get_order(&xas);

Thanks for the suggestion. I did consider implementing it this way,
but I was worried that the order could grow upwards. For example
shmem_undo_range is trying to free 0-95 and there is an entry at 64
with order 5 (64 - 95). Before shmem_free_swap is called, the entry
was swapped in, then the folio was freed, then an order 6 folio was
allocated there and swapped out again using the same entry.
Then here it will free the whole order 6 entry (64 - 127), while
shmem_undo_range is only supposed to erase (0-96).
That's why I added a max_nr argument to the helper. The GFP == 0 below
looks not very clean either, that's trivial though.
...
     old = __xa_cmpxchg(xas.xa, index, radswap, NULL, 0);

Am I overthinking it?

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] mm/shmem, swap: fix race of truncate and swap entry split