Re: [PATCH] mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex_table

14 May 2025

      On Tue, 13 May 2025 17:34:48 +0800 Gavin Guo gavinguo@igalia.com wrote:
...
The patch fixes a deadlock which can be triggered by an internal
syzkaller [1] reproducer and captured by bpftrace script [2] and its log
[3] in this scenario:
Process 1                              Process 2

hugetlb_fault
  mutex_lock(B) // take B
  filemap_lock_hugetlb_folio
    filemap_lock_folio
      __filemap_get_folio
        folio_lock(A) // take A
  hugetlb_wp
    mutex_unlock(B) // release B
    ...                                hugetlb_fault
    ...                                  mutex_lock(B) // take B
                                         filemap_lock_hugetlb_folio
                                           filemap_lock_folio
                                             __filemap_get_folio
                                               folio_lock(A) // blocked
    unmap_ref_private
    ...
    mutex_lock(B) // retake and blocked
This is a ABBA deadlock involving two locks:

Lock A: pagecache_folio lock
Lock B: hugetlb_fault_mutex_table lock

Nostalgia.  A decade or three ago many of us spent much of our lives
staring at ABBA deadlocks.  Then came lockdep and after a few more
years, it all stopped.  I've long hoped that lockdep would gain a
solution to custom locks such as folio_wait_bit_common(), but not yet.
Byungchul, please take a look.  Would DEPT
(https://lkml.kernel.org/r/20250513100730.12664-1-byungchul@sk.com)
have warned us about this?
...
...
The deadlock occurs between two processes as follows:
...
Fixes: 40549ba8f8e0 ("hugetlb: use new vma_lock for pmd sharing synchronization")
Cc: stable@vger.kernel.org
It's been there for three years so I assume we aren't in a hurry.
The fix looks a bit nasty, sorry.  Perhaps designed for a minimal patch
footprint?  That's good for a backportable fixup, but a more broadly
architected solution may be needed going forward.
I'll queue it for 6.16-rc1 with a cc:stable, so this should be
presented to the -stable trees 3-4 weeks from now.

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex_table