Re: [PATCH 1/7] mm/hugetlb: Fix avoid_reserve to allow taking folio from subpool

27 Dec 2024


      Ackerley Tng ackerleytng@google.com writes:
...
<snip>
I'll go over the rest of your patches and dig into the meaning of `avoid_reserve`.
Yes, after looking into this more deeply, I agree that avoid_reserve
means avoiding the reservations in the resv_map rather than reservations
in the subpool or hstate.
Here's more detail of what's going on in the reproducer that I wrote as I
reviewed Peter's patch:
1. On fallocate(), allocate page A
2. On mmap(), set up a vma without VM_MAYSHARE since MAP_PRIVATE was requested
3. On faulting *buf = 1, allocate a new page B, copy A to B because the mmap request was MAP_PRIVATE
4. On fork, prep for COW by marking page as read only. Both parent and child share B.
5. On faulting *buf = 2 (write fault), allocate page C, copy B to C
    + B belongs to the child, C belongs to the parent
    + C is owned by the parent
6. Child exits, B is freed
7. On munmap(), C is freed
8. On unlink(), A is freed
When C was allocated in the parent (owns MAP_PRIVATE page, doing a copy on
write), spool->rsv_hpages was decreased but h->resv_huge_pages was not. This is
the root of the bug.
We should decrement h->resv_huge_pages if a reserved page from the subpool was
used, instead of whether avoid_reserve or vma_has_reserves() is set. If
avoid_reserve is set, the subpool shouldn't be checked for a reservation, so we
won't be decrementing h->resv_huge_pages anyway.
I agree with Peter's fix as a whole (the entire patch series).
Reviewed-by: Ackerley Tng ackerleytng@google.com
Tested-by: Ackerley Tng ackerleytng@google.com
---
Some definitions which might be helpful:
+ h->resv_huge_pages indicates number of reserved pages globally.
    + This number increases when pages are reserved
    + This number decreases when reserved pages are allocated, or when pages are unreserved
+ spool->rsv_hpages indicates number of reserved pages in this subpool.
    + This number increases when pages are reserved
    + This number decreases when reserved pages are allocated, or when pages are unreserved
+ h->resv_huge_pages should be the sum of all subpools' spool->rsv_hpages.
More details on the flow in alloc_hugetlb_folio() which might be helpful:
hugepage_subpool_get_pages() returns "the number of pages by which the global
pools must be adjusted (upward)". This return value is never negative other than
errors. (hugepage_subpool_get_pages() always gets called with a positive delta).
Specifically in alloc_hugetlb_folio(), the return value is either 0 or 1 (other
than errors).
If the return value is 0, the subpool had enough reservations and so we should
decrement h->resv_huge_pages.
If the return value is 1, it means that this subpool did not have any more
reserved hugepages, and we need to get a page from the global
hstate. dequeue_hugetlb_folio_vma() will get us a page that was already
allocated.
In dequeue_hugetlb_folio_vma(), if the vma doesn't have enough reserves for 1
page, and there are no available_huge_pages() left, we quit dequeueing since we
will need to allocate a new page. If we want to avoid_reserve, that means we
don't want to use the vma's reserves in resv_map, we also check
available_huge_pages(). If there are available_huge_pages(), we go on to dequeue
a page.
Then, we determine whether to decrement h->resv_huge_pages. We should decrement
if a reserved page from the subpool was used, instead of whether avoid_reserve
or vma_has_reserves() is set.
In the case where a surplus page needs to be allocated, the surplus page isn't
and doesn't need to be associated with a subpool, so no subpool hugepage number
tracking updates are required. h->resv_huge_pages still has to be updated... is
this where h->resv_huge_pages can go negative?

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 1/7] mm/hugetlb: Fix avoid_reserve to allow taking folio from subpool