Peter Xu peterx@redhat.com writes:
Since commit 04f2cbe35699 ("hugetlb: guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed"), avoid_reserve was introduced for a special case of CoW on hugetlb private mappings, and only if the owner VMA is trying to allocate yet another hugetlb folio that is not reserved within the private vma reserved map.
Later on, in commit d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate"), alloc_huge_page() enforced to not consume any global reservation as long as avoid_reserve=true. This operation doesn't look correct, because even if it will enforce the allocation to not use global reservation at all, it will still try to take one reservation from the spool (if the subpool existed). Then since the spool reserved pages take from global reservation, it'll also take one reservation globally.
Logically it can cause global reservation to go wrong.
I wrote a reproducer below
Thank you so much for looking into this!
<snip>
I was able to reproduce this using your reproducer. /sys/kernel/mm/hugepages/hugepages-2048kB/resv_hugepages is not decremented even after the reproducer exits.
# sysctl vm.nr_hugepages=16 vm.nr_hugepages = 16 # mkdir ./hugetlb-pool # mount -t hugetlbfs -o min_size=8M,pagesize=2M none ./hugetlb-pool # for i in $(seq 16); do ./a.out hugetlb-pool/test; cat /sys/kernel/mm/hugepages/hugepages-2048kB/resv_hugepages; done 5 6 7 8 9 10 11 12 13 14 15 16 16 16 16 16 #
I'll go over the rest of your patches and dig into the meaning of `avoid_reserve`.