In commit d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags") the use of PagePrivate to indicate a reservation count should be restored at free time was changed to the hugetlb specific flag HPageRestoreReserve. Changes to a userfaultfd error path as well as a VM_BUG_ON() in remove_inode_hugepages() were overlooked.
Users could see incorrect hugetlb reserve counts if they experience an error with a UFFDIO_COPY operation. Specifically, this would be the result of an unlikely copy_huge_page_from_user error. There is not an increased chance of hitting the VM_BUG_ON.
Fixes: d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags") Cc: stable@vger.kernel.org Signed-off-by: Mike Kravetz mike.kravetz@oracle.com --- fs/hugetlbfs/inode.c | 2 +- mm/userfaultfd.c | 28 ++++++++++++++-------------- 2 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 9d9e0097c1d3..55efd3dd04f6 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -529,7 +529,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, * the subpool and global reserve usage count can need * to be adjusted. */ - VM_BUG_ON(PagePrivate(page)); + VM_BUG_ON(HPageRestoreReserve(page)); remove_huge_page(page); freed++; if (!truncate_op) { diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 5508f7d9e2dc..9ce5a3793ad4 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -432,38 +432,38 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, * If a reservation for the page existed in the reservation * map of a private mapping, the map was modified to indicate * the reservation was consumed when the page was allocated. - * We clear the PagePrivate flag now so that the global + * We clear the HPageRestoreReserve flag now so that the global * reserve count will not be incremented in free_huge_page. * The reservation map will still indicate the reservation * was consumed and possibly prevent later page allocation. * This is better than leaking a global reservation. If no - * reservation existed, it is still safe to clear PagePrivate - * as no adjustments to reservation counts were made during - * allocation. + * reservation existed, it is still safe to clear + * HPageRestoreReserve as no adjustments to reservation counts + * were made during allocation. * * The reservation map for shared mappings indicates which * pages have reservations. When a huge page is allocated * for an address with a reservation, no change is made to - * the reserve map. In this case PagePrivate will be set - * to indicate that the global reservation count should be + * the reserve map. In this case HPageRestoreReserve will be + * set to indicate that the global reservation count should be * incremented when the page is freed. This is the desired * behavior. However, when a huge page is allocated for an * address without a reservation a reservation entry is added - * to the reservation map, and PagePrivate will not be set. - * When the page is freed, the global reserve count will NOT - * be incremented and it will appear as though we have leaked - * reserved page. In this case, set PagePrivate so that the - * global reserve count will be incremented to match the - * reservation map entry which was created. + * to the reservation map, and HPageRestoreReserve will not be + * set. When the page is freed, the global reserve count will + * NOT be incremented and it will appear as though we have + * leaked reserved page. In this case, set HPageRestoreReserve + * so that the global reserve count will be incremented to + * match the reservation map entry which was created. * * Note that vm_alloc_shared is based on the flags of the vma * for which the page was originally allocated. dst_vma could * be different or NULL on error. */ if (vm_alloc_shared) - SetPagePrivate(page); + SetHPageRestoreReserve(page); else - ClearPagePrivate(page); + ClearHPageRestoreReserve(page); put_page(page); } BUG_ON(copied < 0);
On Fri, May 21, 2021 at 4:40 PM Mike Kravetz mike.kravetz@oracle.com wrote:
In commit d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags") the use of PagePrivate to indicate a reservation count should be restored at free time was changed to the hugetlb specific flag HPageRestoreReserve. Changes to a userfaultfd error path as well as a VM_BUG_ON() in remove_inode_hugepages() were overlooked.
Users could see incorrect hugetlb reserve counts if they experience an error with a UFFDIO_COPY operation. Specifically, this would be the result of an unlikely copy_huge_page_from_user error. There is not an increased chance of hitting the VM_BUG_ON.
Fixes: d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags") Cc: stable@vger.kernel.org Signed-off-by: Mike Kravetz mike.kravetz@oracle.com
fs/hugetlbfs/inode.c | 2 +- mm/userfaultfd.c | 28 ++++++++++++++-------------- 2 files changed, 15 insertions(+), 15 deletions(-)
Reviewed-by: Mina Almasry almasry.mina@google.com
I'm guessing this may interact with my patch in review "[PATCH v3] mm, hugetlb: fix resv_huge_pages underflow on UFFDIO_COPY". I'll rebase my patch and re-test.
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 9d9e0097c1d3..55efd3dd04f6 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -529,7 +529,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, * the subpool and global reserve usage count can need * to be adjusted. */
VM_BUG_ON(PagePrivate(page));
VM_BUG_ON(HPageRestoreReserve(page)); remove_huge_page(page); freed++; if (!truncate_op) {
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 5508f7d9e2dc..9ce5a3793ad4 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -432,38 +432,38 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, * If a reservation for the page existed in the reservation * map of a private mapping, the map was modified to indicate * the reservation was consumed when the page was allocated.
* We clear the PagePrivate flag now so that the global
* We clear the HPageRestoreReserve flag now so that the global * reserve count will not be incremented in free_huge_page. * The reservation map will still indicate the reservation * was consumed and possibly prevent later page allocation. * This is better than leaking a global reservation. If no
* reservation existed, it is still safe to clear PagePrivate
* as no adjustments to reservation counts were made during
* allocation.
* reservation existed, it is still safe to clear
* HPageRestoreReserve as no adjustments to reservation counts
* were made during allocation. * * The reservation map for shared mappings indicates which * pages have reservations. When a huge page is allocated * for an address with a reservation, no change is made to
* the reserve map. In this case PagePrivate will be set
* to indicate that the global reservation count should be
* the reserve map. In this case HPageRestoreReserve will be
* set to indicate that the global reservation count should be * incremented when the page is freed. This is the desired * behavior. However, when a huge page is allocated for an * address without a reservation a reservation entry is added
* to the reservation map, and PagePrivate will not be set.
* When the page is freed, the global reserve count will NOT
* be incremented and it will appear as though we have leaked
* reserved page. In this case, set PagePrivate so that the
* global reserve count will be incremented to match the
* reservation map entry which was created.
* to the reservation map, and HPageRestoreReserve will not be
* set. When the page is freed, the global reserve count will
* NOT be incremented and it will appear as though we have
* leaked reserved page. In this case, set HPageRestoreReserve
* so that the global reserve count will be incremented to
* match the reservation map entry which was created. * * Note that vm_alloc_shared is based on the flags of the vma * for which the page was originally allocated. dst_vma could * be different or NULL on error. */ if (vm_alloc_shared)
SetPagePrivate(page);
SetHPageRestoreReserve(page); else
ClearPagePrivate(page);
ClearHPageRestoreReserve(page); put_page(page); } BUG_ON(copied < 0);
-- 2.31.1
On Sat, May 22, 2021 at 7:40 AM Mike Kravetz mike.kravetz@oracle.com wrote:
In commit d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags") the use of PagePrivate to indicate a reservation count should be restored at free time was changed to the hugetlb specific flag HPageRestoreReserve. Changes to a userfaultfd error path as well as a VM_BUG_ON() in remove_inode_hugepages() were overlooked.
Users could see incorrect hugetlb reserve counts if they experience an error with a UFFDIO_COPY operation. Specifically, this would be the result of an unlikely copy_huge_page_from_user error. There is not an increased chance of hitting the VM_BUG_ON.
Fixes: d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags") Cc: stable@vger.kernel.org Signed-off-by: Mike Kravetz mike.kravetz@oracle.com
Reviewed-by: Muchun Song songmuchun@bytedance.com
Thanks Mike.
linux-stable-mirror@lists.linaro.org