-----Original Message----- From: Haiyang Zhang haiyangz@microsoft.com Sent: Saturday, March 1, 2025 12:00 PM To: Yunsheng Lin yunshenglin0825@gmail.com; linux- hyperv@vger.kernel.org; akpm@linux-foundation.org; linux-mm@kvack.org Cc: Dexuan Cui decui@microsoft.com; KY Srinivasan kys@microsoft.com; Paul Rosswurm paulros@microsoft.com; olaf@aepfle.de; vkuznets vkuznets@redhat.com; davem@davemloft.net; wei.liu@kernel.org; Long Li longli@microsoft.com; linux-kernel@vger.kernel.org; linyunsheng@huawei.com; stable@vger.kernel.org; netdev@vger.kernel.org; Alexander Duyck alexander.duyck@gmail.com Subject: RE: [EXTERNAL] Re: [PATCH] mm: page_frag: Fix refill handling in __page_frag_alloc_align()
-----Original Message----- From: Yunsheng Lin yunshenglin0825@gmail.com Sent: Saturday, March 1, 2025 8:50 AM To: Haiyang Zhang haiyangz@microsoft.com; linux-
hyperv@vger.kernel.org;
akpm@linux-foundation.org; linux-mm@kvack.org Cc: Dexuan Cui decui@microsoft.com; KY Srinivasan kys@microsoft.com; Paul Rosswurm paulros@microsoft.com; olaf@aepfle.de; vkuznets vkuznets@redhat.com; davem@davemloft.net; wei.liu@kernel.org; Long Li longli@microsoft.com; linux-kernel@vger.kernel.org; linyunsheng@huawei.com; stable@vger.kernel.org; netdev@vger.kernel.org; Alexander Duyck alexander.duyck@gmail.com Subject: [EXTERNAL] Re: [PATCH] mm: page_frag: Fix refill handling in __page_frag_alloc_align()
+cc netdev ML & Alexander
On 3/1/2025 10:03 AM, Haiyang Zhang wrote:
In commit 8218f62c9c9b ("mm: page_frag: use initial zero offset for page_frag_alloc_align()"), the check for fragsz is moved earlier. So when the cache is used up, and if the fragsz > PAGE_SIZE, it won't try to refill, and just return NULL. I tested it with fragsz:8192, cache-size:32768. After the initial four successful allocations, it failed, even there is plenty of free memory in the system.
Hi, Haiyang It seems the PAGE_SIZE is 4K for the tested system?
Yes.
Which drivers or subsystems are passing the fragsz being bigger than PAGE_SIZE to page_frag_alloc_align() related API?
For example, our MANA driver when using jumbo frame. https://web.git/. kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fnetdev%2Fnet- next.git%2Ftree%2Fdrivers%2Fnet%2Fethernet%2Fmicrosoft%2Fmana&data=05%7C02 %7Chaiyangz%40microsoft.com%7Cea9cc3de8c904a5c720408dd58e2913a%7C72f988bf8 6f141af91ab2d7cd011db47%7C1%7C0%7C638764452327076527%7CUnknown%7CTWFpbGZsb 3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWF pbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=7%2BZ9hBYGudGZUeeF0i4UEa3zjx4ZLFd q5E3qcZxnIWE%3D&reserved=0
To fix, revert the refill logic like before: the refill is attempted before the check & return NULL.
page_frag API is not really for allocating memory being bigger than PAGE_SIZE as __page_frag_cache_refill() will not try hard enough to allocate order 3 compound page when calling __alloc_pages() and will fail back to allocate base page as the discussed in below:
https://lore.ker/ %2F&data=05%7C02%7Chaiyangz%40microsoft.com%7Cea9cc3de8c904a5c720408dd58e2 913a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638764452327105287%7CUnk nown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4 zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=g4VAI8DbzUD95qgth vzFV0PYgOIA3%2F%2FI3gmQHzuLwbo%3D&reserved=0
nel.org%2Fall%2Fead00fb7-8538-45b3-8322-
8a41386e7381%40huawei.com%2F&data=05%7C02%7Chaiyangz%40microsoft.com%7Cd73
d6a0ae65b4a42681c08dd58c8087b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7
C638764338396356411%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOi
IwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7
C&sdata=FJ7Ggrxxxv6QzKepUiHmtns1GZC2G2oJMcWSzOuFbsE%3D&reserved=0
We are already aware of this, and have error checking in place for the failover case to "base page".
From the discussion thread above, there are other drivers using page_frag_alloc_align() for over PAGE_SIZE too. If making the page_frag API support only fragsz <= PAGE_SIZE is desired, can we create another API? One keeps the existing API semantics (allowing > PAGE_SIZE), the other uses your new code. By the way, it should add an explicit check and fail ALL requests for fragsz > PAGE_SIZE. Currently your code successfully allocates big frags for a few times, then fail. This is not a desired behavior. It's also a breaking change for our MANA driver, which can no longer run Jumbo frames.
@Andrew Morton akpm@linux-foundation.org And other maintainers, could you please also evaluate the idea above?
And, quote from current doc 6.14.0-rc4: "A page fragment is an arbitrary-length arbitrary-offset area of memory which resides within a 0 or higher order compound page." https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree...
So, it is designed to be *arbitrary-length* within a 0 or higher order compound page.
If the commit 8218f62c9c9b ("mm: page_frag: use initial zero offset for page_frag_alloc_align()") intended to change the existing API semantics to be Page Frag Length <= PAGE_SIZE, the document and all breaking drivers need to be updated.
Thanks, - Haiyang