On Fri, Dec 8, 2023 at 1:30 AM Yunsheng Lin linyunsheng@huawei.com wrote:
As mentioned before, it seems we need to have the above checking every time we need to do some per-page handling in page_pool core, is there a plan in your mind how to remove those kind of checking in the future?
I see 2 ways to remove the checking, both infeasible:
1. Allocate a wrapper struct that pulls out all the fields the page pool needs:
struct netmem { /* common fields */ refcount_t refcount; bool is_pfmemalloc; int nid; ... union { struct dmabuf_genpool_chunk_owner *owner; struct page * page; }; };
The page pool can then not care if the underlying memory is iov or page. However this introduces significant memory bloat as this struct needs to be allocated for each page or ppiov, which I imagine is not acceptable for the upside of removing a few static_branch'd if statements with no performance cost.
2. Create a unified struct for page and dmabuf memory, which the mm folks have repeatedly nacked, and I imagine will repeatedly nack in the future.
So I imagine the special handling of ppiov in some form is critical and the checking may not be removable.
Even though a static_branch check is added in page_is_page_pool_iov(), it does not make much sense that a core has tow different 'struct' for its most basic data.
IMHO, the ppiov for dmabuf is forced fitting into page_pool without much design consideration at this point.
...
For now, the above may work for the the rx part as it seems that you are only enabling rx for dmabuf for now.
What is the plan to enable tx for dmabuf? If it is also intergrated into page_pool? There was a attempt to enable page_pool for tx, Eric seemed to have some comment about this: https://lkml.kernel.org/netdev/2cf4b672-d7dc-db3d-ce90-15b4e91c4005@huawei.c...
If tx is not intergrated into page_pool, do we need to create a new layer for the tx dmabuf?
I imagine the TX path will reuse page_pool_iov, page_pool_iov_*() helpers, and page_pool_page_*() helpers, but will not need any core page_pool changes. This is because the TX path will have to piggyback on MSG_ZEROCOPY (devmem is not copyable), so no memory allocation from the page_pool (or otherwise) is needed or possible. RFCv1 had a TX implementation based on dmabuf pages without page_pool involvement, I imagine I'll do something similar.