Re: [net-next v1 09/16] page_pool: device memory support

11 Dec 2023

      On Sun, Dec 10, 2023 at 6:04 PM Yunsheng Lin linyunsheng@huawei.com wrote:
...
On 2023/12/9 0:05, Mina Almasry wrote:
...
On Fri, Dec 8, 2023 at 1:30 AM Yunsheng Lin linyunsheng@huawei.com wrote:
...
As mentioned before, it seems we need to have the above checking every
time we need to do some per-page handling in page_pool core, is there
a plan in your mind how to remove those kind of checking in the future?
I see 2 ways to remove the checking, both infeasible:

Allocate a wrapper struct that pulls out all the fields the page pool needs:

struct netmem {
        /* common fields */
        refcount_t refcount;
        bool is_pfmemalloc;
        int nid;
        ...
        union {
                struct dmabuf_genpool_chunk_owner *owner;
                struct page * page;
        };
};
The page pool can then not care if the underlying memory is iov or
page. However this introduces significant memory bloat as this struct
needs to be allocated for each page or ppiov, which I imagine is not
acceptable for the upside of removing a few static_branch'd if
statements with no performance cost.

Create a unified struct for page and dmabuf memory, which the mm

folks have repeatedly nacked, and I imagine will repeatedly nack in
the future.
So I imagine the special handling of ppiov in some form is critical
and the checking may not be removable.
If the above is true, perhaps devmem is not really supposed to be intergated
into page_pool.
Adding a checking for every per-page handling in page_pool core is just too
hacky to be really considerred a longterm solution.
The only other option is to implement another page_pool for ppiov and
have the driver create page_pool or ppiov_pool depending on the state
of the netdev_rx_queue (or some helper in the net stack to do that for
the driver). This introduces some code duplication. The ppiov_pool &
page_pool would look similar in implementation.
But this was all discussed in detail in RFC v2 and the last response I
heard from Jesper was in favor if this approach, if I understand
correctly:
https://lore.kernel.org/netdev/7aedc5d5-0daf-63be-21bc-3b724cc1cab9@redhat.c...
Would love to have the maintainer weigh in here.
...
It is somewhat ironical that devmem is using static_branch to alliviate the
performance impact for normal memory at the possible cost of performance
degradation for devmem, does it not defeat some purpose of intergating devmem
to page_pool?
I don't see the issue. The static branch sets the non-ppiov path as
default if no memory providers are in use, and flips it when they are,
making the default branch prediction ideal in both cases.
...
...
...
Even though a static_branch check is added in page_is_page_pool_iov(), it
does not make much sense that a core has tow different 'struct' for its
most basic data.
IMHO, the ppiov for dmabuf is forced fitting into page_pool without much
design consideration at this point.
...
...
For now, the above may work for the the rx part as it seems that you are
only enabling rx for dmabuf for now.
What is the plan to enable tx for dmabuf? If it is also intergrated into
page_pool? There was a attempt to enable page_pool for tx, Eric seemed to
have some comment about this:
https://lkml.kernel.org/netdev/2cf4b672-d7dc-db3d-ce90-15b4e91c4005@huawei.c...
If tx is not intergrated into page_pool, do we need to create a new layer for
the tx dmabuf?
I imagine the TX path will reuse page_pool_iov, page_pool_iov_*()
helpers, and page_pool_page_*() helpers, but will not need any core
page_pool changes. This is because the TX path will have to piggyback
We may need another bit/flags checking to demux between page_pool owned
devmem and non-page_pool owned devmem.
The way I'm imagining the support, I don't see the need for such
flags. We'd be re-using generic helpers like
page_pool_iov_get_dma_address() and what not that don't need that
checking.
...
Also calling page_pool_*() on non-page_pool owned devmem is confusing
enough that we may need a thin layer handling non-page_pool owned devmem
in the end.
The page_pool_page* & page_pool_iov* functions can be renamed if
confusing. I would think that's no issue (note that the page_pool_*
functions need not be called for TX path).
...
...
on MSG_ZEROCOPY (devmem is not copyable), so no memory allocation from
the page_pool (or otherwise) is needed or possible. RFCv1 had a TX
implementation based on dmabuf pages without page_pool involvement, I
imagine I'll do something similar.
It would be good to have a tx implementation for the next version, so
that we can have a whole picture of devmem.
...
-- 
Thanks,
Mina

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [net-next v1 09/16] page_pool: device memory support