Re: [net-next v1 09/16] page_pool: device memory support

12 Dec 2023

      On Tue, Dec 12, 2023 at 3:17 AM Yunsheng Lin linyunsheng@huawei.com wrote:
...
On 2023/12/12 2:14, Mina Almasry wrote:
...
On Mon, Dec 11, 2023 at 3:51 AM Yunsheng Lin linyunsheng@huawei.com wrote:
...
On 2023/12/11 12:04, Mina Almasry wrote:
...
On Sun, Dec 10, 2023 at 6:26 PM Mina Almasry almasrymina@google.com wrote:
...
On Sun, Dec 10, 2023 at 6:04 PM Yunsheng Lin linyunsheng@huawei.com wrote:
...
On 2023/12/9 0:05, Mina Almasry wrote:
> On Fri, Dec 8, 2023 at 1:30 AM Yunsheng Lin linyunsheng@huawei.com wrote:
>>
>>
>> As mentioned before, it seems we need to have the above checking every
>> time we need to do some per-page handling in page_pool core, is there
>> a plan in your mind how to remove those kind of checking in the future?
>>
>
> I see 2 ways to remove the checking, both infeasible:
>
> 1. Allocate a wrapper struct that pulls out all the fields the page pool needs:
>
> struct netmem {
>         /* common fields */
>         refcount_t refcount;
>         bool is_pfmemalloc;
>         int nid;
>         ...
>         union {
>                 struct dmabuf_genpool_chunk_owner *owner;
>                 struct page * page;
>         };
> };
>
> The page pool can then not care if the underlying memory is iov or
> page. However this introduces significant memory bloat as this struct
> needs to be allocated for each page or ppiov, which I imagine is not
> acceptable for the upside of removing a few static_branch'd if
> statements with no performance cost.
>
> 2. Create a unified struct for page and dmabuf memory, which the mm
> folks have repeatedly nacked, and I imagine will repeatedly nack in
> the future.
>
> So I imagine the special handling of ppiov in some form is critical
> and the checking may not be removable.
If the above is true, perhaps devmem is not really supposed to be intergated
into page_pool.
Adding a checking for every per-page handling in page_pool core is just too
hacky to be really considerred a longterm solution.
The only other option is to implement another page_pool for ppiov and
have the driver create page_pool or ppiov_pool depending on the state
of the netdev_rx_queue (or some helper in the net stack to do that for
the driver). This introduces some code duplication. The ppiov_pool &
page_pool would look similar in implementation.
I think there is a design pattern already to deal with this kind of problem,
refactoring common code used by both page_pool and ppiov into a library to
aovid code duplication if most of them have similar implementation.
Code can be refactored if it's identical, not if it is similar. I
Similarity indicates an opportunity to the refactor out the common
code, like the page_frag case below:
https://patchwork.kernel.org/project/netdevbpf/cover/20231205113444.63015-1-...
But untill we do a proof of concept implemention, it is hard to tell if
it is feasiable or not.
...
suspect the page_pools will be only similar, and if you're not willing
to take devmem handling into the page pool then refactoring page_pool
code into helpers that do devmem handling may also not be an option.
...
...
...
But this was all discussed in detail in RFC v2 and the last response I
heard from Jesper was in favor if this approach, if I understand
correctly:
https://lore.kernel.org/netdev/7aedc5d5-0daf-63be-21bc-3b724cc1cab9@redhat.c...
Would love to have the maintainer weigh in here.
I should note we may be able to remove some of the checking, but maybe not all.

Checks that disable page fragging for ppiov can be removed once

ppiov has frag support (in this series or follow up).

If we use page->pp_frag_count (or page->pp_ref_count) for

refcounting ppiov, we can remove the if checking in the refcounting.
I'm not sure this is actually possible in the short term. The
page_pool uses both page->_refcount and page->pp_frag_count for
refcounting, and I will not be able to remove the special handling
around page->_refcount as i'm not allowed to call page_ref_*() APIs on
a non-struct page.
the page_ref_*() API may be avoided using the below patch:
https://patchwork.kernel.org/project/netdevbpf/patch/20231113130041.58124-7-...
Even after the patch above, you're still calling page_ref_count() in
the page_pool to check for recycling, so after that patch you're still
using page->_refcount.
...
But I am not sure how to do that for tx part if devmem for tx is not
intergating into page_pool, that is why I suggest having a tx implementation
for the next version, so that we can have a whole picture of devmem.
I strongly prefer to keep the TX implementation in a separate series.
This series is complicated to implement and review as it is, and is
hitting the 15 patch limit anyway.
...
...
...
...

We may be able to store the dma_addr of the ppiov in page->dma_addr,

but I'm unsure if that actually works, because the dma_buf dmaddr is
dma_addr_t (u32 or u64), but page->dma_addr is unsigned long (4 bytes
I think). But if it works for pages I may be able to make it work for
ppiov as well.

Checks that obtain the page->pp can work with ppiov if we align the

offset of page->pp and ppiov->pp.

Checks around page->pp_magic can be removed if we also have offset

aligned ppiov->pp_magic.
Sadly I don't see us removing the checking for these other cases:

page_is_pfmemalloc(): I'm not allowed to pass a non-struct page into

that helper.
We can do similar trick like above as bit 1 of page->pp_magic is used to
indicate that if it is a pfmemalloc page.
Likely yes.
...
...

page_to_nid(): I'm not allowed to pass a non-struct page into that helper.

Yes, this one need special case.
...

page_pool_free_va(): ppiov have no va.

Doesn't the skb_frags_readable() checking will protect the page_pool_free_va()
from being called on devmem?
This function seems to be only called from veth which doesn't support
devmem. I can remove the handling there.
...
...

page_pool_sync_for_dev/page_pool_dma_map: ppiov backed by dma-buf

fundamentally can't get mapped again.
Can we just fail the page_pool creation with PP_FLAG_DMA_MAP and
DMA_ATTR_SKIP_CPU_SYNC flags for devmem provider?
Jakub says PP_FLAG_DMA_MAP must be enabled for devmem, such that the
page_pool handles the dma mapping of the devmem and the driver doesn't
use it on its own.
I am not sure what benefit does it bring by enabling the DMA_MAP for devmem,
as devmem seems to call dma_buf_map_attachment() in netdev_bind_dmabuf(), it
does not really need enabling PP_FLAG_DMA_MAP to get the dma addr for the
devmem chunk.
-- 
Thanks,
Mina

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [net-next v1 09/16] page_pool: device memory support