On 12/14/23 20:03, Mina Almasry wrote:
On Mon, Dec 11, 2023 at 12:37 PM Pavel Begunkov asml.silence@gmail.com wrote: ...
If you remove the branch, let it fall into ->release and rely on refcounting there, then the callback could also fix up release_cnt or ask pp to do it, like in the patch I linked above
Sadly I don't think this is possible due to the reasons I mention in the commit message of that patch. Prematurely releasing ppiov and not having them be candidates for recycling shows me a 4-5x degradation in performance.
I don't think I follow. The concept is to only recycle a buffer (i.e. make it available for allocation) when its refs drop to zero, which is IMHO the only way it can work, and IIUC what this patchset is doing.
That's also I suggest to do, but through a slightly different path. Let's say at some moment there are 2 refs (e.g. 1 for an skb and 1 for userspace/xarray).
Say it first puts the skb:
napi_pp_put_page() -> page_pool_return_page() -> mp_ops->release_page() -> need_to_free = put_buf() // not last ref, need_to_free==false, // don't recycle, don't increase release_cnt
Then you put the last ref:
page_pool_iov_put_many() -> page_pool_return_page() -> mp_ops->release_page() -> need_to_free = put_buf() // last ref, need_to_free==true, // recycle and release_cnt++
And that last put can even be recycled right into the pp / ptr_ring, in which case it doesn't need to touch release_cnt. Does it make sense? I don't see where 4-5x degradation would come from
Sorry for the late reply, I have been working on this locally.
What you're saying makes sense, and I'm no longer sure why I was seeing a perf degradation without '[net-next v1 10/16] page_pool: don't release iov on elevanted refcount'. However, even though what you're saying is technically correct, AFAIU it's actually semantically wrong. When a page is released by the page_pool, we should call page_pool_clear_pp_info() and completely disconnect the page from the pool. If we call release_page() on a page and then the page pool sees it again in page_pool_return_page(), I think that is considered a bug.
You're adding a new feature the semantics of which is already different from what is in there, you can extend it any way as long as it makes sense and agreed on. IMHO, it does. But well, if there is a better solution I'm all for it.
In fact I think what you're proposing is as a result of a bug because we don't call a page_pool_clear_pp_info() equivalent on releasing ppiov.
I don't get it, what bug? page_pool_clear_pp_info() is not called for ppiov because it doesn't make sense to call it for ppiov, there is no reason to clear ppiov->pp, nor there is any pp_magic.
However, I'm reasonably confident I figured out the right thing to do here. The page_pool uses page->pp_frag_count for its refcounting. pp_frag_count is a misnomer, it's being renamed to pp_ref_count in Liang's series[1]). In this series I used a get_page/put_page equivalent for refcounting. Once I transitioned to using pp_[frag|ref]_count for refcounting inside the page_pool, the issue went away, and I no longer need the patch 'page_pool: don't release iov on elevanted refcount'.
Lovely, I'll take a look later! (also assuming it's in v5)
There is an additional upside, since pages and ppiovs are both being refcounted using pp_[frag|ref]_count, we get some unified handling for ppiov and we reduce the checks around ppiov. This should be fixed properly in the next series.
I still need to do some work (~1 week) before I upload the next version as there is a new requirement from MM that we transition to a new type and not re-use page*, but I uploaded my changes github with the refcounting issues resolved in case they're useful to you. Sorry for the churn:
https://github.com/mina/linux/commits/tcpdevmem-v1.5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=809049&state...