On Thu, Dec 26, 2024 at 11:07 AM Stanislav Fomichev stfomichev@gmail.com wrote:
On 12/21, Mina Almasry wrote:
Currently net_iovs support only pp ref counts, and do not support a page ref equivalent.
This is fine for the RX path as net_iovs are used exclusively with the pp and only pp refcounting is needed there. The TX path however does not use pp ref counts, thus, support for get_page/put_page equivalent is needed for netmem.
Support get_netmem/put_netmem. Check the type of the netmem before passing it to page or net_iov specific code to obtain a page ref equivalent.
For dmabuf net_iovs, we obtain a ref on the underlying binding. This ensures the entire binding doesn't disappear until all the net_iovs have been put_netmem'ed. We do not need to track the refcount of individual dmabuf net_iovs as we don't allocate/free them from a pool similar to what the buddy allocator does for pages.
This code is written to be extensible by other net_iov implementers. get_netmem/put_netmem will check the type of the netmem and route it to the correct helper:
pages -> [get|put]_page() dmabuf net_iovs -> net_devmem_[get|put]_net_iov() new net_iovs -> new helpers
Signed-off-by: Mina Almasry almasrymina@google.com
include/linux/skbuff_ref.h | 4 ++-- include/net/netmem.h | 3 +++ net/core/devmem.c | 10 ++++++++++ net/core/devmem.h | 11 +++++++++++ net/core/skbuff.c | 30 ++++++++++++++++++++++++++++++ 5 files changed, 56 insertions(+), 2 deletions(-)
diff --git a/include/linux/skbuff_ref.h b/include/linux/skbuff_ref.h index 0f3c58007488..9e49372ef1a0 100644 --- a/include/linux/skbuff_ref.h +++ b/include/linux/skbuff_ref.h @@ -17,7 +17,7 @@ */ static inline void __skb_frag_ref(skb_frag_t *frag) {
get_page(skb_frag_page(frag));
get_netmem(skb_frag_netmem(frag));
}
/** @@ -40,7 +40,7 @@ static inline void skb_page_unref(netmem_ref netmem, bool recycle) if (recycle && napi_pp_put_page(netmem)) return; #endif
[..]
put_page(netmem_to_page(netmem));
put_netmem(netmem);
I moved the release operation onto a workqueue in my series [1] to avoid calling dmabuf detach (which can sleep) from the socket close path (which is called with bh disabled). You should probably do something similar, see the trace attached below.
1: https://github.com/fomichev/linux/commit/3b3ad4f36771a376c204727e5a167c4993d...
(the condition to trigger that is to have an skb in the write queue and call close from the userspace)
Thanks for catching this indeed. I've also changed the unbinding to scheduled_work, although I arrived at a slightly different implementation that is simpler to my eye. I'll upload what I have for review shortly as RFC.