Re: [PATCH RFC net-next v1 5/5] net: devmem: Implement TX path

28 Jan 2025


      Hi Willem, sorry for the late reply, some holiday vacations and other
priorities pulled me away from this a bit. I'm getting ready to post
RFC V2, so answering some questions ahead of when I do that.
On Thu, Dec 26, 2024 at 1:52 PM Willem de Bruijn
willemdebruijn.kernel@gmail.com wrote:
...
...
...
+static int zerocopy_fill_skb_from_devmem(struct sk_buff *skb,

                                 struct msghdr *msg,


                                 struct iov_iter *from, int length)



+{

int i = skb_shinfo(skb)->nr_frags;


int orig_length = length;


netmem_ref netmem;


size_t size;



while (length && iov_iter_count(from)) {


        if (i == MAX_SKB_FRAGS)


                return -EMSGSIZE;



        size = min_t(size_t, iter_iov_len(from), length);


        if (!size)


                return -EFAULT;



On error, should caller skb_zerocopy_iter_stream rewind from, rather
than (or as well as) msg->msg_iter?
Ah, so this was confusing, because there were 2 iterators to keep
track off, (a) binding->tx_iter, which is `from` and (b)
msg->msg_iter.
Stan suggested removing binding->tx_iter entirely, so that we're back
to using only 1 iterator, which is msg->msg_iter. That does simplify
the code greatly, and I think addresses this comment as well, because
there will be no need to make sure from is advanced/reverted with
msg->msg_iter.
...
...


        netmem = net_iov_to_netmem(iter_iov(from)->iov_base);


        get_netmem(netmem);


        skb_add_rx_frag_netmem(skb, i, netmem, from->iov_offset, size,


                               PAGE_SIZE);



        iov_iter_advance(from, size);


        length -= size;


        i++;


}



iov_iter_advance(&msg->msg_iter, orig_length);



What does this do if sendmsg is called with NULL as buffer?
So even if iov_base == NULL, the iterator is created anyhow. The
iterator will be from addresses 0 -> iov_len.
In the next iteration, I've applied Stan's suggestion to use iov_base
as the offset into the dma-buf to send from. I think it ends up being
a much cleaner UAPI, but let me know what you think.
...
...
...
struct net_iov *
 net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding)
@@ -109,6 +112,13 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding)
      unsigned long xa_idx;
      unsigned int rxq_idx;

xa_erase(&net_devmem_dmabuf_bindings, binding->id);



/* Ensure no tx net_devmem_lookup_dmabuf() are in flight after the


 * erase.


 */


synchronize_net();




What precisely does this protect?
synchronize_net() ensures no packet is in flight inside an rcu
readside section. But a packet can still be in flight, such as posted
to the device or queued in a qdisc.
The TX data path does a net_devmem_lookup_dmabuf() to lookup the
dmabuf_id provided by the user.
But that dmabuf_id may be unbind'd via net_devmem_unbind_dmabuf () by
the user at any time, so some synchronization is needed to make sure
we don't do a send from a dmabuf that is being freed in another
thread.
The synchronization in this patch is such that the lookup path obtains
a reference under rcu lock, and the unbind control path makes sure to
wait a full RCU grace period before dropping reference via
net_devmem_dmabuf_binding_put(). net_devmem_dmabuf_binding_put() will
trigger freeing the binding if the refcount hits zero.
...
...
...
+struct net_devmem_dmabuf_binding *
+net_devmem_get_sockc_binding(struct sock *sk, struct sockcm_cookie *sockc)
+{

struct net_devmem_dmabuf_binding *binding;


int err = 0;



binding = net_devmem_lookup_dmabuf(sockc->dmabuf_id);



This lookup is from global xarray net_devmem_dmabuf_bindings.
Is there a check that the socket is sending out through the device
to which this dmabuf was bound with netlink? Should there be?
(e.g., SO_BINDTODEVICE).
Yes, I think it may be an issue if the user triggers a send from a
different netdevice, because indeed when we bind a dmabuf we bind it
to a specific netdevice.
One option is as you say to require TX sockets to be bound and to
check that we're bound to the correct netdev. I also wonder if I can
make this work without SO_BINDTODEVICE, by querying the netdev the
sock is currently trying to send out on and doing a check in the
tcp_sendmsg. I'm not sure if this is possible but I'll give it a look.
--
Thanks,
Mina

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH RFC net-next v1 5/5] net: devmem: Implement TX path