On Thu, Aug 22, 2024 at 12:36 AM Mina Almasry almasrymina@google.com wrote:
On Wed, Aug 21, 2024 at 5:15 AM Taehee Yoo ap420073@gmail.com wrote:
On Tue, Aug 20, 2024 at 1:01 PM Mina Almasry almasrymina@google.com wrote:
On Mon, Aug 19, 2024 at 6:53 PM Jakub Kicinski kuba@kernel.org wrote:
On Mon, 19 Aug 2024 00:44:27 +0900 Taehee Yoo wrote:
@@ -9537,6 +9540,10 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack NL_SET_ERR_MSG(extack, "Native and generic XDP can't be active at the same time"); return -EEXIST; }
if (dev_get_max_mp_channel(dev) != -1) {
NL_SET_ERR_MSG(extack, "XDP can't be installed on a netdev using memory providers");
return -EINVAL;
}
Should we consider virtual interfaces like bonding, bridge, etc? Virtual interfaces as an upper interface of physical interfaces can still install XDP prog.
# ip link add bond0 type bond # ip link set eth0 master bond0 # ip link set bond0 xdp pin /sys/fs/bpf/x/y and # ip link set bond0 xdpgeneric pin /sys/fs/bpf/x/y
All virtual interfaces can install generic XDP prog. The bonding interface can install native XDP prog.
Good point. We may need some common helpers to place the checks for XDP. They are spread all over the place now.
Took a bit of a look here. Forgive me, I'm not that familiar with XDP and virtual interfaces, so I'm a bit unsure what to do here.
For veth, it seems, the device behind the veth is stored in veth_priv->peer, so it seems maybe a dev_get_max_mp_channel() check on veth_priv->peer is the way to go to disable this for veth? I think we need to do this check on creation of the veth and on the ndo_bpf of veth.
For bonding, it seems we need to add mp channel check in bond_xdp_set, and bond_enslave?
There are a few other drivers that define ndo_add_slave, seems a check in br_add_slave is needed as well.
This seems like a potentially deep rabbit hole with a few checks to add all of the place. Is this blocking the series? AFAICT if XDP fails with mp-bound queues with a benign error, that seems fine to me; I don't have a use case for memory providers + xdp yet. This should only be blocking if someone can repro a very serious error (kernel crash) or something with this combination.
I can try to add these checks locally and propose as a follow up series. Let me know if I'm on the right track with figuring out how to implement this, and, if you feel like it's blocking.
-- Thanks, Mina
I agree with the current approach, which uses the dev_get_min_mp_channel_count() in the dev_xdp_attach(). The only problem that I am concerned about is the dev_get_min_mp_channel_count() can't check lower interfaces. So, how about just making the current code to be able to check lower interfaces?
Thank you for the code snippet! It's very useful! I have been wondering how to walk lower/upper devices!
To be honest, I think maybe Jakub's suggestion to refactor all the ->ndo_bpf calls needs to happen anyway. The reason is that there are ->ndo_bpf calls in the core net stack, like net/xdp/xsk_buff_pool.c and kernel/bpf/offload.c. AFAICT we need to add checks in these places as well, so refactoring them into one place is nice?
Note I sent the refactor for review. Sorry, I forgot to CC Taehee: https://patchwork.kernel.org/project/netdevbpf/patch/20240821045629.2856641-...
I agree that it requires refactoring. The dev_xdp_propagate() will be useful.
Additionally I'm wondering if we should disable adding mp-bound devices as slaves completely, regardless of xdp. My concern is that if the lower device is using unreadable memory, then the upper device may see unreadable memory in its code paths, and will not be expecting that, so it may break. From the look at the code, it looks like net/batman-adv calls ndo_add_slave, and a bunch of code that touches skb_frags:
$ ackc -i ndo_add_slave soft-interface.c 889: .ndo_add_slave = batadv_softif_slave_add,
$ ackc -i skb_frag fragmentation.c 403: struct sk_buff *skb_fragment; 407: skb_fragment = dev_alloc_skb(ll_reserved + mtu + tailroom); 408: if (!skb_fragment) 411: skb_fragment->priority = skb->priority; 414: skb_reserve(skb_fragment, ll_reserved + header_size); 415: skb_split(skb, skb_fragment, skb->len - fragment_size); 418: skb_push(skb_fragment, header_size); 419: memcpy(skb_fragment->data, frag_head, header_size); 422: return skb_fragment; 441: struct sk_buff *skb_fragment; 513: skb_fragment = batadv_frag_create(net_dev, skb, &frag_header, 515: if (!skb_fragment) { 522: skb_fragment->len + ETH_HLEN); 523: ret = batadv_send_unicast_skb(skb_fragment, neigh_node);
If we disable ndo_add_slave on mp devices, then we don't need to walk lower or upper devices. What do you think? If we don't disable mp lower devices entirely, then yes, we can make dev_get_min_mp_channel_count() do a recursive check.
Note that we can add support for mp bound devices as slaves in the future if we have a use case for it, and it's well tested to be safe with selftests added.
If we disable adding mp devices as lower devices, then during the mp binding we should also check if the device has upper devices.
I truly agree with this idea! Almost all virtual interfaces as an upper interface of mp_bound devices especially tunneling interfaces will not work. As you already know there are several reasons. 1. HDS wouldn't work due to tunneling header. 2. RSS wouldn't work due to tunneling header. So, I agree that we disable setting up virtual interfaces as an upper interface of mp_bound devices. Then as you said, we can allow only confirmed interface types in the future.
The IPsec is also not working with mp_bound devices due to the same reason. It would be a more complex issue, unfortunately, I don't know how to deal with it.