Hi all,
Thank you for your review comments. Here is an updated patch series with the requested changes.
To add a selftest for the metadata support of the tun driver, I refactored an existing "xdp_context_functional" test which already tested something similar but for the veth driver. I made the testing logic behind it more reusable so that it also works for the tun driver and possibly other drivers in the future.
The last patch ("fix file descriptor assertion in open_tuntap helper") fixes an assertion in an existing helper function that I just moved and reused. Somehow the file descriptor for /dev/net/tun turned out to be 0 when running in the BPF kernel-patches GitHub CI, so the assert condition needed adjustment: https://github.com/kernel-patches/bpf/actions/runs/13339140896
Successful pipeline: https://github.com/kernel-patches/bpf/actions/runs/13372306548
---
v2: - submit against bpf-next subtree - split commits and improved commit messages - remove redundant metasize check and add clarifying comment instead - use max() instead of ternary operator - add selftest for metadata support in the tun driver
v1: https://lore.kernel.org/all/20250130171614.1657224-1-marcus.wichelmann@hetzn...
Marcus Wichelmann (6): net: tun: enable XDP metadata support net: tun: enable transfer of XDP metadata to skb selftests/bpf: move open_tuntap to network helpers selftests/bpf: refactor xdp_context_functional test and bpf program selftests/bpf: add test for XDP metadata support in tun driver selftests/bpf: fix file descriptor assertion in open_tuntap helper
drivers/net/tun.c | 24 ++- tools/testing/selftests/bpf/network_helpers.c | 28 ++++ tools/testing/selftests/bpf/network_helpers.h | 3 + .../selftests/bpf/prog_tests/lwt_helpers.h | 29 ---- .../bpf/prog_tests/xdp_context_test_run.c | 152 +++++++++++++++--- .../selftests/bpf/progs/test_xdp_meta.c | 56 ++++--- 6 files changed, 215 insertions(+), 77 deletions(-)
Enable the support for the bpf_xdp_adjust_meta helper function for XDP buffers initialized by the tun driver. This allows to reserve a metadata area that is useful to pass any information from one XDP program to another one, for example when using tail-calls.
Whether this helper function can be used in an XDP program depends on how the xdp_buff was initialized. Most net drivers initialize the xdp_buff in a way, that allows bpf_xdp_adjust_meta to be used. In case of the tun driver, this is currently not the case.
There are two code paths in the tun driver that lead to a bpf_prog_run_xdp and where metadata support should be enabled:
1. tun_build_skb, which is called by tun_get_user and is used when writing packets from userspace into the device. In this case, the xdp_buff created in tun_build_skb has no support for bpf_xdp_adjust_meta and calls of that helper function result in ENOTSUPP.
For this code path, it's sufficient to set the meta_valid argument of the xdp_prepare_buff call. The reserved headroom is large enough already.
2. tun_xdp_one, which is called by tun_sendmsg which again is called by other drivers (e.g. vhost_net). When the TUN_MSG_PTR mode is used, another driver may pass a batch of xdp_buffs to the tun driver. In this case, that other driver is the one initializing the xdp_buff.
See commit 043d222f93ab ("tuntap: accept an array of XDP buffs through sendmsg()") for details.
For now, the vhost_net driver is the only one using TUN_MSG_PTR and it already initializes the xdp_buffs with metadata support and sufficient headroom. But the tun driver disables it again, so the xdp_set_data_meta_invalid call has to be removed.
Signed-off-by: Marcus Wichelmann marcus.wichelmann@hetzner-cloud.de --- drivers/net/tun.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 28624cca91f8..c95ab9c46bd2 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1713,7 +1713,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, u32 act;
xdp_init_buff(&xdp, buflen, &tfile->xdp_rxq); - xdp_prepare_buff(&xdp, buf, pad, len, false); + xdp_prepare_buff(&xdp, buf, pad, len, true);
act = bpf_prog_run_xdp(xdp_prog, &xdp); if (act == XDP_REDIRECT || act == XDP_TX) { @@ -2471,7 +2471,6 @@ static int tun_xdp_one(struct tun_struct *tun, }
xdp_init_buff(xdp, buflen, &tfile->xdp_rxq); - xdp_set_data_meta_invalid(xdp);
act = bpf_prog_run_xdp(xdp_prog, xdp); ret = tun_xdp_act(tun, xdp_prog, xdp, act);
On Tue, Feb 18, 2025 at 1:23 AM Marcus Wichelmann marcus.wichelmann@hetzner-cloud.de wrote:
Enable the support for the bpf_xdp_adjust_meta helper function for XDP buffers initialized by the tun driver. This allows to reserve a metadata area that is useful to pass any information from one XDP program to another one, for example when using tail-calls.
Whether this helper function can be used in an XDP program depends on how the xdp_buff was initialized. Most net drivers initialize the xdp_buff in a way, that allows bpf_xdp_adjust_meta to be used. In case of the tun driver, this is currently not the case.
There are two code paths in the tun driver that lead to a bpf_prog_run_xdp and where metadata support should be enabled:
tun_build_skb, which is called by tun_get_user and is used when writing packets from userspace into the device. In this case, the xdp_buff created in tun_build_skb has no support for bpf_xdp_adjust_meta and calls of that helper function result in ENOTSUPP.
For this code path, it's sufficient to set the meta_valid argument of the xdp_prepare_buff call. The reserved headroom is large enough already.
tun_xdp_one, which is called by tun_sendmsg which again is called by other drivers (e.g. vhost_net). When the TUN_MSG_PTR mode is used, another driver may pass a batch of xdp_buffs to the tun driver. In this case, that other driver is the one initializing the xdp_buff.
See commit 043d222f93ab ("tuntap: accept an array of XDP buffs through sendmsg()") for details.
For now, the vhost_net driver is the only one using TUN_MSG_PTR and it already initializes the xdp_buffs with metadata support and sufficient headroom. But the tun driver disables it again, so the xdp_set_data_meta_invalid call has to be removed.
Signed-off-by: Marcus Wichelmann marcus.wichelmann@hetzner-cloud.de
Acked-by: Jason Wang jasowang@redhat.com
Thanks
When the XDP metadata area was used, it is expected that the same metadata can also be accessed from TC, as can be read in the description of the bpf_xdp_adjust_meta helper function. In the tun driver, this was not yet implemented.
To make this work, the skb that is being built on XDP_PASS should know of the current size of the metadata area. This is ensured by adding calls to skb_metadata_set. For the tun_xdp_one code path, an additional check is necessary to handle the case where the externally initialized xdp_buff has no metadata support (xdp->data_meta == xdp->data + 1).
More information about this feature can be found in the commit message of commit de8f3a83b0a0 ("bpf: add meta pointer for direct access").
Signed-off-by: Marcus Wichelmann marcus.wichelmann@hetzner-cloud.de --- drivers/net/tun.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index c95ab9c46bd2..3dde6cd29a84 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1604,7 +1604,8 @@ static bool tun_can_build_skb(struct tun_struct *tun, struct tun_file *tfile,
static struct sk_buff *__tun_build_skb(struct tun_file *tfile, struct page_frag *alloc_frag, char *buf, - int buflen, int len, int pad) + int buflen, int len, int pad, + int metasize) { struct sk_buff *skb = build_skb(buf, buflen);
@@ -1613,6 +1614,8 @@ static struct sk_buff *__tun_build_skb(struct tun_file *tfile,
skb_reserve(skb, pad); skb_put(skb, len); + if (metasize) + skb_metadata_set(skb, metasize); skb_set_owner_w(skb, tfile->socket.sk);
get_page(alloc_frag->page); @@ -1672,6 +1675,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, char *buf; size_t copied; int pad = TUN_RX_PAD; + int metasize = 0; int err = 0;
rcu_read_lock(); @@ -1699,7 +1703,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, if (hdr->gso_type || !xdp_prog) { *skb_xdp = 1; return __tun_build_skb(tfile, alloc_frag, buf, buflen, len, - pad); + pad, metasize); }
*skb_xdp = 0; @@ -1734,12 +1738,18 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
pad = xdp.data - xdp.data_hard_start; len = xdp.data_end - xdp.data; + + /* It is known that the xdp_buff was prepared with metadata + * support, so no additional check is necessary. + */ + metasize = xdp.data - xdp.data_meta; } bpf_net_ctx_clear(bpf_net_ctx); rcu_read_unlock(); local_bh_enable();
- return __tun_build_skb(tfile, alloc_frag, buf, buflen, len, pad); + return __tun_build_skb(tfile, alloc_frag, buf, buflen, len, pad, + metasize);
out: bpf_net_ctx_clear(bpf_net_ctx); @@ -2456,6 +2466,7 @@ static int tun_xdp_one(struct tun_struct *tun, struct sk_buff_head *queue; u32 rxhash = 0, act; int buflen = hdr->buflen; + int metasize = 0; int ret = 0; bool skb_xdp = false; struct page *page; @@ -2510,6 +2521,10 @@ static int tun_xdp_one(struct tun_struct *tun, skb_reserve(skb, xdp->data - xdp->data_hard_start); skb_put(skb, xdp->data_end - xdp->data);
+ metasize = max(xdp->data - xdp->data_meta, 0); + if (metasize) + skb_metadata_set(skb, metasize); + if (virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) { atomic_long_inc(&tun->rx_frame_errors); kfree_skb(skb);
Marcus Wichelmann wrote:
When the XDP metadata area was used, it is expected that the same metadata can also be accessed from TC, as can be read in the description of the bpf_xdp_adjust_meta helper function. In the tun driver, this was not yet implemented.
To make this work, the skb that is being built on XDP_PASS should know of the current size of the metadata area. This is ensured by adding calls to skb_metadata_set. For the tun_xdp_one code path, an additional check is necessary to handle the case where the externally initialized xdp_buff has no metadata support (xdp->data_meta == xdp->data + 1).
More information about this feature can be found in the commit message of commit de8f3a83b0a0 ("bpf: add meta pointer for direct access").
Signed-off-by: Marcus Wichelmann marcus.wichelmann@hetzner-cloud.de
drivers/net/tun.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index c95ab9c46bd2..3dde6cd29a84 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1604,7 +1604,8 @@ static bool tun_can_build_skb(struct tun_struct *tun, struct tun_file *tfile, static struct sk_buff *__tun_build_skb(struct tun_file *tfile, struct page_frag *alloc_frag, char *buf,
int buflen, int len, int pad)
int buflen, int len, int pad,
int metasize)
{ struct sk_buff *skb = build_skb(buf, buflen); @@ -1613,6 +1614,8 @@ static struct sk_buff *__tun_build_skb(struct tun_file *tfile, skb_reserve(skb, pad); skb_put(skb, len);
- if (metasize)
skb_set_owner_w(skb, tfile->socket.sk);skb_metadata_set(skb, metasize);
get_page(alloc_frag->page); @@ -1672,6 +1675,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, char *buf; size_t copied; int pad = TUN_RX_PAD;
- int metasize = 0; int err = 0;
rcu_read_lock(); @@ -1699,7 +1703,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, if (hdr->gso_type || !xdp_prog) { *skb_xdp = 1; return __tun_build_skb(tfile, alloc_frag, buf, buflen, len,
pad);
}pad, metasize);
*skb_xdp = 0; @@ -1734,12 +1738,18 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, pad = xdp.data - xdp.data_hard_start; len = xdp.data_end - xdp.data;
/* It is known that the xdp_buff was prepared with metadata
* support, so no additional check is necessary.
*/
} bpf_net_ctx_clear(bpf_net_ctx); rcu_read_unlock(); local_bh_enable();metasize = xdp.data - xdp.data_meta;
- return __tun_build_skb(tfile, alloc_frag, buf, buflen, len, pad);
- return __tun_build_skb(tfile, alloc_frag, buf, buflen, len, pad,
metasize);
out: bpf_net_ctx_clear(bpf_net_ctx); @@ -2456,6 +2466,7 @@ static int tun_xdp_one(struct tun_struct *tun, struct sk_buff_head *queue; u32 rxhash = 0, act; int buflen = hdr->buflen;
- int metasize = 0; int ret = 0; bool skb_xdp = false; struct page *page;
@@ -2510,6 +2521,10 @@ static int tun_xdp_one(struct tun_struct *tun, skb_reserve(skb, xdp->data - xdp->data_hard_start); skb_put(skb, xdp->data_end - xdp->data);
- metasize = max(xdp->data - xdp->data_meta, 0);
Is it ever possible for xdp->data_meta to be greater than xdp->data?
This is pointer arithmetic, which is a bit complex wrt type. This is likely ptrdiff_t, which is signed. But may want to use max_t(int, to make this more explicit.
- if (metasize)
skb_metadata_set(skb, metasize);
Technically not needed as skb_metadata_clear is just skb_metadata_set(skb, 0). But fine to test and elide.
if (virtio_net_hdr_to_skb(skb, gso, tun_is_lttle_endian(tun))) { atomic_long_inc(&tun->rx_frame_errors); kfree_skb(skb); -- 2.43.0
Marcus Wichelmann wrote:
When the XDP metadata area was used, it is expected that the same metadata can also be accessed from TC, as can be read in the description of the bpf_xdp_adjust_meta helper function. In the tun driver, this was not yet implemented.
To make this work, the skb that is being built on XDP_PASS should know of the current size of the metadata area. This is ensured by adding calls to skb_metadata_set. For the tun_xdp_one code path, an additional check is necessary to handle the case where the externally initialized xdp_buff has no metadata support (xdp->data_meta == xdp->data + 1).
More information about this feature can be found in the commit message of commit de8f3a83b0a0 ("bpf: add meta pointer for direct access").
Signed-off-by: Marcus Wichelmann marcus.wichelmann@hetzner-cloud.de
drivers/net/tun.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index c95ab9c46bd2..3dde6cd29a84 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1604,7 +1604,8 @@ static bool tun_can_build_skb(struct tun_struct *tun, struct tun_file *tfile, static struct sk_buff *__tun_build_skb(struct tun_file *tfile, struct page_frag *alloc_frag, char *buf,
int buflen, int len, int pad)
int buflen, int len, int pad,
int metasize)
{ struct sk_buff *skb = build_skb(buf, buflen); @@ -1613,6 +1614,8 @@ static struct sk_buff *__tun_build_skb(struct tun_file *tfile, skb_reserve(skb, pad); skb_put(skb, len);
- if (metasize)
skb_set_owner_w(skb, tfile->socket.sk);skb_metadata_set(skb, metasize);
get_page(alloc_frag->page); @@ -1672,6 +1675,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, char *buf; size_t copied; int pad = TUN_RX_PAD;
- int metasize = 0; int err = 0;
rcu_read_lock(); @@ -1699,7 +1703,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, if (hdr->gso_type || !xdp_prog) { *skb_xdp = 1; return __tun_build_skb(tfile, alloc_frag, buf, buflen, len,
pad);
}pad, metasize);
*skb_xdp = 0; @@ -1734,12 +1738,18 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, pad = xdp.data - xdp.data_hard_start; len = xdp.data_end - xdp.data;
/* It is known that the xdp_buff was prepared with metadata
* support, so no additional check is necessary.
*/
} bpf_net_ctx_clear(bpf_net_ctx); rcu_read_unlock(); local_bh_enable();metasize = xdp.data - xdp.data_meta;
- return __tun_build_skb(tfile, alloc_frag, buf, buflen, len, pad);
- return __tun_build_skb(tfile, alloc_frag, buf, buflen, len, pad,
metasize);
out: bpf_net_ctx_clear(bpf_net_ctx); @@ -2456,6 +2466,7 @@ static int tun_xdp_one(struct tun_struct *tun, struct sk_buff_head *queue; u32 rxhash = 0, act; int buflen = hdr->buflen;
- int metasize = 0; int ret = 0; bool skb_xdp = false; struct page *page;
@@ -2510,6 +2521,10 @@ static int tun_xdp_one(struct tun_struct *tun, skb_reserve(skb, xdp->data - xdp->data_hard_start); skb_put(skb, xdp->data_end - xdp->data);
- metasize = max(xdp->data - xdp->data_meta, 0);
Can xdp->data_meta ever be greater than xdp->data?
This is pointer comparison, which is tricky wrt type. It likely is ptrdiff_t and thus signed. But may want to use max_t(long int, ..) to make this explicit.
- if (metasize)
skb_metadata_set(skb, metasize);
Not strictly needed. As skb_metadata_clear is just skb_metadata_set(skb, 0). But also not wrong, so fine to keep.
if (virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) { atomic_long_inc(&tun->rx_frame_errors); kfree_skb(skb); -- 2.43.0
Am 18.02.25 um 02:47 schrieb Willem de Bruijn:
Marcus Wichelmann wrote:
[...]
- metasize = max(xdp->data - xdp->data_meta, 0);
Can xdp->data_meta ever be greater than xdp->data?
When an xdp_buff has no metadata support, then this is marked by setting xdp->data_meta to xdp->data + 1. See xdp_prepare_buff or xdp_set_data_meta_invalid.
In the case of tun_xdp_one, the xdp_buff is externally created by another driver and passed to the tun driver using sendmsg and TUN_MSG_PTR. For now, the vhost_net driver is the only driver doing that, and xdp->data_meta is set to xdp->data there, marking support for metadata.
So knowing that vhost_net is currently the only driver passing xdp_buffs to tun_sendmsg, the check is not strictly necessary. But other drivers may use this API as well in the future. That's why I'd like to not make the assumption that other drivers always create the xdp_buffs with metadata support, when they pass them to tun_sendmsg.
Or am I just to careful about this? What do you think?
This is pointer comparison, which is tricky wrt type. It likely is ptrdiff_t and thus signed. But may want to use max_t(long int, ..) to make this explicit.
Ah, I see, good point.
So like that?
metasize = max_t(long int, xdp->data - xdp->data_meta, 0); if (metasize) skb_metadata_set(skb, metasize);
Alternatively, there is also xdp_data_meta_unsupported(xdp_buff) which could be used to make this check very explicit, but I don't see it being used in network drivers elsewhere. Not sure why.
- if (metasize)
skb_metadata_set(skb, metasize);
Not strictly needed. As skb_metadata_clear is just skb_metadata_set(skb, 0). But also not wrong, so fine to keep.
Oh, haven't seen that. I'm following a common pattern here that I've seen in many other network drivers (grep for "skb_metadata_set"):
unsigned int metasize = xdp->data - xdp->data_meta; [...] if (metasize) skb_metadata_set(skb, metasize);
Marcus
Marcus Wichelmann wrote:
Am 18.02.25 um 02:47 schrieb Willem de Bruijn:
Marcus Wichelmann wrote:
[...]
- metasize = max(xdp->data - xdp->data_meta, 0);
Can xdp->data_meta ever be greater than xdp->data?
When an xdp_buff has no metadata support, then this is marked by setting xdp->data_meta to xdp->data + 1. See xdp_prepare_buff or xdp_set_data_meta_invalid.
In the case of tun_xdp_one, the xdp_buff is externally created by another driver and passed to the tun driver using sendmsg and TUN_MSG_PTR. For now, the vhost_net driver is the only driver doing that, and xdp->data_meta is set to xdp->data there, marking support for metadata.
So knowing that vhost_net is currently the only driver passing xdp_buffs to tun_sendmsg, the check is not strictly necessary. But other drivers may use this API as well in the future. That's why I'd like to not make the assumption that other drivers always create the xdp_buffs with metadata support, when they pass them to tun_sendmsg.
Or am I just to careful about this? What do you think?
I agree.
This is pointer comparison, which is tricky wrt type. It likely is ptrdiff_t and thus signed. But may want to use max_t(long int, ..) to make this explicit.
Ah, I see, good point.
So like that?
metasize = max_t(long int, xdp->data - xdp->data_meta, 0); if (metasize) skb_metadata_set(skb, metasize);
Or just this? Also ensures the test uses signed int.
int metasize;
...
metasize = xdp->data - xdp->data_meta; if (metasize > 0) skb_metadata_set(skb, metasize);
Alternatively, there is also xdp_data_meta_unsupported(xdp_buff) which could be used to make this check very explicit, but I don't see it being used in network drivers elsewhere. Not sure why.
- if (metasize)
skb_metadata_set(skb, metasize);
Not strictly needed. As skb_metadata_clear is just skb_metadata_set(skb, 0). But also not wrong, so fine to keep.
Oh, haven't seen that. I'm following a common pattern here that I've seen in many other network drivers (grep for "skb_metadata_set"):
unsigned int metasize = xdp->data - xdp->data_meta; [...] if (metasize) skb_metadata_set(skb, metasize);
Thanks for that context. Sounds good.
Am 19.02.25 um 16:06 schrieb Willem de Bruijn:
Marcus Wichelmann wrote:
Am 18.02.25 um 02:47 schrieb Willem de Bruijn:
[...] This is pointer comparison, which is tricky wrt type. It likely is ptrdiff_t and thus signed. But may want to use max_t(long int, ..) to make this explicit.
Ah, I see, good point.
So like that?
metasize = max_t(long int, xdp->data - xdp->data_meta, 0); if (metasize) skb_metadata_set(skb, metasize);
Or just this? Also ensures the test uses signed int.
int metasize; ... metasize = xdp->data - xdp->data_meta; if (metasize > 0) skb_metadata_set(skb, metasize);
Well, yeah, just keep it simple I guess. ;) Will do that.
I'll send a V3 patch series with the change.
Thanks!
Marcus
To test the XDP metadata functionality of the tun driver, it's necessary to create a new tap device first. A helper function for this already exists in lwt_helpers.h. Move it to the common network helpers header, so it can be reused in other tests.
Signed-off-by: Marcus Wichelmann marcus.wichelmann@hetzner-cloud.de --- tools/testing/selftests/bpf/network_helpers.c | 28 ++++++++++++++++++ tools/testing/selftests/bpf/network_helpers.h | 3 ++ .../selftests/bpf/prog_tests/lwt_helpers.h | 29 ------------------- 3 files changed, 31 insertions(+), 29 deletions(-)
diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c index 737a952dcf80..e1cfa1b37754 100644 --- a/tools/testing/selftests/bpf/network_helpers.c +++ b/tools/testing/selftests/bpf/network_helpers.c @@ -565,6 +565,34 @@ void close_netns(struct nstoken *token) free(token); }
+int open_tuntap(const char *dev_name, bool need_mac) +{ + int err = 0; + struct ifreq ifr; + int fd = open("/dev/net/tun", O_RDWR); + + if (!ASSERT_GT(fd, 0, "open(/dev/net/tun)")) + return -1; + + ifr.ifr_flags = IFF_NO_PI | (need_mac ? IFF_TAP : IFF_TUN); + strncpy(ifr.ifr_name, dev_name, IFNAMSIZ - 1); + ifr.ifr_name[IFNAMSIZ - 1] = '\0'; + + err = ioctl(fd, TUNSETIFF, &ifr); + if (!ASSERT_OK(err, "ioctl(TUNSETIFF)")) { + close(fd); + return -1; + } + + err = fcntl(fd, F_SETFL, O_NONBLOCK); + if (!ASSERT_OK(err, "fcntl(O_NONBLOCK)")) { + close(fd); + return -1; + } + + return fd; +} + int get_socket_local_port(int sock_fd) { struct sockaddr_storage addr; diff --git a/tools/testing/selftests/bpf/network_helpers.h b/tools/testing/selftests/bpf/network_helpers.h index 9f6e05d886c5..99d1417c1d8b 100644 --- a/tools/testing/selftests/bpf/network_helpers.h +++ b/tools/testing/selftests/bpf/network_helpers.h @@ -8,6 +8,7 @@ typedef __u16 __sum16; #include <linux/if_ether.h> #include <linux/if_packet.h> +#include <linux/if_tun.h> #include <linux/ip.h> #include <linux/ipv6.h> #include <linux/ethtool.h> @@ -98,6 +99,8 @@ int send_recv_data(int lfd, int fd, uint32_t total_bytes); int make_netns(const char *name); int remove_netns(const char *name);
+int open_tuntap(const char *dev_name, bool need_mac); + /** * append_tid() - Append thread ID to the given string. * diff --git a/tools/testing/selftests/bpf/prog_tests/lwt_helpers.h b/tools/testing/selftests/bpf/prog_tests/lwt_helpers.h index fb1eb8c67361..ccec0fcdabc1 100644 --- a/tools/testing/selftests/bpf/prog_tests/lwt_helpers.h +++ b/tools/testing/selftests/bpf/prog_tests/lwt_helpers.h @@ -5,7 +5,6 @@
#include <time.h> #include <net/if.h> -#include <linux/if_tun.h> #include <linux/icmp.h>
#include "test_progs.h" @@ -37,34 +36,6 @@ static inline int netns_delete(void) return system("ip netns del " NETNS ">/dev/null 2>&1"); }
-static int open_tuntap(const char *dev_name, bool need_mac) -{ - int err = 0; - struct ifreq ifr; - int fd = open("/dev/net/tun", O_RDWR); - - if (!ASSERT_GT(fd, 0, "open(/dev/net/tun)")) - return -1; - - ifr.ifr_flags = IFF_NO_PI | (need_mac ? IFF_TAP : IFF_TUN); - strncpy(ifr.ifr_name, dev_name, IFNAMSIZ - 1); - ifr.ifr_name[IFNAMSIZ - 1] = '\0'; - - err = ioctl(fd, TUNSETIFF, &ifr); - if (!ASSERT_OK(err, "ioctl(TUNSETIFF)")) { - close(fd); - return -1; - } - - err = fcntl(fd, F_SETFL, O_NONBLOCK); - if (!ASSERT_OK(err, "fcntl(O_NONBLOCK)")) { - close(fd); - return -1; - } - - return fd; -} - #define ICMP_PAYLOAD_SIZE 100
/* Match an ICMP packet with payload len ICMP_PAYLOAD_SIZE */
Marcus Wichelmann wrote:
To test the XDP metadata functionality of the tun driver, it's necessary to create a new tap device first. A helper function for this already exists in lwt_helpers.h. Move it to the common network helpers header, so it can be reused in other tests.
Signed-off-by: Marcus Wichelmann marcus.wichelmann@hetzner-cloud.de
Reviewed-by: Willem de Bruijn willemb@google.com
Both tests that include lwt_helpers.h also include network_helpers.h, so the linking with network_helpers.o is already addressed. This was not entirely obvious from just reading the code.
The existing XDP metadata test works by creating a veth pair and attaching XDP & TC programs that drop the packet when the condition of the test isn't fulfilled. The test then pings through the veth pair and succeeds when the ping comes through.
While this test works great for a veth pair, it is hard to replicate for tap devices to test the XDP metadata support of them. A similar test for the tun driver would either involve logic to reply to the ping request, or would have to capture the packet to check if it was dropped or not.
To make the testing of other drivers easier while still maximizing code reuse, this commit refactors the existing xdp_context_functional test to use a test_result map. Instead of conditionally passing or dropping the packet, the TC program is changed to copy the received metadata into the value of that single-entry array map. Tests can then verify that the map value matches the expectation.
This testing logic is easy to adapt to other network drivers as the only remaining requirement is that there is some way to send a custom Ethernet packet through it that triggers the XDP & TC programs.
The payload of the Ethernet packet is used as the test data that is expected to be passed as metadata from the XDP to the TC program and written to the map. It has a fixed size of 32 bytes which is a reasonalbe size that should be supported by both drivers. Additional packet headers are not necessary for the test and were therefore skipped to keep the testing code short.
Signed-off-by: Marcus Wichelmann marcus.wichelmann@hetzner-cloud.de --- .../bpf/prog_tests/xdp_context_test_run.c | 93 +++++++++++++++---- .../selftests/bpf/progs/test_xdp_meta.c | 56 ++++++----- 2 files changed, 106 insertions(+), 43 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c b/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c index 937da9b7532a..0cd2a0d8ae60 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c @@ -4,13 +4,19 @@ #include "test_xdp_context_test_run.skel.h" #include "test_xdp_meta.skel.h"
-#define TX_ADDR "10.0.0.1" -#define RX_ADDR "10.0.0.2" #define RX_NAME "veth0" #define TX_NAME "veth1" #define TX_NETNS "xdp_context_tx" #define RX_NETNS "xdp_context_rx"
+#define TEST_PAYLOAD_LEN 32 +static const __u8 test_payload[TEST_PAYLOAD_LEN] = { + 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, + 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, + 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28, + 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, +}; + void test_xdp_context_error(int prog_fd, struct bpf_test_run_opts opts, __u32 data_meta, __u32 data, __u32 data_end, __u32 ingress_ifindex, __u32 rx_queue_index, @@ -112,15 +118,66 @@ void test_xdp_context_test_run(void) test_xdp_context_test_run__destroy(skel); }
-void test_xdp_context_functional(void) +int send_test_packet(int ifindex) +{ + int n, sock = -1; + __u8 packet[sizeof(struct ethhdr) + TEST_PAYLOAD_LEN]; + + /* The ethernet header is not relevant for this test and doesn't need to + * be meaningful. + */ + struct ethhdr eth = { 0 }; + + memcpy(packet, ð, sizeof(eth)); + memcpy(packet + sizeof(eth), test_payload, TEST_PAYLOAD_LEN); + + sock = socket(AF_PACKET, SOCK_RAW, IPPROTO_RAW); + if (!ASSERT_GE(sock, 0, "socket")) + goto err; + + struct sockaddr_ll saddr = { + .sll_family = PF_PACKET, + .sll_ifindex = ifindex, + .sll_halen = ETH_ALEN + }; + n = sendto(sock, packet, sizeof(packet), 0, (struct sockaddr *)&saddr, + sizeof(saddr)); + if (!ASSERT_EQ(n, sizeof(packet), "sendto")) + goto err; + + close(sock); + return 0; + +err: + if (sock >= 0) + close(sock); + return -1; +} + +void assert_test_result(struct test_xdp_meta *skel) +{ + int err; + __u32 map_key = 0; + __u8 map_value[TEST_PAYLOAD_LEN]; + + err = bpf_map__lookup_elem(skel->maps.test_result, &map_key, + sizeof(map_key), &map_value, + TEST_PAYLOAD_LEN, BPF_ANY); + if (!ASSERT_OK(err, "lookup test_result")) + return; + + ASSERT_MEMEQ(&map_value, &test_payload, TEST_PAYLOAD_LEN, + "test_result map contains test payload"); +} + +void test_xdp_context_veth(void) { LIBBPF_OPTS(bpf_tc_hook, tc_hook, .attach_point = BPF_TC_INGRESS); LIBBPF_OPTS(bpf_tc_opts, tc_opts, .handle = 1, .priority = 1); struct netns_obj *rx_ns = NULL, *tx_ns = NULL; - struct bpf_program *tc_prog, *xdp_prog; struct test_xdp_meta *skel = NULL; struct nstoken *nstoken = NULL; - int rx_ifindex; + int rx_ifindex, tx_ifindex; int ret;
tx_ns = netns_new(TX_NETNS, false); @@ -138,7 +195,6 @@ void test_xdp_context_functional(void) if (!ASSERT_OK_PTR(nstoken, "setns rx_ns")) goto close;
- SYS(close, "ip addr add " RX_ADDR "/24 dev " RX_NAME); SYS(close, "ip link set dev " RX_NAME " up");
skel = test_xdp_meta__open_and_load(); @@ -154,21 +210,12 @@ void test_xdp_context_functional(void) if (!ASSERT_OK(ret, "bpf_tc_hook_create")) goto close;
- tc_prog = bpf_object__find_program_by_name(skel->obj, "ing_cls"); - if (!ASSERT_OK_PTR(tc_prog, "open ing_cls prog")) - goto close; - - tc_opts.prog_fd = bpf_program__fd(tc_prog); + tc_opts.prog_fd = bpf_program__fd(skel->progs.ing_cls); ret = bpf_tc_attach(&tc_hook, &tc_opts); if (!ASSERT_OK(ret, "bpf_tc_attach")) goto close;
- xdp_prog = bpf_object__find_program_by_name(skel->obj, "ing_xdp"); - if (!ASSERT_OK_PTR(xdp_prog, "open ing_xdp prog")) - goto close; - - ret = bpf_xdp_attach(rx_ifindex, - bpf_program__fd(xdp_prog), + ret = bpf_xdp_attach(rx_ifindex, bpf_program__fd(skel->progs.ing_xdp), 0, NULL); if (!ASSERT_GE(ret, 0, "bpf_xdp_attach")) goto close; @@ -179,9 +226,17 @@ void test_xdp_context_functional(void) if (!ASSERT_OK_PTR(nstoken, "setns tx_ns")) goto close;
- SYS(close, "ip addr add " TX_ADDR "/24 dev " TX_NAME); SYS(close, "ip link set dev " TX_NAME " up"); - ASSERT_OK(SYS_NOFAIL("ping -c 1 " RX_ADDR), "ping"); + + tx_ifindex = if_nametoindex(TX_NAME); + if (!ASSERT_GE(tx_ifindex, 0, "if_nametoindex tx")) + goto close; + + ret = send_test_packet(tx_ifindex); + if (!ASSERT_OK(ret, "send_test_packet")) + goto close; + + assert_test_result(skel);
close: close_netns(nstoken); diff --git a/tools/testing/selftests/bpf/progs/test_xdp_meta.c b/tools/testing/selftests/bpf/progs/test_xdp_meta.c index fe2d71ae0e71..56a83307b483 100644 --- a/tools/testing/selftests/bpf/progs/test_xdp_meta.c +++ b/tools/testing/selftests/bpf/progs/test_xdp_meta.c @@ -4,49 +4,57 @@
#include <bpf/bpf_helpers.h>
-#define __round_mask(x, y) ((__typeof__(x))((y) - 1)) -#define round_up(x, y) ((((x) - 1) | __round_mask(x, y)) + 1) -#define ctx_ptr(ctx, mem) (void *)(unsigned long)ctx->mem +#define META_SIZE 32 + +/* Demonstrates how metadata can be passed from an XDP program to a TC program + * using bpf_xdp_adjust_meta. + * For the sake of testing the metadata support in drivers, the XDP program uses + * a fixed-size payload after the Ethernet header as metadata. The TC program + * copies the metadata it receives into a map so it can be checked from + * userspace. + */ + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, __u32); + __uint(value_size, META_SIZE); +} test_result SEC(".maps");
SEC("tc") int ing_cls(struct __sk_buff *ctx) { - __u8 *data, *data_meta, *data_end; - __u32 diff = 0; + void *data_meta = (void *)(unsigned long)ctx->data_meta; + void *data = (void *)(unsigned long)ctx->data;
- data_meta = ctx_ptr(ctx, data_meta); - data_end = ctx_ptr(ctx, data_end); - data = ctx_ptr(ctx, data); - - if (data + ETH_ALEN > data_end || - data_meta + round_up(ETH_ALEN, 4) > data) + if (data_meta + META_SIZE > data) return TC_ACT_SHOT;
- diff |= ((__u32 *)data_meta)[0] ^ ((__u32 *)data)[0]; - diff |= ((__u16 *)data_meta)[2] ^ ((__u16 *)data)[2]; + int key = 0; + + bpf_map_update_elem(&test_result, &key, data_meta, BPF_ANY);
- return diff ? TC_ACT_SHOT : TC_ACT_OK; + return TC_ACT_SHOT; }
SEC("xdp") int ing_xdp(struct xdp_md *ctx) { - __u8 *data, *data_meta, *data_end; - int ret; - - ret = bpf_xdp_adjust_meta(ctx, -round_up(ETH_ALEN, 4)); + int ret = bpf_xdp_adjust_meta(ctx, -META_SIZE); if (ret < 0) return XDP_DROP;
- data_meta = ctx_ptr(ctx, data_meta); - data_end = ctx_ptr(ctx, data_end); - data = ctx_ptr(ctx, data); + void *data_meta = (void *)(unsigned long)ctx->data_meta; + void *data = (void *)(unsigned long)ctx->data; + void *data_end = (void *)(unsigned long)ctx->data_end;
- if (data + ETH_ALEN > data_end || - data_meta + round_up(ETH_ALEN, 4) > data) + void *payload = data + sizeof(struct ethhdr); + + if (data_meta + META_SIZE > data || payload + META_SIZE > data_end) return XDP_DROP;
- __builtin_memcpy(data_meta, data, ETH_ALEN); + __builtin_memcpy(data_meta, payload, META_SIZE); + return XDP_PASS; }
Add a selftest that creates a tap device, attaches XDP and TC programs, writes a packet with a test payload into the tap device and checks the test result. This test ensures that the XDP metadata support in the tun driver is enabled and that the metadata size is correctly passed to the skb.
See the previous commit ("selftests/bpf: refactor xdp_context_functional test and bpf program") for details about the test design.
Signed-off-by: Marcus Wichelmann marcus.wichelmann@hetzner-cloud.de --- .../bpf/prog_tests/xdp_context_test_run.c | 59 +++++++++++++++++++ 1 file changed, 59 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c b/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c index 0cd2a0d8ae60..5eba4a823184 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c @@ -8,6 +8,7 @@ #define TX_NAME "veth1" #define TX_NETNS "xdp_context_tx" #define RX_NETNS "xdp_context_rx" +#define TAP_NAME "tap0"
#define TEST_PAYLOAD_LEN 32 static const __u8 test_payload[TEST_PAYLOAD_LEN] = { @@ -245,3 +246,61 @@ void test_xdp_context_veth(void) netns_free(tx_ns); }
+void test_xdp_context_tuntap(void) +{ + LIBBPF_OPTS(bpf_tc_hook, tc_hook, .attach_point = BPF_TC_INGRESS); + LIBBPF_OPTS(bpf_tc_opts, tc_opts, .handle = 1, .priority = 1); + struct test_xdp_meta *skel = NULL; + __u8 packet[sizeof(struct ethhdr) + TEST_PAYLOAD_LEN]; + int tap_fd = -1; + int tap_ifindex; + int ret; + + tap_fd = open_tuntap(TAP_NAME, true); + if (!ASSERT_GE(tap_fd, 0, "open_tuntap")) + goto close; + + SYS(close, "ip link set dev " TAP_NAME " up"); + + skel = test_xdp_meta__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open and load skeleton")) + goto close; + + tap_ifindex = if_nametoindex(TAP_NAME); + if (!ASSERT_GE(tap_ifindex, 0, "if_nametoindex")) + goto close; + + tc_hook.ifindex = tap_ifindex; + ret = bpf_tc_hook_create(&tc_hook); + if (!ASSERT_OK(ret, "bpf_tc_hook_create")) + goto close; + + tc_opts.prog_fd = bpf_program__fd(skel->progs.ing_cls); + ret = bpf_tc_attach(&tc_hook, &tc_opts); + if (!ASSERT_OK(ret, "bpf_tc_attach")) + goto close; + + ret = bpf_xdp_attach(tap_ifindex, bpf_program__fd(skel->progs.ing_xdp), + 0, NULL); + if (!ASSERT_GE(ret, 0, "bpf_xdp_attach")) + goto close; + + /* The ethernet header is not relevant for this test and doesn't need to + * be meaningful. + */ + struct ethhdr eth = { 0 }; + + memcpy(packet, ð, sizeof(eth)); + memcpy(packet + sizeof(eth), test_payload, TEST_PAYLOAD_LEN); + + ret = write(tap_fd, packet, sizeof(packet)); + if (!ASSERT_EQ(ret, sizeof(packet), "write packet")) + goto close; + + assert_test_result(skel); + +close: + if (tap_fd >= 0) + close(tap_fd); + test_xdp_meta__destroy(skel); +}
The open_tuntap helper function uses open() to get a file descriptor for /dev/net/tun.
The open(2) manpage writes this about its return value:
On success, open(), openat(), and creat() return the new file descriptor (a nonnegative integer). On error, -1 is returned and errno is set to indicate the error.
This means that the fd > 0 assertion in the open_tuntap helper is incorrect and should rather check for fd >= 0.
When running the BPF selftests locally, this incorrect assertion was not an issue, but the BPF kernel-patches CI failed because of this:
open_tuntap:FAIL:open(/dev/net/tun) unexpected open(/dev/net/tun): actual 0 <= expected 0
Signed-off-by: Marcus Wichelmann marcus.wichelmann@hetzner-cloud.de --- tools/testing/selftests/bpf/network_helpers.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c index e1cfa1b37754..9b59bfd5d912 100644 --- a/tools/testing/selftests/bpf/network_helpers.c +++ b/tools/testing/selftests/bpf/network_helpers.c @@ -571,7 +571,7 @@ int open_tuntap(const char *dev_name, bool need_mac) struct ifreq ifr; int fd = open("/dev/net/tun", O_RDWR);
- if (!ASSERT_GT(fd, 0, "open(/dev/net/tun)")) + if (!ASSERT_GE(fd, 0, "open(/dev/net/tun)")) return -1;
ifr.ifr_flags = IFF_NO_PI | (need_mac ? IFF_TAP : IFF_TUN);
Marcus Wichelmann wrote:
The open_tuntap helper function uses open() to get a file descriptor for /dev/net/tun.
The open(2) manpage writes this about its return value:
On success, open(), openat(), and creat() return the new file descriptor (a nonnegative integer). On error, -1 is returned and errno is set to indicate the error.
This means that the fd > 0 assertion in the open_tuntap helper is incorrect and should rather check for fd >= 0.
When running the BPF selftests locally, this incorrect assertion was not an issue, but the BPF kernel-patches CI failed because of this:
open_tuntap:FAIL:open(/dev/net/tun) unexpected open(/dev/net/tun): actual 0 <= expected 0
Wow. What kind of environment is this that 0 is not assigned stdin.
Signed-off-by: Marcus Wichelmann marcus.wichelmann@hetzner-cloud.de
The code makes sense.
I suppose that if this condition can hit, then it can also affect existing lwt_* tests and thus should be a fix to commit 43a7c3ef8a15 ("selftests/bpf: Add lwt_xmit tests for BPF_REDIRECT"), sent separately to bpf (not bpf-next)?
Since it's a test and no failure was reported so far, maybe fine to just merge as part of this bpf-next series, not my call.
tools/testing/selftests/bpf/network_helpers.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c index e1cfa1b37754..9b59bfd5d912 100644 --- a/tools/testing/selftests/bpf/network_helpers.c +++ b/tools/testing/selftests/bpf/network_helpers.c @@ -571,7 +571,7 @@ int open_tuntap(const char *dev_name, bool need_mac) struct ifreq ifr; int fd = open("/dev/net/tun", O_RDWR);
- if (!ASSERT_GT(fd, 0, "open(/dev/net/tun)"))
- if (!ASSERT_GE(fd, 0, "open(/dev/net/tun)")) return -1;
ifr.ifr_flags = IFF_NO_PI | (need_mac ? IFF_TAP : IFF_TUN); -- 2.43.0
Am 18.02.25 um 02:56 schrieb Willem de Bruijn:
Marcus Wichelmann wrote:
[...] When running the BPF selftests locally, this incorrect assertion was not an issue, but the BPF kernel-patches CI failed because of this:
open_tuntap:FAIL:open(/dev/net/tun) unexpected open(/dev/net/tun): actual 0 <= expected 0
Wow. What kind of environment is this that 0 is not assigned stdin.
Signed-off-by: Marcus Wichelmann marcus.wichelmann@hetzner-cloud.de
The code makes sense.
I suppose that if this condition can hit, then it can also affect existing lwt_* tests and thus should be a fix to commit 43a7c3ef8a15 ("selftests/bpf: Add lwt_xmit tests for BPF_REDIRECT"), sent separately to bpf (not bpf-next)?
Since it's a test and no failure was reported so far, maybe fine to just merge as part of this bpf-next series, not my call.
I'm not sure why this only became an issue after I added the xdp_context_tuntap test and never before. This may have to do with the order of test execution.
If nobody speaks up, I'll leave it in this patch series for now.
Marcus
linux-kselftest-mirror@lists.linaro.org