This patch set improves the documentation and selftests for XDP Rx metadata handling. The first patch clarifies the documentation around XDP metadata layout and the use of bpf_xdp_adjust_meta. The second patch enhances the BPF selftests to make XDP metadata handling more robust and portable across different NICs.
Prior to this patch set, the user application retrieved the xdp_meta by calculating backward from the data pointer, while the XDP program fill in the xdp_meta by calculating backward from data_meta. This approach will cause mismatch if there is device-reserved metadata.
|<---sizeof(xdp_meta)--| | | struct xdp_meta rx_desc->address ^ ^ | | +----------+----------------------+------------+------+ | headroom | custom metadata | reserved | data | +----------+----------------------+------------+------+ ^ ^ ^ | | | struct xdp_meta xdp_buff->data_meta xdp_buff->data | | |<---sizeof(xdp_meta)--|
Song Yoong Siang (2): doc: clarify XDP Rx metadata layout and bpf_xdp_adjust_meta usage selftests/bpf: Enhance XDP Rx Metadata Handling
Documentation/networking/xdp-rx-metadata.rst | 38 +++++++++++++++++++ .../selftests/bpf/prog_tests/xdp_metadata.c | 2 +- .../selftests/bpf/progs/xdp_hw_metadata.c | 10 ++++- .../selftests/bpf/progs/xdp_metadata.c | 8 +++- tools/testing/selftests/bpf/xdp_hw_metadata.c | 2 +- tools/testing/selftests/bpf/xdp_metadata.h | 7 ++++ 6 files changed, 63 insertions(+), 4 deletions(-)
Expand the explanation of how METADATA_SIZE should be chosen to accommodate both device-reserved and custom metadata. Additionally, add a diagram to illustrate the calculation of the delta parameter for bpf_xdp_adjust_meta, including alignment and size constraints.
These changes help users correctly allocate and access metadata in AF_XDP use cases.
Signed-off-by: Song Yoong Siang yoong.siang.song@intel.com --- Documentation/networking/xdp-rx-metadata.rst | 38 ++++++++++++++++++++ 1 file changed, 38 insertions(+)
diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst index a6e0ece18be5..61418f533e0e 100644 --- a/Documentation/networking/xdp-rx-metadata.rst +++ b/Documentation/networking/xdp-rx-metadata.rst @@ -54,6 +54,19 @@ area in whichever format it chooses. Later consumers of the metadata will have to agree on the format by some out of band contract (like for the AF_XDP use case, see below).
+It is important to note that some devices may utilize the ``data_meta`` area for +their own purposes. For example, the IGC device utilizes ``IGC_TS_HDR_LEN`` +bytes of the ``data_meta`` area for receiving hardware timestamps. Therefore, +the XDP program should ensure that it does not overwrite any existing metadata. +The metadata layout of such device is depicted below:: + + +----------+-----------------+--------------------------+------+ + | headroom | custom metadata | device-reserved metadata | data | + +----------+-----------------+--------------------------+------+ + ^ ^ + | | + xdp_buff->data_meta xdp_buff->data + AF_XDP ======
@@ -76,6 +89,31 @@ Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer):: | rx_desc->address
+It is crucial that the agreed ``METADATA_SIZE`` between the BPF program and the +final consumer is sufficient to accommodate both device-reserved metadata and +the data the BPF program needs to populate. When calling +``bpf_xdp_adjust_meta``, the input parameter ``delta`` should be calculated as +``METADATA_SIZE - (xdp_buff->data - xdp_buff->data_meta)``. + +The diagram below provides a visual representation of the calculation of +``delta`` and the overall metadata layout:: + + |<-------------------METADATA_SIZE------------------->| + +----------+--------------------------+--------------------------+------+ + | headroom | custom metadata | device-reserved metadata | data | + +----------+--------------------------+--------------------------+------+ + ^ ^ ^ + | | | + new xdp_buff->data_meta old xdp_buff->data_meta xdp_buff->data + | | + |<----------delta--------->| + +``bpf_xdp_adjust_meta`` ensures that ``METADATA_SIZE`` is aligned to 4 bytes, +does not exceed 252 bytes, and leaves sufficient space for building the +xdp_frame. If these conditions are not met, it returns a negative error. In this +case, the BPF program should not proceed to populate data into the ``data_meta`` +area. + XDP_PASS ========
Introduce the XDP_METADATA_SIZE macro to ensure that user applications can consistently retrieve the correct location of struct xdp_meta.
Prior to this commit, the XDP program adjusted the data_meta backward by the size of struct xdp_meta, while the user application retrieved the data by calculating backward from the data pointer. This approach only worked if xdp_buff->data_meta was equal to xdp_buff->data before calling bpf_xdp_adjust_meta.
With the introduction of XDP_METADATA_SIZE, both the XDP program and user application now calculate and identify the location of struct xdp_meta from the data pointer. This ensures the implementation remains functional even when there is device-reserved metadata, making the tests more portable across different NICs.
Signed-off-by: Song Yoong Siang yoong.siang.song@intel.com --- tools/testing/selftests/bpf/prog_tests/xdp_metadata.c | 2 +- tools/testing/selftests/bpf/progs/xdp_hw_metadata.c | 10 +++++++++- tools/testing/selftests/bpf/progs/xdp_metadata.c | 8 +++++++- tools/testing/selftests/bpf/xdp_hw_metadata.c | 2 +- tools/testing/selftests/bpf/xdp_metadata.h | 7 +++++++ 5 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c index 19f92affc2da..8d6c2633698b 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c @@ -302,7 +302,7 @@ static int verify_xsk_metadata(struct xsk *xsk, bool sent_from_af_xdp)
/* custom metadata */
- meta = data - sizeof(struct xdp_meta); + meta = data - XDP_METADATA_SIZE;
if (!ASSERT_NEQ(meta->rx_timestamp, 0, "rx_timestamp")) return -1; diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c index 330ece2eabdb..72242ac1cdcd 100644 --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c @@ -27,6 +27,7 @@ extern int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx, SEC("xdp.frags") int rx(struct xdp_md *ctx) { + int metalen_used, metalen_to_adjust; void *data, *data_meta, *data_end; struct ipv6hdr *ip6h = NULL; struct udphdr *udp = NULL; @@ -72,7 +73,14 @@ int rx(struct xdp_md *ctx) return XDP_PASS; }
- err = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta)); + metalen_used = ctx->data - ctx->data_meta; + metalen_to_adjust = XDP_METADATA_SIZE - metalen_used; + if (metalen_to_adjust < (int)sizeof(struct xdp_meta)) { + __sync_add_and_fetch(&pkts_skip, 1); + return XDP_PASS; + } + + err = bpf_xdp_adjust_meta(ctx, -metalen_to_adjust); if (err) { __sync_add_and_fetch(&pkts_fail, 1); return XDP_PASS; diff --git a/tools/testing/selftests/bpf/progs/xdp_metadata.c b/tools/testing/selftests/bpf/progs/xdp_metadata.c index 09bb8a038d52..a0ba4ef4bbd8 100644 --- a/tools/testing/selftests/bpf/progs/xdp_metadata.c +++ b/tools/testing/selftests/bpf/progs/xdp_metadata.c @@ -37,6 +37,7 @@ extern int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx, SEC("xdp") int rx(struct xdp_md *ctx) { + int metalen_used, metalen_to_adjust; void *data, *data_meta, *data_end; struct ipv6hdr *ip6h = NULL; struct ethhdr *eth = NULL; @@ -73,7 +74,12 @@ int rx(struct xdp_md *ctx)
/* Reserve enough for all custom metadata. */
- ret = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta)); + metalen_used = ctx->data - ctx->data_meta; + metalen_to_adjust = XDP_METADATA_SIZE - metalen_used; + if (metalen_to_adjust < (int)sizeof(struct xdp_meta)) + return XDP_DROP; + + ret = bpf_xdp_adjust_meta(ctx, -metalen_to_adjust); if (ret != 0) return XDP_DROP;
diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c index 3d8de0d4c96a..a529d55d4ff4 100644 --- a/tools/testing/selftests/bpf/xdp_hw_metadata.c +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c @@ -223,7 +223,7 @@ static void verify_xdp_metadata(void *data, clockid_t clock_id) { struct xdp_meta *meta;
- meta = data - sizeof(*meta); + meta = data - XDP_METADATA_SIZE;
if (meta->hint_valid & XDP_META_FIELD_RSS) printf("rx_hash: 0x%X with RSS type:0x%X\n", diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h index 87318ad1117a..2dfd3bf5e7bb 100644 --- a/tools/testing/selftests/bpf/xdp_metadata.h +++ b/tools/testing/selftests/bpf/xdp_metadata.h @@ -50,3 +50,10 @@ struct xdp_meta { }; enum xdp_meta_field hint_valid; }; + +/* XDP_METADATA_SIZE must be at least the size of struct xdp_meta. An additional + * 32 bytes of padding is included as a conservative measure to accommodate any + * metadata areas reserved by Ethernet devices. If the device-reserved metadata + * exceeds 32 bytes, this value will need adjustment. + */ +#define XDP_METADATA_SIZE (sizeof(struct xdp_meta) + 32)
On 07/01, Song Yoong Siang wrote:
Introduce the XDP_METADATA_SIZE macro to ensure that user applications can consistently retrieve the correct location of struct xdp_meta.
Prior to this commit, the XDP program adjusted the data_meta backward by the size of struct xdp_meta, while the user application retrieved the data by calculating backward from the data pointer. This approach only worked if xdp_buff->data_meta was equal to xdp_buff->data before calling bpf_xdp_adjust_meta.
With the introduction of XDP_METADATA_SIZE, both the XDP program and user application now calculate and identify the location of struct xdp_meta from the data pointer. This ensures the implementation remains functional even when there is device-reserved metadata, making the tests more portable across different NICs.
Signed-off-by: Song Yoong Siang yoong.siang.song@intel.com
tools/testing/selftests/bpf/prog_tests/xdp_metadata.c | 2 +- tools/testing/selftests/bpf/progs/xdp_hw_metadata.c | 10 +++++++++- tools/testing/selftests/bpf/progs/xdp_metadata.c | 8 +++++++- tools/testing/selftests/bpf/xdp_hw_metadata.c | 2 +- tools/testing/selftests/bpf/xdp_metadata.h | 7 +++++++ 5 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c index 19f92affc2da..8d6c2633698b 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c @@ -302,7 +302,7 @@ static int verify_xsk_metadata(struct xsk *xsk, bool sent_from_af_xdp) /* custom metadata */
- meta = data - sizeof(struct xdp_meta);
- meta = data - XDP_METADATA_SIZE;
if (!ASSERT_NEQ(meta->rx_timestamp, 0, "rx_timestamp")) return -1; diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c index 330ece2eabdb..72242ac1cdcd 100644 --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c @@ -27,6 +27,7 @@ extern int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx, SEC("xdp.frags") int rx(struct xdp_md *ctx) {
- int metalen_used, metalen_to_adjust; void *data, *data_meta, *data_end; struct ipv6hdr *ip6h = NULL; struct udphdr *udp = NULL;
@@ -72,7 +73,14 @@ int rx(struct xdp_md *ctx) return XDP_PASS; }
- err = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta));
[..]
- metalen_used = ctx->data - ctx->data_meta;
Is the intent here to query how much metadata has been consumed/reserved by the driver? Looking at IGC it has the following code/comment:
bi->xdp->data += IGC_TS_HDR_LEN;
/* HW timestamp has been copied into local variable. Metadata * length when XDP program is called should be 0. */ bi->xdp->data_meta += IGC_TS_HDR_LEN;
Are you sure that metadata size is correctly exposed to the bpf program?
My assumptions was that we should just unconditionally do bpf_xdp_adjust_meta with -XDP_METADATA_SIZE and that should be good enough.
On Wednesday, July 2, 2025 12:31 AM, Stanislav Fomichev stfomichev@gmail.com wrote:
On 07/01, Song Yoong Siang wrote:
Introduce the XDP_METADATA_SIZE macro to ensure that user applications can consistently retrieve the correct location of struct xdp_meta.
Prior to this commit, the XDP program adjusted the data_meta backward by the size of struct xdp_meta, while the user application retrieved the data by calculating backward from the data pointer. This approach only worked if xdp_buff->data_meta was equal to xdp_buff->data before calling bpf_xdp_adjust_meta.
With the introduction of XDP_METADATA_SIZE, both the XDP program and user application now calculate and identify the location of struct xdp_meta from the data pointer. This ensures the implementation remains functional even when there is device-reserved metadata, making the tests more portable across different NICs.
Signed-off-by: Song Yoong Siang yoong.siang.song@intel.com
tools/testing/selftests/bpf/prog_tests/xdp_metadata.c | 2 +- tools/testing/selftests/bpf/progs/xdp_hw_metadata.c | 10 +++++++++- tools/testing/selftests/bpf/progs/xdp_metadata.c | 8 +++++++- tools/testing/selftests/bpf/xdp_hw_metadata.c | 2 +- tools/testing/selftests/bpf/xdp_metadata.h | 7 +++++++ 5 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
index 19f92affc2da..8d6c2633698b 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c @@ -302,7 +302,7 @@ static int verify_xsk_metadata(struct xsk *xsk, bool
sent_from_af_xdp)
/* custom metadata */
- meta = data - sizeof(struct xdp_meta);
meta = data - XDP_METADATA_SIZE;
if (!ASSERT_NEQ(meta->rx_timestamp, 0, "rx_timestamp")) return -1;
diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
index 330ece2eabdb..72242ac1cdcd 100644 --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c @@ -27,6 +27,7 @@ extern int bpf_xdp_metadata_rx_vlan_tag(const struct
xdp_md *ctx,
SEC("xdp.frags") int rx(struct xdp_md *ctx) {
- int metalen_used, metalen_to_adjust; void *data, *data_meta, *data_end; struct ipv6hdr *ip6h = NULL; struct udphdr *udp = NULL;
@@ -72,7 +73,14 @@ int rx(struct xdp_md *ctx) return XDP_PASS; }
- err = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta));
[..]
- metalen_used = ctx->data - ctx->data_meta;
Is the intent here to query how much metadata has been consumed/reserved by the driver?
Yes.
Looking at IGC it has the following code/comment:
bi->xdp->data += IGC_TS_HDR_LEN;
/* HW timestamp has been copied into local variable. Metadata * length when XDP program is called should be 0. */ bi->xdp->data_meta += IGC_TS_HDR_LEN;
Are you sure that metadata size is correctly exposed to the bpf program?
You are right, the current igc driver didn't expose the metadata size correctly. I submitted [1] to fix it.
[1] https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20250701080955.32...
My assumptions was that we should just unconditionally do bpf_xdp_adjust_meta with -XDP_METADATA_SIZE and that should be good enough.
The checking is just for precautions. No problem if directly adjust the meta unconditionally. That will save processing time for each packet as well. I will remove the checking and submit v2.
Thanks & Regards Siang
On Wednesday, July 2, 2025 10:23 AM, Song, Yoong Siang yoong.siang.song@intel.com wrote:
On Wednesday, July 2, 2025 12:31 AM, Stanislav Fomichev stfomichev@gmail.com wrote:
On 07/01, Song Yoong Siang wrote:
Introduce the XDP_METADATA_SIZE macro to ensure that user applications can consistently retrieve the correct location of struct xdp_meta.
Prior to this commit, the XDP program adjusted the data_meta backward by the size of struct xdp_meta, while the user application retrieved the data by calculating backward from the data pointer. This approach only worked if xdp_buff->data_meta was equal to xdp_buff->data before calling bpf_xdp_adjust_meta.
With the introduction of XDP_METADATA_SIZE, both the XDP program and user application now calculate and identify the location of struct xdp_meta from the data pointer. This ensures the implementation remains functional even when there is device-reserved metadata, making the tests more portable across different NICs.
Signed-off-by: Song Yoong Siang yoong.siang.song@intel.com
tools/testing/selftests/bpf/prog_tests/xdp_metadata.c | 2 +- tools/testing/selftests/bpf/progs/xdp_hw_metadata.c | 10 +++++++++- tools/testing/selftests/bpf/progs/xdp_metadata.c | 8 +++++++- tools/testing/selftests/bpf/xdp_hw_metadata.c | 2 +- tools/testing/selftests/bpf/xdp_metadata.h | 7 +++++++ 5 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
index 19f92affc2da..8d6c2633698b 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c @@ -302,7 +302,7 @@ static int verify_xsk_metadata(struct xsk *xsk, bool
sent_from_af_xdp)
/* custom metadata */
- meta = data - sizeof(struct xdp_meta);
meta = data - XDP_METADATA_SIZE;
if (!ASSERT_NEQ(meta->rx_timestamp, 0, "rx_timestamp")) return -1;
diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
index 330ece2eabdb..72242ac1cdcd 100644 --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c @@ -27,6 +27,7 @@ extern int bpf_xdp_metadata_rx_vlan_tag(const struct
xdp_md *ctx,
SEC("xdp.frags") int rx(struct xdp_md *ctx) {
- int metalen_used, metalen_to_adjust; void *data, *data_meta, *data_end; struct ipv6hdr *ip6h = NULL; struct udphdr *udp = NULL;
@@ -72,7 +73,14 @@ int rx(struct xdp_md *ctx) return XDP_PASS; }
- err = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta));
[..]
- metalen_used = ctx->data - ctx->data_meta;
Is the intent here to query how much metadata has been consumed/reserved by the driver?
Yes.
Looking at IGC it has the following code/comment:
bi->xdp->data += IGC_TS_HDR_LEN;
/* HW timestamp has been copied into local variable. Metadata * length when XDP program is called should be 0. */ bi->xdp->data_meta += IGC_TS_HDR_LEN;
Are you sure that metadata size is correctly exposed to the bpf program?
You are right, the current igc driver didn't expose the metadata size correctly. I submitted [1] to fix it.
[1] https://patchwork.ozlabs.org/project/intel-wired- lan/patch/20250701080955.3273137-1-yoong.siang.song@intel.com/
My assumptions was that we should just unconditionally do bpf_xdp_adjust_meta with -XDP_METADATA_SIZE and that should be good enough.
The checking is just for precautions. No problem if directly adjust the meta unconditionally. That will save processing time for each packet as well. I will remove the checking and submit v2.
Thanks & Regards Siang
Hi Stanislav Fomichev,
I submitted v2. But after that, I think twice. IMHO, err = bpf_xdp_adjust_meta(ctx, (int)(ctx->data - ctx->data_meta - XDP_METADATA_SIZE)); is better than err = bpf_xdp_adjust_meta(ctx, -(int)XDP_METADATA_SIZE); because it is more robust.
Any thoughts?
Thanks & Regards Siang
On 07/02, Song, Yoong Siang wrote:
On Wednesday, July 2, 2025 10:23 AM, Song, Yoong Siang yoong.siang.song@intel.com wrote:
On Wednesday, July 2, 2025 12:31 AM, Stanislav Fomichev stfomichev@gmail.com wrote:
On 07/01, Song Yoong Siang wrote:
Introduce the XDP_METADATA_SIZE macro to ensure that user applications can consistently retrieve the correct location of struct xdp_meta.
Prior to this commit, the XDP program adjusted the data_meta backward by the size of struct xdp_meta, while the user application retrieved the data by calculating backward from the data pointer. This approach only worked if xdp_buff->data_meta was equal to xdp_buff->data before calling bpf_xdp_adjust_meta.
With the introduction of XDP_METADATA_SIZE, both the XDP program and user application now calculate and identify the location of struct xdp_meta from the data pointer. This ensures the implementation remains functional even when there is device-reserved metadata, making the tests more portable across different NICs.
Signed-off-by: Song Yoong Siang yoong.siang.song@intel.com
tools/testing/selftests/bpf/prog_tests/xdp_metadata.c | 2 +- tools/testing/selftests/bpf/progs/xdp_hw_metadata.c | 10 +++++++++- tools/testing/selftests/bpf/progs/xdp_metadata.c | 8 +++++++- tools/testing/selftests/bpf/xdp_hw_metadata.c | 2 +- tools/testing/selftests/bpf/xdp_metadata.h | 7 +++++++ 5 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
index 19f92affc2da..8d6c2633698b 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c @@ -302,7 +302,7 @@ static int verify_xsk_metadata(struct xsk *xsk, bool
sent_from_af_xdp)
/* custom metadata */
- meta = data - sizeof(struct xdp_meta);
meta = data - XDP_METADATA_SIZE;
if (!ASSERT_NEQ(meta->rx_timestamp, 0, "rx_timestamp")) return -1;
diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
index 330ece2eabdb..72242ac1cdcd 100644 --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c @@ -27,6 +27,7 @@ extern int bpf_xdp_metadata_rx_vlan_tag(const struct
xdp_md *ctx,
SEC("xdp.frags") int rx(struct xdp_md *ctx) {
- int metalen_used, metalen_to_adjust; void *data, *data_meta, *data_end; struct ipv6hdr *ip6h = NULL; struct udphdr *udp = NULL;
@@ -72,7 +73,14 @@ int rx(struct xdp_md *ctx) return XDP_PASS; }
- err = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta));
[..]
- metalen_used = ctx->data - ctx->data_meta;
Is the intent here to query how much metadata has been consumed/reserved by the driver?
Yes.
Looking at IGC it has the following code/comment:
bi->xdp->data += IGC_TS_HDR_LEN;
/* HW timestamp has been copied into local variable. Metadata * length when XDP program is called should be 0. */ bi->xdp->data_meta += IGC_TS_HDR_LEN;
Are you sure that metadata size is correctly exposed to the bpf program?
You are right, the current igc driver didn't expose the metadata size correctly. I submitted [1] to fix it.
[1] https://patchwork.ozlabs.org/project/intel-wired- lan/patch/20250701080955.3273137-1-yoong.siang.song@intel.com/
My assumptions was that we should just unconditionally do bpf_xdp_adjust_meta with -XDP_METADATA_SIZE and that should be good enough.
The checking is just for precautions. No problem if directly adjust the meta unconditionally. That will save processing time for each packet as well. I will remove the checking and submit v2.
Thanks & Regards Siang
Hi Stanislav Fomichev,
I submitted v2. But after that, I think twice. IMHO, err = bpf_xdp_adjust_meta(ctx, (int)(ctx->data - ctx->data_meta - XDP_METADATA_SIZE)); is better than err = bpf_xdp_adjust_meta(ctx, -(int)XDP_METADATA_SIZE); because it is more robust.
Any thoughts?
My preference is on keeping everything as is and converting to -(int)XDP_METADATA_SIZE. Making IGC properly expose (temporary) metadata len is a user visible change, not sure we have a good justification?
On Wednesday, July 2, 2025 11:19 PM, Stanislav Fomichev stfomichev@gmail.com wrote:
On 07/02, Song, Yoong Siang wrote:
On Wednesday, July 2, 2025 10:23 AM, Song, Yoong Siang
yoong.siang.song@intel.com wrote:
On Wednesday, July 2, 2025 12:31 AM, Stanislav Fomichev
wrote:
On 07/01, Song Yoong Siang wrote:
Introduce the XDP_METADATA_SIZE macro to ensure that user applications can consistently retrieve the correct location of struct xdp_meta.
Prior to this commit, the XDP program adjusted the data_meta backward by the size of struct xdp_meta, while the user application retrieved the data by calculating backward from the data pointer. This approach only worked if xdp_buff->data_meta was equal to xdp_buff->data before calling bpf_xdp_adjust_meta.
With the introduction of XDP_METADATA_SIZE, both the XDP program and user application now calculate and identify the location of struct xdp_meta from the data pointer. This ensures the implementation remains functional even when there is device-reserved metadata, making the tests more portable across different NICs.
Signed-off-by: Song Yoong Siang yoong.siang.song@intel.com
tools/testing/selftests/bpf/prog_tests/xdp_metadata.c | 2 +- tools/testing/selftests/bpf/progs/xdp_hw_metadata.c | 10 +++++++++- tools/testing/selftests/bpf/progs/xdp_metadata.c | 8 +++++++- tools/testing/selftests/bpf/xdp_hw_metadata.c | 2 +- tools/testing/selftests/bpf/xdp_metadata.h | 7 +++++++ 5 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
index 19f92affc2da..8d6c2633698b 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c @@ -302,7 +302,7 @@ static int verify_xsk_metadata(struct xsk *xsk, bool
sent_from_af_xdp)
/* custom metadata */
- meta = data - sizeof(struct xdp_meta);
meta = data - XDP_METADATA_SIZE;
if (!ASSERT_NEQ(meta->rx_timestamp, 0, "rx_timestamp")) return -1;
diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
index 330ece2eabdb..72242ac1cdcd 100644 --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c @@ -27,6 +27,7 @@ extern int bpf_xdp_metadata_rx_vlan_tag(const struct
xdp_md *ctx,
SEC("xdp.frags") int rx(struct xdp_md *ctx) {
- int metalen_used, metalen_to_adjust; void *data, *data_meta, *data_end; struct ipv6hdr *ip6h = NULL; struct udphdr *udp = NULL;
@@ -72,7 +73,14 @@ int rx(struct xdp_md *ctx) return XDP_PASS; }
- err = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta));
[..]
- metalen_used = ctx->data - ctx->data_meta;
Is the intent here to query how much metadata has been consumed/reserved by the driver?
Yes.
Looking at IGC it has the following code/comment:
bi->xdp->data += IGC_TS_HDR_LEN;
/* HW timestamp has been copied into local variable. Metadata * length when XDP program is called should be 0. */ bi->xdp->data_meta += IGC_TS_HDR_LEN;
Are you sure that metadata size is correctly exposed to the bpf program?
You are right, the current igc driver didn't expose the metadata size correctly. I submitted [1] to fix it.
[1] https://patchwork.ozlabs.org/project/intel-wired- lan/patch/20250701080955.3273137-1-yoong.siang.song@intel.com/
My assumptions was that we should just unconditionally do
bpf_xdp_adjust_meta
with -XDP_METADATA_SIZE and that should be good enough.
The checking is just for precautions. No problem if directly adjust the meta unconditionally. That will save processing time for each packet as well. I will remove the checking and submit v2.
Thanks & Regards Siang
Hi Stanislav Fomichev,
I submitted v2. But after that, I think twice. IMHO, err = bpf_xdp_adjust_meta(ctx, (int)(ctx->data - ctx->data_meta -
XDP_METADATA_SIZE));
is better than err = bpf_xdp_adjust_meta(ctx, -(int)XDP_METADATA_SIZE); because it is more robust.
Any thoughts?
My preference is on keeping everything as is and converting to -(int)XDP_METADATA_SIZE. Making IGC properly expose (temporary) metadata len is a user visible change, not sure we have a good justification?
Thank you for your feedback. I agree that we don't have a strong justification for making the metadata length user-visible at this time. I concur with your preference to keep everything as is and proceed with -(int)XDP_METADATA_SIZE.
Btw, do you think whether my first patch which changes the documentation is still needed or not?
On 07/02, Song, Yoong Siang wrote:
On Wednesday, July 2, 2025 11:19 PM, Stanislav Fomichev stfomichev@gmail.com wrote:
On 07/02, Song, Yoong Siang wrote:
On Wednesday, July 2, 2025 10:23 AM, Song, Yoong Siang
yoong.siang.song@intel.com wrote:
On Wednesday, July 2, 2025 12:31 AM, Stanislav Fomichev
wrote:
On 07/01, Song Yoong Siang wrote:
Introduce the XDP_METADATA_SIZE macro to ensure that user applications can consistently retrieve the correct location of struct xdp_meta.
Prior to this commit, the XDP program adjusted the data_meta backward by the size of struct xdp_meta, while the user application retrieved the data by calculating backward from the data pointer. This approach only worked if xdp_buff->data_meta was equal to xdp_buff->data before calling bpf_xdp_adjust_meta.
With the introduction of XDP_METADATA_SIZE, both the XDP program and user application now calculate and identify the location of struct xdp_meta from the data pointer. This ensures the implementation remains functional even when there is device-reserved metadata, making the tests more portable across different NICs.
Signed-off-by: Song Yoong Siang yoong.siang.song@intel.com
tools/testing/selftests/bpf/prog_tests/xdp_metadata.c | 2 +- tools/testing/selftests/bpf/progs/xdp_hw_metadata.c | 10 +++++++++- tools/testing/selftests/bpf/progs/xdp_metadata.c | 8 +++++++- tools/testing/selftests/bpf/xdp_hw_metadata.c | 2 +- tools/testing/selftests/bpf/xdp_metadata.h | 7 +++++++ 5 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
index 19f92affc2da..8d6c2633698b 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c @@ -302,7 +302,7 @@ static int verify_xsk_metadata(struct xsk *xsk, bool
sent_from_af_xdp)
/* custom metadata */
- meta = data - sizeof(struct xdp_meta);
meta = data - XDP_METADATA_SIZE;
if (!ASSERT_NEQ(meta->rx_timestamp, 0, "rx_timestamp")) return -1;
diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
index 330ece2eabdb..72242ac1cdcd 100644 --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c @@ -27,6 +27,7 @@ extern int bpf_xdp_metadata_rx_vlan_tag(const struct
xdp_md *ctx,
SEC("xdp.frags") int rx(struct xdp_md *ctx) {
- int metalen_used, metalen_to_adjust; void *data, *data_meta, *data_end; struct ipv6hdr *ip6h = NULL; struct udphdr *udp = NULL;
@@ -72,7 +73,14 @@ int rx(struct xdp_md *ctx) return XDP_PASS; }
- err = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta));
[..]
- metalen_used = ctx->data - ctx->data_meta;
Is the intent here to query how much metadata has been consumed/reserved by the driver?
Yes.
Looking at IGC it has the following code/comment:
bi->xdp->data += IGC_TS_HDR_LEN;
/* HW timestamp has been copied into local variable. Metadata * length when XDP program is called should be 0. */ bi->xdp->data_meta += IGC_TS_HDR_LEN;
Are you sure that metadata size is correctly exposed to the bpf program?
You are right, the current igc driver didn't expose the metadata size correctly. I submitted [1] to fix it.
[1] https://patchwork.ozlabs.org/project/intel-wired- lan/patch/20250701080955.3273137-1-yoong.siang.song@intel.com/
My assumptions was that we should just unconditionally do
bpf_xdp_adjust_meta
with -XDP_METADATA_SIZE and that should be good enough.
The checking is just for precautions. No problem if directly adjust the meta unconditionally. That will save processing time for each packet as well. I will remove the checking and submit v2.
Thanks & Regards Siang
Hi Stanislav Fomichev,
I submitted v2. But after that, I think twice. IMHO, err = bpf_xdp_adjust_meta(ctx, (int)(ctx->data - ctx->data_meta -
XDP_METADATA_SIZE));
is better than err = bpf_xdp_adjust_meta(ctx, -(int)XDP_METADATA_SIZE); because it is more robust.
Any thoughts?
My preference is on keeping everything as is and converting to -(int)XDP_METADATA_SIZE. Making IGC properly expose (temporary) metadata len is a user visible change, not sure we have a good justification?
Thank you for your feedback. I agree that we don't have a strong justification for making the metadata length user-visible at this time. I concur with your preference to keep everything as is and proceed with -(int)XDP_METADATA_SIZE.
Btw, do you think whether my first patch which changes the documentation is still needed or not?
Yes, the documentation is super useful, let's keep it!
On Thursday, July 3, 2025 12:04 AM, Stanislav Fomichev stfomichev@gmail.com wrote:
On 07/02, Song, Yoong Siang wrote:
On Wednesday, July 2, 2025 11:19 PM, Stanislav Fomichev
stfomichev@gmail.com wrote:
On 07/02, Song, Yoong Siang wrote:
On Wednesday, July 2, 2025 10:23 AM, Song, Yoong Siang
yoong.siang.song@intel.com wrote:
On Wednesday, July 2, 2025 12:31 AM, Stanislav Fomichev
wrote:
On 07/01, Song Yoong Siang wrote: > Introduce the XDP_METADATA_SIZE macro to ensure that user applications
can
> consistently retrieve the correct location of struct xdp_meta. > > Prior to this commit, the XDP program adjusted the data_meta backward by > the size of struct xdp_meta, while the user application retrieved the data > by calculating backward from the data pointer. This approach only worked if > xdp_buff->data_meta was equal to xdp_buff->data before calling > bpf_xdp_adjust_meta. > > With the introduction of XDP_METADATA_SIZE, both the XDP program and
user
> application now calculate and identify the location of struct xdp_meta from > the data pointer. This ensures the implementation remains functional even > when there is device-reserved metadata, making the tests more portable > across different NICs. > > Signed-off-by: Song Yoong Siang yoong.siang.song@intel.com > --- > tools/testing/selftests/bpf/prog_tests/xdp_metadata.c | 2 +- > tools/testing/selftests/bpf/progs/xdp_hw_metadata.c | 10 +++++++++- > tools/testing/selftests/bpf/progs/xdp_metadata.c | 8 +++++++- > tools/testing/selftests/bpf/xdp_hw_metadata.c | 2 +- > tools/testing/selftests/bpf/xdp_metadata.h | 7 +++++++ > 5 files changed, 25 insertions(+), 4 deletions(-) > > diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c > index 19f92affc2da..8d6c2633698b 100644 > --- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c > +++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c > @@ -302,7 +302,7 @@ static int verify_xsk_metadata(struct xsk *xsk, bool sent_from_af_xdp) > > /* custom metadata */ > > - meta = data - sizeof(struct xdp_meta); > + meta = data - XDP_METADATA_SIZE; > > if (!ASSERT_NEQ(meta->rx_timestamp, 0, "rx_timestamp")) > return -1; > diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c > index 330ece2eabdb..72242ac1cdcd 100644 > --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c > +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c > @@ -27,6 +27,7 @@ extern int bpf_xdp_metadata_rx_vlan_tag(const
struct
xdp_md *ctx, > SEC("xdp.frags") > int rx(struct xdp_md *ctx) > { > + int metalen_used, metalen_to_adjust; > void *data, *data_meta, *data_end; > struct ipv6hdr *ip6h = NULL; > struct udphdr *udp = NULL; > @@ -72,7 +73,14 @@ int rx(struct xdp_md *ctx) > return XDP_PASS; > } > > - err = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta));
[..]
> + metalen_used = ctx->data - ctx->data_meta;
Is the intent here to query how much metadata has been consumed/reserved by the driver?
Yes.
Looking at IGC it has the following code/comment:
bi->xdp->data += IGC_TS_HDR_LEN;
/* HW timestamp has been copied into local variable. Metadata * length when XDP program is called should be 0. */ bi->xdp->data_meta += IGC_TS_HDR_LEN;
Are you sure that metadata size is correctly exposed to the bpf program?
You are right, the current igc driver didn't expose the metadata size correctly. I submitted [1] to fix it.
[1] https://patchwork.ozlabs.org/project/intel-wired- lan/patch/20250701080955.3273137-1-yoong.siang.song@intel.com/
My assumptions was that we should just unconditionally do
bpf_xdp_adjust_meta
with -XDP_METADATA_SIZE and that should be good enough.
The checking is just for precautions. No problem if directly adjust the meta unconditionally. That will save processing time for each packet as well. I will remove the checking and submit v2.
Thanks & Regards Siang
Hi Stanislav Fomichev,
I submitted v2. But after that, I think twice. IMHO, err = bpf_xdp_adjust_meta(ctx, (int)(ctx->data - ctx->data_meta -
XDP_METADATA_SIZE));
is better than err = bpf_xdp_adjust_meta(ctx, -(int)XDP_METADATA_SIZE); because it is more robust.
Any thoughts?
My preference is on keeping everything as is and converting to -(int)XDP_METADATA_SIZE. Making IGC properly expose (temporary) metadata
len
is a user visible change, not sure we have a good justification?
Thank you for your feedback. I agree that we don't have a strong justification for making the metadata length user-visible at this time. I concur with your preference to keep everything as is and proceed with -(int)XDP_METADATA_SIZE.
Btw, do you think whether my first patch which changes the documentation is still needed or not?
Yes, the documentation is super useful, let's keep it!
Sure. I will keep the documentation but submit v3 to remove the portion that suggest user to use bpf_xdp_adjust_meta with METADATA_SIZE - (xdp_buff->data - xdp_buff->data_meta).
On Tue, 1 Jul 2025 12:29:38 +0800 Song Yoong Siang wrote:
|<---sizeof(xdp_meta)--| | | struct xdp_meta rx_desc->address ^ ^ | |
+----------+----------------------+------------+------+ | headroom | custom metadata | reserved | data | +----------+----------------------+------------+------+ ^ ^ ^ | | | struct xdp_meta xdp_buff->data_meta xdp_buff->data | | |<---sizeof(xdp_meta)--|
Huh. Did AF_XDP maintainers explicitly sign off on this or it's just how IGC implementation works and nobody noticed?
For normal XDP my understanding is that its the driver's responsibility to move the "reserved" stuff out of place before presenting the frame to program.
On Tuesday, July 8, 2025 4:55 AM, Jakub Kicinski kuba@kernel.org wrote:
On Tue, 1 Jul 2025 12:29:38 +0800 Song Yoong Siang wrote:
|<---sizeof(xdp_meta)--| | | struct xdp_meta rx_desc->address ^ ^ | |
+----------+----------------------+------------+------+ | headroom | custom metadata | reserved | data | +----------+----------------------+------------+------+ ^ ^ ^ | | | struct xdp_meta xdp_buff->data_meta xdp_buff->data | | |<---sizeof(xdp_meta)--|
Huh. Did AF_XDP maintainers explicitly sign off on this or it's just how IGC implementation works and nobody noticed?
Previously, IGC do copy out the Rx hwts from metadata area, so no problem when implementing XDP Rx metadata.
After that, net_device_ops.ndo_get_tstamp() is added into IGC to support timestamping from both free-running clock and adjustable clock. The 2 timers are stored in the metadata area, thus causing the issue.
For normal XDP my understanding is that its the driver's responsibility to move the "reserved" stuff out of place before presenting the frame to program.
Is it means that driver needs to move out the "reserved" stuff before XDP program and then move back the stuff after XDP program for certain situation, like XDP_PASS?
IMHO, if driver is allowed to use some portion of the metadata area, then the packet processing will be more efficiency and also align with the "zero-copy" idea.
Any thoughts?
On Tue, 8 Jul 2025 01:34:13 +0000 Song, Yoong Siang wrote:
For normal XDP my understanding is that its the driver's responsibility to move the "reserved" stuff out of place before presenting the frame to program.
Is it means that driver needs to move out the "reserved" stuff before XDP program and then move back the stuff after XDP program for certain situation, like XDP_PASS?
Why would the driver need to move it back? On XDP_PASS an skb is constructed, so the metadata should be transferred to the skb. There is no need to copy it back as a prepend.
On Tuesday, July 8, 2025 9:45 AM, Jakub Kicinski kuba@kernel.org wrote:
On Tue, 8 Jul 2025 01:34:13 +0000 Song, Yoong Siang wrote:
For normal XDP my understanding is that its the driver's responsibility to move the "reserved" stuff out of place before presenting the frame to program.
Is it means that driver needs to move out the "reserved" stuff before XDP program and then move back the stuff after XDP program for certain situation, like
XDP_PASS?
Why would the driver need to move it back? On XDP_PASS an skb is constructed, so the metadata should be transferred to the skb. There is no need to copy it back as a prepend.
I said so because I thought need to put back the timestamp as prepend and then point skb_shared_hwtstamps.netdev_data to it to support the ndo_get_tstamp().
I haven't study the code flow in detail, so I might be missing something.
On Tue, 8 Jul 2025 02:06:11 +0000 Song, Yoong Siang wrote:
Why would the driver need to move it back? On XDP_PASS an skb is constructed, so the metadata should be transferred to the skb. There is no need to copy it back as a prepend.
I said so because I thought need to put back the timestamp as prepend and then point skb_shared_hwtstamps.netdev_data to it to support the ndo_get_tstamp().
No need, the timestamps are set in shared info directly. There are multiple drivers which use the metadata prepend method, so I'm pretty sure it should work.
On Tuesday, July 8, 2025 10:18 AM, Jakub Kicinski kuba@kernel.org wrote:
On Tue, 8 Jul 2025 02:06:11 +0000 Song, Yoong Siang wrote:
Why would the driver need to move it back? On XDP_PASS an skb is constructed, so the metadata should be transferred to the skb. There is no need to copy it back as a prepend.
I said so because I thought need to put back the timestamp as prepend and then point skb_shared_hwtstamps.netdev_data to it to support the ndo_get_tstamp().
No need, the timestamps are set in shared info directly. There are multiple drivers which use the metadata prepend method, so I'm pretty sure it should work.
Thanks for pointing me in the right direction. I'll proceed with updating the IGC driver and conduct tests to ensure everything works as expected.
linux-kselftest-mirror@lists.linaro.org