This patchset originates from my attempt to resolve a KMSAN warning that has existed for over 3 years: https://syzkaller.appspot.com/bug?extid=0e6ddb1ef80986bdfe64
Previously, we had a brief discussion in this thread about whether we can simply perform memset in adjust_{head,meta}: https://lore.kernel.org/netdev/20250328043941.085de23b@kernel.org/T/#t
Unfortunately, I couldn't find a similar topic in the mail list, but I did find a similar security-related commit: commit 6dfb970d3dbd ("xdp: avoid leaking info stored in frame data on page reuse")
I just create a new topic here and make subject more clear, we can discuss this here.
Meanwhile, I also discovered a related issue that led to a CVE,specifically the Facebook Katran vulnerability (https://vuldb.com/?id.246309).
Currently, even with unprivileged functionality disabled, a user can load a BPF program using CAP_BPF and CAP_NET_ADMIN, which I believe we should avoid exposing kernel memory directly to users now.
Regarding performance considerations, I added corresponding results to the selftest, testing common MAC headers and IP headers of various sizes.
Compared to not using memset, the execution time increased by 2ns, but I think this is negligible considering the entire net stack.
Jiayuan Chen (2): bpf, xdp: clean head/meta when expanding it selftests/bpf: add perf test for adjust_{head,meta}
include/uapi/linux/bpf.h | 8 +-- net/core/filter.c | 5 +- tools/include/uapi/linux/bpf.h | 6 ++- .../selftests/bpf/prog_tests/xdp_perf.c | 52 ++++++++++++++++--- tools/testing/selftests/bpf/progs/xdp_dummy.c | 14 +++++ 5 files changed, 72 insertions(+), 13 deletions(-)
The device allocates an skb, it additionally allocates a prepad size (usually equal to NET_SKB_PAD or XDP_PACKET_HEADROOM) but leaves it uninitialized.
The bpf_xdp_adjust_head function moves skb->data forward, which allows users to access data belonging to other programs, posing a security risk.
Reported-by: syzbot+0e6ddb1ef80986bdfe64@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/00000000000067f65105edbd295d@google.com/T/ Signed-off-by: Jiayuan Chen jiayuan.chen@linux.dev --- include/uapi/linux/bpf.h | 8 +++++--- net/core/filter.c | 5 ++++- tools/include/uapi/linux/bpf.h | 6 ++++-- 3 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index defa5bb881f4..be01a848cbbf 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2760,8 +2760,9 @@ union bpf_attr { * * long bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta) * Description - * Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that - * it is possible to use a negative value for *delta*. This helper + * Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that + * it is possible to use a negative value for *delta*. If *delta* + * is negative, the new header will be memset to zero. This helper * can be used to prepare the packet for pushing or popping * headers. * @@ -2989,7 +2990,8 @@ union bpf_attr { * long bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta) * Description * Adjust the address pointed by *xdp_md*\ **->data_meta** by - * *delta* (which can be positive or negative). Note that this + * *delta* (which can be positive or negative). If *delta* is + * negative, the new meta will be memset to zero. Note that this * operation modifies the address stored in *xdp_md*\ **->data**, * so the latter must be loaded only after the helper has been * called. diff --git a/net/core/filter.c b/net/core/filter.c index 46ae8eb7a03c..5f01d373b719 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3947,6 +3947,8 @@ BPF_CALL_2(bpf_xdp_adjust_head, struct xdp_buff *, xdp, int, offset) if (metalen) memmove(xdp->data_meta + offset, xdp->data_meta, metalen); + if (offset < 0) + memset(data, 0, -offset); xdp->data_meta += offset; xdp->data = data;
@@ -4239,7 +4241,8 @@ BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset) return -EINVAL; if (unlikely(xdp_metalen_invalid(metalen))) return -EACCES; - + if (offset < 0) + memset(meta, 0, -offset); xdp->data_meta = meta;
return 0; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index defa5bb881f4..7b1871f2eccf 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -2761,7 +2761,8 @@ union bpf_attr { * long bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta) * Description * Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that - * it is possible to use a negative value for *delta*. This helper + * it is possible to use a negative value for *delta*. If *delta* + * is negative, the new header will be memset to zero. This helper * can be used to prepare the packet for pushing or popping * headers. * @@ -2989,7 +2990,8 @@ union bpf_attr { * long bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta) * Description * Adjust the address pointed by *xdp_md*\ **->data_meta** by - * *delta* (which can be positive or negative). Note that this + * *delta* (which can be positive or negative). If *delta* is + * negative, the new meta will be memset to zero. Note that this * operation modifies the address stored in *xdp_md*\ **->data**, * so the latter must be loaded only after the helper has been * called.
On 31/03/2025 05.23, Jiayuan Chen wrote:
The device allocates an skb, it additionally allocates a prepad size (usually equal to NET_SKB_PAD or XDP_PACKET_HEADROOM) but leaves it uninitialized.
The bpf_xdp_adjust_head function moves skb->data forward, which allows users to access data belonging to other programs, posing a security risk.
I find your description confusing, and it needs to be improved to avoid people interpenetrating this when reading the commit log in the future.
It is part of the UAPI that BPF programs access data belonging to other BPF programs. It is the concept behind data_meta, e.g. that XDP set information in this memory and TC-BPF reads it again (chained XDP progs also have R/W access). I hope/assume this is not the desired interpretation of your text.
I guess you want to say, that the intention is to avoid BPF programs accessing uninitialized data? (... which is also what the code does at a glance)
Reported-by: syzbot+0e6ddb1ef80986bdfe64@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/00000000000067f65105edbd295d@google.com/T/ Signed-off-by: Jiayuan Chen jiayuan.chen@linux.dev
include/uapi/linux/bpf.h | 8 +++++--- net/core/filter.c | 5 ++++- tools/include/uapi/linux/bpf.h | 6 ++++-- 3 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index defa5bb881f4..be01a848cbbf 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2760,8 +2760,9 @@ union bpf_attr {
- long bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta)
- Description
Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that
it is possible to use a negative value for *delta*. This helper
Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that
it is possible to use a negative value for *delta*. If *delta*
is negative, the new header will be memset to zero. This helper
can be used to prepare the packet for pushing or popping
headers.
@@ -2989,7 +2990,8 @@ union bpf_attr {
- long bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta)
- Description
Adjust the address pointed by *xdp_md*\ **->data_meta** by
*delta* (which can be positive or negative). Note that this
*delta* (which can be positive or negative). If *delta* is
negative, the new meta will be memset to zero. Note that this
operation modifies the address stored in *xdp_md*\ **->data**,
so the latter must be loaded only after the helper has been
called.
diff --git a/net/core/filter.c b/net/core/filter.c index 46ae8eb7a03c..5f01d373b719 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3947,6 +3947,8 @@ BPF_CALL_2(bpf_xdp_adjust_head, struct xdp_buff *, xdp, int, offset) if (metalen) memmove(xdp->data_meta + offset, xdp->data_meta, metalen);
- if (offset < 0)
xdp->data_meta += offset; xdp->data = data;memset(data, 0, -offset);
@@ -4239,7 +4241,8 @@ BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset) return -EINVAL; if (unlikely(xdp_metalen_invalid(metalen))) return -EACCES;
- if (offset < 0)
xdp->data_meta = meta;memset(meta, 0, -offset);
return 0; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index defa5bb881f4..7b1871f2eccf 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -2761,7 +2761,8 @@ union bpf_attr {
- long bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta)
- Description
Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that
it is possible to use a negative value for *delta*. This helper
it is possible to use a negative value for *delta*. If *delta*
is negative, the new header will be memset to zero. This helper
can be used to prepare the packet for pushing or popping
headers.
@@ -2989,7 +2990,8 @@ union bpf_attr {
- long bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta)
- Description
Adjust the address pointed by *xdp_md*\ **->data_meta** by
*delta* (which can be positive or negative). Note that this
*delta* (which can be positive or negative). If *delta* is
negative, the new meta will be memset to zero. Note that this
operation modifies the address stored in *xdp_md*\ **->data**,
so the latter must be loaded only after the helper has been
called.
On Sun, Mar 30, 2025 at 8:27 PM Jiayuan Chen jiayuan.chen@linux.dev wrote:
The device allocates an skb, it additionally allocates a prepad size (usually equal to NET_SKB_PAD or XDP_PACKET_HEADROOM) but leaves it uninitialized.
The bpf_xdp_adjust_head function moves skb->data forward, which allows users to access data belonging to other programs, posing a security risk.
Reported-by: syzbot+0e6ddb1ef80986bdfe64@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/00000000000067f65105edbd295d@google.com/T/ Signed-off-by: Jiayuan Chen jiayuan.chen@linux.dev
include/uapi/linux/bpf.h | 8 +++++--- net/core/filter.c | 5 ++++- tools/include/uapi/linux/bpf.h | 6 ++++-- 3 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index defa5bb881f4..be01a848cbbf 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2760,8 +2760,9 @@ union bpf_attr {
- long bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta)
Description
Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that
it is possible to use a negative value for *delta*. This helper
Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that
it is possible to use a negative value for *delta*. If *delta*
is negative, the new header will be memset to zero. This helper
can be used to prepare the packet for pushing or popping
headers.
@@ -2989,7 +2990,8 @@ union bpf_attr {
- long bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta)
Description
Adjust the address pointed by *xdp_md*\ **->data_meta** by
*delta* (which can be positive or negative). Note that this
*delta* (which can be positive or negative). If *delta* is
negative, the new meta will be memset to zero. Note that this
operation modifies the address stored in *xdp_md*\ **->data**,
so the latter must be loaded only after the helper has been
called.
diff --git a/net/core/filter.c b/net/core/filter.c index 46ae8eb7a03c..5f01d373b719 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3947,6 +3947,8 @@ BPF_CALL_2(bpf_xdp_adjust_head, struct xdp_buff *, xdp, int, offset) if (metalen) memmove(xdp->data_meta + offset, xdp->data_meta, metalen);
if (offset < 0)
memset(data, 0, -offset); xdp->data_meta += offset; xdp->data = data;
@@ -4239,7 +4241,8 @@ BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset) return -EINVAL; if (unlikely(xdp_metalen_invalid(metalen))) return -EACCES;
if (offset < 0)
memset(meta, 0, -offset);
Let's make everyone pay a performance penalty to silence KMSAN warning?
I don't think it's a good trade off.
Soft nack.
Alexei Starovoitov wrote:
On Sun, Mar 30, 2025 at 8:27 PM Jiayuan Chen jiayuan.chen@linux.dev wrote:
The device allocates an skb, it additionally allocates a prepad size (usually equal to NET_SKB_PAD or XDP_PACKET_HEADROOM) but leaves it uninitialized.
The bpf_xdp_adjust_head function moves skb->data forward, which allows users to access data belonging to other programs, posing a security risk.
Reported-by: syzbot+0e6ddb1ef80986bdfe64@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/00000000000067f65105edbd295d@google.com/T/ Signed-off-by: Jiayuan Chen jiayuan.chen@linux.dev
include/uapi/linux/bpf.h | 8 +++++--- net/core/filter.c | 5 ++++- tools/include/uapi/linux/bpf.h | 6 ++++-- 3 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index defa5bb881f4..be01a848cbbf 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2760,8 +2760,9 @@ union bpf_attr {
- long bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta)
Description
Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that
it is possible to use a negative value for *delta*. This helper
Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that
it is possible to use a negative value for *delta*. If *delta*
is negative, the new header will be memset to zero. This helper
can be used to prepare the packet for pushing or popping
headers.
@@ -2989,7 +2990,8 @@ union bpf_attr {
- long bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta)
Description
Adjust the address pointed by *xdp_md*\ **->data_meta** by
*delta* (which can be positive or negative). Note that this
*delta* (which can be positive or negative). If *delta* is
negative, the new meta will be memset to zero. Note that this
operation modifies the address stored in *xdp_md*\ **->data**,
so the latter must be loaded only after the helper has been
called.
diff --git a/net/core/filter.c b/net/core/filter.c index 46ae8eb7a03c..5f01d373b719 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3947,6 +3947,8 @@ BPF_CALL_2(bpf_xdp_adjust_head, struct xdp_buff *, xdp, int, offset) if (metalen) memmove(xdp->data_meta + offset, xdp->data_meta, metalen);
if (offset < 0)
memset(data, 0, -offset); xdp->data_meta += offset; xdp->data = data;
@@ -4239,7 +4241,8 @@ BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset) return -EINVAL; if (unlikely(xdp_metalen_invalid(metalen))) return -EACCES;
if (offset < 0)
memset(meta, 0, -offset);
Let's make everyone pay a performance penalty to silence KMSAN warning?
I don't think it's a good trade off.
Soft nack.
I also assumed that this was known when the feature was originally introduced and left as is for performance reasons.
Might be good to have that explicit. And that it is deemed safe by virtue of XDP requiring superuser privileges anyway. Or at least I guess that was the thought process?
On Thu, Apr 3, 2025 at 7:32 AM Willem de Bruijn willemdebruijn.kernel@gmail.com wrote:
Alexei Starovoitov wrote:
On Sun, Mar 30, 2025 at 8:27 PM Jiayuan Chen jiayuan.chen@linux.dev wrote:
The device allocates an skb, it additionally allocates a prepad size (usually equal to NET_SKB_PAD or XDP_PACKET_HEADROOM) but leaves it uninitialized.
The bpf_xdp_adjust_head function moves skb->data forward, which allows users to access data belonging to other programs, posing a security risk.
Reported-by: syzbot+0e6ddb1ef80986bdfe64@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/00000000000067f65105edbd295d@google.com/T/ Signed-off-by: Jiayuan Chen jiayuan.chen@linux.dev
include/uapi/linux/bpf.h | 8 +++++--- net/core/filter.c | 5 ++++- tools/include/uapi/linux/bpf.h | 6 ++++-- 3 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index defa5bb881f4..be01a848cbbf 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2760,8 +2760,9 @@ union bpf_attr {
- long bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta)
Description
Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that
it is possible to use a negative value for *delta*. This helper
Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that
it is possible to use a negative value for *delta*. If *delta*
is negative, the new header will be memset to zero. This helper
can be used to prepare the packet for pushing or popping
headers.
@@ -2989,7 +2990,8 @@ union bpf_attr {
- long bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta)
Description
Adjust the address pointed by *xdp_md*\ **->data_meta** by
*delta* (which can be positive or negative). Note that this
*delta* (which can be positive or negative). If *delta* is
negative, the new meta will be memset to zero. Note that this
operation modifies the address stored in *xdp_md*\ **->data**,
so the latter must be loaded only after the helper has been
called.
diff --git a/net/core/filter.c b/net/core/filter.c index 46ae8eb7a03c..5f01d373b719 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3947,6 +3947,8 @@ BPF_CALL_2(bpf_xdp_adjust_head, struct xdp_buff *, xdp, int, offset) if (metalen) memmove(xdp->data_meta + offset, xdp->data_meta, metalen);
if (offset < 0)
memset(data, 0, -offset); xdp->data_meta += offset; xdp->data = data;
@@ -4239,7 +4241,8 @@ BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset) return -EINVAL; if (unlikely(xdp_metalen_invalid(metalen))) return -EACCES;
if (offset < 0)
memset(meta, 0, -offset);
Let's make everyone pay a performance penalty to silence KMSAN warning?
I don't think it's a good trade off.
Soft nack.
I also assumed that this was known when the feature was originally introduced and left as is for performance reasons.
Might be good to have that explicit. And that it is deemed safe by virtue of XDP requiring superuser privileges anyway. Or at least I guess that was the thought process?
Correct. When prog extends the headroom it is suppose to write something in there. Extending the packet just to capture some garbage bytes from the previous packet is dumb, but doesn't compromise the safety of the kernel. There were proposals to ask the verifier to track that the headroom is actually initialized by the program, but it's pointless. Dumb prog can write garbage in there just as well. bpf_probe_read_kernel( from_random_addr) and store into the headroom.
April 3, 2025 at 22:24, "Alexei Starovoitov" alexei.starovoitov@gmail.com wrote:
On Sun, Mar 30, 2025 at 8:27 PM Jiayuan Chen jiayuan.chen@linux.dev wrote:
The device allocates an skb, it additionally allocates a prepad size
(usually equal to NET_SKB_PAD or XDP_PACKET_HEADROOM) but leaves it
uninitialized.
The bpf_xdp_adjust_head function moves skb->data forward, which allows
users to access data belonging to other programs, posing a security risk.
Reported-by: syzbot+0e6ddb1ef80986bdfe64@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/00000000000067f65105edbd295d@google.com/T/
Signed-off-by: Jiayuan Chen jiayuan.chen@linux.dev
include/uapi/linux/bpf.h | 8 +++++---
net/core/filter.c | 5 ++++-
tools/include/uapi/linux/bpf.h | 6 ++++--
3 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index defa5bb881f4..be01a848cbbf 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2760,8 +2760,9 @@ union bpf_attr {
long bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta)
Description
- Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that
- it is possible to use a negative value for *delta*. This helper
- Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that
- it is possible to use a negative value for *delta*. If *delta*
- is negative, the new header will be memset to zero. This helper
can be used to prepare the packet for pushing or popping
headers.
@@ -2989,7 +2990,8 @@ union bpf_attr {
long bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta)
Description
Adjust the address pointed by *xdp_md*\ **->data_meta** by
- *delta* (which can be positive or negative). Note that this
- *delta* (which can be positive or negative). If *delta* is
- negative, the new meta will be memset to zero. Note that this
operation modifies the address stored in *xdp_md*\ **->data**,
so the latter must be loaded only after the helper has been
called.
diff --git a/net/core/filter.c b/net/core/filter.c
index 46ae8eb7a03c..5f01d373b719 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3947,6 +3947,8 @@ BPF_CALL_2(bpf_xdp_adjust_head, struct xdp_buff *, xdp, int, offset)
if (metalen)
memmove(xdp->data_meta + offset,
xdp->data_meta, metalen);
if (offset < 0)
memset(data, 0, -offset);
xdp->data_meta += offset;
xdp->data = data;
@@ -4239,7 +4241,8 @@ BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset)
return -EINVAL;
if (unlikely(xdp_metalen_invalid(metalen)))
return -EACCES;
if (offset < 0)
memset(meta, 0, -offset);
Let's make everyone pay a performance penalty to silence KMSAN warning? I don't think it's a good trade off. Soft nack.
It's not just about simply suppressing KMSAN warnings, but for security considerations.
So I'd like to confirm: currently, loading an XDP program only requires CAP_NET_ADMIN and CAP_BPF permissions. If we consider this as a super privilege, then even if uninitialized memory is exposed, I think it's okay, as it's the developer's responsibility, for example, like the CVE in meta https://vuldb.com/?id.246309.
Or I'm thinking, can we rely on the verifier to force the initialization of the newly added packet boundary behavior, specifically for this special case (although it won't be easy to implement).
On Thu, Apr 3, 2025 at 5:27 PM Jiayuan Chen jiayuan.chen@linux.dev wrote:
April 3, 2025 at 22:24, "Alexei Starovoitov" alexei.starovoitov@gmail.com wrote:
On Sun, Mar 30, 2025 at 8:27 PM Jiayuan Chen jiayuan.chen@linux.dev wrote:
The device allocates an skb, it additionally allocates a prepad size
(usually equal to NET_SKB_PAD or XDP_PACKET_HEADROOM) but leaves it
uninitialized.
The bpf_xdp_adjust_head function moves skb->data forward, which allows
users to access data belonging to other programs, posing a security risk.
Reported-by: syzbot+0e6ddb1ef80986bdfe64@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/00000000000067f65105edbd295d@google.com/T/
Signed-off-by: Jiayuan Chen jiayuan.chen@linux.dev
include/uapi/linux/bpf.h | 8 +++++---
net/core/filter.c | 5 ++++-
tools/include/uapi/linux/bpf.h | 6 ++++--
3 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index defa5bb881f4..be01a848cbbf 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2760,8 +2760,9 @@ union bpf_attr {
long bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta)
Description
- Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that
- it is possible to use a negative value for *delta*. This helper
- Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that
- it is possible to use a negative value for *delta*. If *delta*
- is negative, the new header will be memset to zero. This helper
can be used to prepare the packet for pushing or popping
headers.
@@ -2989,7 +2990,8 @@ union bpf_attr {
long bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta)
Description
Adjust the address pointed by *xdp_md*\ **->data_meta** by
- *delta* (which can be positive or negative). Note that this
- *delta* (which can be positive or negative). If *delta* is
- negative, the new meta will be memset to zero. Note that this
operation modifies the address stored in *xdp_md*\ **->data**,
so the latter must be loaded only after the helper has been
called.
diff --git a/net/core/filter.c b/net/core/filter.c
index 46ae8eb7a03c..5f01d373b719 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3947,6 +3947,8 @@ BPF_CALL_2(bpf_xdp_adjust_head, struct xdp_buff *, xdp, int, offset)
if (metalen)
memmove(xdp->data_meta + offset,
xdp->data_meta, metalen);
if (offset < 0)
memset(data, 0, -offset);
xdp->data_meta += offset;
xdp->data = data;
@@ -4239,7 +4241,8 @@ BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset)
return -EINVAL;
if (unlikely(xdp_metalen_invalid(metalen)))
return -EACCES;
if (offset < 0)
memset(meta, 0, -offset);
Let's make everyone pay a performance penalty to silence KMSAN warning? I don't think it's a good trade off. Soft nack.
It's not just about simply suppressing KMSAN warnings, but for security considerations.
So I'd like to confirm: currently, loading an XDP program only requires CAP_NET_ADMIN and CAP_BPF permissions. If we consider this as a super privilege, then even if uninitialized memory is exposed, I think it's okay, as it's the developer's responsibility, for example, like the CVE in meta https://vuldb.com/?id.246309.
And we fixed Katran. not the kernel.
Or I'm thinking, can we rely on the verifier to force the initialization of the newly added packet boundary behavior, specifically for this special case (although it won't be easy to implement).
We added a memset operation during the adjust operation, which may cause performance issues.
Therefore, we added perf testing, and testing found that for common header length operations, memset() operation increased the performance overhead by 2ns, which is negligible for the net stack.
Before memset ./test_progs -a xdp_adjust_head_perf -v run adjust head with size 6 cost 56 ns run adjust head with size 20 cost 56 ns run adjust head with size 40 cost 56 ns run adjust head with size 200 cost 56 ns
After memset ./test_progs -a xdp_adjust_head_perf -v run adjust head with size 6 cost 58 ns run adjust head with size 20 cost 58 ns run adjust head with size 40 cost 58 ns run adjust head with size 200 cost 66 ns
Signed-off-by: Jiayuan Chen jiayuan.chen@linux.dev --- .../selftests/bpf/prog_tests/xdp_perf.c | 52 ++++++++++++++++--- tools/testing/selftests/bpf/progs/xdp_dummy.c | 14 +++++ 2 files changed, 59 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_perf.c b/tools/testing/selftests/bpf/prog_tests/xdp_perf.c index ec5369f247cb..1b4260c6e5d7 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_perf.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_perf.c @@ -1,10 +1,11 @@ // SPDX-License-Identifier: GPL-2.0 #include <test_progs.h> +#include <network_helpers.h> +#include "xdp_dummy.skel.h"
void test_xdp_perf(void) { - const char *file = "./xdp_dummy.bpf.o"; - struct bpf_object *obj; + struct xdp_dummy *skel; char in[128], out[128]; int err, prog_fd; LIBBPF_OPTS(bpf_test_run_opts, topts, @@ -15,14 +16,51 @@ void test_xdp_perf(void) .repeat = 1000000, );
- err = bpf_prog_test_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd); - if (CHECK_FAIL(err)) - return; - + skel = xdp_dummy__open_and_load(); + prog_fd = bpf_program__fd(skel->progs.xdp_dummy_prog); err = bpf_prog_test_run_opts(prog_fd, &topts); ASSERT_OK(err, "test_run"); ASSERT_EQ(topts.retval, XDP_PASS, "test_run retval"); ASSERT_EQ(topts.data_size_out, 128, "test_run data_size_out");
- bpf_object__close(obj); + xdp_dummy__destroy(skel); +} + +void test_xdp_adjust_head_perf(void) +{ + struct xdp_dummy *skel; + int repeat = 9000000; + struct xdp_md ctx_in; + char data[100]; + int err, prog_fd; + size_t test_header_size[] = { + ETH_ALEN, + sizeof(struct iphdr), + sizeof(struct ipv6hdr), + 200, + }; + DECLARE_LIBBPF_OPTS(bpf_test_run_opts, topts, + .data_in = &data, + .data_size_in = sizeof(data), + .repeat = repeat, + ); + + topts.ctx_in = &ctx_in; + topts.ctx_size_in = sizeof(ctx_in); + memset(&ctx_in, 0, sizeof(ctx_in)); + ctx_in.data_meta = 0; + ctx_in.data_end = ctx_in.data + sizeof(data); + + skel = xdp_dummy__open_and_load(); + prog_fd = bpf_program__fd(skel->progs.xdp_dummy_adjust_head); + + for (int i = 0; i < ARRAY_SIZE(test_header_size); i++) { + skel->bss->head_size = test_header_size[i]; + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "test_run"); + ASSERT_EQ(topts.retval, XDP_PASS, "test_run retval"); + fprintf(stdout, "run adjust head with size %zd cost %d ns\n", + test_header_size[i], topts.duration); + } + xdp_dummy__destroy(skel); } diff --git a/tools/testing/selftests/bpf/progs/xdp_dummy.c b/tools/testing/selftests/bpf/progs/xdp_dummy.c index d988b2e0cee8..7bebedbbc949 100644 --- a/tools/testing/selftests/bpf/progs/xdp_dummy.c +++ b/tools/testing/selftests/bpf/progs/xdp_dummy.c @@ -4,10 +4,24 @@ #include <linux/bpf.h> #include <bpf/bpf_helpers.h>
+int head_size; + SEC("xdp") int xdp_dummy_prog(struct xdp_md *ctx) { return XDP_PASS; }
+SEC("xdp") +int xdp_dummy_adjust_head(struct xdp_md *ctx) +{ + if (bpf_xdp_adjust_head(ctx, -head_size)) + return XDP_DROP; + + if (bpf_xdp_adjust_head(ctx, head_size)) + return XDP_DROP; + + return XDP_PASS; +} + char _license[] SEC("license") = "GPL";
On Mon, 31 Mar 2025 11:23:45 +0800 Jiayuan Chen wrote:
which is negligible for the net stack.
Before memset ./test_progs -a xdp_adjust_head_perf -v run adjust head with size 6 cost 56 ns run adjust head with size 20 cost 56 ns run adjust head with size 40 cost 56 ns run adjust head with size 200 cost 56 ns
After memset ./test_progs -a xdp_adjust_head_perf -v run adjust head with size 6 cost 58 ns run adjust head with size 20 cost 58 ns run adjust head with size 40 cost 58 ns run adjust head with size 200 cost 66 ns
FWIW I'm not sure if this is "negligible" for XDP like you say, but I defer to Jesper :)
On 03/04/2025 02.24, Jakub Kicinski wrote:
On Mon, 31 Mar 2025 11:23:45 +0800 Jiayuan Chen wrote:
which is negligible for the net stack.
Before memset ./test_progs -a xdp_adjust_head_perf -v run adjust head with size 6 cost 56 ns run adjust head with size 20 cost 56 ns run adjust head with size 40 cost 56 ns run adjust head with size 200 cost 56 ns
After memset ./test_progs -a xdp_adjust_head_perf -v run adjust head with size 6 cost 58 ns run adjust head with size 20 cost 58 ns run adjust head with size 40 cost 58 ns run adjust head with size 200 cost 66 ns
FWIW I'm not sure if this is "negligible" for XDP like you say, but I defer to Jesper :)
It would be too much for the XDP_DROP use-case, e.g. DDoS protection and driver hardware eval. But this is changing a BPF-helper, which means it is opt-in by the BPF-programmer. Thus, we can accept larger perf overhead here.
I suspect your 2 nanosec overhead primarily comes from the function call overhead. On my AMD production system with SRSO mitigation enabled I expect to see around 6 ns overhead (5.699 ns), which is sad.
I've done a lot of benchmarking of memset (see [1]). One take-away is that memset with small const values will get compiled into very fast code that avoids the function call (basically QWORD MOVs). E.g. memset with const 32 is extremely fast[2], on my system it takes 0.673 ns (and 0.562 ns comes from for-loop overhead). Thus, it is possible to do something faster, as we are clearing very small values. I.e. using a duff's device construct like I did for remainder in [2].
In this case, as this is a BPF-helper, I am uncertain if it is worth the complexity to add such optimizations... I guess not. This turned into a long way of saying, I'm okay with this change.
[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time...
[2] https://github.com/netoptimizer/prototype-kernel/blob/35b1716d0c300e7fa2c8b6...
--Jesper
linux-kselftest-mirror@lists.linaro.org