syzkaller reported a bug [1] where a socket using sockmap, after being unloaded, exposed incorrect copied_seq calculation. The selftest I provided can be used to reproduce the issue reported by syzkaller.
TCP recvmsg seq # bug 2: copied E92C873, seq E68D125, rcvnxt E7CEB7C, fl 40 WARNING: CPU: 1 PID: 5997 at net/ipv4/tcp.c:2724 tcp_recvmsg_locked+0xb2f/0x2910 net/ipv4/tcp.c:2724 Call Trace: <TASK> receive_fallback_to_copy net/ipv4/tcp.c:1968 [inline] tcp_zerocopy_receive+0x131a/0x2120 net/ipv4/tcp.c:2200 do_tcp_getsockopt+0xe28/0x26c0 net/ipv4/tcp.c:4713 tcp_getsockopt+0xdf/0x100 net/ipv4/tcp.c:4812 do_sock_getsockopt+0x34d/0x440 net/socket.c:2421 __sys_getsockopt+0x12f/0x260 net/socket.c:2450 __do_sys_getsockopt net/socket.c:2457 [inline] __se_sys_getsockopt net/socket.c:2454 [inline] __x64_sys_getsockopt+0xbd/0x160 net/socket.c:2454 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f
A sockmap socket maintains its own receive queue (ingress_msg) which may contain data from either its own protocol stack or forwarded from other sockets.
FD1:read() -- FD1->copied_seq++ | [read data] | [enqueue data] v [sockmap] -> ingress to self -> ingress_msg queue FD1 native stack ------> ^ -- FD1->rcv_nxt++ -> redirect to other | [enqueue data] | | | ingress to FD1 v ^ ... | [sockmap] FD2 native stack
The issue occurs when reading from ingress_msg: we update tp->copied_seq by default, but if the data comes from other sockets (not the socket's own protocol stack), tcp->rcv_nxt remains unchanged. Later, when converting back to a native socket, reads may fail as copied_seq could be significantly larger than rcv_nxt.
Additionally, FIONREAD calculation based on copied_seq and rcv_nxt is insufficient for sockmap sockets, requiring separate field tracking.
[1] https://syzkaller.appspot.com/bug?extid=06dbd397158ec0ea4983
---
v1 -> v2: Use skmsg.sk instead of extending BPF_F_XXX macro v1: https://lore.kernel.org/bpf/20251117110736.293040-1-jiayuan.chen@linux.dev/
Jiayuan Chen (3): bpf, sockmap: Fix incorrect copied_seq calculation bpf, sockmap: Fix FIONREAD for sockmap bpf, selftest: Add tests for FIONREAD and copied_seq
include/linux/skmsg.h | 48 ++++- net/core/skmsg.c | 29 ++- net/ipv4/tcp_bpf.c | 26 ++- net/ipv4/udp_bpf.c | 25 ++- .../selftests/bpf/prog_tests/sockmap_basic.c | 203 +++++++++++++++++- .../bpf/progs/test_sockmap_pass_prog.c | 8 + 6 files changed, 323 insertions(+), 16 deletions(-)
A socket using sockmap has its own independent receive queue: ingress_msg. This queue may contain data from its own protocol stack or from other sockets.
The issue is that when reading from ingress_msg, we update tp->copied_seq by default. However, if the data is not from its own protocol stack, tcp->rcv_nxt is not increased. Later, if we convert this socket to a native socket, reading from this socket may fail because copied_seq might be significantly larger than rcv_nxt.
This fix also addresses the syzkaller-reported bug referenced in the Closes tag.
This patch marks the skmsg objects in ingress_msg. When reading, we update copied_seq only if the data is from its own protocol stack.
FD1:read() -- FD1->copied_seq++ | [read data] | [enqueue data] v [sockmap] -> ingress to self -> ingress_msg queue FD1 native stack ------> ^ -- FD1->rcv_nxt++ -> redirect to other | [enqueue data] | | | ingress to FD1 v ^ ... | [sockmap] FD2 native stack
Closes: https://syzkaller.appspot.com/bug?extid=06dbd397158ec0ea4983 Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: Jiayuan Chen jiayuan.chen@linux.dev --- include/linux/skmsg.h | 2 ++ net/core/skmsg.c | 25 ++++++++++++++++++++++--- net/ipv4/tcp_bpf.c | 5 +++-- 3 files changed, 27 insertions(+), 5 deletions(-)
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 49847888c287..0323a2b6cf5e 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -141,6 +141,8 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, struct sk_msg *msg, u32 bytes); int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, int len, int flags); +int __sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, + int len, int flags, int *from_self_copied); bool sk_msg_is_readable(struct sock *sk);
static inline void sk_msg_check_to_free(struct sk_msg *msg, u32 i, u32 bytes) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 2ac7731e1e0a..d73e03f7713a 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -409,14 +409,14 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, } EXPORT_SYMBOL_GPL(sk_msg_memcopy_from_iter);
-/* Receive sk_msg from psock->ingress_msg to @msg. */ -int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, - int len, int flags) +int __sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, + int len, int flags, int *from_self_copied) { struct iov_iter *iter = &msg->msg_iter; int peek = flags & MSG_PEEK; struct sk_msg *msg_rx; int i, copied = 0; + bool to_self;
msg_rx = sk_psock_peek_msg(psock); while (copied != len) { @@ -425,6 +425,7 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, if (unlikely(!msg_rx)) break;
+ to_self = msg_rx->sk == sk; i = msg_rx->sg.start; do { struct page *page; @@ -443,6 +444,9 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, }
copied += copy; + if (to_self && from_self_copied) + *from_self_copied += copy; + if (likely(!peek)) { sge->offset += copy; sge->length -= copy; @@ -487,6 +491,14 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, out: return copied; } +EXPORT_SYMBOL_GPL(__sk_msg_recvmsg); + +/* Receive sk_msg from psock->ingress_msg to @msg. */ +int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, + int len, int flags) +{ + return __sk_msg_recvmsg(sk, psock, msg, len, flags, NULL); +} EXPORT_SYMBOL_GPL(sk_msg_recvmsg);
bool sk_msg_is_readable(struct sock *sk) @@ -616,6 +628,12 @@ static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb if (unlikely(!msg)) return -EAGAIN; skb_set_owner_r(skb, sk); + + /* This is used in tcp_bpf_recvmsg_parser() to determine whether the + * data originates from the socket's own protocol stack. No need to + * refcount sk because msg's lifetime is bound to sk via the ingress_msg. + */ + msg->sk = sk; err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg, take_ref); if (err < 0) kfree(msg); @@ -909,6 +927,7 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock, sk_msg_compute_data_pointers(msg); msg->sk = sk; ret = bpf_prog_run_pin_on_cpu(prog, msg); + msg->sk = NULL; ret = sk_psock_map_verd(ret, msg->sk_redir); psock->apply_bytes = msg->apply_bytes; if (ret == __SK_REDIRECT) { diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index a268e1595b22..6332fc36ffe6 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -226,6 +226,7 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, int peek = flags & MSG_PEEK; struct sk_psock *psock; struct tcp_sock *tcp; + int from_self_copied = 0; int copied = 0; u32 seq;
@@ -262,7 +263,7 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, }
msg_bytes_ready: - copied = sk_msg_recvmsg(sk, psock, msg, len, flags); + copied = __sk_msg_recvmsg(sk, psock, msg, len, flags, &from_self_copied); /* The typical case for EFAULT is the socket was gracefully * shutdown with a FIN pkt. So check here the other case is * some error on copy_page_to_iter which would be unexpected. @@ -277,7 +278,7 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, goto out; } } - seq += copied; + seq += from_self_copied; if (!copied) { long timeo; int data;
A socket using sockmap has its own independent receive queue: ingress_msg. This queue may contain data from its own protocol stack or from other sockets.
Therefore, for sockmap, relying solely on copied_seq and rcv_nxt to calculate FIONREAD is not enough.
This patch adds a new ingress_size field in the psock structure to record the data length in ingress_msg. Additionally, we implement new ioctl interfaces for TCP and UDP to intercept FIONREAD operations. While Unix and VSOCK also support sockmap and have similar FIONREAD calculation issues, fixing them would require more extensive changes (please let me know if modifications are needed). I believe it's not appropriate to include those changes under this fix patch.
Previous work by John Fastabend made some efforts towards FIONREAD support: commit e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq") Although the current patch is based on the previous work by John Fastabend, it is acceptable for our Fixes tag to point to the same commit.
FD1:read() -- FD1->copied_seq++ | [read data] | [enqueue data] v [sockmap] -> ingress to self -> ingress_msg queue FD1 native stack ------> ^ -- FD1->rcv_nxt++ -> redirect to other | [enqueue data] | | | ingress to FD1 v ^ ... | [sockmap] FD2 native stack
Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: Jiayuan Chen jiayuan.chen@linux.dev --- include/linux/skmsg.h | 46 ++++++++++++++++++++++++++++++++++++++++++- net/core/skmsg.c | 4 ++++ net/ipv4/tcp_bpf.c | 21 ++++++++++++++++++++ net/ipv4/udp_bpf.c | 25 +++++++++++++++++++---- 4 files changed, 91 insertions(+), 5 deletions(-)
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 0323a2b6cf5e..49069b519499 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -97,6 +97,7 @@ struct sk_psock { struct sk_buff_head ingress_skb; struct list_head ingress_msg; spinlock_t ingress_lock; + ssize_t ingress_size; unsigned long state; struct list_head link; spinlock_t link_lock; @@ -321,6 +322,16 @@ static inline void sock_drop(struct sock *sk, struct sk_buff *skb) kfree_skb(skb); }
+static inline ssize_t sk_psock_get_msg_size(struct sk_psock *psock) +{ + return psock->ingress_size; +} + +static inline void sk_psock_inc_msg_size(struct sk_psock *psock, ssize_t diff) +{ + psock->ingress_size += diff; +} + static inline bool sk_psock_queue_msg(struct sk_psock *psock, struct sk_msg *msg) { @@ -329,6 +340,7 @@ static inline bool sk_psock_queue_msg(struct sk_psock *psock, spin_lock_bh(&psock->ingress_lock); if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) { list_add_tail(&msg->list, &psock->ingress_msg); + sk_psock_inc_msg_size(psock, msg->sg.size); ret = true; } else { sk_msg_free(psock->sk, msg); @@ -345,8 +357,10 @@ static inline struct sk_msg *sk_psock_dequeue_msg(struct sk_psock *psock)
spin_lock_bh(&psock->ingress_lock); msg = list_first_entry_or_null(&psock->ingress_msg, struct sk_msg, list); - if (msg) + if (msg) { list_del(&msg->list); + sk_psock_inc_msg_size(psock, -msg->sg.size); + } spin_unlock_bh(&psock->ingress_lock); return msg; } @@ -523,6 +537,36 @@ static inline bool sk_psock_strp_enabled(struct sk_psock *psock) return !!psock->saved_data_ready; }
+static inline ssize_t sk_psock_msg_inq(struct sock *sk) +{ + struct sk_psock *psock; + ssize_t inq = 0; + + psock = sk_psock_get(sk); + if (likely(psock)) { + inq = sk_psock_get_msg_size(psock); + sk_psock_put(sk, psock); + } + return inq; +} + +/* for udp */ +static inline ssize_t sk_msg_first_length(struct sock *sk) +{ + struct sk_psock *psock; + struct sk_msg *msg; + ssize_t inq = 0; + + psock = sk_psock_get(sk); + if (likely(psock)) { + msg = sk_psock_peek_msg(psock); + if (msg) + inq = msg->sg.size; + sk_psock_put(sk, psock); + } + return inq; +} + #if IS_ENABLED(CONFIG_NET_SOCK_MSG)
#define BPF_F_STRPARSER (1UL << 1) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index d73e03f7713a..4390deaeb890 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -455,6 +455,7 @@ int __sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg atomic_sub(copy, &sk->sk_rmem_alloc); } msg_rx->sg.size -= copy; + sk_psock_inc_msg_size(psock, -copy);
if (!sge->length) { sk_msg_iter_var_next(i); @@ -820,8 +821,11 @@ static void __sk_psock_purge_ingress_msg(struct sk_psock *psock) if (!msg->skb) atomic_sub(msg->sg.size, &psock->sk->sk_rmem_alloc); sk_msg_free(psock->sk, msg); + sk_psock_inc_msg_size(psock, -msg->sg.size); kfree(msg); } + + WARN_ON_ONCE(psock->ingress_size); }
static void __sk_psock_zap_ingress(struct sk_psock *psock) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 6332fc36ffe6..a9c758868f13 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -10,6 +10,7 @@
#include <net/inet_common.h> #include <net/tls.h> +#include <asm/ioctls.h>
void tcp_eat_skb(struct sock *sk, struct sk_buff *skb) { @@ -332,6 +333,25 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, return copied; }
+static int tcp_bpf_ioctl(struct sock *sk, int cmd, int *karg) +{ + bool slow; + + /* we only care about FIONREAD */ + if (cmd != SIOCINQ) + return tcp_ioctl(sk, cmd, karg); + + /* works similar as tcp_ioctl */ + if (sk->sk_state == TCP_LISTEN) + return -EINVAL; + + slow = lock_sock_fast(sk); + *karg = sk_psock_msg_inq(sk); + unlock_sock_fast(sk, slow); + + return 0; +} + static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags, int *addr_len) { @@ -610,6 +630,7 @@ static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS], prot[TCP_BPF_BASE].close = sock_map_close; prot[TCP_BPF_BASE].recvmsg = tcp_bpf_recvmsg; prot[TCP_BPF_BASE].sock_is_readable = sk_msg_is_readable; + prot[TCP_BPF_BASE].ioctl = tcp_bpf_ioctl;
prot[TCP_BPF_TX] = prot[TCP_BPF_BASE]; prot[TCP_BPF_TX].sendmsg = tcp_bpf_sendmsg; diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c index 0735d820e413..7928bec7a53c 100644 --- a/net/ipv4/udp_bpf.c +++ b/net/ipv4/udp_bpf.c @@ -5,6 +5,7 @@ #include <net/sock.h> #include <net/udp.h> #include <net/inet_common.h> +#include <asm/ioctls.h>
#include "udp_impl.h"
@@ -111,12 +112,28 @@ enum { static DEFINE_SPINLOCK(udpv6_prot_lock); static struct proto udp_bpf_prots[UDP_BPF_NUM_PROTS];
+static int udp_bpf_ioctl(struct sock *sk, int cmd, int *karg) +{ + /* we only care about FIONREAD */ + if (cmd != SIOCINQ) + return tcp_ioctl(sk, cmd, karg); + + /* works similar as udp_ioctl. + * man udp(7): "FIONREAD (SIOCINQ): Returns the size of the next + * pending datagram in the integer in bytes, or 0 when no datagram + * is pending." + */ + *karg = sk_msg_first_length(sk); + return 0; +} + static void udp_bpf_rebuild_protos(struct proto *prot, const struct proto *base) { - *prot = *base; - prot->close = sock_map_close; - prot->recvmsg = udp_bpf_recvmsg; - prot->sock_is_readable = sk_msg_is_readable; + *prot = *base; + prot->close = sock_map_close; + prot->recvmsg = udp_bpf_recvmsg; + prot->sock_is_readable = sk_msg_is_readable; + prot->ioctl = udp_bpf_ioctl; }
static void udp_bpf_check_v6_needs_rebuild(struct proto *ops)
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 0323a2b6c..49069b519 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h
[ ... ]
diff --git a/net/core/skmsg.c b/net/core/skmsg.c index d73e03f77..4390deaeb 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c
[ ... ]
@@ -820,8 +821,11 @@ static void __sk_psock_purge_ingress_msg(struct sk_psock *psock) if (!msg->skb) atomic_sub(msg->sg.size, &psock->sk->sk_rmem_alloc); sk_msg_free(psock->sk, msg);
kfree(msg); }sk_psock_inc_msg_size(psock, -msg->sg.size);- WARN_ON_ONCE(psock->ingress_size);
}
Can msg->sg.size be read after sk_msg_free()? Looking at sk_msg_free():
int sk_msg_free(struct sock *sk, struct sk_msg *msg) { return __sk_msg_free(sk, msg, msg->sg.start, true); }
And __sk_msg_free() at net/core/skmsg.c:207:
static int __sk_msg_free(struct sock *sk, struct sk_msg *msg, u32 i, bool charge) { ... while (msg->sg.size) { msg->sg.size -= sge->length; ... } consume_skb(msg->skb); sk_msg_init(msg); // <-- this zeros the entire msg structure return freed; }
And sk_msg_init() at include/linux/skmsg.h:193:
static inline void sk_msg_init(struct sk_msg *msg) { BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != NR_MSG_FRAG_IDS); memset(msg, 0, sizeof(*msg)); // <-- zeros everything sg_init_marker(msg->sg.data, NR_MSG_FRAG_IDS); }
After sk_msg_free() completes, msg->sg.size is zero, so the line "sk_psock_inc_msg_size(psock, -msg->sg.size)" subtracts 0 instead of the actual message size. This means ingress_size will never be decremented during cleanup, and WARN_ON_ONCE(psock->ingress_size) will trigger whenever the ingress_msg queue contained messages.
The fix would be to save msg->sg.size before calling sk_msg_free():
u32 msg_size = msg->sg.size; if (!msg->skb) atomic_sub(msg_size, &psock->sk->sk_rmem_alloc); sk_msg_free(psock->sk, msg); sk_psock_inc_msg_size(psock, -msg_size); kfree(msg);
[ ... ]
--- AI reviewed your patch. Please fix the bug or email reply why it's not a bug. See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/19558612316
This commit adds two new test functions: one to reproduce the bug reported by syzkaller [1], and another to cover the calculation of copied_seq.
The tests primarily involve installing and uninstalling sockmap on sockets, then reading data to verify proper functionality.
Additionally, extend the do_test_sockmap_skb_verdict_fionread() function to support UDP FIONREAD testing.
[1] https://syzkaller.appspot.com/bug?extid=06dbd397158ec0ea4983
Signed-off-by: Jiayuan Chen jiayuan.chen@linux.dev --- .../selftests/bpf/prog_tests/sockmap_basic.c | 203 +++++++++++++++++- .../bpf/progs/test_sockmap_pass_prog.c | 8 + 2 files changed, 205 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c index 1e3e4392dcca..8828f2b2f761 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c @@ -1,7 +1,8 @@ // SPDX-License-Identifier: GPL-2.0 // Copyright (c) 2020 Cloudflare #include <error.h> -#include <netinet/tcp.h> +#include <linux/tcp.h> +#include <linux/socket.h> #include <sys/epoll.h>
#include "test_progs.h" @@ -22,6 +23,16 @@ #define TCP_REPAIR_ON 1 #define TCP_REPAIR_OFF_NO_WP -1 /* Turn off without window probes */
+/** + * SOL_TCP is defined in <netinet/tcp.h> while field + * copybuf_address of tcp_zerocopy_receive is not in it + * Although glibc has merged my patch to sync headers, + * the fix will take time to propagate, hence this workaround. + */ +#ifndef SOL_TCP +#define SOL_TCP 6 +#endif + static int connected_socket_v4(void) { struct sockaddr_in addr = { @@ -536,13 +547,14 @@ static void test_sockmap_skb_verdict_shutdown(void) }
-static void test_sockmap_skb_verdict_fionread(bool pass_prog) +static void do_test_sockmap_skb_verdict_fionread(int sotype, bool pass_prog) { int err, map, verdict, c0 = -1, c1 = -1, p0 = -1, p1 = -1; int expected, zero = 0, sent, recvd, avail; struct test_sockmap_pass_prog *pass = NULL; struct test_sockmap_drop_prog *drop = NULL; char buf[256] = "0123456789"; + int split_len = sizeof(buf) / 2;
if (pass_prog) { pass = test_sockmap_pass_prog__open_and_load(); @@ -550,7 +562,10 @@ static void test_sockmap_skb_verdict_fionread(bool pass_prog) return; verdict = bpf_program__fd(pass->progs.prog_skb_verdict); map = bpf_map__fd(pass->maps.sock_map_rx); - expected = sizeof(buf); + if (sotype == SOCK_DGRAM) + expected = split_len; /* FIONREAD for UDP is different from TCP */ + else + expected = sizeof(buf); } else { drop = test_sockmap_drop_prog__open_and_load(); if (!ASSERT_OK_PTR(drop, "open_and_load")) @@ -566,7 +581,7 @@ static void test_sockmap_skb_verdict_fionread(bool pass_prog) if (!ASSERT_OK(err, "bpf_prog_attach")) goto out;
- err = create_socket_pairs(AF_INET, SOCK_STREAM, &c0, &c1, &p0, &p1); + err = create_socket_pairs(AF_INET, sotype, &c0, &c1, &p0, &p1); if (!ASSERT_OK(err, "create_socket_pairs()")) goto out;
@@ -574,8 +589,9 @@ static void test_sockmap_skb_verdict_fionread(bool pass_prog) if (!ASSERT_OK(err, "bpf_map_update_elem(c1)")) goto out_close;
- sent = xsend(p1, &buf, sizeof(buf), 0); - ASSERT_EQ(sent, sizeof(buf), "xsend(p0)"); + sent = xsend(p1, &buf, split_len, 0); + sent += xsend(p1, &buf, sizeof(buf) - split_len, 0); + ASSERT_EQ(sent, sizeof(buf), "xsend(p1)"); err = ioctl(c1, FIONREAD, &avail); ASSERT_OK(err, "ioctl(FIONREAD) error"); ASSERT_EQ(avail, expected, "ioctl(FIONREAD)"); @@ -597,6 +613,12 @@ static void test_sockmap_skb_verdict_fionread(bool pass_prog) test_sockmap_drop_prog__destroy(drop); }
+static void test_sockmap_skb_verdict_fionread(bool pass_prog) +{ + do_test_sockmap_skb_verdict_fionread(SOCK_STREAM, pass_prog); + do_test_sockmap_skb_verdict_fionread(SOCK_DGRAM, pass_prog); +} + static void test_sockmap_skb_verdict_change_tail(void) { struct test_sockmap_change_tail *skel; @@ -1042,6 +1064,169 @@ static void test_sockmap_vsock_unconnected(void) xclose(map); }
+/* it used to reproduce WARNING */ +static void test_sockmap_zc(void) +{ + int map, err, sent, recvd, zero = 0, one = 1, on = 1; + char buf[10] = "0123456789", rcv[11], addr[100]; + struct test_sockmap_pass_prog *skel = NULL; + int c0 = -1, p0 = -1, c1 = -1, p1 = -1; + struct tcp_zerocopy_receive zc; + socklen_t zc_len = sizeof(zc); + struct bpf_program *prog; + + skel = test_sockmap_pass_prog__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open_and_load")) + return; + + if (create_socket_pairs(AF_INET, SOCK_STREAM, &c0, &c1, &p0, &p1)) + goto end; + + prog = skel->progs.prog_skb_verdict_ingress; + map = bpf_map__fd(skel->maps.sock_map_rx); + + err = bpf_prog_attach(bpf_program__fd(prog), map, BPF_SK_SKB_STREAM_VERDICT, 0); + if (!ASSERT_OK(err, "bpf_prog_attach")) + goto end; + + err = bpf_map_update_elem(map, &zero, &p0, BPF_ANY); + if (!ASSERT_OK(err, "bpf_map_update_elem")) + goto end; + + err = bpf_map_update_elem(map, &one, &p1, BPF_ANY); + if (!ASSERT_OK(err, "bpf_map_update_elem")) + goto end; + + sent = xsend(c0, buf, sizeof(buf), 0); + if (!ASSERT_EQ(sent, sizeof(buf), "xsend")) + goto end; + + /* trigger tcp_bpf_recvmsg_parser and inc copied_seq of p1 */ + recvd = recv_timeout(p1, rcv, sizeof(rcv), MSG_DONTWAIT, 1); + if (!ASSERT_EQ(recvd, sent, "recv_timeout(p1)")) + goto end; + + /* uninstall sockmap of p1 */ + bpf_map_delete_elem(map, &one); + + /* trigger tcp stack and the rcv_nxt of p1 is less than copied_seq */ + sent = xsend(c1, buf, sizeof(buf) - 1, 0); + if (!ASSERT_EQ(sent, sizeof(buf) - 1, "xsend")) + goto end; + + err = setsockopt(p1, SOL_SOCKET, SO_ZEROCOPY, &on, sizeof(on)); + if (!ASSERT_OK(err, "setsockopt")) + goto end; + + memset(&zc, 0, sizeof(zc)); + zc.copybuf_address = (__u64)((unsigned long)addr); + zc.copybuf_len = sizeof(addr); + + err = getsockopt(p1, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, &zc, &zc_len); + if (!ASSERT_OK(err, "getsockopt")) + goto end; + +end: + if (c0 >= 0) + close(c0); + if (p0 >= 0) + close(p0); + if (c1 >= 0) + close(c1); + if (p1 >= 0) + close(p1); + test_sockmap_pass_prog__destroy(skel); +} + +/* it used to check whether copied_seq of sk is correct */ +static void test_sockmap_copied_seq(bool strp) +{ + int i, map, err, sent, recvd, zero = 0, one = 1; + struct test_sockmap_pass_prog *skel = NULL; + int c0 = -1, p0 = -1, c1 = -1, p1 = -1; + char buf[10] = "0123456789", rcv[11]; + struct bpf_program *prog; + + skel = test_sockmap_pass_prog__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open_and_load")) + return; + + if (create_socket_pairs(AF_INET, SOCK_STREAM, &c0, &c1, &p0, &p1)) + goto end; + + prog = skel->progs.prog_skb_verdict_ingress; + map = bpf_map__fd(skel->maps.sock_map_rx); + + err = bpf_prog_attach(bpf_program__fd(prog), map, BPF_SK_SKB_STREAM_VERDICT, 0); + if (!ASSERT_OK(err, "bpf_prog_attach verdict")) + goto end; + + if (strp) { + prog = skel->progs.prog_skb_parser; + err = bpf_prog_attach(bpf_program__fd(prog), map, BPF_SK_SKB_STREAM_PARSER, 0); + if (!ASSERT_OK(err, "bpf_prog_attach parser")) + goto end; + } + + err = bpf_map_update_elem(map, &zero, &p0, BPF_ANY); + if (!ASSERT_OK(err, "bpf_map_update_elem(p0)")) + goto end; + + err = bpf_map_update_elem(map, &one, &p1, BPF_ANY); + if (!ASSERT_OK(err, "bpf_map_update_elem(p1)")) + goto end; + + /* just trigger sockamp: data sent by c0 will be received by p1 */ + sent = xsend(c0, buf, sizeof(buf), 0); + if (!ASSERT_EQ(sent, sizeof(buf), "xsend(c0), bpf")) + goto end; + + recvd = recv_timeout(p1, rcv, sizeof(rcv), MSG_DONTWAIT, 1); + if (!ASSERT_EQ(recvd, sent, "recv_timeout(p1), bpf")) + goto end; + + /* uninstall sockmap of p1 and p0 */ + err = bpf_map_delete_elem(map, &one); + if (!ASSERT_OK(err, "bpf_map_delete_elem(1)")) + goto end; + + err = bpf_map_delete_elem(map, &zero); + if (!ASSERT_OK(err, "bpf_map_delete_elem(0)")) + goto end; + + /* now all sockets become plain socket, they should still work */ + for (i = 0; i < 5; i++) { + /* test copied_seq of p1 by running tcp native stack */ + sent = xsend(c1, buf, sizeof(buf), 0); + if (!ASSERT_EQ(sent, sizeof(buf), "xsend(c1), native")) + goto end; + + recvd = recv(p1, rcv, sizeof(rcv), MSG_DONTWAIT); + if (!ASSERT_EQ(recvd, sent, "recv_timeout(p1), native")) + goto end; + + /* p0 previously redirected skb to p1, we also check copied_seq of p0 */ + sent = xsend(c0, buf, sizeof(buf), 0); + if (!ASSERT_EQ(sent, sizeof(buf), "xsend(c0), native")) + goto end; + + recvd = recv(p0, rcv, sizeof(rcv), MSG_DONTWAIT); + if (!ASSERT_EQ(recvd, sent, "recv_timeout(p0), native")) + goto end; + } + +end: + if (c0 >= 0) + close(c0); + if (p0 >= 0) + close(p0); + if (c1 >= 0) + close(c1); + if (p1 >= 0) + close(p1); + test_sockmap_pass_prog__destroy(skel); +} + void test_sockmap_basic(void) { if (test__start_subtest("sockmap create_update_free")) @@ -1108,4 +1293,10 @@ void test_sockmap_basic(void) test_sockmap_skb_verdict_vsock_poll(); if (test__start_subtest("sockmap vsock unconnected")) test_sockmap_vsock_unconnected(); + if (test__start_subtest("sockmap with zc")) + test_sockmap_zc(); + if (test__start_subtest("sockmap recover")) + test_sockmap_copied_seq(false); + if (test__start_subtest("sockmap recover with strp")) + test_sockmap_copied_seq(true); } diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c b/tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c index 69aacc96db36..085d7e61e78d 100644 --- a/tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c +++ b/tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c @@ -44,4 +44,12 @@ int prog_skb_parser(struct __sk_buff *skb) return SK_PASS; }
+SEC("sk_skb/stream_verdict") +int prog_skb_verdict_ingress(struct __sk_buff *skb) +{ + int one = 1; + + return bpf_sk_redirect_map(skb, &sock_map_rx, one, BPF_F_INGRESS); +} + char _license[] SEC("license") = "GPL";
syzbot ci has tested the following series
[v2] bpf: Fix FIONREAD and copied_seq issues https://lore.kernel.org/all/20251121030013.60133-1-jiayuan.chen@linux.dev * [PATCH bpf-next v2 1/3] bpf, sockmap: Fix incorrect copied_seq calculation * [PATCH bpf-next v2 2/3] bpf, sockmap: Fix FIONREAD for sockmap * [PATCH bpf-next v2 3/3] bpf, selftest: Add tests for FIONREAD and copied_seq
and found the following issues: * KASAN: slab-out-of-bounds Read in tcp_ioctl * KASAN: slab-use-after-free Read in tcp_ioctl * SYZFAIL: failed to recv rpc * WARNING in sk_psock_destroy
Full report is available here: https://ci.syzbot.org/series/2dc8b7c6-6074-4e9e-a56e-dc5f34e678db
***
KASAN: slab-out-of-bounds Read in tcp_ioctl
tree: bpf-next URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next.git base: 4722981cca373a338bbcf3a93ecf7144a892b03b arch: amd64 compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8 config: https://ci.syzbot.org/builds/ffa9b2b0-ceb4-4d19-bc45-58475d561b3b/config syz repro: https://ci.syzbot.org/findings/5fbc614a-9eb9-426b-96c5-1f69b6874e56/syz_repr...
================================================================== BUG: KASAN: slab-out-of-bounds in tcp_ioctl+0x673/0x860 net/ipv4/tcp.c:659 Read of size 2 at addr ffff8881114d5df6 by task syz.2.19/6006
CPU: 0 UID: 0 PID: 6006 Comm: syz.2.19 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xca/0x240 mm/kasan/report.c:482 kasan_report+0x118/0x150 mm/kasan/report.c:595 tcp_ioctl+0x673/0x860 net/ipv4/tcp.c:659 sock_ioctl_out net/core/sock.c:4392 [inline] sk_ioctl+0x3c7/0x600 net/core/sock.c:4420 inet_ioctl+0x416/0x4c0 net/ipv4/af_inet.c:1007 sock_do_ioctl+0xdc/0x300 net/socket.c:1254 sock_ioctl+0x576/0x790 net/socket.c:1375 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f472558f6c9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f472647a038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f47257e5fa0 RCX: 00007f472558f6c9 RDX: 0000000000000000 RSI: 0000000000008905 RDI: 0000000000000004 RBP: 00007f4725611f91 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f47257e6038 R14: 00007f47257e5fa0 R15: 00007ffcbb3f8478 </TASK>
The buggy address belongs to the object at ffff8881114d5d80 which belongs to the cache UDP of size 1984 The buggy address is located 118 bytes inside of allocated 1984-byte region [ffff8881114d5d80, ffff8881114d6540)
The buggy address belongs to the physical page: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1114d0 head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 memcg:ffff88811112a501 flags: 0x17ff00000000040(head|node=0|zone=2|lastcpupid=0x7ff) page_type: f5(slab) raw: 017ff00000000040 ffff888105697000 dead000000000122 0000000000000000 raw: 0000000000000000 00000000800f000f 00000000f5000000 ffff88811112a501 head: 017ff00000000040 ffff888105697000 dead000000000122 0000000000000000 head: 0000000000000000 00000000800f000f 00000000f5000000 ffff88811112a501 head: 017ff00000000003 ffffea0004453401 00000000ffffffff 00000000ffffffff head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5847, tgid 5847 (syz-executor), ts 60144423433, free_ts 60089454561 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x240/0x2a0 mm/page_alloc.c:1850 prep_new_page mm/page_alloc.c:1858 [inline] get_page_from_freelist+0x2365/0x2440 mm/page_alloc.c:3884 __alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5183 alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2416 alloc_slab_page mm/slub.c:3059 [inline] allocate_slab+0x96/0x350 mm/slub.c:3232 new_slab mm/slub.c:3286 [inline] ___slab_alloc+0xf56/0x1990 mm/slub.c:4655 __slab_alloc+0x65/0x100 mm/slub.c:4778 __slab_alloc_node mm/slub.c:4854 [inline] slab_alloc_node mm/slub.c:5276 [inline] kmem_cache_alloc_noprof+0x3f9/0x6e0 mm/slub.c:5295 sk_prot_alloc+0x57/0x220 net/core/sock.c:2233 sk_alloc+0x3a/0x370 net/core/sock.c:2295 inet_create+0x7a0/0x1000 net/ipv4/af_inet.c:328 __sock_create+0x4b3/0x9f0 net/socket.c:1605 udp_sock_create4+0xbe/0x4b0 net/ipv4/udp_tunnel_core.c:19 udp_sock_create include/net/udp_tunnel.h:59 [inline] wg_socket_init+0x4e5/0xa60 drivers/net/wireguard/socket.c:387 wg_open+0x24f/0x420 drivers/net/wireguard/device.c:51 __dev_open+0x470/0x880 net/core/dev.c:1682 page last free pid 5847 tgid 5847 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1394 [inline] __free_frozen_pages+0xbc4/0xd30 mm/page_alloc.c:2906 discard_slab mm/slub.c:3330 [inline] __put_partials+0x146/0x170 mm/slub.c:3876 put_cpu_partial+0x1f2/0x2e0 mm/slub.c:3951 __slab_free+0x2b9/0x390 mm/slub.c:5929 qlink_free mm/kasan/quarantine.c:163 [inline] qlist_free_all+0x97/0x140 mm/kasan/quarantine.c:179 kasan_quarantine_reduce+0x148/0x160 mm/kasan/quarantine.c:286 __kasan_slab_alloc+0x22/0x80 mm/kasan/common.c:352 kasan_slab_alloc include/linux/kasan.h:252 [inline] slab_post_alloc_hook mm/slub.c:4978 [inline] slab_alloc_node mm/slub.c:5288 [inline] __do_kmalloc_node mm/slub.c:5649 [inline] __kmalloc_noprof+0x3c3/0x7f0 mm/slub.c:5662 kmalloc_noprof include/linux/slab.h:961 [inline] kzalloc_noprof include/linux/slab.h:1094 [inline] fib_create_info+0x172d/0x3210 net/ipv4/fib_semantics.c:1402 fib_table_insert+0xc6/0x1b50 net/ipv4/fib_trie.c:1212 fib_magic+0x2c4/0x390 net/ipv4/fib_frontend.c:1134 fib_add_ifaddr+0x144/0x5f0 net/ipv4/fib_frontend.c:1156 fib_netdev_event+0x382/0x490 net/ipv4/fib_frontend.c:1516 notifier_call_chain+0x1b6/0x3e0 kernel/notifier.c:85 call_netdevice_notifiers_extack net/core/dev.c:2267 [inline] call_netdevice_notifiers net/core/dev.c:2281 [inline] __dev_notify_flags+0x18d/0x2e0 net/core/dev.c:-1 netif_change_flags+0xe8/0x1a0 net/core/dev.c:9705
Memory state around the buggy address: ffff8881114d5c80: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc ffff8881114d5d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8881114d5d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff8881114d5e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8881114d5e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ==================================================================
***
KASAN: slab-use-after-free Read in tcp_ioctl
tree: bpf-next URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next.git base: 4722981cca373a338bbcf3a93ecf7144a892b03b arch: amd64 compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8 config: https://ci.syzbot.org/builds/ffa9b2b0-ceb4-4d19-bc45-58475d561b3b/config C repro: https://ci.syzbot.org/findings/956fd3c8-f06a-44eb-b0ff-510a6c8b378d/c_repro syz repro: https://ci.syzbot.org/findings/956fd3c8-f06a-44eb-b0ff-510a6c8b378d/syz_repr...
================================================================== BUG: KASAN: slab-use-after-free in tcp_ioctl+0x673/0x860 net/ipv4/tcp.c:659 Read of size 2 at addr ffff888101fe3bf6 by task syz.0.17/5959
CPU: 1 UID: 0 PID: 5959 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xca/0x240 mm/kasan/report.c:482 kasan_report+0x118/0x150 mm/kasan/report.c:595 tcp_ioctl+0x673/0x860 net/ipv4/tcp.c:659 sock_ioctl_out net/core/sock.c:4392 [inline] sk_ioctl+0x3c7/0x600 net/core/sock.c:4420 inet_ioctl+0x416/0x4c0 net/ipv4/af_inet.c:1007 sock_do_ioctl+0xdc/0x300 net/socket.c:1254 sock_ioctl+0x576/0x790 net/socket.c:1375 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f4a1458f6c9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f4a1545b038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f4a147e5fa0 RCX: 00007f4a1458f6c9 RDX: 0000000000000000 RSI: 0000000000008905 RDI: 0000000000000004 RBP: 00007f4a14611f91 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f4a147e6038 R14: 00007f4a147e5fa0 R15: 00007ffef33173a8 </TASK>
Allocated by task 5926: kasan_save_stack mm/kasan/common.c:56 [inline] kasan_save_track+0x3e/0x80 mm/kasan/common.c:77 unpoison_slab_object mm/kasan/common.c:342 [inline] __kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:368 kasan_slab_alloc include/linux/kasan.h:252 [inline] slab_post_alloc_hook mm/slub.c:4978 [inline] slab_alloc_node mm/slub.c:5288 [inline] kmem_cache_alloc_noprof+0x367/0x6e0 mm/slub.c:5295 sk_prot_alloc+0x57/0x220 net/core/sock.c:2233 sk_alloc+0x3a/0x370 net/core/sock.c:2295 inet_create+0x7a0/0x1000 net/ipv4/af_inet.c:328 __sock_create+0x4b3/0x9f0 net/socket.c:1605 sock_create net/socket.c:1663 [inline] __sys_socket_create net/socket.c:1700 [inline] __sys_socket+0xd7/0x1b0 net/socket.c:1747 __do_sys_socket net/socket.c:1761 [inline] __se_sys_socket net/socket.c:1759 [inline] __x64_sys_socket+0x7a/0x90 net/socket.c:1759 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 5926: kasan_save_stack mm/kasan/common.c:56 [inline] kasan_save_track+0x3e/0x80 mm/kasan/common.c:77 __kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:587 kasan_save_free_info mm/kasan/kasan.h:406 [inline] poison_slab_object mm/kasan/common.c:252 [inline] __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:284 kasan_slab_free include/linux/kasan.h:234 [inline] slab_free_hook mm/slub.c:2543 [inline] slab_free mm/slub.c:6642 [inline] kmem_cache_free+0x19b/0x690 mm/slub.c:6752 sk_prot_free net/core/sock.c:2276 [inline] __sk_destruct+0x4d2/0x660 net/core/sock.c:2373 inet_release+0x144/0x190 net/ipv4/af_inet.c:437 __sock_release net/socket.c:662 [inline] sock_close+0xc3/0x240 net/socket.c:1455 __fput+0x44c/0xa70 fs/file_table.c:468 fput_close_sync+0x119/0x200 fs/file_table.c:573 __do_sys_close fs/open.c:1589 [inline] __se_sys_close fs/open.c:1574 [inline] __x64_sys_close+0x7f/0x110 fs/open.c:1574 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f
The buggy address belongs to the object at ffff888101fe3b80 which belongs to the cache UDP of size 1984 The buggy address is located 118 bytes inside of freed 1984-byte region [ffff888101fe3b80, ffff888101fe4340)
The buggy address belongs to the physical page: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x101fe0 head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 memcg:ffff8881086b4301 anon flags: 0x17ff00000000040(head|node=0|zone=2|lastcpupid=0x7ff) page_type: f5(slab) raw: 017ff00000000040 ffff888160ef5b40 0000000000000000 0000000000000001 raw: 0000000000000000 00000000800f000f 00000000f5000000 ffff8881086b4301 head: 017ff00000000040 ffff888160ef5b40 0000000000000000 0000000000000001 head: 0000000000000000 00000000800f000f 00000000f5000000 ffff8881086b4301 head: 017ff00000000003 ffffea000407f801 00000000ffffffff 00000000ffffffff head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 1, tgid 1 (swapper/0), ts 3581320107, free_ts 0 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x240/0x2a0 mm/page_alloc.c:1850 prep_new_page mm/page_alloc.c:1858 [inline] get_page_from_freelist+0x2365/0x2440 mm/page_alloc.c:3884 __alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5183 alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2416 alloc_slab_page mm/slub.c:3059 [inline] allocate_slab+0x96/0x350 mm/slub.c:3232 new_slab mm/slub.c:3286 [inline] ___slab_alloc+0xf56/0x1990 mm/slub.c:4655 __slab_alloc+0x65/0x100 mm/slub.c:4778 __slab_alloc_node mm/slub.c:4854 [inline] slab_alloc_node mm/slub.c:5276 [inline] kmem_cache_alloc_noprof+0x3f9/0x6e0 mm/slub.c:5295 sk_prot_alloc+0x57/0x220 net/core/sock.c:2233 sk_alloc+0x3a/0x370 net/core/sock.c:2295 inet_create+0x7a0/0x1000 net/ipv4/af_inet.c:328 __sock_create+0x4b3/0x9f0 net/socket.c:1605 inet_ctl_sock_create+0x9a/0x220 net/ipv4/af_inet.c:1632 igmp_net_init+0xb9/0x150 net/ipv4/igmp.c:3125 ops_init+0x35c/0x5c0 net/core/net_namespace.c:137 __register_pernet_operations net/core/net_namespace.c:1314 [inline] register_pernet_operations+0x336/0x800 net/core/net_namespace.c:1391 page_owner free stack trace missing
Memory state around the buggy address: ffff888101fe3a80: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc ffff888101fe3b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888101fe3b80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff888101fe3c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888101fe3c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ==================================================================
***
SYZFAIL: failed to recv rpc
tree: bpf-next URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next.git base: 4722981cca373a338bbcf3a93ecf7144a892b03b arch: amd64 compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8 config: https://ci.syzbot.org/builds/ffa9b2b0-ceb4-4d19-bc45-58475d561b3b/config C repro: https://ci.syzbot.org/findings/f3ffc26d-6509-4e92-92bd-bfa118ae72e6/c_repro syz repro: https://ci.syzbot.org/findings/f3ffc26d-6509-4e92-92bd-bfa118ae72e6/syz_repr...
SYZFAIL: failed to recv rpc fd=3 want=4 recv=0 n=0 (errno 9: Bad file descriptor)
***
WARNING in sk_psock_destroy
tree: bpf-next URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next.git base: 4722981cca373a338bbcf3a93ecf7144a892b03b arch: amd64 compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8 config: https://ci.syzbot.org/builds/ffa9b2b0-ceb4-4d19-bc45-58475d561b3b/config C repro: https://ci.syzbot.org/findings/a13957c4-d51a-4472-ab31-37d57484d32e/c_repro syz repro: https://ci.syzbot.org/findings/a13957c4-d51a-4472-ab31-37d57484d32e/syz_repr...
------------[ cut here ]------------ WARNING: CPU: 0 PID: 5871 at net/core/skmsg.c:828 __sk_psock_purge_ingress_msg net/core/skmsg.c:828 [inline] WARNING: CPU: 0 PID: 5871 at net/core/skmsg.c:828 __sk_psock_zap_ingress net/core/skmsg.c:839 [inline] WARNING: CPU: 0 PID: 5871 at net/core/skmsg.c:828 sk_psock_destroy+0xa24/0xaa0 net/core/skmsg.c:871 Modules linked in: CPU: 0 UID: 0 PID: 5871 Comm: kworker/0:4 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Workqueue: events sk_psock_destroy RIP: 0010:__sk_psock_purge_ingress_msg net/core/skmsg.c:828 [inline] RIP: 0010:__sk_psock_zap_ingress net/core/skmsg.c:839 [inline] RIP: 0010:sk_psock_destroy+0xa24/0xaa0 net/core/skmsg.c:871 Code: e8 51 78 7c f8 85 ed 7e 29 e8 08 74 7c f8 48 89 df 48 83 c4 38 5b 41 5c 41 5d 41 5e 41 5f 5d e9 32 e7 d6 f8 e8 ed 73 7c f8 90 <0f> 0b 90 e9 3f fb ff ff e8 df 73 7c f8 4c 89 f7 be 03 00 00 00 e8 RSP: 0018:ffffc900044679f0 EFLAGS: 00010293 RAX: ffffffff894392d3 RBX: dffffc0000000000 RCX: ffff88810f218000 RDX: 0000000000000000 RSI: 0000000000022fe0 RDI: 0000000000000000 RBP: ffff88810ba8b000 R08: ffffffff8f7cee77 R09: 1ffffffff1ef9dce R10: dffffc0000000000 R11: fffffbfff1ef9dcf R12: ffff88810ba890c0 R13: dead000000000100 R14: 0000000000022fe0 R15: ffff88810ba890c0 FS: 0000000000000000(0000) GS:ffff88818eb38000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c0002a58f8 CR3: 000000000dd38000 CR4: 00000000000006f0 Call Trace: <TASK> process_one_work kernel/workqueue.c:3263 [inline] process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3346 worker_thread+0x8a0/0xda0 kernel/workqueue.c:3427 kthread+0x711/0x8a0 kernel/kthread.c:463 ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK>
***
If these findings have caused you to resend the series or submit a separate fix, please add the following tag to your commit message: Tested-by: syzbot@syzkaller.appspotmail.com
--- This report is generated by a bot. It may contain errors. syzbot ci engineers can be reached at syzkaller@googlegroups.com.
linux-kselftest-mirror@lists.linaro.org