As is described in the "How to use MPTCP?" section in MPTCP wiki [1]:
"Your app should create sockets with IPPROTO_MPTCP as the proto: ( socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); ). Legacy apps can be forced to create and use MPTCP sockets instead of TCP ones via the mptcpize command bundled with the mptcpd daemon."
But the mptcpize (LD_PRELOAD technique) command has some limitations [2]:
- it doesn't work if the application is not using libc (e.g. GoLang apps) - in some envs, it might not be easy to set env vars / change the way apps are launched, e.g. on Android - mptcpize needs to be launched with all apps that want MPTCP: we could have more control from BPF to enable MPTCP only for some apps or all the ones of a netns or a cgroup, etc. - it is not in BPF, we cannot talk about it at netdev conf.
So this patchset attempts to use BPF to implement functions similer to mptcpize.
The main idea is to add a hook in sys_socket() to change the protocol id from IPPROTO_TCP (or 0) to IPPROTO_MPTCP.
[1] https://github.com/multipath-tcp/mptcp_net-next/wiki [2] https://github.com/multipath-tcp/mptcp_net-next/issues/79
v12: - update diag_* log of update_socket_protocol. - add 'ip netns show' after 'ip netns del' to check if there is a test did not clean up its netns. - return libbpf_get_error() instead of -EIO for the error from open_and_load(). - Use getsockopt(SOL_PROTOCOL) to verify mptcp protocol intead of using 'ss -tOni'.
v11: - add comments about outputs of 'ss' and 'nstat'. - use "err = verify_mptcpify()" instead of using =+.
v10: - drop "#ifdef CONFIG_BPF_JIT". - include vmlinux.h and bpf_tracing_net.h to avoid defining some macros. - drop unneeded checks for mptcp.
v9: - update comment for 'update_socket_protocol'.
v8: - drop the additional checks on the 'protocol' value after the 'update_socket_protocol()' call.
v7: - add __weak and __diag_* for update_socket_protocol.
v6: - add update_socket_protocol.
v5: - add bpf_mptcpify helper.
v4: - use lsm_cgroup/socket_create
v3: - patch 8: char cmd[128]; -> char cmd[256];
v2: - Fix build selftests errors reported by CI
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/79
Geliang Tang (5): bpf: Add update_socket_protocol hook selftests/bpf: Use random netns name for mptcp selftests/bpf: Add two mptcp netns helpers selftests/bpf: Fix error checks of mptcp open_and_load selftests/bpf: Add mptcpify test
net/mptcp/bpf.c | 15 ++ net/socket.c | 26 +++- .../testing/selftests/bpf/prog_tests/mptcp.c | 146 +++++++++++++++--- tools/testing/selftests/bpf/progs/mptcpify.c | 20 +++ 4 files changed, 186 insertions(+), 21 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/mptcpify.c
Add a hook named update_socket_protocol in __sys_socket(), for bpf progs to attach to and update socket protocol. One user case is to force legacy TCP apps to create and use MPTCP sockets instead of TCP ones.
Define a mod_ret set named bpf_mptcp_fmodret_ids, add the hook update_socket_protocol into this set, and register it in bpf_mptcp_kfunc_init().
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/79 Acked-by: Matthieu Baerts matthieu.baerts@tessares.net Acked-by: Yonghong Song yonghong.song@linux.dev Signed-off-by: Geliang Tang geliang.tang@suse.com --- net/mptcp/bpf.c | 15 +++++++++++++++ net/socket.c | 26 +++++++++++++++++++++++++- 2 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index 5a0a84ad94af..8a16672b94e2 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -19,3 +19,18 @@ struct mptcp_sock *bpf_mptcp_sock_from_subflow(struct sock *sk)
return NULL; } + +BTF_SET8_START(bpf_mptcp_fmodret_ids) +BTF_ID_FLAGS(func, update_socket_protocol) +BTF_SET8_END(bpf_mptcp_fmodret_ids) + +static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = { + .owner = THIS_MODULE, + .set = &bpf_mptcp_fmodret_ids, +}; + +static int __init bpf_mptcp_kfunc_init(void) +{ + return register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set); +} +late_initcall(bpf_mptcp_kfunc_init); diff --git a/net/socket.c b/net/socket.c index 2b0e54b2405c..e9b2e754a103 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1644,12 +1644,36 @@ struct file *__sys_socket_file(int family, int type, int protocol) return sock_alloc_file(sock, flags, NULL); }
+/* A hook for bpf progs to attach to and update socket protocol. + * + * A static noinline declaration here could cause the compiler to + * optimize away the function. A global noinline declaration will + * keep the definition, but may optimize away the callsite. + * Therefore, __weak is needed to ensure that the call is still + * emitted, by telling the compiler that we don't know what the + * function might eventually be. + * + * __diag_* below are needed to dismiss the missing prototype warning. + */ + +__diag_push(); +__diag_ignore_all("-Wmissing-prototypes", + "A fmod_ret entry point for BPF programs"); + +__weak noinline int update_socket_protocol(int family, int type, int protocol) +{ + return protocol; +} + +__diag_pop(); + int __sys_socket(int family, int type, int protocol) { struct socket *sock; int flags;
- sock = __sys_socket_create(family, type, protocol); + sock = __sys_socket_create(family, type, + update_socket_protocol(family, type, protocol)); if (IS_ERR(sock)) return PTR_ERR(sock);
When running mptcp tests simultaneously, it fails sometimes with "Cannot create namespace file "/var/run/netns/mptcp_ns": File exists" errors. So this patch uses rand() to generate a random netns name instead of using the fixed name "mptcp_ns" for every test.
Add "ip netns show" after "ip netns del" to check if there is a test did not clean up its netns
By doing that, we can re-launch the test even if there was an issue removing the previous netns or if by accident, a netns with this generic name already existed on the system.
Note that using a different name each will also help adding more subtests in future commits.
Acked-by: Yonghong Song yonghong.song@linux.dev Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Geliang Tang geliang.tang@suse.com --- tools/testing/selftests/bpf/prog_tests/mptcp.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index cd0c42fff7c0..b2d41024c6c2 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -7,7 +7,7 @@ #include "network_helpers.h" #include "mptcp_sock.skel.h"
-#define NS_TEST "mptcp_ns" +char NS_TEST[32];
#ifndef TCP_CA_NAME_MAX #define TCP_CA_NAME_MAX 16 @@ -147,6 +147,8 @@ static void test_base(void) if (!ASSERT_GE(cgroup_fd, 0, "test__join_cgroup")) return;
+ srand(time(NULL)); + snprintf(NS_TEST, sizeof(NS_TEST), "mptcp_ns_%d", rand()); SYS(fail, "ip netns add %s", NS_TEST); SYS(fail, "ip -net %s link set dev lo up", NS_TEST);
@@ -178,6 +180,7 @@ static void test_base(void) close_netns(nstoken);
SYS_NOFAIL("ip netns del " NS_TEST " &> /dev/null"); + SYS_NOFAIL("ip netns show %s", NS_TEST);
close(cgroup_fd); }
Add two netns helpers for mptcp tests: create_netns() and cleanup_netns(). Use them in test_base().
These new helpers will be re-used in the following commits introducing new tests.
Acked-by: Yonghong Song yonghong.song@linux.dev Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Geliang Tang geliang.tang@suse.com --- .../testing/selftests/bpf/prog_tests/mptcp.c | 37 ++++++++++++------- 1 file changed, 24 insertions(+), 13 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index b2d41024c6c2..51cbd1e14156 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -22,6 +22,27 @@ struct mptcp_storage { char ca_name[TCP_CA_NAME_MAX]; };
+static struct nstoken *create_netns(void) +{ + srand(time(NULL)); + snprintf(NS_TEST, sizeof(NS_TEST), "mptcp_ns_%d", rand()); + SYS(fail, "ip netns add %s", NS_TEST); + SYS(fail, "ip -net %s link set dev lo up", NS_TEST); + + return open_netns(NS_TEST); +fail: + return NULL; +} + +static void cleanup_netns(struct nstoken *nstoken) +{ + if (nstoken) + close_netns(nstoken); + + SYS_NOFAIL("ip netns del %s &> /dev/null", NS_TEST); + SYS_NOFAIL("ip netns show %s", NS_TEST); +} + static int verify_tsk(int map_fd, int client_fd) { int err, cfd = client_fd; @@ -147,13 +168,8 @@ static void test_base(void) if (!ASSERT_GE(cgroup_fd, 0, "test__join_cgroup")) return;
- srand(time(NULL)); - snprintf(NS_TEST, sizeof(NS_TEST), "mptcp_ns_%d", rand()); - SYS(fail, "ip netns add %s", NS_TEST); - SYS(fail, "ip -net %s link set dev lo up", NS_TEST); - - nstoken = open_netns(NS_TEST); - if (!ASSERT_OK_PTR(nstoken, "open_netns")) + nstoken = create_netns(); + if (!ASSERT_OK_PTR(nstoken, "create_netns")) goto fail;
/* without MPTCP */ @@ -176,12 +192,7 @@ static void test_base(void) close(server_fd);
fail: - if (nstoken) - close_netns(nstoken); - - SYS_NOFAIL("ip netns del " NS_TEST " &> /dev/null"); - SYS_NOFAIL("ip netns show %s", NS_TEST); - + cleanup_netns(nstoken); close(cgroup_fd); }
Return libbpf_get_error(), instead of -EIO, for the error from mptcp_sock__open_and_load().
Load success means prog_fd and map_fd are always valid. So drop these unneeded ASSERT_GE checks for them in mptcp run_test().
Acked-by: Yonghong Song yonghong.song@linux.dev Signed-off-by: Geliang Tang geliang.tang@suse.com --- tools/testing/selftests/bpf/prog_tests/mptcp.c | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index 51cbd1e14156..8cbd72981a01 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -121,24 +121,14 @@ static int run_test(int cgroup_fd, int server_fd, bool is_mptcp)
sock_skel = mptcp_sock__open_and_load(); if (!ASSERT_OK_PTR(sock_skel, "skel_open_load")) - return -EIO; + return libbpf_get_error(sock_skel);
err = mptcp_sock__attach(sock_skel); if (!ASSERT_OK(err, "skel_attach")) goto out;
prog_fd = bpf_program__fd(sock_skel->progs._sockops); - if (!ASSERT_GE(prog_fd, 0, "bpf_program__fd")) { - err = -EIO; - goto out; - } - map_fd = bpf_map__fd(sock_skel->maps.socket_storage_map); - if (!ASSERT_GE(map_fd, 0, "bpf_map__fd")) { - err = -EIO; - goto out; - } - err = bpf_prog_attach(prog_fd, cgroup_fd, BPF_CGROUP_SOCK_OPS, 0); if (!ASSERT_OK(err, "bpf_prog_attach")) goto out;
Implement a new test program mptcpify: if the family is AF_INET or AF_INET6, the type is SOCK_STREAM, and the protocol ID is 0 or IPPROTO_TCP, set it to IPPROTO_MPTCP. It will be hooked in update_socket_protocol().
Extend the MPTCP test base, add a selftest test_mptcpify() for the mptcpify case. Open and load the mptcpify test prog to mptcpify the TCP sockets dynamically, then use start_server() and connect_to_fd() to create a TCP socket, but actually what's created is an MPTCP socket, which can be verified through 'getsockopt(SOL_PROTOCOL)' and 'nstat' commands.
Acked-by: Yonghong Song yonghong.song@linux.dev Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Geliang Tang geliang.tang@suse.com --- .../testing/selftests/bpf/prog_tests/mptcp.c | 102 ++++++++++++++++++ tools/testing/selftests/bpf/progs/mptcpify.c | 20 ++++ 2 files changed, 122 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/mptcpify.c
diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index 8cbd72981a01..16353a35d288 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -2,13 +2,19 @@ /* Copyright (c) 2020, Tessares SA. */ /* Copyright (c) 2022, SUSE. */
+#include <netinet/in.h> #include <test_progs.h> #include "cgroup_helpers.h" #include "network_helpers.h" #include "mptcp_sock.skel.h" +#include "mptcpify.skel.h"
char NS_TEST[32];
+#ifndef IPPROTO_MPTCP +#define IPPROTO_MPTCP 262 +#endif + #ifndef TCP_CA_NAME_MAX #define TCP_CA_NAME_MAX 16 #endif @@ -186,8 +192,104 @@ static void test_base(void) close(cgroup_fd); }
+static void send_byte(int fd) +{ + char b = 0x55; + + ASSERT_EQ(write(fd, &b, sizeof(b)), 1, "send single byte"); +} + +static int verify_mptcpify(int server_fd) +{ + socklen_t optlen; + char cmd[256]; + int protocol; + int err = 0; + + optlen = sizeof(protocol); + if (!ASSERT_OK(getsockopt(server_fd, SOL_SOCKET, SO_PROTOCOL, &protocol, &optlen), + "getsockopt(SOL_PROTOCOL)")) + return -1; + + if (!ASSERT_EQ(protocol, IPPROTO_MPTCP, "protocol isn't MPTCP")) + err++; + + /* Output of nstat: + * + * #kernel + * MPTcpExtMPCapableSYNACKRX 1 0.0 + */ + snprintf(cmd, sizeof(cmd), + "ip netns exec %s nstat -asz %s | awk '%s' | grep -q '%s'", + NS_TEST, "MPTcpExtMPCapableSYNACKRX", + "NR==1 {next} {print $2}", "1"); + if (!ASSERT_OK(system(cmd), "No MPTcpExtMPCapableSYNACKRX found!")) + err++; + + return err; +} + +static int run_mptcpify(int cgroup_fd) +{ + int server_fd, client_fd, err = 0; + struct mptcpify *mptcpify_skel; + + mptcpify_skel = mptcpify__open_and_load(); + if (!ASSERT_OK_PTR(mptcpify_skel, "skel_open_load")) + return libbpf_get_error(mptcpify_skel); + + err = mptcpify__attach(mptcpify_skel); + if (!ASSERT_OK(err, "skel_attach")) + goto out; + + /* without MPTCP */ + server_fd = start_server(AF_INET, SOCK_STREAM, NULL, 0, 0); + if (!ASSERT_GE(server_fd, 0, "start_server")) { + err = -EIO; + goto out; + } + + client_fd = connect_to_fd(server_fd, 0); + if (!ASSERT_GE(client_fd, 0, "connect to fd")) { + err = -EIO; + goto close_server; + } + + send_byte(client_fd); + err = verify_mptcpify(server_fd); + + close(client_fd); +close_server: + close(server_fd); +out: + mptcpify__destroy(mptcpify_skel); + return err; +} + +static void test_mptcpify(void) +{ + struct nstoken *nstoken = NULL; + int cgroup_fd; + + cgroup_fd = test__join_cgroup("/mptcpify"); + if (!ASSERT_GE(cgroup_fd, 0, "test__join_cgroup")) + return; + + nstoken = create_netns(); + if (!ASSERT_OK_PTR(nstoken, "create_netns")) + goto fail; + + ASSERT_OK(run_mptcpify(cgroup_fd), "run_mptcpify"); + +fail: + cleanup_netns(nstoken); + close(cgroup_fd); +} + void test_mptcp(void) { if (test__start_subtest("base")) test_base(); + if (test__start_subtest("mptcpify")) + test_mptcpify(); } diff --git a/tools/testing/selftests/bpf/progs/mptcpify.c b/tools/testing/selftests/bpf/progs/mptcpify.c new file mode 100644 index 000000000000..53301ae8a8f7 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/mptcpify.c @@ -0,0 +1,20 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2023, SUSE. */ + +#include "vmlinux.h" +#include <bpf/bpf_tracing.h> +#include "bpf_tracing_net.h" + +char _license[] SEC("license") = "GPL"; + +SEC("fmod_ret/update_socket_protocol") +int BPF_PROG(mptcpify, int family, int type, int protocol) +{ + if ((family == AF_INET || family == AF_INET6) && + type == SOCK_STREAM && + (!protocol || protocol == IPPROTO_TCP)) { + return IPPROTO_MPTCP; + } + + return protocol; +}
linux-kselftest-mirror@lists.linaro.org