As is described in the "How to use MPTCP?" section in MPTCP wiki [1]:
"Your app should create sockets with IPPROTO_MPTCP as the proto: ( socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); ). Legacy apps can be forced to create and use MPTCP sockets instead of TCP ones via the mptcpize command bundled with the mptcpd daemon."
But the mptcpize (LD_PRELOAD technique) command has some limitations [2]:
- it doesn't work if the application is not using libc (e.g. GoLang apps) - in some envs, it might not be easy to set env vars / change the way apps are launched, e.g. on Android - mptcpize needs to be launched with all apps that want MPTCP: we could have more control from BPF to enable MPTCP only for some apps or all the ones of a netns or a cgroup, etc. - it is not in BPF, we cannot talk about it at netdev conf.
So this patchset attempts to use BPF to implement functions similer to mptcpize.
The main idea is to add a hook in sys_socket() to change the protocol id from IPPROTO_TCP (or 0) to IPPROTO_MPTCP.
[1] https://github.com/multipath-tcp/mptcp_net-next/wiki [2] https://github.com/multipath-tcp/mptcp_net-next/issues/79
v7: - add __weak and __diag_* for update_socket_protocol.
v6: - add update_socket_protocol.
v5: - add bpf_mptcpify helper.
v4: - use lsm_cgroup/socket_create
v3: - patch 8: char cmd[128]; -> char cmd[256];
v2: - Fix build selftests errors reported by CI
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/79
Geliang Tang (6): net: socket: add update_socket_protocol hook bpf: Register mptcp modret set selftests/bpf: Add mptcpify program selftests/bpf: use random netns name for mptcp selftests/bpf: add two mptcp netns helpers selftests/bpf: Add mptcpify selftest
net/mptcp/bpf.c | 17 +++ net/socket.c | 26 ++++ .../testing/selftests/bpf/prog_tests/mptcp.c | 125 ++++++++++++++++-- tools/testing/selftests/bpf/progs/mptcpify.c | 25 ++++ 4 files changed, 184 insertions(+), 9 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/mptcpify.c
Add a hook named update_socket_protocol in __sys_socket(), for bpf progs to attach to and update socket protocol. One user case is to force legacy TCP apps to create and use MPTCP sockets instead of TCP ones.
Signed-off-by: Geliang Tang geliang.tang@suse.com --- net/socket.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/net/socket.c b/net/socket.c index 2b0e54b2405c..6837a517ac9f 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1644,11 +1644,37 @@ struct file *__sys_socket_file(int family, int type, int protocol) return sock_alloc_file(sock, flags, NULL); }
+/* + * A hook for bpf progs to attach to and update socket protocol. + * + * A static noinline declaration here could cause the compiler to optimize away + * the function. A global noinline declaration will keep the definition, but may + * optimize away the callsite. Therefore, __weak is needed to ensure that the + * call is still emitted, by telling the compiler that we don't know what the + * function might eventually be. + * + * __diag_* below are needed to dismiss the missing prototype warning. + */ +__diag_push(); +__diag_ignore_all("-Wmissing-prototypes", + "kfuncs which will be used in BPF programs"); + +__weak noinline int update_socket_protocol(int family, int type, int protocol) +{ + return protocol; +} + +__diag_pop(); + int __sys_socket(int family, int type, int protocol) { struct socket *sock; int flags;
+ protocol = update_socket_protocol(family, type, protocol); + if (protocol < 0 || protocol >= IPPROTO_MAX) + return -EINVAL; + sock = __sys_socket_create(family, type, protocol); if (IS_ERR(sock)) return PTR_ERR(sock);
This patch defines a mod_ret set named bpf_mptcp_fmodret_ids, Add update_socket_protocol into this set, and register it in bpf_mptcp_kfunc_init().
Signed-off-by: Geliang Tang geliang.tang@suse.com --- net/mptcp/bpf.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index 5a0a84ad94af..c43aee31014d 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -12,6 +12,23 @@ #include <linux/bpf.h> #include "protocol.h"
+#ifdef CONFIG_BPF_JIT +BTF_SET8_START(bpf_mptcp_fmodret_ids) +BTF_ID_FLAGS(func, update_socket_protocol) +BTF_SET8_END(bpf_mptcp_fmodret_ids) + +static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = { + .owner = THIS_MODULE, + .set = &bpf_mptcp_fmodret_ids, +}; + +static int __init bpf_mptcp_kfunc_init(void) +{ + return register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set); +} +late_initcall(bpf_mptcp_kfunc_init); +#endif /* CONFIG_BPF_JIT */ + struct mptcp_sock *bpf_mptcp_sock_from_subflow(struct sock *sk) { if (sk && sk_fullsock(sk) && sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk))
This patch implements a new test program mptcpify: if the family is AF_INET or AF_INET6, the type is SOCK_STREAM, and the protocol ID is 0 or IPPROTO_TCP, set it to IPPROTO_MPTCP.
It will be hooked in update_socket_protocol().
Signed-off-by: Geliang Tang geliang.tang@suse.com --- tools/testing/selftests/bpf/progs/mptcpify.c | 25 ++++++++++++++++++++ 1 file changed, 25 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/mptcpify.c
diff --git a/tools/testing/selftests/bpf/progs/mptcpify.c b/tools/testing/selftests/bpf/progs/mptcpify.c new file mode 100644 index 000000000000..9cf1febe982d --- /dev/null +++ b/tools/testing/selftests/bpf/progs/mptcpify.c @@ -0,0 +1,25 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2023, SUSE. */ + +#include <linux/bpf.h> +#include <bpf/bpf_tracing.h> + +char _license[] SEC("license") = "GPL"; + +#define AF_INET 2 +#define AF_INET6 10 +#define SOCK_STREAM 1 +#define IPPROTO_TCP 6 +#define IPPROTO_MPTCP 262 + +SEC("fmod_ret/update_socket_protocol") +int BPF_PROG(mptcpify, int family, int type, int protocol) +{ + if ((family == AF_INET || family == AF_INET6) && + type == SOCK_STREAM && + (!protocol || protocol == IPPROTO_TCP)) { + return IPPROTO_MPTCP; + } + + return protocol; +}
Use rand() to generate a random netns name instead of using the fixed name "mptcp_ns" for every test.
By doing that, we can re-launch the test even if there was an issue removing the previous netns or if by accident, a netns with this generic name already existed on the system.
Note that using a different name each will also help adding more subtests in future commits.
Signed-off-by: Geliang Tang geliang.tang@suse.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net --- tools/testing/selftests/bpf/prog_tests/mptcp.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index cd0c42fff7c0..4ccca3d39a8f 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -7,7 +7,7 @@ #include "network_helpers.h" #include "mptcp_sock.skel.h"
-#define NS_TEST "mptcp_ns" +char NS_TEST[32];
#ifndef TCP_CA_NAME_MAX #define TCP_CA_NAME_MAX 16 @@ -147,6 +147,8 @@ static void test_base(void) if (!ASSERT_GE(cgroup_fd, 0, "test__join_cgroup")) return;
+ srand(time(NULL)); + snprintf(NS_TEST, sizeof(NS_TEST), "mptcp_ns_%d", rand()); SYS(fail, "ip netns add %s", NS_TEST); SYS(fail, "ip -net %s link set dev lo up", NS_TEST);
Add two netns helpers for mptcp tests: create_netns() and cleanup_netns(). Use them in test_base().
These new helpers will be re-used in the following commits introducing new tests.
Signed-off-by: Geliang Tang geliang.tang@suse.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net --- .../testing/selftests/bpf/prog_tests/mptcp.c | 35 ++++++++++++------- 1 file changed, 23 insertions(+), 12 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index 4ccca3d39a8f..4407bd5c9e9a 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -22,6 +22,26 @@ struct mptcp_storage { char ca_name[TCP_CA_NAME_MAX]; };
+static struct nstoken *create_netns(void) +{ + srand(time(NULL)); + snprintf(NS_TEST, sizeof(NS_TEST), "mptcp_ns_%d", rand()); + SYS(fail, "ip netns add %s", NS_TEST); + SYS(fail, "ip -net %s link set dev lo up", NS_TEST); + + return open_netns(NS_TEST); +fail: + return NULL; +} + +static void cleanup_netns(struct nstoken *nstoken) +{ + if (nstoken) + close_netns(nstoken); + + SYS_NOFAIL("ip netns del %s &> /dev/null", NS_TEST); +} + static int verify_tsk(int map_fd, int client_fd) { int err, cfd = client_fd; @@ -147,13 +167,8 @@ static void test_base(void) if (!ASSERT_GE(cgroup_fd, 0, "test__join_cgroup")) return;
- srand(time(NULL)); - snprintf(NS_TEST, sizeof(NS_TEST), "mptcp_ns_%d", rand()); - SYS(fail, "ip netns add %s", NS_TEST); - SYS(fail, "ip -net %s link set dev lo up", NS_TEST); - - nstoken = open_netns(NS_TEST); - if (!ASSERT_OK_PTR(nstoken, "open_netns")) + nstoken = create_netns(); + if (!ASSERT_OK_PTR(nstoken, "create_netns")) goto fail;
/* without MPTCP */ @@ -176,11 +191,7 @@ static void test_base(void) close(server_fd);
fail: - if (nstoken) - close_netns(nstoken); - - SYS_NOFAIL("ip netns del " NS_TEST " &> /dev/null"); - + cleanup_netns(nstoken); close(cgroup_fd); }
This patch extends the MPTCP test base, add a selftest test_mptcpify() for the mptcpify case.
Open and load the mptcpify test prog to mptcpify the TCP sockets dynamically, then use start_server() and connect_to_fd() to create a TCP socket, but actually what's created is an MPTCP socket, which can be verified through the outputs of 'ss' and 'nstat' commands.
Signed-off-by: Geliang Tang geliang.tang@suse.com --- .../testing/selftests/bpf/prog_tests/mptcp.c | 94 +++++++++++++++++++ 1 file changed, 94 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index 4407bd5c9e9a..caab3aa6a162 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -6,6 +6,7 @@ #include "cgroup_helpers.h" #include "network_helpers.h" #include "mptcp_sock.skel.h" +#include "mptcpify.skel.h"
char NS_TEST[32];
@@ -195,8 +196,101 @@ static void test_base(void) close(cgroup_fd); }
+static void send_byte(int fd) +{ + char b = 0x55; + + ASSERT_EQ(write(fd, &b, sizeof(b)), 1, "send single byte"); +} + +static int verify_mptcpify(void) +{ + char cmd[256]; + int err = 0; + + snprintf(cmd, sizeof(cmd), + "ip netns exec %s ss -tOni | grep -q '%s'", + NS_TEST, "tcp-ulp-mptcp"); + if (!ASSERT_OK(system(cmd), "No tcp-ulp-mptcp found!")) + err++; + + snprintf(cmd, sizeof(cmd), + "ip netns exec %s nstat -asz %s | awk '%s' | grep -q '%s'", + NS_TEST, "MPTcpExtMPCapableSYNACKRX", + "NR==1 {next} {print $2}", "1"); + if (!ASSERT_OK(system(cmd), "No MPTcpExtMPCapableSYNACKRX found!")) + err++; + + return err; +} + +static int run_mptcpify(int cgroup_fd) +{ + int server_fd, client_fd, prog_fd, err = 0; + struct mptcpify *mptcpify_skel; + + mptcpify_skel = mptcpify__open_and_load(); + if (!ASSERT_OK_PTR(mptcpify_skel, "skel_open_load")) + return -EIO; + + err = mptcpify__attach(mptcpify_skel); + if (!ASSERT_OK(err, "skel_attach")) + goto out; + + prog_fd = bpf_program__fd(mptcpify_skel->progs.mptcpify); + if (!ASSERT_GE(prog_fd, 0, "bpf_program__fd")) { + err = -EIO; + goto out; + } + + /* without MPTCP */ + server_fd = start_server(AF_INET, SOCK_STREAM, NULL, 0, 0); + if (!ASSERT_GE(server_fd, 0, "start_server")) { + err = -EIO; + goto out; + } + + client_fd = connect_to_fd(server_fd, 0); + if (!ASSERT_GE(client_fd, 0, "connect to fd")) { + err = -EIO; + goto close_server; + } + + send_byte(client_fd); + err += verify_mptcpify(); + + close(client_fd); +close_server: + close(server_fd); +out: + mptcpify__destroy(mptcpify_skel); + return err; +} + +static void test_mptcpify(void) +{ + struct nstoken *nstoken = NULL; + int cgroup_fd; + + cgroup_fd = test__join_cgroup("/mptcpify"); + if (!ASSERT_GE(cgroup_fd, 0, "test__join_cgroup")) + return; + + nstoken = create_netns(); + if (!ASSERT_OK_PTR(nstoken, "create_netns")) + goto fail; + + ASSERT_OK(run_mptcpify(cgroup_fd), "run_mptcpify"); + +fail: + cleanup_netns(nstoken); + close(cgroup_fd); +} + void test_mptcp(void) { if (test__start_subtest("base")) test_base(); + if (test__start_subtest("mptcpify")) + test_mptcpify(); }
On Sat, Jul 29, 2023 at 05:57:21PM +0800, Geliang Tang wrote:
The main idea is to add a hook in sys_socket() to change the protocol id from IPPROTO_TCP (or 0) to IPPROTO_MPTCP.
I still think it's a hack, but its blast radius is nicely contained. And since I cannot propose any better I'm ok with it.
Patches 1-2 can be squashed into one. Just like patches 3-6 as a single patch for selftests.
But before proceeding I'd like an explicit ack from netdev maintainers.
On Mon, Jul 31, 2023 at 05:43:23PM -0700, Alexei Starovoitov wrote:
On Sat, Jul 29, 2023 at 05:57:21PM +0800, Geliang Tang wrote:
The main idea is to add a hook in sys_socket() to change the protocol id from IPPROTO_TCP (or 0) to IPPROTO_MPTCP.
I still think it's a hack, but its blast radius is nicely contained. And since I cannot propose any better I'm ok with it.
Patches 1-2 can be squashed into one. Just like patches 3-6 as a single patch for selftests.
Thanks Alexei. I'll squash patch 1 and patch 2 into one, and squash patch 3 and patch 6 into one for selftests. But I prefer to keep patch 4 and patch 5 as is, since they were implemented in different times for different purposes. They were merged into MPTCP repo on May 17 for "run MPTCP sched tests in a dedicated ns" [1].
[1] https://patchwork.kernel.org/project/mptcp/cover/cover.1684362296.git.gelian...
-Geliang
But before proceeding I'd like an explicit ack from netdev maintainers.
On Tue, Aug 1, 2023 at 7:19 PM Geliang Tang geliang.tang@suse.com wrote:
On Mon, Jul 31, 2023 at 05:43:23PM -0700, Alexei Starovoitov wrote:
On Sat, Jul 29, 2023 at 05:57:21PM +0800, Geliang Tang wrote:
The main idea is to add a hook in sys_socket() to change the protocol id from IPPROTO_TCP (or 0) to IPPROTO_MPTCP.
I still think it's a hack, but its blast radius is nicely contained. And since I cannot propose any better I'm ok with it.
Patches 1-2 can be squashed into one. Just like patches 3-6 as a single patch for selftests.
Thanks Alexei. I'll squash patch 1 and patch 2 into one, and squash patch 3 and patch 6 into one for selftests. But I prefer to keep patch 4 and patch 5 as is, since they were implemented in different times for different purposes. They were merged into MPTCP repo on May 17 for "run MPTCP sched tests in a dedicated ns" [1].
since they were sent to a different tree than don't send them here. git will not like that during the merge window.
Hi Alexei,
On 02/08/2023 04:23, Alexei Starovoitov wrote:
On Tue, Aug 1, 2023 at 7:19 PM Geliang Tang geliang.tang@suse.com wrote:
On Mon, Jul 31, 2023 at 05:43:23PM -0700, Alexei Starovoitov wrote:
On Sat, Jul 29, 2023 at 05:57:21PM +0800, Geliang Tang wrote:
The main idea is to add a hook in sys_socket() to change the protocol id from IPPROTO_TCP (or 0) to IPPROTO_MPTCP.
I still think it's a hack, but its blast radius is nicely contained. And since I cannot propose any better I'm ok with it.
Patches 1-2 can be squashed into one. Just like patches 3-6 as a single patch for selftests.
Thanks Alexei. I'll squash patch 1 and patch 2 into one, and squash patch 3 and patch 6 into one for selftests. But I prefer to keep patch 4 and patch 5 as is, since they were implemented in different times for different purposes. They were merged into MPTCP repo on May 17 for "run MPTCP sched tests in a dedicated ns" [1].
since they were sent to a different tree than don't send them here. git will not like that during the merge window.
Thank you for the suggestion but that's OK to have these patches applied in BPF tree because on MPTCP side, they have been applied in a "devel" branch which is rebased on top of -net and net-next everyday. In other words, we don't use it in pull requests and it is fine to apply these patches in bpf-next or elsewhere.
Cheers, Matt
On Mon, 2023-07-31 at 17:43 -0700, Alexei Starovoitov wrote:
I still think it's a hack, but its blast radius is nicely contained. And since I cannot propose any better I'm ok with it.
Patches 1-2 can be squashed into one. Just like patches 3-6 as a single patch for selftests.
But before proceeding I'd like an explicit ack from netdev maintainers.
Just to state the obvious, I carry my personal bias on this topic due to my background ;)
My perspective is quite similar to Alexei's one: the solution is not extremely elegant, but is very self-contained; it looks viable to me.
WRT the specific code, I think the additional checks on the 'protocol' value after the 'update_socket_protocol()' call should be dropped: the user space can already provide an arbitrary value there and the later code deal with that.
Cheers,
Paolo
linux-kselftest-mirror@lists.linaro.org