[PATCH bpf-next/net v3 0/5] bpf: Add mptcp_subflow bpf_iter support

List overview All Threads
Download

newer

older

[PATCH net-next] selftests: net:...

[PATCH 0/9] tools/nolibc: split...

Matthieu Baerts (NGI0)

20 Mar 2025 20 Mar '25

5:48 p.m.

Here is a series from Geliang, adding mptcp_subflow bpf_iter support.

We are working on extending MPTCP with BPF, e.g. to control the path manager -- in charge of the creation, deletion, and announcements of subflows (paths) -- and the packet scheduler -- in charge of selecting which available path the next data will be sent to. These extensions need to iterate over the list of subflows attached to an MPTCP connection, and do some specific actions via some new kfunc that will be added later on.

This preparation work is split in different patches:

- Patch 1: register some "basic" MPTCP kfunc.

- Patch 2: add mptcp_subflow bpf_iter support. Note that previous versions of this single patch have already been shared to the BPF mailing list. The changelog has been kept with a comment, but the version number has been reset to avoid confusions.

- Patch 3: add more MPTCP endpoints in the selftests, in order to create more than 2 subflows.

- Patch 4: add a very simple test validating mptcp_subflow bpf_iter support. This test could be written without the new bpf_iter, but it is there only to make sure this specific feature works as expected.

- Patch 5: a small fix to drop an unused parameter in the selftests.

Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Changes in v3: - Previous patches 1, 2 and 5 were no longer needed. (Martin) - Patch 2: Switch to 'struct sock' and drop unneeded checks. (Martin) - Patch 4: Adapt the test accordingly. - Patch 5: New small fix for the selftests. - Examples and questions for BPF maintainers have been added in Patch 2. - Link to v2: https://lore.kernel.org/r/20241219-bpf-next-net-mptcp-bpf_iter-subflows-v2-0...

Changes in v2: - Patches 1-2: new ones. - Patch 3: remove two kfunc, more restrictions. (Martin) - Patch 4: add BUILD_BUG_ON(), more restrictions. (Martin) - Patch 7: adaptations due to modifications in patches 1-4. - Link to v1: https://lore.kernel.org/r/20241108-bpf-next-net-mptcp-bpf_iter-subflows-v1-0...

--- Geliang Tang (5): bpf: Register mptcp common kfunc set bpf: Add mptcp_subflow bpf_iter selftests/bpf: More endpoints for endpoint_init selftests/bpf: Add mptcp_subflow bpf_iter subtest selftests/bpf: Drop cgroup_fd of run_mptcpify

net/mptcp/bpf.c | 87 +++++++++++++- tools/testing/selftests/bpf/bpf_experimental.h | 8 ++ tools/testing/selftests/bpf/prog_tests/mptcp.c | 133 +++++++++++++++++++-- tools/testing/selftests/bpf/progs/mptcp_bpf.h | 4 + .../testing/selftests/bpf/progs/mptcp_bpf_iters.c | 59 +++++++++ 5 files changed, 282 insertions(+), 9 deletions(-) --- base-commit: dad704ebe38642cd405e15b9c51263356391355c change-id: 20241108-bpf-next-net-mptcp-bpf_iter-subflows-027f6d87770e

Best regards,

-- Matthieu Baerts (NGI0) matttbe@kernel.org

Show replies by date

Matthieu Baerts (NGI0)

20 Mar 20 Mar

5:48 p.m.

New subject: [PATCH bpf-next/net v3 1/5] bpf: Register mptcp common kfunc set

From: Geliang Tang tanggeliang@kylinos.cn

MPTCP helper mptcp_sk() is used to convert struct sock to mptcp_sock. Helpers mptcp_subflow_ctx() and mptcp_subflow_tcp_sock() are used to convert between struct mptcp_subflow_context and sock. They all will be used in MPTCP BPF programs too.

This patch defines corresponding wrappers of them, and put the wrappers into mptcp common kfunc set and register the set with the flag BPF_PROG_TYPE_UNSPEC to let them accessible to all types of BPF programs.

Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Mat Martineau martineau@kernel.org Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Notes: - v2: - Thanks to the two new previous patches, bpf_mptcp_sk() and bpf_mptcp_subflow_tcp_sock() are no longer needed. - bpf_mptcp_subflow_ctx(): make sure the socket is an MPTCP subflow, and add KF_RET_NULL (Martin). - Restrict this kfunc to BPF_PROG_TYPE_CGROUP_SOCKOPT for the moment. --- net/mptcp/bpf.c | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index 8a16672b94e2384f5263e1432296cbca1236bb30..2e4b8ddf81ab0bb9dc547ea8783b73767d553a18 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -29,8 +29,37 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = { .set = &bpf_mptcp_fmodret_ids, };

+__bpf_kfunc_start_defs(); + +__bpf_kfunc static struct mptcp_subflow_context * +bpf_mptcp_subflow_ctx(const struct sock *sk) +{ + if (sk && sk_fullsock(sk) && + sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk)) + return mptcp_subflow_ctx(sk); + + return NULL; +} + +__bpf_kfunc_end_defs(); + +BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids) +BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL) +BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids) + +static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = { + .owner = THIS_MODULE, + .set = &bpf_mptcp_common_kfunc_ids, +}; + static int __init bpf_mptcp_kfunc_init(void) { - return register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set); + int ret; + + ret = register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCKOPT, + &bpf_mptcp_common_kfunc_set); + + return ret; } late_initcall(bpf_mptcp_kfunc_init);

-- 2.48.1

Matthieu Baerts (NGI0)

5:48 p.m.

New subject: [PATCH bpf-next/net v3 2/5] bpf: Add mptcp_subflow bpf_iter

From: Geliang Tang tanggeliang@kylinos.cn

It's necessary to traverse all subflows on the conn_list of an MPTCP socket and then call kfunc to modify the fields of each subflow. In kernel space, mptcp_for_each_subflow() helper is used for this:

mptcp_for_each_subflow(msk, subflow) kfunc(subflow);

But in the MPTCP BPF program, this has not yet been implemented. As Martin suggested recently, this conn_list walking + modify-by-kfunc usage fits the bpf_iter use case.

So this patch adds a new bpf_iter type named "mptcp_subflow" to do this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/ _destroy(). And register these bpf_iter mptcp_subflow into mptcp common kfunc set. Then bpf_for_each() for mptcp_subflow can be used in BPF program like this:

bpf_for_each(mptcp_subflow, subflow, msk) kfunc(subflow);

Suggested-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Notes: - v2: - Add BUILD_BUG_ON() checks, similar to the ones done with other bpf_iter_(...) helpers. - Replace msk_owned_by_me() by sock_owned_by_user_nocheck() and !spin_is_locked() (Martin). - v3: - Switch parameter from 'struct mptcp_sock' to 'struct sock' (Martin) - Remove unneeded !msk check (Martin) - Remove locks checks, add msk_owned_by_me for lockdep (Martin) - The following note and 2 questions have been added below.

This new bpf_iter will be used by our future BPF packet schedulers and path managers. To see how we are going to use them, please check our export branch [1], especially these two commits:

- "bpf: Add mptcp packet scheduler struct_ops": introduce a new struct_ops. - "selftests/bpf: Add bpf_burst scheduler & test": new test showing how the new struct_ops and bpf_iter are being used.

[1] https://github.com/multipath-tcp/mptcp_net-next/commits/export

@BPF maintainers: we would like to allow this new mptcp_subflow bpf_iter to be used with struct_ops, but only with the two new ones we are going to introduce that are specific to MPTCP, and with not others struct_ops (TCP CC, sched_ext, etc.). We are not sure how to do that. By chance, do you have examples or doc you could point to us to have this restriction in place, please?

Also, for one of the two future MPTCP struct_ops, not all callbacks should be allowed to use this new bpf_iter, because they are called from different contexts. How can we ensure such callbacks from a struct_ops cannot call mptcp_subflow bpf_iter without adding new dedicated checks looking if some locks are held for all callbacks? We understood that they wanted to have something similar with sched_ext, but we are not sure if this code is ready nor if it is going to be accepted.

--- A few versions of this single patch have been previously posted to the BPF mailing list by Geliang, before continuing to the MPTCP mailing list only, with other patches of this series. The version of the whole series has been reset to 1, but here is the ChangeLog for the previous ones: - v2: remove msk->pm.lock in _new() and _destroy() (Martin) drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2] (Andrii) - v3: drop bpf_iter__mptcp_subflow - v4: if msk is NULL, initialize kit->msk to NULL in _new() and check it in _next() (Andrii) - v5: use list_is_last() instead of list_entry_is_head() add KF_ITER_NEW/NEXT/DESTROY flags add msk_owned_by_me in _new() - v6: add KF_TRUSTED_ARGS flag (Andrii, Martin) --- net/mptcp/bpf.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+)

diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index 2e4b8ddf81ab0bb9dc547ea8783b73767d553a18..d3b5597eddb915a19eca87d87c31a27dfbdda619 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -29,6 +29,15 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = { .set = &bpf_mptcp_fmodret_ids, };

+struct bpf_iter_mptcp_subflow { + __u64 __opaque[2]; +} __aligned(8); + +struct bpf_iter_mptcp_subflow_kern { + struct mptcp_sock *msk; + struct list_head *pos; +} __aligned(8); + __bpf_kfunc_start_defs();

__bpf_kfunc static struct mptcp_subflow_context * @@ -41,10 +50,57 @@ bpf_mptcp_subflow_ctx(const struct sock *sk) return NULL; }

+__bpf_kfunc static int +bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it, + struct sock *sk) +{ + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; + struct mptcp_sock *msk; + + BUILD_BUG_ON(sizeof(struct bpf_iter_mptcp_subflow_kern) > + sizeof(struct bpf_iter_mptcp_subflow)); + BUILD_BUG_ON(__alignof__(struct bpf_iter_mptcp_subflow_kern) != + __alignof__(struct bpf_iter_mptcp_subflow)); + + if (unlikely(!sk || !sk_fullsock(sk))) + return -EINVAL; + + if (sk->sk_protocol != IPPROTO_MPTCP) + return -EINVAL; + + msk = mptcp_sk(sk); + + msk_owned_by_me(msk); + + kit->msk = msk; + kit->pos = &msk->conn_list; + return 0; +} + +__bpf_kfunc static struct mptcp_subflow_context * +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) +{ + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; + + if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list)) + return NULL; + + kit->pos = kit->pos->next; + return list_entry(kit->pos, struct mptcp_subflow_context, node); +} + +__bpf_kfunc static void +bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) +{ +} + __bpf_kfunc_end_defs();

BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids) BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY) BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids)

static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = {

-- 2.48.1

Martin KaFai Lau

16 May 16 May

10:34 p.m.

New subject: [PATCH bpf-next/net v3 2/5] bpf: Add mptcp_subflow bpf_iter

On 3/20/25 10:48 AM, Matthieu Baerts (NGI0) wrote:

...

From: Geliang Tang tanggeliang@kylinos.cn

It's necessary to traverse all subflows on the conn_list of an MPTCP socket and then call kfunc to modify the fields of each subflow. In kernel space, mptcp_for_each_subflow() helper is used for this:

mptcp_for_each_subflow(msk, subflow) kfunc(subflow);

But in the MPTCP BPF program, this has not yet been implemented. As Martin suggested recently, this conn_list walking + modify-by-kfunc usage fits the bpf_iter use case.

So this patch adds a new bpf_iter type named "mptcp_subflow" to do this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/ _destroy(). And register these bpf_iter mptcp_subflow into mptcp common kfunc set. Then bpf_for_each() for mptcp_subflow can be used in BPF program like this:

bpf_for_each(mptcp_subflow, subflow, msk) kfunc(subflow);

Suggested-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org

Notes:

v2:

Add BUILD_BUG_ON() checks, similar to the ones done with other bpf_iter_(...) helpers.

Replace msk_owned_by_me() by sock_owned_by_user_nocheck() and !spin_is_locked() (Martin).

v3:

Switch parameter from 'struct mptcp_sock' to 'struct sock' (Martin)

Remove unneeded !msk check (Martin)

Remove locks checks, add msk_owned_by_me for lockdep (Martin)

The following note and 2 questions have been added below.

This new bpf_iter will be used by our future BPF packet schedulers and path managers. To see how we are going to use them, please check our export branch [1], especially these two commits:

"bpf: Add mptcp packet scheduler struct_ops": introduce a new struct_ops.

"selftests/bpf: Add bpf_burst scheduler & test": new test showing how the new struct_ops and bpf_iter are being used.

[1] https://github.com/multipath-tcp/mptcp_net-next/commits/export

@BPF maintainers: we would like to allow this new mptcp_subflow bpf_iter to be used with struct_ops, but only with the two new ones we are going to introduce that are specific to MPTCP, and with not others struct_ops (TCP CC, sched_ext, etc.). We are not sure how to do that. By chance, do you have examples or doc you could point to us to have this restriction in place, please?

The bpf_qdisc.c has done that. Take a look at the "bpf_qdisc_kfunc_filter()".

It is in net-next and bpf-next/net.

...

Also, for one of the two future MPTCP struct_ops, not all callbacks should be allowed to use this new bpf_iter, because they are called from different contexts. How can we ensure such callbacks from a struct_ops cannot call mptcp_subflow bpf_iter without adding new dedicated checks looking if some locks are held for all callbacks? We understood that they wanted to have something similar with sched_ext, but we are not sure if this code is ready nor if it is going to be accepted.

Same. Take a look at "bpf_qdisc_kfunc_filter()".

Matthieu Baerts

19 May 19 May

10:05 a.m.

New subject: [PATCH bpf-next/net v3 2/5] bpf: Add mptcp_subflow bpf_iter

Hi Martin,

Thank you for your reply!

On 17/05/2025 00:34, Martin KaFai Lau wrote:

...

On 3/20/25 10:48 AM, Matthieu Baerts (NGI0) wrote:

(...)

...

...
@BPF maintainers: we would like to allow this new mptcp_subflow bpf_iter to be used with struct_ops, but only with the two new ones we are going to introduce that are specific to MPTCP, and with not others struct_ops (TCP CC, sched_ext, etc.). We are not sure how to do that. By chance, do you have examples or doc you could point to us to have this restriction in place, please?

The bpf_qdisc.c has done that. Take a look at the "bpf_qdisc_kfunc_filter()".

It is in net-next and bpf-next/net.

Many thanks for the pointer! I see, some operations have specific kfunc, similar to our needs!

...

...
Also, for one of the two future MPTCP struct_ops, not all callbacks should be allowed to use this new bpf_iter, because they are called from different contexts. How can we ensure such callbacks from a struct_ops cannot call mptcp_subflow bpf_iter without adding new dedicated checks looking if some locks are held for all callbacks? We understood that they wanted to have something similar with sched_ext, but we are not sure if this code is ready nor if it is going to be accepted.

Same. Take a look at "bpf_qdisc_kfunc_filter()".

Excellent, thank you, we will look at that!

Cheers, Matt

-- Sponsored by the NGI0 Core fund.

Matthieu Baerts (NGI0)

20 Mar 20 Mar

5:48 p.m.

New subject: [PATCH bpf-next/net v3 3/5] selftests/bpf: More endpoints for endpoint_init

From: Geliang Tang tanggeliang@kylinos.cn

This patch changes ADDR_2 from "10.0.1.2" to "10.0.2.1", and adds two more IPv4 test addresses ADDR_3 - ADDR_4, four IPv6 addresses ADDR6_1 - ADDR6_4. Add a new helper address_init() to initialize all these addresses.

Add a new parameter "endpoints" for endpoint_init() to control how many endpoints are used for the tests. This makes it more flexible. Update the parameters of endpoint_init() in test_subflow().

Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- tools/testing/selftests/bpf/prog_tests/mptcp.c | 56 +++++++++++++++++++++++--- 1 file changed, 50 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index f8eb7f9d4fd20bbb7ee018728f7ae0f0a09d4d30..85f3d4119802a85c86cde7b74a0b857252bad8b8 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -14,7 +14,13 @@

#define NS_TEST "mptcp_ns" #define ADDR_1 "10.0.1.1" -#define ADDR_2 "10.0.1.2" +#define ADDR_2 "10.0.2.1" +#define ADDR_3 "10.0.3.1" +#define ADDR_4 "10.0.4.1" +#define ADDR6_1 "dead:beef:1::1" +#define ADDR6_2 "dead:beef:2::1" +#define ADDR6_3 "dead:beef:3::1" +#define ADDR6_4 "dead:beef:4::1" #define PORT_1 10001

#ifndef IPPROTO_MPTCP @@ -322,22 +328,60 @@ static void test_mptcpify(void) close(cgroup_fd); }

-static int endpoint_init(char *flags) +static int address_init(void) { SYS(fail, "ip -net %s link add veth1 type veth peer name veth2", NS_TEST); SYS(fail, "ip -net %s addr add %s/24 dev veth1", NS_TEST, ADDR_1); + SYS(fail, "ip -net %s addr add %s/64 dev veth1 nodad", NS_TEST, ADDR6_1); SYS(fail, "ip -net %s link set dev veth1 up", NS_TEST); SYS(fail, "ip -net %s addr add %s/24 dev veth2", NS_TEST, ADDR_2); + SYS(fail, "ip -net %s addr add %s/64 dev veth2 nodad", NS_TEST, ADDR6_2); SYS(fail, "ip -net %s link set dev veth2 up", NS_TEST); - if (SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, ADDR_2, flags)) { + + SYS(fail, "ip -net %s link add veth3 type veth peer name veth4", NS_TEST); + SYS(fail, "ip -net %s addr add %s/24 dev veth3", NS_TEST, ADDR_3); + SYS(fail, "ip -net %s addr add %s/64 dev veth3 nodad", NS_TEST, ADDR6_3); + SYS(fail, "ip -net %s link set dev veth3 up", NS_TEST); + SYS(fail, "ip -net %s addr add %s/24 dev veth4", NS_TEST, ADDR_4); + SYS(fail, "ip -net %s addr add %s/64 dev veth4 nodad", NS_TEST, ADDR6_4); + SYS(fail, "ip -net %s link set dev veth4 up", NS_TEST); + + return 0; +fail: + return -1; +} + +static int endpoint_add(char *addr, char *flags) +{ + return SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, addr, flags); +} + +static int endpoint_init(char *flags, u8 endpoints) +{ + int ret = -1; + + if (!endpoints || endpoints > 4) + goto fail; + + if (address_init()) + goto fail; + + if (SYS_NOFAIL("ip -net %s mptcp limits set add_addr_accepted 4 subflows 4", + NS_TEST)) { printf("'ip mptcp' not supported, skip this test.\n"); test__skip(); goto fail; }

- return 0; + if (endpoints > 1) + ret = endpoint_add(ADDR_2, flags); + if (endpoints > 2) + ret = ret ?: endpoint_add(ADDR_3, flags); + if (endpoints > 3) + ret = ret ?: endpoint_add(ADDR_4, flags); + fail: - return -1; + return ret; }

static void wait_for_new_subflows(int fd) @@ -423,7 +467,7 @@ static void test_subflow(void) if (!ASSERT_OK_PTR(netns, "netns_new: mptcp_subflow")) goto skel_destroy;

- if (endpoint_init("subflow") < 0) + if (endpoint_init("subflow", 2) < 0) goto close_netns;

run_subflow();

-- 2.48.1

Matthieu Baerts (NGI0)

5:48 p.m.

New subject: [PATCH bpf-next/net v3 4/5] selftests/bpf: Add mptcp_subflow bpf_iter subtest

From: Geliang Tang tanggeliang@kylinos.cn

This patch adds a "cgroup/getsockopt" program "iters_subflow" to test the newly added mptcp_subflow bpf_iter.

Export mptcp_subflow helpers bpf_iter_mptcp_subflow_new/_next/_destroy and other helpers into bpf_experimental.h.

Use bpf_for_each() to walk the subflow list of an msk. From there, future MPTCP-specific kfunc can be called in the loop. Because they are not there yet, this test doesn't do anything very "useful" for the moment, but it focuses on validating the 'bpf_iter' part and the basic MPTCP kfunc. That's why it simply adds all subflow ids to local variable local_ids to make sure all subflows have been seen, then invoke mptcp_subflow_tcp_sock() in the loop to pick the subflow context.

Out of the loop, use bpf_mptcp_subflow_ctx() to get the subflow context of the picked subflow context and do some verifications. Finally, assign local_ids to global variable ids so that the application can obtain this value.

A related subtest called test_iters_subflow is added to load and verify the newly added mptcp_subflow type bpf_iter example in test_mptcp. The endpoint_init() helper is used to add 3 new subflow endpoints. Then one byte of message is sent to trigger the creation of new subflows. getsockopt() is invoked once the subflows have been created to trigger the "cgroup/getsockopt" test program "iters_subflow". skel->bss->ids is then checked to make sure it equals 10, the sum of each subflow ID: we should have 4 subflows: 1 + 2 + 3 + 4 = 10. If that's the case, the bpf_iter loop did the job as expected.

Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Notes: - v2: - explicit sk protocol checks are no longer needed, implicitly done in bpf_skc_to_mptcp_sock(). - use bpf_skc_to_mptcp_sock() instead of bpf_mptcp_sk(), and mptcp_subflow_tcp_sock() instead of bpf_mptcp_subflow_tcp_sock(). - bpf_mptcp_subflow_ctx() can now return NULL. - v3: - Use bpf_core_cast to get the msk instead of bpf_skc_to_mptcp_sock. - Drop bpf_mptcp_sock_acquire and bpf_mptcp_sock_release (Martin). - Adapt the commit message accordingly. - Remove no longer needed export to the mptcp_bpf.h file and adapt bpf_iter_mptcp_subflow_new parameter in bpf_experimental.h. --- tools/testing/selftests/bpf/bpf_experimental.h | 8 +++ tools/testing/selftests/bpf/prog_tests/mptcp.c | 73 ++++++++++++++++++++++ tools/testing/selftests/bpf/progs/mptcp_bpf.h | 4 ++ .../testing/selftests/bpf/progs/mptcp_bpf_iters.c | 59 +++++++++++++++++ 4 files changed, 144 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h index cd8ecd39c3f3c68d40c6e3e1465b42ed66537027..6a96c56f0725a86ab6e83675ca0e474c3d668b10 100644 --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -575,6 +575,14 @@ extern int bpf_iter_css_new(struct bpf_iter_css *it, extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym; extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym;

+struct bpf_iter_mptcp_subflow; +extern int bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it, + struct sock *sk) __weak __ksym; +extern struct mptcp_subflow_context * +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) __weak __ksym; +extern void +bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) __weak __ksym; + extern int bpf_wq_init(struct bpf_wq *wq, void *p__map, unsigned int flags) __weak __ksym; extern int bpf_wq_start(struct bpf_wq *wq, unsigned int flags) __weak __ksym; extern int bpf_wq_set_callback_impl(struct bpf_wq *wq, diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index 85f3d4119802a85c86cde7b74a0b857252bad8b8..f37574b5ef68d8f32f8002df317869dfdf1d4b2d 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -11,6 +11,7 @@ #include "mptcp_sock.skel.h" #include "mptcpify.skel.h" #include "mptcp_subflow.skel.h" +#include "mptcp_bpf_iters.skel.h"

#define NS_TEST "mptcp_ns" #define ADDR_1 "10.0.1.1" @@ -33,6 +34,9 @@ #ifndef MPTCP_INFO #define MPTCP_INFO 1 #endif +#ifndef TCP_IS_MPTCP +#define TCP_IS_MPTCP 43 /* Is MPTCP being used? */ +#endif #ifndef MPTCP_INFO_FLAG_FALLBACK #define MPTCP_INFO_FLAG_FALLBACK _BITUL(0) #endif @@ -480,6 +484,73 @@ static void test_subflow(void) close(cgroup_fd); }

+static void run_iters_subflow(void) +{ + int server_fd, client_fd; + int is_mptcp, err; + socklen_t len; + + server_fd = start_mptcp_server(AF_INET, ADDR_1, PORT_1, 0); + if (!ASSERT_OK_FD(server_fd, "start_mptcp_server")) + return; + + client_fd = connect_to_fd(server_fd, 0); + if (!ASSERT_OK_FD(client_fd, "connect_to_fd")) + goto close_server; + + send_byte(client_fd); + wait_for_new_subflows(client_fd); + + len = sizeof(is_mptcp); + /* mainly to trigger the BPF program */ + err = getsockopt(client_fd, SOL_TCP, TCP_IS_MPTCP, &is_mptcp, &len); + if (ASSERT_OK(err, "getsockopt(client_fd, TCP_IS_MPTCP)")) + ASSERT_EQ(is_mptcp, 1, "is_mptcp"); + + close(client_fd); +close_server: + close(server_fd); +} + +static void test_iters_subflow(void) +{ + struct mptcp_bpf_iters *skel; + struct netns_obj *netns; + int cgroup_fd; + + cgroup_fd = test__join_cgroup("/iters_subflow"); + if (!ASSERT_OK_FD(cgroup_fd, "join_cgroup: iters_subflow")) + return; + + skel = mptcp_bpf_iters__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel_open_load: iters_subflow")) + goto close_cgroup; + + skel->links.iters_subflow = bpf_program__attach_cgroup(skel->progs.iters_subflow, + cgroup_fd); + if (!ASSERT_OK_PTR(skel->links.iters_subflow, "attach getsockopt")) + goto skel_destroy; + + netns = netns_new(NS_TEST, true); + if (!ASSERT_OK_PTR(netns, "netns_new: iters_subflow")) + goto skel_destroy; + + if (endpoint_init("subflow", 4) < 0) + goto close_netns; + + run_iters_subflow(); + + /* 1 + 2 + 3 + 4 = 10 */ + ASSERT_EQ(skel->bss->ids, 10, "subflow ids"); + +close_netns: + netns_free(netns); +skel_destroy: + mptcp_bpf_iters__destroy(skel); +close_cgroup: + close(cgroup_fd); +} + void test_mptcp(void) { if (test__start_subtest("base")) @@ -488,4 +559,6 @@ void test_mptcp(void) test_mptcpify(); if (test__start_subtest("subflow")) test_subflow(); + if (test__start_subtest("iters_subflow")) + test_iters_subflow(); } diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf.h b/tools/testing/selftests/bpf/progs/mptcp_bpf.h index 3b188ccdcc4041acb4f7ed38ae8ddf5a7305466a..aa897074de6f377e8cddc859c3b2dc3751f14381 100644 --- a/tools/testing/selftests/bpf/progs/mptcp_bpf.h +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf.h @@ -39,4 +39,8 @@ mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow) return subflow->tcp_sock; }

+/* ksym */ +extern struct mptcp_subflow_context * +bpf_mptcp_subflow_ctx(const struct sock *sk) __ksym; + #endif diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf_iters.c b/tools/testing/selftests/bpf/progs/mptcp_bpf_iters.c new file mode 100644 index 0000000000000000000000000000000000000000..a1d8f9b20259a9cbdc46d58e0d18157564fa5acd --- /dev/null +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_iters.c @@ -0,0 +1,59 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024, Kylin Software */ + +/* vmlinux.h, bpf_helpers.h and other 'define' */ +#include "bpf_tracing_net.h" +#include "mptcp_bpf.h" + +char _license[] SEC("license") = "GPL"; +int ids; + +#ifndef TCP_IS_MPTCP +#define TCP_IS_MPTCP 43 /* Is MPTCP being used? */ +#endif + +SEC("cgroup/getsockopt") +int iters_subflow(struct bpf_sockopt *ctx) +{ + struct mptcp_subflow_context *subflow; + struct bpf_sock *sk = ctx->sk; + struct sock *ssk = NULL; + struct mptcp_sock *msk; + int local_ids = 0; + + if (ctx->level != SOL_TCP || ctx->optname != TCP_IS_MPTCP) + return 1; + + msk = bpf_core_cast(sk, struct mptcp_sock); + if (!msk || msk->pm.server_side || !msk->pm.subflows) + return 1; + + bpf_for_each(mptcp_subflow, subflow, (struct sock *)sk) { + /* Here MPTCP-specific packet scheduler kfunc can be called: + * this test is not doing anything really useful, only to + * verify the iteration works. + */ + + local_ids += subflow->subflow_id; + + /* only to check the following helper works */ + ssk = mptcp_subflow_tcp_sock(subflow); + } + + if (!ssk) + goto out; + + /* assert: if not OK, something wrong on the kernel side */ + if (ssk->sk_dport != ((struct sock *)msk)->sk_dport) + goto out; + + /* only to check the following kfunc works */ + subflow = bpf_mptcp_subflow_ctx(ssk); + if (!subflow || subflow->token != msk->token) + goto out; + + ids = local_ids; + +out: + return 1; +}

-- 2.48.1

Martin KaFai Lau

16 May 16 May

10:48 p.m.

New subject: [PATCH bpf-next/net v3 4/5] selftests/bpf: Add mptcp_subflow bpf_iter subtest

On 3/20/25 10:48 AM, Matthieu Baerts (NGI0) wrote:

...

From: Geliang Tang tanggeliang@kylinos.cn

This patch adds a "cgroup/getsockopt" program "iters_subflow" to test the newly added mptcp_subflow bpf_iter.

Export mptcp_subflow helpers bpf_iter_mptcp_subflow_new/_next/_destroy and other helpers into bpf_experimental.h.

Use bpf_for_each() to walk the subflow list of an msk. From there, future MPTCP-specific kfunc can be called in the loop. Because they are not there yet, this test doesn't do anything very "useful" for the moment, but it focuses on validating the 'bpf_iter' part and the basic MPTCP kfunc. That's why it simply adds all subflow ids to local variable local_ids to make sure all subflows have been seen, then invoke mptcp_subflow_tcp_sock() in the loop to pick the subflow context.

Out of the loop, use bpf_mptcp_subflow_ctx() to get the subflow context of the picked subflow context and do some verifications. Finally, assign local_ids to global variable ids so that the application can obtain this value.

A related subtest called test_iters_subflow is added to load and verify the newly added mptcp_subflow type bpf_iter example in test_mptcp. The endpoint_init() helper is used to add 3 new subflow endpoints. Then one byte of message is sent to trigger the creation of new subflows. getsockopt() is invoked once the subflows have been created to trigger the "cgroup/getsockopt" test program "iters_subflow". skel->bss->ids is then checked to make sure it equals 10, the sum of each subflow ID: we should have 4 subflows: 1 + 2 + 3 + 4 = 10. If that's the case, the bpf_iter loop did the job as expected.

Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org

Notes:

v2:

explicit sk protocol checks are no longer needed, implicitly done in bpf_skc_to_mptcp_sock().

use bpf_skc_to_mptcp_sock() instead of bpf_mptcp_sk(), and mptcp_subflow_tcp_sock() instead of bpf_mptcp_subflow_tcp_sock().

bpf_mptcp_subflow_ctx() can now return NULL.

v3:

Use bpf_core_cast to get the msk instead of bpf_skc_to_mptcp_sock.

Drop bpf_mptcp_sock_acquire and bpf_mptcp_sock_release (Martin).

Adapt the commit message accordingly.

Remove no longer needed export to the mptcp_bpf.h file and adapt bpf_iter_mptcp_subflow_new parameter in bpf_experimental.h.

tools/testing/selftests/bpf/bpf_experimental.h | 8 +++ tools/testing/selftests/bpf/prog_tests/mptcp.c | 73 ++++++++++++++++++++++ tools/testing/selftests/bpf/progs/mptcp_bpf.h | 4 ++ .../testing/selftests/bpf/progs/mptcp_bpf_iters.c | 59 +++++++++++++++++ 4 files changed, 144 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h index cd8ecd39c3f3c68d40c6e3e1465b42ed66537027..6a96c56f0725a86ab6e83675ca0e474c3d668b10 100644 --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -575,6 +575,14 @@ extern int bpf_iter_css_new(struct bpf_iter_css *it, extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym; extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym; +struct bpf_iter_mptcp_subflow; +extern int bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
		      struct sock *sk) __weak __ksym;
+extern struct mptcp_subflow_context * +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) __weak __ksym; +extern void +bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) __weak __ksym;

extern int bpf_wq_init(struct bpf_wq *wq, void *p__map, unsigned int flags) __weak __ksym; extern int bpf_wq_start(struct bpf_wq *wq, unsigned int flags) __weak __ksym; extern int bpf_wq_set_callback_impl(struct bpf_wq *wq,

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index 85f3d4119802a85c86cde7b74a0b857252bad8b8..f37574b5ef68d8f32f8002df317869dfdf1d4b2d 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -11,6 +11,7 @@ #include "mptcp_sock.skel.h" #include "mptcpify.skel.h" #include "mptcp_subflow.skel.h" +#include "mptcp_bpf_iters.skel.h" #define NS_TEST "mptcp_ns" #define ADDR_1 "10.0.1.1" @@ -33,6 +34,9 @@ #ifndef MPTCP_INFO #define MPTCP_INFO 1 #endif +#ifndef TCP_IS_MPTCP +#define TCP_IS_MPTCP 43 /* Is MPTCP being used? */ +#endif #ifndef MPTCP_INFO_FLAG_FALLBACK #define MPTCP_INFO_FLAG_FALLBACK _BITUL(0) #endif @@ -480,6 +484,73 @@ static void test_subflow(void) close(cgroup_fd); } +static void run_iters_subflow(void) +{
int server_fd, client_fd;

int is_mptcp, err;

socklen_t len;

server_fd = start_mptcp_server(AF_INET, ADDR_1, PORT_1, 0);

if (!ASSERT_OK_FD(server_fd, "start_mptcp_server"))
return;
client_fd = connect_to_fd(server_fd, 0);

if (!ASSERT_OK_FD(client_fd, "connect_to_fd"))
goto close_server;
send_byte(client_fd);

wait_for_new_subflows(client_fd);

len = sizeof(is_mptcp);

/* mainly to trigger the BPF program */

err = getsockopt(client_fd, SOL_TCP, TCP_IS_MPTCP, &is_mptcp, &len);

if (ASSERT_OK(err, "getsockopt(client_fd, TCP_IS_MPTCP)"))
ASSERT_EQ(is_mptcp, 1, "is_mptcp");
close(client_fd);
+close_server:

close(server_fd);

+}

+static void test_iters_subflow(void) +{
struct mptcp_bpf_iters *skel;

struct netns_obj *netns;

int cgroup_fd;

cgroup_fd = test__join_cgroup("/iters_subflow");

if (!ASSERT_OK_FD(cgroup_fd, "join_cgroup: iters_subflow"))
return;
skel = mptcp_bpf_iters__open_and_load();

if (!ASSERT_OK_PTR(skel, "skel_open_load: iters_subflow"))
goto close_cgroup;
skel->links.iters_subflow = bpf_program__attach_cgroup(skel->progs.iters_subflow,
					       cgroup_fd);
if (!ASSERT_OK_PTR(skel->links.iters_subflow, "attach getsockopt"))
goto skel_destroy;
netns = netns_new(NS_TEST, true);

if (!ASSERT_OK_PTR(netns, "netns_new: iters_subflow"))
goto skel_destroy;
if (endpoint_init("subflow", 4) < 0)
goto close_netns;
run_iters_subflow();

/* 1 + 2 + 3 + 4 = 10 */

ASSERT_EQ(skel->bss->ids, 10, "subflow ids");
+close_netns:

netns_free(netns);

+skel_destroy:

mptcp_bpf_iters__destroy(skel);

+close_cgroup:

close(cgroup_fd);

+}

void test_mptcp(void) { if (test__start_subtest("base"))

@@ -488,4 +559,6 @@ void test_mptcp(void) test_mptcpify(); if (test__start_subtest("subflow")) test_subflow();
if (test__start_subtest("iters_subflow"))
test_iters_subflow();
}
diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf.h b/tools/testing/selftests/bpf/progs/mptcp_bpf.h index 3b188ccdcc4041acb4f7ed38ae8ddf5a7305466a..aa897074de6f377e8cddc859c3b2dc3751f14381 100644 --- a/tools/testing/selftests/bpf/progs/mptcp_bpf.h +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf.h @@ -39,4 +39,8 @@ mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow) return subflow->tcp_sock; } +/* ksym */ +extern struct mptcp_subflow_context * +bpf_mptcp_subflow_ctx(const struct sock *sk) __ksym;

#endif

diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf_iters.c b/tools/testing/selftests/bpf/progs/mptcp_bpf_iters.c new file mode 100644 index 0000000000000000000000000000000000000000..a1d8f9b20259a9cbdc46d58e0d18157564fa5acd --- /dev/null +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_iters.c @@ -0,0 +1,59 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024, Kylin Software */

+/* vmlinux.h, bpf_helpers.h and other 'define' */ +#include "bpf_tracing_net.h" +#include "mptcp_bpf.h"

+char _license[] SEC("license") = "GPL"; +int ids;

+#ifndef TCP_IS_MPTCP +#define TCP_IS_MPTCP 43 /* Is MPTCP being used? */ +#endif

+SEC("cgroup/getsockopt") +int iters_subflow(struct bpf_sockopt *ctx) +{
struct mptcp_subflow_context *subflow;

struct bpf_sock *sk = ctx->sk;

struct sock *ssk = NULL;

struct mptcp_sock *msk;

int local_ids = 0;

if (ctx->level != SOL_TCP || ctx->optname != TCP_IS_MPTCP)
return 1;
msk = bpf_core_cast(sk, struct mptcp_sock);

if (!msk || msk->pm.server_side || !msk->pm.subflows)
return 1;
bpf_for_each(mptcp_subflow, subflow, (struct sock *)sk) {
/* Here MPTCP-specific packet scheduler kfunc can be called:
 * this test is not doing anything really useful, only to

Lets fold the bpf_iter_mptcp_subflow addition into the future "mptcp_sched_ops" set (the github link that you mentioned in patch 2). Post them as one set to have a more practical example.

...

 * verify the iteration works.
 */
local_ids += subflow->subflow_id;
/* only to check the following helper works */
ssk = mptcp_subflow_tcp_sock(subflow);
}

if (!ssk)
goto out;
/* assert: if not OK, something wrong on the kernel side */

if (ssk->sk_dport != ((struct sock *)msk)->sk_dport)
goto out;
/* only to check the following kfunc works */

subflow = bpf_mptcp_subflow_ctx(ssk);

bpf_core_cast should be as good instead of adding a new bpf_mptcp_subflow_ctx() kfunc, so patch 1 should not be needed.

...

if (!subflow || subflow->token != msk->token)
goto out;
ids = local_ids;
+out:

return 1;

+}

Matthieu Baerts

19 May 19 May

10:04 a.m.

New subject: [PATCH bpf-next/net v3 4/5] selftests/bpf: Add mptcp_subflow bpf_iter subtest

Hi Martin,

On 17/05/2025 00:48, Martin KaFai Lau wrote:

...

On 3/20/25 10:48 AM, Matthieu Baerts (NGI0) wrote:

...
From: Geliang Tang tanggeliang@kylinos.cn

This patch adds a "cgroup/getsockopt" program "iters_subflow" to test the newly added mptcp_subflow bpf_iter.

Export mptcp_subflow helpers bpf_iter_mptcp_subflow_new/_next/_destroy and other helpers into bpf_experimental.h.

Use bpf_for_each() to walk the subflow list of an msk. From there, future MPTCP-specific kfunc can be called in the loop. Because they are not there yet, this test doesn't do anything very "useful" for the moment, but it focuses on validating the 'bpf_iter' part and the basic MPTCP kfunc. That's why it simply adds all subflow ids to local variable local_ids to make sure all subflows have been seen, then invoke mptcp_subflow_tcp_sock() in the loop to pick the subflow context.

Out of the loop, use bpf_mptcp_subflow_ctx() to get the subflow context of the picked subflow context and do some verifications. Finally, assign local_ids to global variable ids so that the application can obtain this value.

A related subtest called test_iters_subflow is added to load and verify the newly added mptcp_subflow type bpf_iter example in test_mptcp. The endpoint_init() helper is used to add 3 new subflow endpoints. Then one byte of message is sent to trigger the creation of new subflows. getsockopt() is invoked once the subflows have been created to trigger the "cgroup/getsockopt" test program "iters_subflow". skel->bss->ids is then checked to make sure it equals 10, the sum of each subflow ID: we should have 4 subflows: 1 + 2 + 3 + 4 = 10. If that's the case, the bpf_iter loop did the job as expected.

(...)

...

...
diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf_iters.c b/ tools/testing/selftests/bpf/progs/mptcp_bpf_iters.c new file mode 100644 index 0000000000000000000000000000000000000000..a1d8f9b20259a9cbdc46d58e0d18157564fa5acd --- /dev/null +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_iters.c @@ -0,0 +1,59 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024, Kylin Software */

+/* vmlinux.h, bpf_helpers.h and other 'define' */ +#include "bpf_tracing_net.h" +#include "mptcp_bpf.h"

+char _license[] SEC("license") = "GPL"; +int ids;

+#ifndef TCP_IS_MPTCP +#define TCP_IS_MPTCP        43    /* Is MPTCP being used? */ +#endif

+SEC("cgroup/getsockopt") +int iters_subflow(struct bpf_sockopt *ctx) +{ +    struct mptcp_subflow_context *subflow; +    struct bpf_sock *sk = ctx->sk; +    struct sock *ssk = NULL; +    struct mptcp_sock *msk; +    int local_ids = 0;

+    if (ctx->level != SOL_TCP || ctx->optname != TCP_IS_MPTCP) +        return 1;

+    msk = bpf_core_cast(sk, struct mptcp_sock); +    if (!msk || msk->pm.server_side || !msk->pm.subflows) +        return 1;

+    bpf_for_each(mptcp_subflow, subflow, (struct sock *)sk) { +        /* Here MPTCP-specific packet scheduler kfunc can be called: +         * this test is not doing anything really useful, only to

Lets fold the bpf_iter_mptcp_subflow addition into the future "mptcp_sched_ops" set (the github link that you mentioned in patch 2). Post them as one set to have a more practical example.

Thank you for this suggestion. We can delay that if needed.

Note that we have two struct_ops in preparation: mptcp_sched_ops and mptcp_pm_ops. We don't know which one will be ready first. They are both "blocked" by internal API modifications we would like to do to ease the maintenance later before "exposing" such API's via BPF. That's why we suggested to upstream this common part first as it is ready. But we can of course wait if you prefer.

...

...
+         * verify the iteration works. +         */

+        local_ids += subflow->subflow_id;

+        /* only to check the following helper works */ +        ssk = mptcp_subflow_tcp_sock(subflow); +    }

+    if (!ssk) +        goto out;

+    /* assert: if not OK, something wrong on the kernel side */ +    if (ssk->sk_dport != ((struct sock *)msk)->sk_dport) +        goto out;

+    /* only to check the following kfunc works */ +    subflow = bpf_mptcp_subflow_ctx(ssk);

bpf_core_cast should be as good instead of adding a new bpf_mptcp_subflow_ctx() kfunc, so patch 1 should not be needed.

OK, indeed, in this series we don't need it. We will need it later to modify some fields from the "subflow" structure directly. We can do the modification or drop this test when the new struct_ops will be ready.

...

...
+    if (!subflow || subflow->token != msk->token) +        goto out;

+    ids = local_ids;

+out: +    return 1; +}

Cheers, Matt

-- Sponsored by the NGI0 Core fund.

Martin KaFai Lau

20 May 20 May

10:18 p.m.

New subject: [PATCH bpf-next/net v3 4/5] selftests/bpf: Add mptcp_subflow bpf_iter subtest

On 5/19/25 3:04 AM, Matthieu Baerts wrote:

...

...
...
+SEC("cgroup/getsockopt") +int iters_subflow(struct bpf_sockopt *ctx) +{ +    struct mptcp_subflow_context *subflow; +    struct bpf_sock *sk = ctx->sk; +    struct sock *ssk = NULL; +    struct mptcp_sock *msk; +    int local_ids = 0;

+    if (ctx->level != SOL_TCP || ctx->optname != TCP_IS_MPTCP) +        return 1;

+    msk = bpf_core_cast(sk, struct mptcp_sock); +    if (!msk || msk->pm.server_side || !msk->pm.subflows) +        return 1;

+    bpf_for_each(mptcp_subflow, subflow, (struct sock *)sk) { +        /* Here MPTCP-specific packet scheduler kfunc can be called: +         * this test is not doing anything really useful, only to

Lets fold the bpf_iter_mptcp_subflow addition into the future "mptcp_sched_ops" set (the github link that you mentioned in patch 2). Post them as one set to have a more practical example.

Thank you for this suggestion. We can delay that if needed.

Note that we have two struct_ops in preparation: mptcp_sched_ops and mptcp_pm_ops. We don't know which one will be ready first. They are both "blocked" by internal API modifications we would like to do to ease the maintenance later before "exposing" such API's via BPF. That's why we suggested to upstream this common part first as it is ready. But we can of course wait if you prefer.

This set is useful for discussing the questions you raised in patch 2.

I still don't see it useful to upstream patch 2 alone. The existing selftests/bpf/progs/mptcp_subflow.c has already shown a way to do similar iteration in SEC("cgroup/getsockopt") without patch 2.

I would prefer to wait for a fuller picture on the main struct_ops use case first to ensure that we didn't overlook things. iiuc, improving the iteration in SEC("cgroup/getsockopt") is not the main objective.

...

...
...
+         * verify the iteration works. +         */

+        local_ids += subflow->subflow_id;

+        /* only to check the following helper works */ +        ssk = mptcp_subflow_tcp_sock(subflow); +    }

+    if (!ssk) +        goto out;

+    /* assert: if not OK, something wrong on the kernel side */ +    if (ssk->sk_dport != ((struct sock *)msk)->sk_dport) +        goto out;

+    /* only to check the following kfunc works */ +    subflow = bpf_mptcp_subflow_ctx(ssk);

bpf_core_cast should be as good instead of adding a new bpf_mptcp_subflow_ctx() kfunc, so patch 1 should not be needed.

OK, indeed, in this series we don't need it. We will need it later to modify some fields from the "subflow" structure directly. We can do the

The "ssk" here is not a trusted pointer. Note that in patch 1, the kfunc bpf_mptcp_subflow_ctx() does not specify KF_TRUSTED_ARGS. I suspect it should be KF_TRUSTED_ARGS based on what you described here.

Matthieu Baerts

23 May 23 May

11:07 a.m.

New subject: [PATCH bpf-next/net v3 4/5] selftests/bpf: Add mptcp_subflow bpf_iter subtest

Hi Martin,

On 21/05/2025 00:18, Martin KaFai Lau wrote:

...

On 5/19/25 3:04 AM, Matthieu Baerts wrote:

...
...
...
+SEC("cgroup/getsockopt") +int iters_subflow(struct bpf_sockopt *ctx) +{ +    struct mptcp_subflow_context *subflow; +    struct bpf_sock *sk = ctx->sk; +    struct sock *ssk = NULL; +    struct mptcp_sock *msk; +    int local_ids = 0;

+    if (ctx->level != SOL_TCP || ctx->optname != TCP_IS_MPTCP) +        return 1;

+    msk = bpf_core_cast(sk, struct mptcp_sock); +    if (!msk || msk->pm.server_side || !msk->pm.subflows) +        return 1;

+    bpf_for_each(mptcp_subflow, subflow, (struct sock *)sk) { +        /* Here MPTCP-specific packet scheduler kfunc can be called: +         * this test is not doing anything really useful, only to

Lets fold the bpf_iter_mptcp_subflow addition into the future "mptcp_sched_ops" set (the github link that you mentioned in patch 2). Post them as one set to have a more practical example.

Thank you for this suggestion. We can delay that if needed.

Note that we have two struct_ops in preparation: mptcp_sched_ops and mptcp_pm_ops. We don't know which one will be ready first. They are both "blocked" by internal API modifications we would like to do to ease the maintenance later before "exposing" such API's via BPF. That's why we suggested to upstream this common part first as it is ready. But we can of course wait if you prefer.

This set is useful for discussing the questions you raised in patch 2.

I still don't see it useful to upstream patch 2 alone. The existing selftests/bpf/progs/mptcp_subflow.c has already shown a way to do similar iteration in SEC("cgroup/getsockopt") without patch 2.

I would prefer to wait for a fuller picture on the main struct_ops use case first to ensure that we didn't overlook things. iiuc, improving the iteration in SEC("cgroup/getsockopt") is not the main objective.

I understand, that makes sense. When the rest will be ready, we will upstream patches from this series, except this one ("useless" selftest), and restricting bpf_iter_mptcp_subflow_* and other new kfuncs to BPF_PROG_TYPE_STRUCT_OPS only. So not to BPF_PROG_TYPE_CGROUP_SOCKOPT any more which was only needed for this new test. I don't think this program type requires access to these new kfunc for useful use-cases. This can be changed later if required anyway.

...

...
...
...
+         * verify the iteration works. +         */

+        local_ids += subflow->subflow_id;

+        /* only to check the following helper works */ +        ssk = mptcp_subflow_tcp_sock(subflow); +    }

+    if (!ssk) +        goto out;

+    /* assert: if not OK, something wrong on the kernel side */ +    if (ssk->sk_dport != ((struct sock *)msk)->sk_dport) +        goto out;

+    /* only to check the following kfunc works */ +    subflow = bpf_mptcp_subflow_ctx(ssk);

bpf_core_cast should be as good instead of adding a new bpf_mptcp_subflow_ctx() kfunc, so patch 1 should not be needed.

OK, indeed, in this series we don't need it. We will need it later to modify some fields from the "subflow" structure directly. We can do the

The "ssk" here is not a trusted pointer. Note that in patch 1, the kfunc bpf_mptcp_subflow_ctx() does not specify KF_TRUSTED_ARGS. I suspect it should be KF_TRUSTED_ARGS based on what you described here.

Good point, I think this flag is indeed missing.

Cheers, Matt

-- Sponsored by the NGI0 Core fund.

Matthieu Baerts (NGI0)

20 Mar 20 Mar

5:48 p.m.

New subject: [PATCH bpf-next/net v3 5/5] selftests/bpf: Drop cgroup_fd of run_mptcpify

From: Geliang Tang tanggeliang@kylinos.cn

The parameter 'cgroup_fd' of run_mptcpify() is useless, drop it.

Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Notes: - v3: - New patch, simply to remove an unused parameter in the selftests. --- tools/testing/selftests/bpf/prog_tests/mptcp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index f37574b5ef68d8f32f8002df317869dfdf1d4b2d..f519c452884e7b6088de24b6c2a9e646954fe0d8 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -272,7 +272,7 @@ static int verify_mptcpify(int server_fd, int client_fd) return err; }

-static int run_mptcpify(int cgroup_fd) +static int run_mptcpify(void) { int server_fd, client_fd, err = 0; struct mptcpify *mptcpify_skel; @@ -325,7 +325,7 @@ static void test_mptcpify(void) if (!ASSERT_OK_PTR(netns, "netns_new")) goto fail;

- ASSERT_OK(run_mptcpify(cgroup_fd), "run_mptcpify"); + ASSERT_OK(run_mptcpify(), "run_mptcpify");

fail: netns_free(netns);

-- 2.48.1

221

days inactive

285

days old

linux-kselftest-mirror@lists.linaro.org

11 comments

participants

tags (0)

participants (3)

Martin KaFai Lau
Matthieu Baerts
Matthieu Baerts (NGI0)