From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Plesae find the v5 AccECN case handling patch series, which covers
several excpetional case handling of Accurate ECN spec (RFC9768),
adds new identifiers to be used by CC modules, adds ecn_delta into
rate_sample, and keeps the ACE counter for computation, etc.
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
Best regards,
Chia-Yu
---
v5:
- Move previous #11 in v4 in latter patch after discussion with RFC author.
- Add #3 to update the comments for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN. (Parav Pandit <parav(a)nvidia.com>)
- Add gro self-test for TCP CWR flag in #4. (Eric Dumazet <edumazet(a)google.com>)
- Add fixes: tag into #7 (Paolo Abeni <pabeni(a)redhat.com>)
- Update commit message of #8 and if condition check (Paolo Abeni <pabeni(a)redhat.com>)
- Add empty line between variable declarations and code in #13 (Paolo Abeni <pabeni(a)redhat.com>)
v4:
- Add previous #13 in v2 back after dicussion with the RFC author.
- Add TCP_ACCECN_OPTION_PERSIST to tcp_ecn_option sysctl to ignore AccECN fallback policy on sending AccECN option.
v3:
- Add additional min() check if pkts_acked_ewma is not initialized in #1. (Paolo Abeni <pabeni(a)redhat.com>)
- Change TCP_CONG_WANTS_ECT_1 into individual flag add helper function INET_ECN_xmit_wants_ect_1() in #3. (Paolo Abeni <pabeni(a)redhat.com>)
- Add empty line between variable declarations and code in #4. (Paolo Abeni <pabeni(a)redhat.com>)
- Update commit message to fix old AccECN commits in #5. (Paolo Abeni <pabeni(a)redhat.com>)
- Remove unnecessary brackets in #10. (Paolo Abeni <pabeni(a)redhat.com>)
- Move patch #3 in v2 to a later Prague patch serise and remove patch #13 in v2. (Paolo Abeni <pabeni(a)redhat.com>)
---
Chia-Yu Chang (12):
net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
selftests/net: gro: add self-test for TCP CWR flag
tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules
tcp: disable RFC3168 fallback identifier for CC modules
tcp: accecn: handle unexpected AccECN negotiation feedback
tcp: accecn: retransmit downgraded SYN in AccECN negotiation
tcp: move increment of num_retrans
tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN
SYN/ACK
tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
tcp: accecn: fallback outgoing half link to non-AccECN
tcp: accecn: detect loss ACK w/ AccECN option and add
TCP_ACCECN_OPTION_PERSIST
tcp: accecn: enable AccECN
Ilpo Järvinen (2):
tcp: try to avoid safer when ACKs are thinned
gro: flushing when CWR is set negatively affects AccECN
Documentation/networking/ip-sysctl.rst | 4 +-
.../networking/net_cachelines/tcp_sock.rst | 1 +
include/linux/skbuff.h | 13 ++-
include/linux/tcp.h | 4 +-
include/net/inet_ecn.h | 20 +++-
include/net/tcp.h | 32 ++++++-
include/net/tcp_ecn.h | 92 ++++++++++++++-----
net/ipv4/sysctl_net_ipv4.c | 4 +-
net/ipv4/tcp.c | 2 +
net/ipv4/tcp_cong.c | 10 +-
net/ipv4/tcp_input.c | 37 +++++++-
net/ipv4/tcp_minisocks.c | 40 +++++---
net/ipv4/tcp_offload.c | 3 +-
net/ipv4/tcp_output.c | 42 ++++++---
tools/testing/selftests/net/gro.c | 80 +++++++++++-----
15 files changed, 294 insertions(+), 90 deletions(-)
--
2.34.1
Hi all,
This series refactors the VMA count limit code to improve clarity,
test coverage, and observability.
The VMA count limit, controlled by sysctl_max_map_count, is a safeguard
that prevents a single process from consuming excessive kernel memory
by creating too many memory mappings.
A major change since v3 is the first patch in the series which instead of
attempting to fix overshooting the limit now documents that this is the
intended behavior. As Hugh pointed out, the lenient check (>) in do_mmap()
and do_brk_flags() is intentional to allow for potential VMA merges or
expansions when the process is at the sysctl_max_map_count limit.
The consensus is that this historical behavior is correct but non-obvious.
This series now focuses on making that behavior clear and the surrounding
code more robust. Based on feedback from Lorenzo and David, this series
retains the helper function and the rename of map_count.
The refined v4 series is now structured as follows:
1. Documents the lenient VMA count checks with comments to clarify
their purpose.
2. Adds a comprehensive selftest to codify the expected behavior at the
limit, including the lenient mmap case.
3. Introduces max_vma_count() to abstract the max map count sysctl,
making the sysctl static and converting all callers to use the new
helper.
4. Renames mm_struct->map_count to the more explicit vma_count for
better code clarity.
5. Adds a tracepoint for observability when a process fails to
allocate a VMA due to the count limit.
Tested on x86_64 and arm64:
1. Build test:
allyesconfig for rename
2. Selftests:
cd tools/testing/selftests/mm && \
make && \
./run_vmtests.sh -t max_vma_count
3. vma tests:
cd tools/testing/vma && \
make && \
./vma
Link to v3:
https://lore.kernel.org/r/20251013235259.589015-1-kaleshsingh@google.com/
Thanks to everyone for the valuable discussion on previous revisions.
-- Kalesh
Kalesh Singh (5):
mm: Document lenient map_count checks
mm/selftests: add max_vma_count tests
mm: Introduce max_vma_count() to abstract the max map count sysctl
mm: rename mm_struct::map_count to vma_count
mm/tracing: introduce trace_mm_insufficient_vma_slots event
MAINTAINERS | 2 +
fs/binfmt_elf.c | 2 +-
fs/coredump.c | 2 +-
include/linux/mm.h | 2 -
include/linux/mm_types.h | 2 +-
include/trace/events/vma.h | 32 +
kernel/fork.c | 2 +-
mm/debug.c | 2 +-
mm/internal.h | 3 +
mm/mmap.c | 25 +-
mm/mremap.c | 13 +-
mm/nommu.c | 8 +-
mm/util.c | 1 -
mm/vma.c | 42 +-
mm/vma_internal.h | 2 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/max_vma_count_tests.c | 716 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 5 +
tools/testing/vma/vma.c | 32 +-
tools/testing/vma/vma_internal.h | 13 +-
21 files changed, 856 insertions(+), 52 deletions(-)
create mode 100644 include/trace/events/vma.h
create mode 100644 tools/testing/selftests/mm/max_vma_count_tests.c
base-commit: b227c04932039bccc21a0a89cd6df50fa57e4716
--
2.51.1.851.g4ebd6896fd-goog
Overall, we encountered a warning [1] that can be triggered by running the
selftest I provided.
MPTCP creates subflows for data transmission between two endpoints.
However, BPF can use sockops to perform additional operations when TCP
completes the three-way handshake. The issue arose because we used sockmap
in sockops, which replaces sk->sk_prot and some handlers. Since subflows
also have their own specialized handlers, this creates a conflict and leads
to traffic failure. Therefore, we need to reject operations targeting
subflows.
This patchset simply prevents the combination of subflows and sockmap
without changing any functionality.
A complete integration of MPTCP and sockmap would require more effort, for
example, we would need to retrieve the parent socket from subflows in
sockmap and implement handlers like read_skb.
If maintainers don't object, we can further improve this in subsequent
work.
[1] truncated warning:
[ 18.234652] ------------[ cut here ]------------
[ 18.234664] WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 mptcp_stream_accept+0x34c/0x380
[ 18.234726] Modules linked in:
[ 18.234755] RIP: 0010:mptcp_stream_accept+0x34c/0x380
[ 18.234762] RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202
[ 18.234800] PKRU: 55555554
[ 18.234806] Call Trace:
[ 18.234810] <TASK>
[ 18.234837] do_accept+0xeb/0x190
[ 18.234861] ? __x64_sys_pselect6+0x61/0x80
[ 18.234898] ? _raw_spin_unlock+0x12/0x30
[ 18.234915] ? alloc_fd+0x11e/0x190
[ 18.234925] __sys_accept4+0x8c/0x100
[ 18.234930] __x64_sys_accept+0x1f/0x30
[ 18.234933] x64_sys_call+0x202f/0x20f0
[ 18.234966] do_syscall_64+0x72/0x9a0
[ 18.234979] ? switch_fpu_return+0x60/0xf0
[ 18.234993] ? irqentry_exit_to_user_mode+0xdb/0x1e0
[ 18.235002] ? irqentry_exit+0x3f/0x50
[ 18.235005] ? clear_bhb_loop+0x50/0xa0
[ 18.235022] ? clear_bhb_loop+0x50/0xa0
[ 18.235025] ? clear_bhb_loop+0x50/0xa0
[ 18.235028] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 18.235066] </TASK>
[ 18.235109] ---[ end trace 0000000000000000 ]---
---
v2: https://lore.kernel.org/bpf/20251020060503.325369-1-jiayuan.chen@linux.dev/…
Some advice suggested by Jakub Sitnicki
v1: https://lore.kernel.org/mptcp/a0a2b87119a06c5ffaa51427a0964a05534fe6f1@linu…
Some advice from Matthieu Baerts.
Jiayuan Chen (3):
net,mptcp: fix proto fallback detection with BPF sockmap
bpf,sockmap: disallow MPTCP sockets from sockmap
selftests/bpf: Add mptcp test with sockmap
net/core/sock_map.c | 27 ++++
net/mptcp/protocol.c | 9 +-
.../testing/selftests/bpf/prog_tests/mptcp.c | 150 ++++++++++++++++++
.../selftests/bpf/progs/mptcp_sockmap.c | 43 +++++
4 files changed, 227 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c
--
2.43.0
Some network selftests defined variable-sized types defined at the end of
struct causing -Wgnu-variable-sized-type-not-at-end warning.
warning:
timestamping.c:285:18: warning: field 'cm' with variable sized type
'struct cmsghdr' not at the end of a struct or class is a GNU
extension [-Wgnu-variable-sized-type-not-at-end]
285 | struct cmsghdr cm;
| ^
ipsec.c:835:5: warning: field 'u' with variable sized type 'union
(unnamed union at ipsec.c:831:3)' not at the end of a struct or class
is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
835 | } u;
| ^
This patch move these field at the end of struct to fix these warnings.
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com>
---
tools/testing/selftests/net/ipsec.c | 2 +-
tools/testing/selftests/net/timestamping.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/ipsec.c b/tools/testing/selftests/net/ipsec.c
index 0ccf484b1d9d..36083c8f884f 100644
--- a/tools/testing/selftests/net/ipsec.c
+++ b/tools/testing/selftests/net/ipsec.c
@@ -828,12 +828,12 @@ static int xfrm_state_pack_algo(struct nlmsghdr *nh, size_t req_sz,
struct xfrm_desc *desc)
{
struct {
+ char buf[XFRM_ALGO_KEY_BUF_SIZE];
union {
struct xfrm_algo alg;
struct xfrm_algo_aead aead;
struct xfrm_algo_auth auth;
} u;
- char buf[XFRM_ALGO_KEY_BUF_SIZE];
} alg = {};
size_t alen, elen, clen, aelen;
unsigned short type;
diff --git a/tools/testing/selftests/net/timestamping.c b/tools/testing/selftests/net/timestamping.c
index 044bc0e9ed81..ad2be2143698 100644
--- a/tools/testing/selftests/net/timestamping.c
+++ b/tools/testing/selftests/net/timestamping.c
@@ -282,8 +282,8 @@ static void recvpacket(int sock, int recvmsg_flags,
struct iovec entry;
struct sockaddr_in from_addr;
struct {
- struct cmsghdr cm;
char control[512];
+ struct cmsghdr cm;
} control;
int res;
--
2.51.0
This small patchset is about avoid verifier bug warning when tnum_overlap()
is called with zero mask intersection.
v2:
- fix runtime error
v1:
https://lore.kernel.org/all/20251026163806.3300636-1-kafai.wan@linux.dev/
---
KaFai Wan (2):
bpf: Fix tnum_overlap to check for zero mask intersection
selftests/bpf: Range analysis test case for JEQ
kernel/bpf/tnum.c | 2 ++
.../selftests/bpf/progs/verifier_bounds.c | 23 +++++++++++++++++++
2 files changed, 25 insertions(+)
--
2.43.0
Socket APIs like recvfrom(), accept(), and getsockname() expect socklen_t*
arg, but tests were using int variables. This causes -Wpointer-sign
warnings on platforms where socklen_t is unsigned.
Change the variable type from int to socklen_t to resolve the warning and
ensure type safety across platforms.
warning fixed:
sctp_collision.c:62:70: warning: passing 'int *' to parameter of
type 'socklen_t *' (aka 'unsigned int *') converts between pointers to
integer types with different sign [-Wpointer-sign]
62 | ret = recvfrom(sd, buf, sizeof(buf),
0, (struct sockaddr *)&daddr, &len);
| ^~~~
/usr/include/sys/socket.h:165:27: note: passing argument to
parameter '__addr_len' here
165 | socklen_t *__restrict __addr_len);
| ^
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com>
---
tools/testing/selftests/net/netfilter/sctp_collision.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/netfilter/sctp_collision.c b/tools/testing/selftests/net/netfilter/sctp_collision.c
index 21bb1cfd8a85..91df996367e9 100644
--- a/tools/testing/selftests/net/netfilter/sctp_collision.c
+++ b/tools/testing/selftests/net/netfilter/sctp_collision.c
@@ -9,7 +9,8 @@
int main(int argc, char *argv[])
{
struct sockaddr_in saddr = {}, daddr = {};
- int sd, ret, len = sizeof(daddr);
+ int sd, ret;
+ socklen_t len = sizeof(daddr);
struct timeval tv = {25, 0};
char buf[] = "hello";
--
2.51.0
From: Steven Rostedt <rostedt(a)goodmis.org>
The tracing selftest "event-filter-function.tc" was failing because it
first runs the "sample_events" function that triggers the kmem_cache_free
event and it looks at what function was used during a call to "ls".
But the first time it calls this, it could trigger events that are used to
pull pages into the page cache.
The rest of the test uses the function it finds during that call to see if
it will be called in subsequent "sample_events" calls. But if there's no
need to pull pages into the page cache, it will not trigger that function
and the test will fail.
Call the "sample_events" twice to trigger all the page cache work before
it calls it to find a function to use in subsequent checks.
Cc: stable(a)vger.kernel.org
Fixes: eb50d0f250e96 ("selftests/ftrace: Choose target function for filter test from samples")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
.../selftests/ftrace/test.d/filter/event-filter-function.tc | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
index c62165fabd0c..003f612f57b0 100644
--- a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
+++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
@@ -20,6 +20,10 @@ sample_events() {
echo 0 > tracing_on
echo 0 > events/enable
+# Clear functions caused by page cache; run sample_events twice
+sample_events
+sample_events
+
echo "Get the most frequently calling function"
echo > trace
sample_events
--
2.51.0
This is a follow up patch for commit 495d2d8133fd("selftests/bpf: Attempt
to build BPF programs with -Wsign-compare") from Alexei Starovoitov[1]
to be able to enable -Wsign-compare C compilation flag for clang since
-Wall doesn't add it and BPF programs are built with clang.This has the
benefit to catch problematic comparisons in future tests as quoted from
the commit message:"
int i = -1;
unsigned int j = 1;
if (i < j) // this is false.
long i = -1;
unsigned int j = 1;
if (i < j) // this is true.
C standard for reference:
- If either operand is unsigned long the other shall be converted to
unsigned long.
- Otherwise, if one operand is a long int and the other unsigned int,
then if a long int can represent all the values of an unsigned int,
the unsigned int shall be converted to a long int;
otherwise both operands shall be converted to unsigned long int.
- Otherwise, if either operand is long, the other shall be
converted to long.
- Otherwise, if either operand is unsigned, the other shall be
converted to unsigned.
Unfortunately clang's -Wsign-compare is very noisy.
It complains about (s32)a == (u32)b which is safe and doen't
have surprising behavior."
This specific patch supresses the following warnings when
-Wsign-compare is enabled:
1 warning generated.
progs/bpf_iter_bpf_percpu_array_map.c:35:16: warning: comparison of
integers of different signs: 'int' and 'const volatile __u32'
(aka 'const volatile unsigned int') [-Wsign-compare]
35 | for (i = 0; i < num_cpus; i++) {
| ~ ^ ~~~~~~~~
1 warning generated.
progs/bpf_qdisc_fifo.c:93:2: warning: comparison of integers of
different signs: 'int' and '__u32'
(aka 'unsigned int') [-Wsign-compare]
93 | bpf_for(i, 0, sch->q.qlen) {
| ^ ~ ~~~~~~~~~~~
Should be noted that many more similar changes are still needed in order
to be able to enable the -Wsign-compare flag since -Werror is enabled and
would cause compilation of bpf selftests to fail.
[1].
Link:https://github.com/torvalds/linux/commit/495d2d8133fd1407519170a5238f4…
Signed-off-by: Mehdi Ben Hadj Khelifa <mehdi.benhadjkhelifa(a)gmail.com>
---
Changelog:
Changes from v3:
-Downsized the patch as suggested by vivek yadav[2].
-Changed the commit message as suggested by Daniel Borkmann[3].
Link:https://lore.kernel.org/all/20250925103559.14876-1-mehdi.benhadjkhelif…
Changes from v2:
-Split up the patch into a patch series as suggested by vivek
-Include only changes to variable types with no casting by my mentor
david
-Removed the -Wsign-compare in Makefile to avoid compilation errors
until adding casting for rest of comparisons.
Link:https://lore.kernel.org/bpf/20250924195731.6374-1-mehdi.benhadjkhelifa…
Changes from v1:
- Fix CI failed builds where it failed due to do missing .c and
.h files in my patch for working in mainline.
Link:https://lore.kernel.org/bpf/20250924162408.815137-1-mehdi.benhadjkheli…
[2]:https://lore.kernel.org/all/CABPSWR7_w3mxr74wCDEF=MYYuG2F_vMJeD-dqotc8MD…
[3]:https://lore.kernel.org/all/5ad26663-a3cc-4bf4-9d6f-8213ac8e8ce6@iogearb…
.../testing/selftests/bpf/progs/bpf_iter_bpf_percpu_array_map.c | 2 +-
tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_array_map.c b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_array_map.c
index 9fdea8cd4c6f..0baf00463f35 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_array_map.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_array_map.c
@@ -24,7 +24,7 @@ int dump_bpf_percpu_array_map(struct bpf_iter__bpf_map_elem *ctx)
__u32 *key = ctx->key;
void *pptr = ctx->value;
__u32 step;
- int i;
+ __u32 i;
if (key == (void *)0 || pptr == (void *)0)
return 0;
diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c b/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c
index 1de2be3e370b..7a639dcb23a9 100644
--- a/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c
+++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c
@@ -88,7 +88,7 @@ void BPF_PROG(bpf_fifo_reset, struct Qdisc *sch)
{
struct bpf_list_node *node;
struct skb_node *skbn;
- int i;
+ __u32 i;
bpf_for(i, 0, sch->q.qlen) {
struct sk_buff *skb = NULL;
--
2.51.1.dirty