When I implemented virtio's hash-related features to tun/tap [1],
I found tun/tap does not fill the entire region reserved for the virtio
header, leaving some uninitialized hole in the middle of the buffer
after read()/recvmesg().
This series fills the uninitialized hole. More concretely, the
num_buffers field will be initialized with 1, and the other fields will
be inialized with 0. Setting the num_buffers field to 1 is mandated by
virtio 1.0 [2].
The change to virtio header is preceded by another change that refactors
tun and tap to unify their virtio-related code.
[1]: https://lore.kernel.org/r/20241008-rss-v5-0-f3cf68df005d@daynix.com
[2]: https://lore.kernel.org/r/20241227084256-mutt-send-email-mst@kernel.org/
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
Changes in v5:
- s/vnet_hdr_len_sz/vnet_hdr_sz/ for patch "tun: Decouple vnet handling"
(Willem de Bruijn)
- Changed to inline vnet implementations to TUN and TAP.
- Dropped patch "tun: Avoid double-tracking iov_iter length changes" and
"tap: Avoid double-tracking iov_iter length changes".
- Link to v4: https://lore.kernel.org/r/20250120-tun-v4-0-ee81dda03d7f@daynix.com
Changes in v4:
- s/sz/vnet_hdr_len_sz/ for patch "tun: Decouple vnet handling"
(Willem de Bruijn)
- Reverted to add CONFIG_TUN_VNET.
- Link to v3: https://lore.kernel.org/r/20250116-tun-v3-0-c6b2871e97f7@daynix.com
Changes in v3:
- Dropped changes to fill the vnet header.
- Splitted patch "tun: Unify vnet implementation".
- Reverted spurious changes in patch "tun: Unify vnet implementation".
- Merged tun_vnet.c into TAP.
- Link to v2: https://lore.kernel.org/r/20250109-tun-v2-0-388d7d5a287a@daynix.com
Changes in v2:
- Fixed num_buffers endian.
- Link to v1: https://lore.kernel.org/r/20250108-tun-v1-0-67d784b34374@daynix.com
---
Akihiko Odaki (7):
tun: Refactor CONFIG_TUN_VNET_CROSS_LE
tun: Keep hdr_len in tun_get_user()
tun: Decouple vnet from tun_struct
tun: Decouple vnet handling
tun: Extract the vnet handling code
tap: Keep hdr_len in tap_get_user()
tap: Use tun's vnet-related code
MAINTAINERS | 2 +-
drivers/net/tap.c | 168 ++++++------------------------------------
drivers/net/tun.c | 193 ++++++-------------------------------------------
drivers/net/tun_vnet.h | 184 ++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 231 insertions(+), 316 deletions(-)
---
base-commit: a32e14f8aef69b42826cf0998b068a43d486a9e9
change-id: 20241230-tun-66e10a49b0c7
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>
The current implementation of netconsole sends all log messages in
parallel, which can lead to an intermixed and interleaved output on the
receiving side. This makes it challenging to demultiplex the messages
and attribute them to their originating CPUs.
As a result, users and developers often struggle to effectively analyze
and debug the parallel log output received through netconsole.
Example of a message got from produciton hosts:
------------[ cut here ]------------
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 2 PID: 1613668 at lib/refcount.c:22 refcount_warn_saturate+0x5e/0xe0
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 26 PID: 4139916 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0xe0
Modules linked in: bpf_preload(E) vhost_net(E) tun(E) vhost(E)
This series of patches introduces a new feature to the netconsole
subsystem that allows the automatic population of the CPU number in the
userdata field for each log message. This enhancement provides several
benefits:
* Improved demultiplexing of parallel log output: When multiple CPUs are
sending messages concurrently, the added CPU number in the userdata
makes it easier to differentiate and attribute the messages to their
originating CPUs.
* Better visibility into message sources: The CPU number information
gives users and developers more insight into which specific CPU a
particular log message came from, which can be valuable for debugging
and analysis.
The changes in this series are as follows Patches::
Patch "consolidate send buffers into netconsole_target struct"
=================================================
Move the static buffers to netconsole target, from static declaration
in send_msg_no_fragmentation() and send_msg_fragmented().
Patch "netconsole: Rename userdata to extradata"
=================================================
Create the a concept of extradata, which encompasses the concept of
userdata and the upcoming sysdatao
Sysdata is a new concept being added, which is basically fields that are
populated by the kernel. At this time only the CPU#, but, there is a
desire to add current task name, kernel release version, etc.
Patch "netconsole: Helper to count number of used entries"
===========================================================
Create a simple helper to count number of entries in extradata. I am
separating this in a function since it will need to count userdata and
sysdata. For instance, when the user adds an extra userdata, we need to
check if there is space, counting the previous data entries (from
userdata and cpu data)
Patch "Introduce configfs helpers for sysdata features"
======================================================
Create the concept of sysdata feature in the netconsole target, and
create the configfs helpers to enable the bit in nt->sysdata
Patch "Include sysdata in extradata entry count"
================================================
Add the concept of sysdata when counting for available space in the
buffer. This will protect users from creating new userdata/sysdata if
there is no more space
Patch "netconsole: add support for sysdata and CPU population"
===============================================================
This is the core patch. Basically add a new option to enable automatic
CPU number population in the netconsole userdata Provides a new "cpu_nr"
sysfs attribute to control this feature
Patch "netconsole: selftest: test CPU number auto-population"
=============================================================
Expands the existing netconsole selftest to verify the CPU number
auto-population functionality Ensures the received netconsole messages
contain the expected "cpu=<CPU>" entry in the message. Test different
permutation with userdata
Patch "netconsole: docs: Add documentation for CPU number auto-population"
=============================================================================
Updates the netconsole documentation to explain the new CPU number
auto-population feature Provides instructions on how to enable and use
the feature
I believe these changes will be a valuable addition to the netconsole
subsystem, enhancing its usefulness for kernel developers and users.
PS: This patchset is on top of the patch that created
netcons_fragmented_msg selftest:
https://lore.kernel.org/all/20250203-netcons_frag_msgs-v1-1-5bc6bedf2ac0@de…
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v4:
- Fixed Kernel doc for netconsole_target (Simon)
- Fixed a typo in disable_sysdata_feature (Simon)
- Improved sysdata_cpu_nr_show() to return !! in a bit-wise operation
- Link to v3: https://lore.kernel.org/r/20250124-netcon_cpu-v3-0-12a0d286ba1d@debian.org
Changes in v3:
- Moved the buffer into netconsole_target, avoiding static functions in
the send path (Jakub).
- Fix a documentation error (Randy Dunlap)
- Created a function that handle all the extradata, consolidating it in
a single place (Jakub)
- Split the patch even more, trying to simplify the review.
- Link to v2: https://lore.kernel.org/r/20250115-netcon_cpu-v2-0-95971b44dc56@debian.org
Changes in v2:
- Create the concept of extradata and sysdata. This will make the design
easier to understand, and the code easier to read.
* Basically extradata encompasses userdata and the new sysdata.
Userdata originates from user, and sysdata originates in kernel.
- Improved the test to send from a very specific CPU, which can be
checked to be correct on the other side, as suggested by Jakub.
- Fixed a bug where CPU # was populated at the wrong place
- Link to v1: https://lore.kernel.org/r/20241113-netcon_cpu-v1-0-d187bf7c0321@debian.org
---
Breno Leitao (8):
netconsole: consolidate send buffers into netconsole_target struct
netconsole: Rename userdata to extradata
netconsole: Helper to count number of used entries
netconsole: Introduce configfs helpers for sysdata features
netconsole: Include sysdata in extradata entry count
netconsole: add support for sysdata and CPU population
netconsole: selftest: test for sysdata CPU
netconsole: docs: Add documentation for CPU number auto-population
Documentation/networking/netconsole.rst | 45 ++++
drivers/net/netconsole.c | 261 ++++++++++++++++-----
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 17 ++
.../selftests/drivers/net/netcons_sysdata.sh | 167 +++++++++++++
5 files changed, 427 insertions(+), 64 deletions(-)
---
base-commit: 145702ca2544bf82c6b244a49163f603fbe746f4
change-id: 20241108-netcon_cpu-ce3917e88f4b
Best regards,
--
Breno Leitao <leitao(a)debian.org>
Support for 32-bit s390 is very easy to implement and useful for
testing. For example I used to test some generic compat_ptr() logic,
which is only testable on 32-bit s390.
The series depends on my other series
"selftests/nolibc: test kernel configuration cleanups".
(It's not a hard dependency, only a minor diff conflict)
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Changes in v2:
- Rebase unto nolibc-next
- Use 96 bytes of stack frame size
- Pick up Ack from Willy
- Link to v1: https://lore.kernel.org/r/20250122-nolibc-s390-v1-0-8c765f00e871@weissschuh…
---
Thomas Weißschuh (2):
selftests/nolibc: rename s390 to s390x
tools/nolibc: add support for 32-bit s390
tools/include/nolibc/arch-s390.h | 5 +++++
tools/include/nolibc/arch.h | 2 +-
tools/testing/selftests/nolibc/Makefile | 10 ++++++++--
tools/testing/selftests/nolibc/run-tests.sh | 7 ++++++-
4 files changed, 20 insertions(+), 4 deletions(-)
---
base-commit: c1f4a7a84037249d086a4114c0c4332a260e9091
change-id: 20250122-nolibc-s390-e57141682c88
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
This series expands the XDP TX metadata framework to allow user
applications to pass per packet 64-bit launch time directly to the kernel
driver, requesting launch time hardware offload support. The XDP TX
metadata framework will not perform any clock conversion or packet
reordering.
Please note that the role of Tx metadata is just to pass the launch time,
not to enable the offload feature. Users will need to enable the launch
time hardware offload feature of the device by using the respective
command, such as the tc-etf command.
Although some devices use the tc-etf command to enable their launch time
hardware offload feature, xsk packets will not go through the etf qdisc.
Therefore, in my opinion, the launch time should always be based on the PTP
Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the
clock source.
To simplify the test steps, I modified the xdp_hw_metadata bpf self-test
tool in such a way that it will set the launch time based on the offset
provided by the user and the value of the Receive Hardware Timestamp, which
is against the PHC. This will eliminate the need to discipline System Clock
with the PHC and then use clock_gettime() to get the time.
Please note that AF_XDP lacks a feedback mechanism to inform the
application if the requested launch time is invalid. So, users are expected
to familiar with the horizon of the launch time of the device they use and
not request a launch time that is beyond the horizon. Otherwise, the driver
might interpret the launch time incorrectly and react wrongly. For stmmac
and igc, where modulo computation is used, a launch time larger than the
horizon will cause the device to transmit the packet earlier that the
requested launch time.
Although there is no feedback mechanism for the launch time request
for now, user still can check whether the requested launch time is
working or not, by requesting the Transmit Completion Hardware Timestamp.
V8:
- check the number of used descriptor in xsk_tx_metadata_request()
by using used_desc of struct igc_metadata_request, and then decreases
the budget with it (Maciej)
V7: https://lore.kernel.org/netdev/20250204004907.789330-1-yoong.siang.song@int…
- split the refactoring code of igc empty packet insertion into a separate
commit (Faizal)
- add explanation on why the value "4" is used as igc transmit budget
(Faizal)
- perform a stress test by sending 1000 packets with 10ms interval and
launch time set to 500us in the future (Faizal & Yong Liang)
V6: https://lore.kernel.org/netdev/20250116155350.555374-1-yoong.siang.song@int…
- fix selftest build errors by using asprintf() and realloc(), instead of
managing the buffer sizes manually (Daniel, Stanislav)
V5: https://lore.kernel.org/netdev/20250114152718.120588-1-yoong.siang.song@int…
- change netdev feature name from tx-launch-time to tx-launch-time-fifo
to explicitly state the FIFO behaviour (Stanislav)
- improve the looping of xdp_hw_metadata app to wait for packet tx
completion to be more readable by using clock_gettime() (Stanislav)
- add launch time setup steps into xdp_hw_metadata app (Stanislav)
V4: https://lore.kernel.org/netdev/20250106135506.9687-1-yoong.siang.song@intel…
- added XDP launch time support to the igc driver (Jesper & Florian)
- added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper)
- added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub)
- added step to enable launch time in the commit message (Jesper & Willem)
- explicitly documented the type of launch_time and which clock source
it is against (Willem)
V3: https://lore.kernel.org/netdev/20231203165129.1740512-1-yoong.siang.song@in…
- renamed to use launch time (Jesper & Willem)
- changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s
because some NICs do not support such a large future time.
V2: https://lore.kernel.org/netdev/20231201062421.1074768-1-yoong.siang.song@in…
- renamed to use Earliest TxTime First (Willem)
- renamed to use txtime (Willem)
V1: https://lore.kernel.org/netdev/20231130162028.852006-1-yoong.siang.song@int…
Song Yoong Siang (5):
xsk: Add launch time hardware offload support to XDP Tx metadata
selftests/bpf: Add launch time request to xdp_hw_metadata
net: stmmac: Add launch time support to XDP ZC
igc: Refactor empty packet insertion into a reusable function
igc: Add launch time support to XDP ZC
Documentation/netlink/specs/netdev.yaml | 4 +
Documentation/networking/xsk-tx-metadata.rst | 62 +++++++
drivers/net/ethernet/intel/igc/igc.h | 1 +
drivers/net/ethernet/intel/igc/igc_main.c | 94 +++++++---
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 +
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++
include/net/xdp_sock.h | 10 ++
include/net/xdp_sock_drv.h | 1 +
include/uapi/linux/if_xdp.h | 10 ++
include/uapi/linux/netdev.h | 3 +
net/core/netdev-genl.c | 2 +
net/xdp/xsk.c | 3 +
tools/include/uapi/linux/if_xdp.h | 10 ++
tools/include/uapi/linux/netdev.h | 3 +
tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++-
15 files changed, 359 insertions(+), 27 deletions(-)
--
2.34.1
The '-f' parameter is there to force the kernel to emit MPTCP FASTCLOSE
by closing the connection with unread bytes in the receive queue.
The xdisconnect() helper was used to stop the connection, but it does
more than that: it will shut it down, then wait before reconnecting to
the same address. This causes the mptcp_join's "fastclose test" to fail
all the time.
This failure is due to a recent change, with commit 218cc166321f
("selftests: mptcp: avoid spurious errors on disconnect"), but that went
unnoticed because the test is currently ignored. The recent modification
only shown an existing issue: xdisconnect() doesn't need to be used
here, only the shutdown() part is needed.
Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Notes:
- The failure was not clearly visible on NIPA, because the results for
the two impacted sub-tests are currently ignored (unstable). Still,
it looks important to fix that, as this will help when the tests will
be improved not to be unstable any more.
---
tools/testing/selftests/net/mptcp/mptcp_connect.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c
index 414addef9a4514c489ecd09249143fe0ce2af649..d240d02fa443a1cd802f0e705ab36db5c22063a8 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c
@@ -1302,7 +1302,7 @@ int main_loop(void)
return ret;
if (cfg_truncate > 0) {
- xdisconnect(fd);
+ shutdown(fd, SHUT_WR);
} else if (--cfg_repeat > 0) {
xdisconnect(fd);
---
base-commit: 4241a702e0d0c2ca9364cfac08dbf134264962de
change-id: 20250204-net-mptcp-sft-conn-f-d1c14274ba66
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
When compiling selftests files under tools/testing/selftests/pidfd/ ,
some compiling errors and warnings will pop out as the following.
CC pidfd_fdinfo_test
pidfd_fdinfo_test.c: In function ‘child_fdinfo_nspid_test’:
pidfd_fdinfo_test.c:231:13: warning: implicit declaration of function \
‘mount’ [-Wimplicit-function-declaration]
231 | r = mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0);
| ^~~~~
pidfd_fdinfo_test.c:231:36: error: ‘MS_REC’ undeclared \
(first use in this function)
231 | r = mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0);
| ^~~~~~
pidfd_fdinfo_test.c:231:36: note: each undeclared identifier is \
reported only once for each function it appears in
pidfd_fdinfo_test.c:231:45: error: ‘MS_PRIVATE’ undeclared \
(first use in this function); did you mean ‘MAP_PRIVATE’?
231 | r = mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0);
| ^~~~~~~~~~
| MAP_PRIVATE
pidfd_fdinfo_test.c:237:15: warning: implicit declaration of function \
‘umount2’; did you mean ‘SYS_umount2’? \
[-Wimplicit-function-declaration]
237 | (void)umount2("/proc", MNT_DETACH);
| ^~~~~~~
| SYS_umount2
pidfd_fdinfo_test.c:237:32: error: ‘MNT_DETACH’ undeclared \
(first use in this function)
237 | (void)umount2("/proc", MNT_DETACH);
| ^~~~~~~~~~
make: *** [../lib.mk:222: \
~/linux/tools/testing/selftests/pidfd/pidfd_fdinfo_test] Error 1
Solve these errors and warnings by including header file <sys/mount.h>.
Signed-off-by: I Hsin Cheng <richard120310(a)gmail.com>
---
tools/testing/selftests/pidfd/pidfd_fdinfo_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c b/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c
index f062a986e..f718aac75 100644
--- a/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c
+++ b/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c
@@ -13,6 +13,7 @@
#include <syscall.h>
#include <sys/wait.h>
#include <sys/mman.h>
+#include <sys/mount.h>
#include "pidfd.h"
#include "../kselftest.h"
--
2.43.0
6.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
commit 336d02bc4c6bec5c3d933e5d470a94970f830957 upstream.
When porting librseq commit:
commit c7b45750fa85 ("Adapt to glibc __rseq_size feature detection")
from librseq to the kernel selftests, the following line was missed
at the end of rseq_init():
rseq_size = get_rseq_kernel_feature_size();
which effectively leaves rseq_size initialized to -1U when glibc does not
have rseq support. glibc supports rseq from version 2.35 onwards.
In a following librseq commit
commit c67d198627c2 ("Only set 'rseq_size' on first thread registration")
to mimic the libc behavior, a new approach is taken: don't set the
feature size in 'rseq_size' until at least one thread has successfully
registered. This allows using 'rseq_size' in fast-paths to test for both
registration status and available features. The caveat is that on libc
either all threads are registered or none are, while with bare librseq
it is the responsability of the user to register all threads using rseq.
This combines the changes from the following librseq git commits:
commit c7b45750fa85 ("Adapt to glibc __rseq_size feature detection")
commit c67d198627c2 ("Only set 'rseq_size' on first thread registration")
Fixes: a0cc649353bb ("selftests/rseq: Fix mm_cid test failure")
Reported-by: Raghavendra Rao Ananta <rananta(a)google.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Raghavendra Rao Ananta <rananta(a)google.com>
Cc: Shuah Khan <skhan(a)linuxfoundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Carlos O'Donell <carlos(a)redhat.com>
Cc: Florian Weimer <fweimer(a)redhat.com>
Cc: Michael Jeanson <mjeanson(a)efficios.com>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
tools/testing/selftests/rseq/rseq.c | 32 +++++++++++++++++++++++++-------
tools/testing/selftests/rseq/rseq.h | 9 ++++++++-
2 files changed, 33 insertions(+), 8 deletions(-)
--- a/tools/testing/selftests/rseq/rseq.c
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -61,7 +61,6 @@ unsigned int rseq_size = -1U;
unsigned int rseq_flags;
static int rseq_ownership;
-static int rseq_reg_success; /* At least one rseq registration has succeded. */
/* Allocate a large area for the TLS. */
#define RSEQ_THREAD_AREA_ALLOC_SIZE 1024
@@ -152,14 +151,27 @@ int rseq_register_current_thread(void)
}
rc = sys_rseq(&__rseq_abi, get_rseq_min_alloc_size(), 0, RSEQ_SIG);
if (rc) {
- if (RSEQ_READ_ONCE(rseq_reg_success)) {
+ /*
+ * After at least one thread has registered successfully
+ * (rseq_size > 0), the registration of other threads should
+ * never fail.
+ */
+ if (RSEQ_READ_ONCE(rseq_size) > 0) {
/* Incoherent success/failure within process. */
abort();
}
return -1;
}
assert(rseq_current_cpu_raw() >= 0);
- RSEQ_WRITE_ONCE(rseq_reg_success, 1);
+
+ /*
+ * The first thread to register sets the rseq_size to mimic the libc
+ * behavior.
+ */
+ if (RSEQ_READ_ONCE(rseq_size) == 0) {
+ RSEQ_WRITE_ONCE(rseq_size, get_rseq_kernel_feature_size());
+ }
+
return 0;
}
@@ -235,12 +247,18 @@ void rseq_init(void)
return;
}
rseq_ownership = 1;
- if (!rseq_available()) {
- rseq_size = 0;
- return;
- }
+
+ /* Calculate the offset of the rseq area from the thread pointer. */
rseq_offset = (void *)&__rseq_abi - rseq_thread_pointer();
+
+ /* rseq flags are deprecated, always set to 0. */
rseq_flags = 0;
+
+ /*
+ * Set the size to 0 until at least one thread registers to mimic the
+ * libc behavior.
+ */
+ rseq_size = 0;
}
static __attribute__((destructor))
--- a/tools/testing/selftests/rseq/rseq.h
+++ b/tools/testing/selftests/rseq/rseq.h
@@ -60,7 +60,14 @@
extern ptrdiff_t rseq_offset;
/*
- * Size of the registered rseq area. 0 if the registration was
+ * The rseq ABI is composed of extensible feature fields. The extensions
+ * are done by appending additional fields at the end of the structure.
+ * The rseq_size defines the size of the active feature set which can be
+ * used by the application for the current rseq registration. Features
+ * starting at offset >= rseq_size are inactive and should not be used.
+ *
+ * The rseq_size is the intersection between the available allocation
+ * size for the rseq area and the feature size supported by the kernel.
* unsuccessful.
*/
extern unsigned int rseq_size;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
commit 336d02bc4c6bec5c3d933e5d470a94970f830957 upstream.
When porting librseq commit:
commit c7b45750fa85 ("Adapt to glibc __rseq_size feature detection")
from librseq to the kernel selftests, the following line was missed
at the end of rseq_init():
rseq_size = get_rseq_kernel_feature_size();
which effectively leaves rseq_size initialized to -1U when glibc does not
have rseq support. glibc supports rseq from version 2.35 onwards.
In a following librseq commit
commit c67d198627c2 ("Only set 'rseq_size' on first thread registration")
to mimic the libc behavior, a new approach is taken: don't set the
feature size in 'rseq_size' until at least one thread has successfully
registered. This allows using 'rseq_size' in fast-paths to test for both
registration status and available features. The caveat is that on libc
either all threads are registered or none are, while with bare librseq
it is the responsability of the user to register all threads using rseq.
This combines the changes from the following librseq git commits:
commit c7b45750fa85 ("Adapt to glibc __rseq_size feature detection")
commit c67d198627c2 ("Only set 'rseq_size' on first thread registration")
Fixes: a0cc649353bb ("selftests/rseq: Fix mm_cid test failure")
Reported-by: Raghavendra Rao Ananta <rananta(a)google.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Raghavendra Rao Ananta <rananta(a)google.com>
Cc: Shuah Khan <skhan(a)linuxfoundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Carlos O'Donell <carlos(a)redhat.com>
Cc: Florian Weimer <fweimer(a)redhat.com>
Cc: Michael Jeanson <mjeanson(a)efficios.com>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
tools/testing/selftests/rseq/rseq.c | 32 +++++++++++++++++++++++++-------
tools/testing/selftests/rseq/rseq.h | 9 ++++++++-
2 files changed, 33 insertions(+), 8 deletions(-)
--- a/tools/testing/selftests/rseq/rseq.c
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -61,7 +61,6 @@ unsigned int rseq_size = -1U;
unsigned int rseq_flags;
static int rseq_ownership;
-static int rseq_reg_success; /* At least one rseq registration has succeded. */
/* Allocate a large area for the TLS. */
#define RSEQ_THREAD_AREA_ALLOC_SIZE 1024
@@ -152,14 +151,27 @@ int rseq_register_current_thread(void)
}
rc = sys_rseq(&__rseq_abi, get_rseq_min_alloc_size(), 0, RSEQ_SIG);
if (rc) {
- if (RSEQ_READ_ONCE(rseq_reg_success)) {
+ /*
+ * After at least one thread has registered successfully
+ * (rseq_size > 0), the registration of other threads should
+ * never fail.
+ */
+ if (RSEQ_READ_ONCE(rseq_size) > 0) {
/* Incoherent success/failure within process. */
abort();
}
return -1;
}
assert(rseq_current_cpu_raw() >= 0);
- RSEQ_WRITE_ONCE(rseq_reg_success, 1);
+
+ /*
+ * The first thread to register sets the rseq_size to mimic the libc
+ * behavior.
+ */
+ if (RSEQ_READ_ONCE(rseq_size) == 0) {
+ RSEQ_WRITE_ONCE(rseq_size, get_rseq_kernel_feature_size());
+ }
+
return 0;
}
@@ -235,12 +247,18 @@ void rseq_init(void)
return;
}
rseq_ownership = 1;
- if (!rseq_available()) {
- rseq_size = 0;
- return;
- }
+
+ /* Calculate the offset of the rseq area from the thread pointer. */
rseq_offset = (void *)&__rseq_abi - rseq_thread_pointer();
+
+ /* rseq flags are deprecated, always set to 0. */
rseq_flags = 0;
+
+ /*
+ * Set the size to 0 until at least one thread registers to mimic the
+ * libc behavior.
+ */
+ rseq_size = 0;
}
static __attribute__((destructor))
--- a/tools/testing/selftests/rseq/rseq.h
+++ b/tools/testing/selftests/rseq/rseq.h
@@ -60,7 +60,14 @@
extern ptrdiff_t rseq_offset;
/*
- * Size of the registered rseq area. 0 if the registration was
+ * The rseq ABI is composed of extensible feature fields. The extensions
+ * are done by appending additional fields at the end of the structure.
+ * The rseq_size defines the size of the active feature set which can be
+ * used by the application for the current rseq registration. Features
+ * starting at offset >= rseq_size are inactive and should not be used.
+ *
+ * The rseq_size is the intersection between the available allocation
+ * size for the rseq area and the feature size supported by the kernel.
* unsuccessful.
*/
extern unsigned int rseq_size;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
commit 336d02bc4c6bec5c3d933e5d470a94970f830957 upstream.
When porting librseq commit:
commit c7b45750fa85 ("Adapt to glibc __rseq_size feature detection")
from librseq to the kernel selftests, the following line was missed
at the end of rseq_init():
rseq_size = get_rseq_kernel_feature_size();
which effectively leaves rseq_size initialized to -1U when glibc does not
have rseq support. glibc supports rseq from version 2.35 onwards.
In a following librseq commit
commit c67d198627c2 ("Only set 'rseq_size' on first thread registration")
to mimic the libc behavior, a new approach is taken: don't set the
feature size in 'rseq_size' until at least one thread has successfully
registered. This allows using 'rseq_size' in fast-paths to test for both
registration status and available features. The caveat is that on libc
either all threads are registered or none are, while with bare librseq
it is the responsability of the user to register all threads using rseq.
This combines the changes from the following librseq git commits:
commit c7b45750fa85 ("Adapt to glibc __rseq_size feature detection")
commit c67d198627c2 ("Only set 'rseq_size' on first thread registration")
Fixes: a0cc649353bb ("selftests/rseq: Fix mm_cid test failure")
Reported-by: Raghavendra Rao Ananta <rananta(a)google.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Raghavendra Rao Ananta <rananta(a)google.com>
Cc: Shuah Khan <skhan(a)linuxfoundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Carlos O'Donell <carlos(a)redhat.com>
Cc: Florian Weimer <fweimer(a)redhat.com>
Cc: Michael Jeanson <mjeanson(a)efficios.com>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
tools/testing/selftests/rseq/rseq.c | 32 ++++++++++++++++++++++-------
tools/testing/selftests/rseq/rseq.h | 9 +++++++-
2 files changed, 33 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
index 5b9772cdf265..f6156790c3b4 100644
--- a/tools/testing/selftests/rseq/rseq.c
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -61,7 +61,6 @@ unsigned int rseq_size = -1U;
unsigned int rseq_flags;
static int rseq_ownership;
-static int rseq_reg_success; /* At least one rseq registration has succeded. */
/* Allocate a large area for the TLS. */
#define RSEQ_THREAD_AREA_ALLOC_SIZE 1024
@@ -152,14 +151,27 @@ int rseq_register_current_thread(void)
}
rc = sys_rseq(&__rseq_abi, get_rseq_min_alloc_size(), 0, RSEQ_SIG);
if (rc) {
- if (RSEQ_READ_ONCE(rseq_reg_success)) {
+ /*
+ * After at least one thread has registered successfully
+ * (rseq_size > 0), the registration of other threads should
+ * never fail.
+ */
+ if (RSEQ_READ_ONCE(rseq_size) > 0) {
/* Incoherent success/failure within process. */
abort();
}
return -1;
}
assert(rseq_current_cpu_raw() >= 0);
- RSEQ_WRITE_ONCE(rseq_reg_success, 1);
+
+ /*
+ * The first thread to register sets the rseq_size to mimic the libc
+ * behavior.
+ */
+ if (RSEQ_READ_ONCE(rseq_size) == 0) {
+ RSEQ_WRITE_ONCE(rseq_size, get_rseq_kernel_feature_size());
+ }
+
return 0;
}
@@ -235,12 +247,18 @@ void rseq_init(void)
return;
}
rseq_ownership = 1;
- if (!rseq_available()) {
- rseq_size = 0;
- return;
- }
+
+ /* Calculate the offset of the rseq area from the thread pointer. */
rseq_offset = (void *)&__rseq_abi - rseq_thread_pointer();
+
+ /* rseq flags are deprecated, always set to 0. */
rseq_flags = 0;
+
+ /*
+ * Set the size to 0 until at least one thread registers to mimic the
+ * libc behavior.
+ */
+ rseq_size = 0;
}
static __attribute__((destructor))
diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
index 4e217b620e0c..062d10925a10 100644
--- a/tools/testing/selftests/rseq/rseq.h
+++ b/tools/testing/selftests/rseq/rseq.h
@@ -60,7 +60,14 @@
extern ptrdiff_t rseq_offset;
/*
- * Size of the registered rseq area. 0 if the registration was
+ * The rseq ABI is composed of extensible feature fields. The extensions
+ * are done by appending additional fields at the end of the structure.
+ * The rseq_size defines the size of the active feature set which can be
+ * used by the application for the current rseq registration. Features
+ * starting at offset >= rseq_size are inactive and should not be used.
+ *
+ * The rseq_size is the intersection between the available allocation
+ * size for the rseq area and the feature size supported by the kernel.
* unsuccessful.
*/
extern unsigned int rseq_size;
--
2.48.1
The thuge-gen test_mmap() and test_shmget() tests are repeatedly run for a
variety of sizes but always report the result of their test with the same
name, meaning that automated sysetms running the tests are unable to
distinguish between the various tests. Add the supplied sizes to the logged
test names to distinguish between runs.
Fixes: b38bd9b2c448 ("selftests/mm: thuge-gen: conform to TAP format output")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/mm/thuge-gen.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/thuge-gen.c b/tools/testing/selftests/mm/thuge-gen.c
index e4370b79b62ffb133056eb843cdd1eaeba6503df..cd5174d735be405220d99ae796a3768f53df6ea4 100644
--- a/tools/testing/selftests/mm/thuge-gen.c
+++ b/tools/testing/selftests/mm/thuge-gen.c
@@ -127,7 +127,7 @@ void test_mmap(unsigned long size, unsigned flags)
show(size);
ksft_test_result(size == getpagesize() || (before - after) == NUM_PAGES,
- "%s mmap\n", __func__);
+ "%s mmap %lu\n", __func__, size);
if (munmap(map, size * NUM_PAGES))
ksft_exit_fail_msg("%s: unmap %s\n", __func__, strerror(errno));
@@ -165,7 +165,7 @@ void test_shmget(unsigned long size, unsigned flags)
show(size);
ksft_test_result(size == getpagesize() || (before - after) == NUM_PAGES,
- "%s: mmap\n", __func__);
+ "%s: mmap %lu\n", __func__, size);
if (shmdt(map))
ksft_exit_fail_msg("%s: shmdt: %s\n", __func__, strerror(errno));
}
---
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
change-id: 20250204-kselftest-mm-fix-dups-076a48577184
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The mac address on backup slave should be convert from Solicited-Node
Multicast address, not from bonding unicast target address.
Hangbin Liu (2):
bonding: fix incorrect MAC address setting to receive NA messages
selftests: bonding: fix incorrect mac address
drivers/net/bonding/bond_options.c | 4 +++-
tools/testing/selftests/drivers/net/bonding/bond_options.sh | 4 ++--
2 files changed, 5 insertions(+), 3 deletions(-)
--
2.39.5 (Apple Git-154)
Support for 32-bit s390 is very easy to implement and useful for
testing. For example I used to test some generic compat_ptr() logic,
which is only testable on 32-bit s390.
The series depends on my other series
"selftests/nolibc: test kernel configuration cleanups".
(It's not a hard dependency, only a minor diff conflict)
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (2):
selftests/nolibc: rename s390 to s390x
tools/nolibc: add support for 32-bit s390
tools/include/nolibc/arch-s390.h | 5 +++++
tools/include/nolibc/arch.h | 2 +-
tools/testing/selftests/nolibc/Makefile | 10 ++++++++--
tools/testing/selftests/nolibc/run-tests.sh | 8 +++++++-
4 files changed, 21 insertions(+), 4 deletions(-)
---
base-commit: 0597614d84c8593ba906418bf3c0c0de1e02e82a
change-id: 20250122-nolibc-s390-e57141682c88
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Add a new selftest to verify netconsole's handling of messages that
exceed the packet size limit and require fragmentation. The test sends
messages with varying sizes and userdata, validating that:
1. Large messages are correctly fragmented and reassembled
2. Userdata fields are properly preserved across fragments
3. Messages work correctly with and without kernel release version
appending
The test creates a networking environment using netdevsim, sends
messages through /dev/kmsg, and verifies the received fragments maintain
message integrity.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Reviewed-by: Simon Horman <horms(a)kernel.org>
---
Changes compared to RFC
- Removed the long-line comment (Simon)
- Added Simon's reviwed-by tag.
- Link to v1: https://lore.kernel.org/r/20250131-netcons_frag_msgs-v1-1-0de83bf2a7e6@debi…
---
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 7 ++
.../drivers/net/netcons_fragmented_msg.sh | 122 +++++++++++++++++++++
3 files changed, 130 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile
index 137470bdee0c7fd2517bd1baafc12d575de4b4ac..c7f1c443f2af091aa13f67dd1df9ae05d7a43f40 100644
--- a/tools/testing/selftests/drivers/net/Makefile
+++ b/tools/testing/selftests/drivers/net/Makefile
@@ -7,6 +7,7 @@ TEST_INCLUDES := $(wildcard lib/py/*.py) \
TEST_PROGS := \
netcons_basic.sh \
+ netcons_fragmented_msg.sh \
netcons_overflow.sh \
ping.py \
queues.py \
diff --git a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh
index 3acaba41ac7b21aa2fd8457ed640a5ac8a41bc12..0c262b123fdd3082c40b2bd899ec626d223226ed 100644
--- a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh
+++ b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh
@@ -110,6 +110,13 @@ function create_dynamic_target() {
echo 1 > "${NETCONS_PATH}"/enabled
}
+# Do not append the release to the header of the message
+function disable_release_append() {
+ echo 0 > "${NETCONS_PATH}"/enabled
+ echo 0 > "${NETCONS_PATH}"/release
+ echo 1 > "${NETCONS_PATH}"/enabled
+}
+
function cleanup() {
local NSIM_DEV_SYS_DEL="/sys/bus/netdevsim/del_device"
diff --git a/tools/testing/selftests/drivers/net/netcons_fragmented_msg.sh b/tools/testing/selftests/drivers/net/netcons_fragmented_msg.sh
new file mode 100755
index 0000000000000000000000000000000000000000..4a71e01a230c75c14147a9937920fe0b68a0926a
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/netcons_fragmented_msg.sh
@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+# SPDX-License-Identifier: GPL-2.0
+
+# Test netconsole's message fragmentation functionality.
+#
+# When a message exceeds the maximum packet size, netconsole splits it into
+# multiple fragments for transmission. This test verifies:
+# - Correct fragmentation of large messages
+# - Proper reassembly of fragments at the receiver
+# - Preservation of userdata across fragments
+# - Behavior with and without kernel release version appending
+#
+# Author: Breno Leitao <leitao(a)debian.org>
+
+set -euo pipefail
+
+SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")
+
+source "${SCRIPTDIR}"/lib/sh/lib_netcons.sh
+
+modprobe netdevsim 2> /dev/null || true
+modprobe netconsole 2> /dev/null || true
+
+# The content of kmsg will be save to the following file
+OUTPUT_FILE="/tmp/${TARGET}"
+
+# set userdata to a long value. In this case, it is "1-2-3-4...50-"
+USERDATA_VALUE=$(printf -- '%.2s-' {1..60})
+
+# Convert the header string in a regexp, so, we can remove
+# the second header as well.
+# A header looks like "13,468,514729715,-,ncfrag=0/1135;". If
+# release is appended, you might find something like:L
+# "6.13.0-04048-g4f561a87745a,13,468,514729715,-,ncfrag=0/1135;"
+function header_to_regex() {
+ # header is everything before ;
+ local HEADER="${1}"
+ REGEX=$(echo "${HEADER}" | cut -d'=' -f1)
+ echo "${REGEX}=[0-9]*\/[0-9]*;"
+}
+
+# We have two headers in the message. Remove both to get the full message,
+# and extract the full message.
+function extract_msg() {
+ local MSGFILE="${1}"
+ # Extract the header, which is the very first thing that arrives in the
+ # first list.
+ HEADER=$(sed -n '1p' "${MSGFILE}" | cut -d';' -f1)
+ HEADER_REGEX=$(header_to_regex "${HEADER}")
+
+ # Remove the two headers from the received message
+ # This will return the message without any header, similarly to what
+ # was sent.
+ sed "s/""${HEADER_REGEX}""//g" "${MSGFILE}"
+}
+
+# Validate the message, which has two messages glued together.
+# unwrap them to make sure all the characters were transmitted.
+# File will look like the following:
+# 13,468,514729715,-,ncfrag=0/1135;<message>
+# key=<part of key>-13,468,514729715,-,ncfrag=967/1135;<rest of the key>
+function validate_fragmented_result() {
+ # Discard the netconsole headers, and assemble the full message
+ RCVMSG=$(extract_msg "${1}")
+
+ # check for the main message
+ if ! echo "${RCVMSG}" | grep -q "${MSG}"; then
+ echo "Message body doesn't match." >&2
+ echo "msg received=" "${RCVMSG}" >&2
+ exit "${ksft_fail}"
+ fi
+
+ # check userdata
+ if ! echo "${RCVMSG}" | grep -q "${USERDATA_VALUE}"; then
+ echo "message userdata doesn't match" >&2
+ echo "msg received=" "${RCVMSG}" >&2
+ exit "${ksft_fail}"
+ fi
+ # test passed. hooray
+}
+
+# Check for basic system dependency and exit if not found
+check_for_dependencies
+# Set current loglevel to KERN_INFO(6), and default to KERN_NOTICE(5)
+echo "6 5" > /proc/sys/kernel/printk
+# Remove the namespace, interfaces and netconsole target on exit
+trap cleanup EXIT
+# Create one namespace and two interfaces
+set_network
+# Create a dynamic target for netconsole
+create_dynamic_target
+# Set userdata "key" with the "value" value
+set_user_data
+
+
+# TEST 1: Send message and userdata. They will fragment
+# =======
+MSG=$(printf -- 'MSG%.3s=' {1..150})
+
+# Listen for netconsole port inside the namespace and destination interface
+listen_port_and_save_to "${OUTPUT_FILE}" &
+# Wait for socat to start and listen to the port.
+wait_local_port_listen "${NAMESPACE}" "${PORT}" udp
+# Send the message
+echo "${MSG}: ${TARGET}" > /dev/kmsg
+# Wait until socat saves the file to disk
+busywait "${BUSYWAIT_TIMEOUT}" test -s "${OUTPUT_FILE}"
+# Check if the message was not corrupted
+validate_fragmented_result "${OUTPUT_FILE}"
+
+# TEST 2: Test with smaller message, and without release appended
+# =======
+MSG=$(printf -- 'FOOBAR%.3s=' {1..100})
+# Let's disable release and test again.
+disable_release_append
+
+listen_port_and_save_to "${OUTPUT_FILE}" &
+wait_local_port_listen "${NAMESPACE}" "${PORT}" udp
+echo "${MSG}: ${TARGET}" > /dev/kmsg
+busywait "${BUSYWAIT_TIMEOUT}" test -s "${OUTPUT_FILE}"
+validate_fragmented_result "${OUTPUT_FILE}"
+exit "${ksft_pass}"
---
base-commit: 0ad9617c78acbc71373fb341a6f75d4012b01d69
change-id: 20250129-netcons_frag_msgs-91506d136f50
Best regards,
--
Breno Leitao <leitao(a)debian.org>
The script uses non-POSIX features like `[[` for conditionals and hence
does not work when run with a POSIX /bin/sh.
Change the shebang to /bin/bash instead, like the other tests in cgroup.
Signed-off-by: Bharadwaj Raju <bharadwaj.raju777(a)gmail.com>
---
tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh b/tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh
index 3f45512fb512..7406c24be1ac 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
#
# Test the special cpuset v1 hotplug case where a cpuset become empty of
--
2.43.0
Greetings:
Welcome to v3. No functional changes, see changelog below.
This is an attempt to followup on something Jakub asked me about [1],
adding an xsk attribute to queues and more clearly documenting which
queues are linked to NAPIs...
After the RFC [2], Jakub suggested creating an empty nest for queues
which have a pool, so I've adjusted this version to work that way.
The nest can be extended in the future to express attributes about XSK
as needed. Queues which are not used for AF_XDP do not have the xsk
attribute present.
I've run the included test on:
- my mlx5 machine (via NETIF=)
- without setting NETIF
And the test seems to pass in both cases.
Thanks,
Joe
[1]: https://lore.kernel.org/netdev/20250113143109.60afa59a@kernel.org/
[2]: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
v3:
- Change comment format in patch 2 to avoid kdoc warnings. No other
changes.
v2: https://lore.kernel.org/all/20250203185828.19334-1-jdamato@fastly.com/
- Switched from RFC to actual submission now that net-next is open
- Adjusted patch 1 to include an empty nest as suggested by Jakub
- Adjusted patch 2 to update the test based on changes to patch 1, and
to incorporate some Python feedback from Jakub :)
rfc: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
Joe Damato (2):
netdev-genl: Add an XSK attribute to queues
selftests: drv-net: Test queue xsk attribute
Documentation/netlink/specs/netdev.yaml | 13 ++-
include/uapi/linux/netdev.h | 6 ++
net/core/netdev-genl.c | 11 +++
tools/include/uapi/linux/netdev.h | 6 ++
.../testing/selftests/drivers/net/.gitignore | 2 +
tools/testing/selftests/drivers/net/Makefile | 3 +
tools/testing/selftests/drivers/net/queues.py | 35 +++++++-
.../selftests/drivers/net/xdp_helper.c | 89 +++++++++++++++++++
8 files changed, 162 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/.gitignore
create mode 100644 tools/testing/selftests/drivers/net/xdp_helper.c
base-commit: c2933b2befe25309f4c5cfbea0ca80909735fd76
--
2.43.0
Good day,
I sent you a message a few hours ago but no reply yet, or you didn't receive it? Kindly read my letter and reply back. I want to make an inquiry
Thanks.
Dr.Allen Cheng
Human Resource Manager | Product Research Assistant
FC Industrial Laboratories Ltd
This patch fixes a grammatical error in a test log message in
reuseaddr_ports_exhausted.c for better clarity as a part of lfx
application tasks
Signed-off-by: Pranav Tyagi <pranav.tyagi03(a)gmail.com>
---
tools/testing/selftests/net/reuseaddr_ports_exhausted.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/reuseaddr_ports_exhausted.c b/tools/testing/selftests/net/reuseaddr_ports_exhausted.c
index 066efd30e294..7b9bf8a7bbe1 100644
--- a/tools/testing/selftests/net/reuseaddr_ports_exhausted.c
+++ b/tools/testing/selftests/net/reuseaddr_ports_exhausted.c
@@ -112,7 +112,7 @@ TEST(reuseaddr_ports_exhausted_reusable_same_euid)
ASSERT_NE(-1, fd[0]) TH_LOG("failed to bind.");
if (opts->reuseport[0] && opts->reuseport[1]) {
- EXPECT_EQ(-1, fd[1]) TH_LOG("should fail to bind because both sockets succeed to be listened.");
+ EXPECT_EQ(-1, fd[1]) TH_LOG("should fail to bind because both sockets successfully listened.");
} else {
EXPECT_NE(-1, fd[1]) TH_LOG("should succeed to bind to connect to different destinations.");
}
--
2.47.1
This series expands the XDP TX metadata framework to allow user
applications to pass per packet 64-bit launch time directly to the kernel
driver, requesting launch time hardware offload support. The XDP TX
metadata framework will not perform any clock conversion or packet
reordering.
Please note that the role of Tx metadata is just to pass the launch time,
not to enable the offload feature. Users will need to enable the launch
time hardware offload feature of the device by using the respective
command, such as the tc-etf command.
Although some devices use the tc-etf command to enable their launch time
hardware offload feature, xsk packets will not go through the etf qdisc.
Therefore, in my opinion, the launch time should always be based on the PTP
Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the
clock source.
To simplify the test steps, I modified the xdp_hw_metadata bpf self-test
tool in such a way that it will set the launch time based on the offset
provided by the user and the value of the Receive Hardware Timestamp, which
is against the PHC. This will eliminate the need to discipline System Clock
with the PHC and then use clock_gettime() to get the time.
Please note that AF_XDP lacks a feedback mechanism to inform the
application if the requested launch time is invalid. So, users are expected
to familiar with the horizon of the launch time of the device they use and
not request a launch time that is beyond the horizon. Otherwise, the driver
might interpret the launch time incorrectly and react wrongly. For stmmac
and igc, where modulo computation is used, a launch time larger than the
horizon will cause the device to transmit the packet earlier that the
requested launch time.
Although there is no feedback mechanism for the launch time request
for now, user still can check whether the requested launch time is
working or not, by requesting the Transmit Completion Hardware Timestamp.
V7:
- split the refactoring code of igc empty packet insertion into a separate
commit (Faizal)
- add explanation on why the value "4" is used as igc transmit budget
(Faizal)
- perform a stress test by sending 1000 packets with 10ms interval and
launch time set to 500us in the future (Faizal & Yong Liang)
V6: https://lore.kernel.org/netdev/20250116155350.555374-1-yoong.siang.song@int…
- fix selftest build errors by using asprintf() and realloc(), instead of
managing the buffer sizes manually (Daniel, Stanislav)
V5: https://lore.kernel.org/netdev/20250114152718.120588-1-yoong.siang.song@int…
- change netdev feature name from tx-launch-time to tx-launch-time-fifo
to explicitly state the FIFO behaviour (Stanislav)
- improve the looping of xdp_hw_metadata app to wait for packet tx
completion to be more readable by using clock_gettime() (Stanislav)
- add launch time setup steps into xdp_hw_metadata app (Stanislav)
V4: https://lore.kernel.org/netdev/20250106135506.9687-1-yoong.siang.song@intel…
- added XDP launch time support to the igc driver (Jesper & Florian)
- added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper)
- added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub)
- added step to enable launch time in the commit message (Jesper & Willem)
- explicitly documented the type of launch_time and which clock source
it is against (Willem)
V3: https://lore.kernel.org/netdev/20231203165129.1740512-1-yoong.siang.song@in…
- renamed to use launch time (Jesper & Willem)
- changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s
because some NICs do not support such a large future time.
V2: https://lore.kernel.org/netdev/20231201062421.1074768-1-yoong.siang.song@in…
- renamed to use Earliest TxTime First (Willem)
- renamed to use txtime (Willem)
V1: https://lore.kernel.org/netdev/20231130162028.852006-1-yoong.siang.song@int…
Song Yoong Siang (5):
xsk: Add launch time hardware offload support to XDP Tx metadata
selftests/bpf: Add launch time request to xdp_hw_metadata
net: stmmac: Add launch time support to XDP ZC
igc: Refactor empty packet insertion into a reusable function
igc: Add launch time support to XDP ZC
Documentation/netlink/specs/netdev.yaml | 4 +
Documentation/networking/xsk-tx-metadata.rst | 62 +++++++
drivers/net/ethernet/intel/igc/igc_main.c | 84 ++++++---
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 +
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++
include/net/xdp_sock.h | 10 ++
include/net/xdp_sock_drv.h | 1 +
include/uapi/linux/if_xdp.h | 10 ++
include/uapi/linux/netdev.h | 3 +
net/core/netdev-genl.c | 2 +
net/xdp/xsk.c | 3 +
tools/include/uapi/linux/if_xdp.h | 10 ++
tools/include/uapi/linux/netdev.h | 3 +
tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++-
14 files changed, 348 insertions(+), 27 deletions(-)
--
2.34.1
From: Kevin Brodsky <kevin.brodsky(a)arm.com>
[ Upstream commit 46036188ea1f5266df23a6149dea0df1c77cd1c7 ]
The mm kselftests are currently built with no optimisation (-O0). It's
unclear why, and besides being obviously suboptimal, this also prevents
the pkeys tests from working as intended. Let's build all the tests with
-O2.
[kevin.brodsky(a)arm.com: silence unused-result warnings]
Link: https://lkml.kernel.org/r/20250107170110.2819685-1-kevin.brodsky@arm.com
Link: https://lkml.kernel.org/r/20241209095019.1732120-6-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky(a)arm.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna(a)oracle.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Joey Gouly <joey.gouly(a)arm.com>
Cc: Keith Lucas <keith.lucas(a)oracle.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
(cherry picked from commit 46036188ea1f5266df23a6149dea0df1c77cd1c7)
[Yifei: This commit also fix the failure of pkey_sighandler_tests_64,
which is also in linux-6.12.y, thus backport this commit]
Signed-off-by: Yifei Liu <yifei.l.liu(a)oracle.com>
---
tools/testing/selftests/mm/Makefile | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 02e1204971b0..c0138cb19705 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -33,9 +33,16 @@ endif
# LDLIBS.
MAKEFLAGS += --no-builtin-rules
-CFLAGS = -Wall -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES)
+CFLAGS = -Wall -O2 -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES)
LDLIBS = -lrt -lpthread -lm
+# Some distributions (such as Ubuntu) configure GCC so that _FORTIFY_SOURCE is
+# automatically enabled at -O1 or above. This triggers various unused-result
+# warnings where functions such as read() or write() are called and their
+# return value is not checked. Disable _FORTIFY_SOURCE to silence those
+# warnings.
+CFLAGS += -U_FORTIFY_SOURCE
+
TEST_GEN_FILES = cow
TEST_GEN_FILES += compaction_test
TEST_GEN_FILES += gup_longterm
--
2.46.0
For platforms on powerpc architecture with a default page size greater
than 4096, there was an inconsistency in fragment size calculation.
This caused the BPF selftest xdp_adjust_tail/xdp_adjust_frags_tail_grow
to fail on powerpc.
The issue occurred because the fragment buffer size in
bpf_prog_test_run_xdp() was set to 4096, while the actual data size in
the fragment within the shared skb was checked against PAGE_SIZE
(65536 on powerpc) in min_t, causing it to exceed 4096 and be set
accordingly. This discrepancy led to an overflow when
bpf_xdp_frags_increase_tail() checked for tailroom, as skb_frag_size(frag)
could be greater than rxq->frag_size (when PAGE_SIZE > 4096).
This commit updates the page size references to 4096 to ensure consistency
and prevent overflow issues in fragment size calculations.
Fixes: 1c1949982524 ("bpf: introduce frags support to bpf_prog_test_run_xdp()")
Signed-off-by: Saket Kumar Bhaskar <skb99(a)linux.ibm.com>
---
net/bpf/test_run.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 501ec4249..eb5476184 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -124,7 +124,7 @@ struct xdp_test_data {
* must be updated accordingly this gets changed, otherwise BPF selftests
* will fail.
*/
-#define TEST_XDP_FRAME_SIZE (PAGE_SIZE - sizeof(struct xdp_page_head))
+#define TEST_XDP_FRAME_SIZE (4096 - sizeof(struct xdp_page_head))
#define TEST_XDP_MAX_BATCH 256
static void xdp_test_run_init_page(netmem_ref netmem, void *arg)
@@ -660,7 +660,7 @@ static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size,
void __user *data_in = u64_to_user_ptr(kattr->test.data_in);
void *data;
- if (size < ETH_HLEN || size > PAGE_SIZE - headroom - tailroom)
+ if (size < ETH_HLEN || size > 4096 - headroom - tailroom)
return ERR_PTR(-EINVAL);
if (user_size > size)
@@ -1297,7 +1297,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
frag = &sinfo->frags[sinfo->nr_frags++];
data_len = min_t(u32, kattr->test.data_size_in - size,
- PAGE_SIZE);
+ 4096);
skb_frag_fill_page_desc(frag, page, 0, data_len);
if (copy_from_user(page_address(page), data_in + size,
--
2.43.5
Greetings:
This is an attempt to followup on something Jakub asked me about [1],
adding an xsk attribute to queues and more clearly documenting which
queues are linked to NAPIs...
After the RFC [2], Jakub suggested creating an empty nest for queues
which have a pool, so I've adjusted this version to work that way.
The nest can be extended in the future to express attributes about XSK
as needed. Queues which are not used for AF_XDP do not have the xsk
attribute present.
I've run the included test on:
- my mlx5 machine (via NETIF=)
- without setting NETIF
And the test seems to pass in both cases.
Thanks,
Joe
[1]: https://lore.kernel.org/netdev/20250113143109.60afa59a@kernel.org/
[2]: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
v2:
- Switched from RFC to actual submission now that net-next is open
- Adjusted patch 1 to include an empty nest as suggested by Jakub
- Adjusted patch 2 to update the test based on changes to patch 1, and
to incorporate some Python feedback from Jakub :)
rfc: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
Joe Damato (2):
netdev-genl: Add an XSK attribute to queues
selftests: drv-net: Test queue xsk attribute
Documentation/netlink/specs/netdev.yaml | 13 ++-
include/uapi/linux/netdev.h | 6 ++
net/core/netdev-genl.c | 11 +++
tools/include/uapi/linux/netdev.h | 6 ++
.../testing/selftests/drivers/net/.gitignore | 2 +
tools/testing/selftests/drivers/net/Makefile | 3 +
tools/testing/selftests/drivers/net/queues.py | 35 +++++++-
.../selftests/drivers/net/xdp_helper.c | 90 +++++++++++++++++++
8 files changed, 163 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/.gitignore
create mode 100644 tools/testing/selftests/drivers/net/xdp_helper.c
base-commit: c2933b2befe25309f4c5cfbea0ca80909735fd76
--
2.25.1
This series introduces support in the ARM PMUv3 driver for
partitioning PMU counters into two separate ranges by taking advantage
of the MDCR_EL2.HPMN register field.
The advantage of a partitioned PMU would be to allow KVM guests direct
access to a subset of PMU functionality, greatly reducing the overhead
of performance monitoring in guests.
While this series could be accepted on its own merits, practically
there is a lot more to be done before it will be fully useful, so I'm
sending as an RFC for now.
This patch is based on v6.13-rc7. It needs a small additional change
after Oliver's Debug cleanups series going into 6.14, specifically
this patch [1], because it changes kvm_arm_setup_mdcr_el2() to
initialize HPMN from a cached value read early in the boot process
instead of reading from the register. The only sensible way I can see
to deal with this is returning to reading the register.
[1] https://lore.kernel.org/kvmarm/20241219224116.3941496-3-oliver.upton@linux.…
Colton Lewis (4):
perf: arm_pmuv3: Introduce module param to partition the PMU
KVM: arm64: Make guests see only counters they can access
perf: arm_pmuv3: Generalize counter bitmasks
perf: arm_pmuv3: Keep out of guest counter partition
arch/arm/include/asm/arm_pmuv3.h | 10 ++
arch/arm64/include/asm/arm_pmuv3.h | 10 ++
arch/arm64/kvm/pmu-emul.c | 8 +-
drivers/perf/arm_pmuv3.c | 113 ++++++++++++++++--
include/linux/perf/arm_pmu.h | 2 +
include/linux/perf/arm_pmuv3.h | 34 +++++-
.../kvm/aarch64/vpmu_counter_access.c | 2 +-
7 files changed, 160 insertions(+), 19 deletions(-)
base-commit: 5bc55a333a2f7316b58edc7573e8e893f7acb532
--
2.48.1.262.g85cc9f2d1e-goog
On Wed, Nov 13, 2024 at 2:31 AM Paolo Bonzini <pbonzini(a)redhat.com> wrote:
>
>
>
> Il mar 12 nov 2024, 21:44 Doug Covelli <doug.covelli(a)broadcom.com> ha scritto:
>>
>> > Split irqchip should be the best tradeoff. Without it, moves from cr8
>> > stay in the kernel, but moves to cr8 always go to userspace with a
>> > KVM_EXIT_SET_TPR exit. You also won't be able to use Intel
>> > flexpriority (in-processor accelerated TPR) because KVM does not know
>> > which bits are set in IRR. So it will be *really* every move to cr8
>> > that goes to userspace.
>>
>> Sorry to hijack this thread but is there a technical reason not to allow CR8
>> based accesses to the TPR (not MMIO accesses) when the in-kernel local APIC is
>> not in use?
>
>
> No worries, you're not hijacking :) The only reason is that it would be more code for a seldom used feature and anyway with worse performance. (To be clear, CR8 based accesses are allowed, but stores cause an exit in order to check the new TPR against IRR. That's because KVM's API does not have an equivalent of the TPR threshold as you point out below).
I have not really looked at the code but it seems like it could also
simplify things as CR8 would be handled more uniformly regardless of
who is virtualizing the local APIC.
>> Also I could not find these documented anywhere but with MSFT's APIC our monitor
>> relies on extensions for trapping certain events such as INIT/SIPI plus LINT0
>> and SVR writes:
>>
>> UINT64 X64ApicInitSipiExitTrap : 1; // WHvRunVpExitReasonX64ApicInitSipiTrap
>> UINT64 X64ApicWriteLint0ExitTrap : 1; // WHvRunVpExitReasonX64ApicWriteTrap
>> UINT64 X64ApicWriteLint1ExitTrap : 1; // WHvRunVpExitReasonX64ApicWriteTrap
>> UINT64 X64ApicWriteSvrExitTrap : 1; // WHvRunVpExitReasonX64ApicWriteTrap
>
>
> There's no need for this in KVM's in-kernel APIC model. INIT and SIPI are handled in the hypervisor and you can get the current state of APs via KVM_GET_MPSTATE. LINT0 and LINT1 are injected with KVM_INTERRUPT and KVM_NMI respectively, and they obey IF/PPR and NMI blocking respectively, plus the interrupt shadow; so there's no need for userspace to know when LINT0/LINT1 themselves change. The spurious interrupt vector register is also handled completely in kernel.
I realize that KVM can handle LINT0/SVR updates themselves but our
interrupt subsystem relies on knowing the current values of these
registers even when not virtualizing the local APIC. I suppose we
could use KVM_GET_LAPIC to sync things up on demand but that seems
like it might nor be great from a performance point of view.
>> I did not see any similar functionality for KVM. Does anything like that exist?
>> In any case we would be happy to add support for handling CR8 accesses w/o
>> exiting w/o the in-kernel APIC along with some sort of a way to configure the
>> TPR threshold if folks are not opposed to that.
>
>
> As far I know everybody who's using KVM (whether proprietary or open source) has had no need for that, so I don't think it's a good idea to make the API more complex. Performance of Windows guests is going to be bad anyway with userspace APIC.
From what I have seen the exit cost with KVM is significantly lower
than with WHP/Hyper-V. I don't think performance of Windows guests
with userspace APIC emulation would be bad if CR8 exits could be
avoided (Linux guests perf isn't bad from what I have observed and the
main difference is the astronomical number of CR8 exits). It seems
like it would be pretty decent although I agree if you want the
absolute best performance then you would want to use the in kernel
APIC to speed up handling of ICR/EOI writes but those are relatively
infrequent compared to CR8 accesses .
Anyway I just saw Sean's response while writing this and it seems he
is not in favor of avoiding CR8 exits w/o the in kernel APIC either so
I suppose we will have to look into making use of the in kernel APIC.
Doug
> Paolo
>
>> Doug
>>
>> > > For now I think it makes sense to handle BDOOR_CMD_GET_VCPU_INFO at userlevel
>> > > like we do on Windows and macOS.
>> > >
>> > > BDOOR_CMD_GETTIME/BDOOR_CMD_GETTIMEFULL are similar with the former being
>> > > deprecated in favor of the latter. Both do essentially the same thing which is
>> > > to return the host OS's time - on Linux this is obtained via gettimeofday. I
>> > > believe this is mainly used by tools to fix up the VM's time when resuming from
>> > > suspend. I think it is fine to continue handling these at userlevel.
>> >
>> > As long as the TSC is not involved it should be okay.
>> >
>> > Paolo
>> >
>> > > > >> Anyway, one question apart from this: is the API the same for the I/O
>> > > > >> port and hypercall backdoors?
>> > > > >
>> > > > > Yeah the calls and arguments are the same. The hypercall based
>> > > > > interface is an attempt to modernize the backdoor since as you pointed
>> > > > > out the I/O based interface is kind of hacky as it bypasses the normal
>> > > > > checks for an I/O port access at CPL3. It would be nice to get rid of
>> > > > > it but unfortunately I don't think that will happen in the foreseeable
>> > > > > future as there are a lot of existing VMs out there with older SW that
>> > > > > still uses this interface.
>> > > >
>> > > > Yeah, but I think it still justifies that the KVM_ENABLE_CAP API can
>> > > > enable the hypercall but not the I/O port.
>> > > >
>> > > > Paolo
>> >
>>
>> --
>> This electronic communication and the information and any files transmitted
>> with it, or attached to it, are confidential and are intended solely for
>> the use of the individual or entity to whom it is addressed and may contain
>> information that is confidential, legally privileged, protected by privacy
>> laws, or otherwise restricted from disclosure to anyone else. If you are
>> not the intended recipient or the person responsible for delivering the
>> e-mail to the intended recipient, you are hereby notified that any use,
>> copying, distributing, dissemination, forwarding, printing, or copying of
>> this e-mail is strictly prohibited. If you received this e-mail in error,
>> please return the e-mail to the sender, delete it from your computer, and
>> destroy any printed copy of it.
>>
--
This electronic communication and the information and any files transmitted
with it, or attached to it, are confidential and are intended solely for
the use of the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for delivering the
e-mail to the intended recipient, you are hereby notified that any use,
copying, distributing, dissemination, forwarding, printing, or copying of
this e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.
Nolibc has support for riscv32. But the testsuite did not allow to test
it so far. Add a test configuration.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (6):
tools/nolibc: add support for waitid()
selftests/nolibc: use waitid() over waitpid()
selftests/nolibc: use a pipe to in vfprintf tests
selftests/nolibc: skip tests for unimplemented syscalls
selftests/nolibc: rename riscv to riscv64
selftests/nolibc: add configurations for riscv32
tools/include/nolibc/sys.h | 18 ++++++++++++
tools/testing/selftests/nolibc/Makefile | 11 +++++++
tools/testing/selftests/nolibc/nolibc-test.c | 44 ++++++++++++++++------------
tools/testing/selftests/nolibc/run-tests.sh | 2 +-
4 files changed, 56 insertions(+), 19 deletions(-)
---
base-commit: 499551201b5f4fd3c0618a3e95e3d0d15ea18f31
change-id: 20241219-nolibc-rv32-cff8a3e22394
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Two fixes for nullness elision. See commits for more details.
Daniel Xu (3):
bpf: verifier: Do not extract constant map keys for irrelevant maps
bpf: selftests: Test constant key extraction on irrelevant maps
bpf: verifier: Disambiguate get_constant_map_key() errors
kernel/bpf/verifier.c | 29 ++++++++++++++-----
.../bpf/progs/verifier_array_access.c | 15 ++++++++++
2 files changed, 36 insertions(+), 8 deletions(-)
--
2.47.1
Add a new selftest to verify netconsole's handling of messages that
exceed the packet size limit and require fragmentation. The test sends
messages with varying sizes and userdata, validating that:
1. Large messages are correctly fragmented and reassembled
2. Userdata fields are properly preserved across fragments
3. Messages work correctly with and without kernel release version
appending
The test creates a networking environment using netdevsim, sends
messages through /dev/kmsg, and verifies the received fragments maintain
message integrity.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 7 ++
.../drivers/net/netcons_fragmented_msg.sh | 122 +++++++++++++++++++++
3 files changed, 130 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile
index 137470bdee0c7fd2517bd1baafc12d575de4b4ac..c7f1c443f2af091aa13f67dd1df9ae05d7a43f40 100644
--- a/tools/testing/selftests/drivers/net/Makefile
+++ b/tools/testing/selftests/drivers/net/Makefile
@@ -7,6 +7,7 @@ TEST_INCLUDES := $(wildcard lib/py/*.py) \
TEST_PROGS := \
netcons_basic.sh \
+ netcons_fragmented_msg.sh \
netcons_overflow.sh \
ping.py \
queues.py \
diff --git a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh
index 3acaba41ac7b21aa2fd8457ed640a5ac8a41bc12..0c262b123fdd3082c40b2bd899ec626d223226ed 100644
--- a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh
+++ b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh
@@ -110,6 +110,13 @@ function create_dynamic_target() {
echo 1 > "${NETCONS_PATH}"/enabled
}
+# Do not append the release to the header of the message
+function disable_release_append() {
+ echo 0 > "${NETCONS_PATH}"/enabled
+ echo 0 > "${NETCONS_PATH}"/release
+ echo 1 > "${NETCONS_PATH}"/enabled
+}
+
function cleanup() {
local NSIM_DEV_SYS_DEL="/sys/bus/netdevsim/del_device"
diff --git a/tools/testing/selftests/drivers/net/netcons_fragmented_msg.sh b/tools/testing/selftests/drivers/net/netcons_fragmented_msg.sh
new file mode 100755
index 0000000000000000000000000000000000000000..d175d5b9db662ab9a6ee203794569cc620801a4f
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/netcons_fragmented_msg.sh
@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+# SPDX-License-Identifier: GPL-2.0
+
+# Test netconsole's message fragmentation functionality.
+#
+# When a message exceeds the maximum packet size, netconsole splits it into
+# multiple fragments for transmission. This test verifies:
+# - Correct fragmentation of large messages
+# - Proper reassembly of fragments at the receiver
+# - Preservation of userdata across fragments
+# - Behavior with and without kernel release version appending
+#
+# Author: Breno Leitao <leitao(a)debian.org>
+
+set -euo pipefail
+
+SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")
+
+source "${SCRIPTDIR}"/lib/sh/lib_netcons.sh
+
+modprobe netdevsim 2> /dev/null || true
+modprobe netconsole 2> /dev/null || true
+
+# The content of kmsg will be save to the following file
+OUTPUT_FILE="/tmp/${TARGET}"
+
+# set userdata to a long value. In this case, it is "1-2-3-4...50-"
+USERDATA_VALUE=$(printf -- '%.2s-' {1..60})
+
+# Convert the header string in a regexp, so, we can remove
+# the second header as well.
+# A header looks like "13,468,514729715,-,ncfrag=0/1135;". If
+# release is appended, you might find something like:L
+# "6.13.0-04048-g4f561a87745a,13,468,514729715,-,ncfrag=0/1135;"
+function header_to_regex() {
+ # header is everything before ;
+ local HEADER="${1}"
+ REGEX=$(echo "${HEADER}" | cut -d'=' -f1)
+ echo "${REGEX}=[0-9]*\/[0-9]*;"
+}
+
+# We have two headers in the message. Remove both to get the full message,
+# and extract the full message.
+function extract_msg() {
+ local MSGFILE="${1}"
+ # Extract the header, which is the very first thing that arrives in the
+ # first list.
+ HEADER=$(sed -n '1p' "${MSGFILE}" | cut -d';' -f1)
+ HEADER_REGEX=$(header_to_regex "${HEADER}")
+
+ # Remove the two headers from the received message
+ # This will return the message without any header, similarly to what
+ # was sent.
+ sed "s/""${HEADER_REGEX}""//g" "${MSGFILE}"
+}
+
+# Validate the message, which has two messages glued together.
+# unwrap them to make sure all the characters were transmitted.
+# File will look like the following:
+# 13,468,514729715,-,ncfrag=0/1135;MSG1=MSG2=MSG3=MSG4=MSG5=MSG6=MSG7=MSG8=MSG9=MSG10=MSG11=MSG12=MSG13=MSG14=MSG15=MSG16=MSG17=MSG18=MSG19=MSG20=MSG21=MSG22=MSG23=MSG24=MSG25=MSG26=MSG27=MSG28=MSG29=MSG30=MSG31=MSG32=MSG33=MSG34=MSG35=MSG36=MSG37=MSG38=MSG39=MSG40=MSG41=MSG42=MSG43=MSG44=MSG45=MSG46=MSG47=MSG48=MSG49=MSG50=MSG51=MSG52=MSG53=MSG54=MSG55=MSG56=MSG57=MSG58=MSG59=MSG60=MSG61=MSG62=MSG63=MSG64=MSG65=MSG66=MSG67=MSG68=MSG69=MSG70=MSG71=MSG72=MSG73=MSG74=MSG75=MSG76=MSG77=MSG78=MSG79=MSG80=MSG81=MSG82=MSG83=MSG84=MSG85=MSG86=MSG87=MSG88=MSG89=MSG90=MSG91=MSG92=MSG93=MSG94=MSG95=MSG96=MSG97=MSG98=MSG99=MSG100=MSG101=MSG102=MSG103=MSG104=MSG105=MSG106=MSG107=MSG108=MSG109=MSG110=MSG111=MSG112=MSG113=MSG114=MSG115=MSG116=MSG117=MSG118=MSG119=MSG120=MSG121=MSG122=MSG123=MSG124=MSG125=MSG126=MSG127=MSG128=MSG129=MSG130=MSG131=MSG132=MSG133=MSG134=MSG135=MSG136=MSG137=MSG138=MSG139=MSG140=MSG141=MSG142=MSG143=MSG144=MSG145=MSG146=MSG147=MSG148=MSG149=MSG150=: netcons_nzmJQ
+# key=1-2-13,468,514729715,-,ncfrag=967/1135;3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26-27-28-29-30-31-32-33-34-35-36-37-38-39-40-41-42-43-44-45-46-47-48-49-50-51-52-53-54-55-56-57-58-59-60-
+function validate_fragmented_result() {
+ # Discard the netconsole headers, and assemble the full message
+ RCVMSG=$(extract_msg "${1}")
+
+ # check for the main message
+ if ! echo "${RCVMSG}" | grep -q "${MSG}"; then
+ echo "Message body doesn't match." >&2
+ echo "msg received=" "${RCVMSG}" >&2
+ exit "${ksft_fail}"
+ fi
+
+ # check userdata
+ if ! echo "${RCVMSG}" | grep -q "${USERDATA_VALUE}"; then
+ echo "message userdata doesn't match" >&2
+ echo "msg received=" "${RCVMSG}" >&2
+ exit "${ksft_fail}"
+ fi
+ # test passed. hooray
+}
+
+# Check for basic system dependency and exit if not found
+check_for_dependencies
+# Set current loglevel to KERN_INFO(6), and default to KERN_NOTICE(5)
+echo "6 5" > /proc/sys/kernel/printk
+# Remove the namespace, interfaces and netconsole target on exit
+trap cleanup EXIT
+# Create one namespace and two interfaces
+set_network
+# Create a dynamic target for netconsole
+create_dynamic_target
+# Set userdata "key" with the "value" value
+set_user_data
+
+
+# TEST 1: Send message and userdata. They will fragment
+# =======
+MSG=$(printf -- 'MSG%.3s=' {1..150})
+
+# Listen for netconsole port inside the namespace and destination interface
+listen_port_and_save_to "${OUTPUT_FILE}" &
+# Wait for socat to start and listen to the port.
+wait_local_port_listen "${NAMESPACE}" "${PORT}" udp
+# Send the message
+echo "${MSG}: ${TARGET}" > /dev/kmsg
+# Wait until socat saves the file to disk
+busywait "${BUSYWAIT_TIMEOUT}" test -s "${OUTPUT_FILE}"
+# Check if the message was not corrupted
+validate_fragmented_result "${OUTPUT_FILE}"
+
+# TEST 2: Test with smaller message, and without release appended
+# =======
+MSG=$(printf -- 'FOOBAR%.3s=' {1..100})
+# Let's disable release and test again.
+disable_release_append
+
+listen_port_and_save_to "${OUTPUT_FILE}" &
+wait_local_port_listen "${NAMESPACE}" "${PORT}" udp
+echo "${MSG}: ${TARGET}" > /dev/kmsg
+busywait "${BUSYWAIT_TIMEOUT}" test -s "${OUTPUT_FILE}"
+validate_fragmented_result "${OUTPUT_FILE}"
+exit "${ksft_pass}"
---
base-commit: 0ad9617c78acbc71373fb341a6f75d4012b01d69
change-id: 20250129-netcons_frag_msgs-91506d136f50
Best regards,
--
Breno Leitao <leitao(a)debian.org>
Hi all,
This patch series continues the work to migrate the *.sh tests into
prog_tests framework.
test_xdp_redirect_multi.sh tests the XDP redirections done through
bpf_redirect_map().
This is already partly covered by test_xdp_veth.c that already tests
map redirections at XDP level. What isn't covered yet by test_xdp_veth is
the use of the broadcast flags (BPF_F_BROADCAST or BPF_F_EXCLUDE_INGRESS)
and XDP egress programs.
Hence, this patch series add test cases to test_xdp_veth.c to get rid of
the test_xdp_redirect_multi.sh:
- PATCH 1 Add an helper to generate unique names
- PATCH 2 to 9 rework test_xdp_veth to make it more generic and allow to
configure different test cases
- PATCH 10 adds test cases for 'classic' bpf_redirect_map()
- PATCH 11 and 12 cover the broadcast flags
- PATCH 13 covers the XDP egress programs
- PATCH 14 removes test_xdp_redirect_multi.sh
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
---
Changes in v4:
- Remove the NO_IP #define
- append_tid() takes string's size as input to ensure there is enough
space to fit the thread ID at the end
- Fix PATCH 12's commit log
- Link to v3: https://lore.kernel.org/r/20250128-redirect-multi-v3-0-c1ce69997c01@bootlin…
Changes in v3:
- Add append_tid() helper and use unique names to allow parallel testing
- Check create_network()'s return value through ASSERT_OK()
- Remove check_ping() and unused defines
- Change next_veth type (from string to int)
- Link to v2: https://lore.kernel.org/r/20250121-redirect-multi-v2-0-fc9cacabc6b2@bootlin…
Changes in v2:
- Use serial_test_* to avoid conflict between tests
- Link to v1: https://lore.kernel.org/r/20250121-redirect-multi-v1-0-b215e35ff505@bootlin…
---
Bastien Curutchet (eBPF Foundation) (14):
selftests/bpf: helpers: Add append_tid()
selftests/bpf: test_xdp_veth: Remove unused defines
selftests/bpf: test_xdp_veth: Remove unecessarry check_ping()
selftests/bpf: test_xdp_veth: Use int to describe next veth
selftests/bpf: test_xdp_veth: Split network configuration
selftests/bpf: test_xdp_veth: Rename config[]
selftests/bpf: test_xdp_veth: Add prog_config[] table
selftests/bpf: test_xdp_veth: Add XDP flags to prog_configuration
selftests/bpf: test_xdp_veth: Use unique names
selftests/bpf: test_xdp_veth: Add new test cases for XDP flags
selftests/bpf: Optionally select broadcasting flags
selftests/bpf: test_xdp_veth: Add XDP broadcast redirection tests
selftests/bpf: test_xdp_veth: Add XDP program on egress test
selftests/bpf: Remove test_xdp_redirect_multi.sh
tools/testing/selftests/bpf/Makefile | 2 -
tools/testing/selftests/bpf/network_helpers.c | 17 +
tools/testing/selftests/bpf/network_helpers.h | 12 +
.../selftests/bpf/prog_tests/test_xdp_veth.c | 588 ++++++++++++++++-----
.../testing/selftests/bpf/progs/xdp_redirect_map.c | 89 ++++
.../selftests/bpf/progs/xdp_redirect_multi_kern.c | 41 +-
.../selftests/bpf/test_xdp_redirect_multi.sh | 214 --------
tools/testing/selftests/bpf/xdp_redirect_multi.c | 226 --------
8 files changed, 615 insertions(+), 574 deletions(-)
---
base-commit: 421ec9c8f46a25743870a8cbaff76de293752e00
change-id: 20250103-redirect-multi-245d6eafb5d1
Best regards,
--
Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
PTRACE_SET_SYSCALL_INFO is a generic ptrace API that complements
PTRACE_GET_SYSCALL_INFO by letting the ptracer modify details of
system calls the tracee is blocked in.
This API allows ptracers to obtain and modify system call details
in a straightforward and architecture-agnostic way.
Current implementation supports changing only those bits of system call
information that are used by strace, namely, syscall number, syscall
arguments, and syscall return value.
Support of changing additional details returned by PTRACE_GET_SYSCALL_INFO,
such as instruction pointer and stack pointer, could be added later if
needed, by using struct ptrace_syscall_info.flags to specify the additional
details that should be set. Currently, "flags" and "reserved" fields of
struct ptrace_syscall_info must be initialized with zeroes; "arch",
"instruction_pointer", and "stack_pointer" fields are ignored.
PTRACE_SET_SYSCALL_INFO currently supports only PTRACE_SYSCALL_INFO_ENTRY,
PTRACE_SYSCALL_INFO_EXIT, and PTRACE_SYSCALL_INFO_SECCOMP operations.
Other operations could be added later if needed.
Ideally, PTRACE_SET_SYSCALL_INFO should have been introduced along with
PTRACE_GET_SYSCALL_INFO, but it didn't happen. The last straw that
convinced me to implement PTRACE_SET_SYSCALL_INFO was apparent failure
to provide an API of changing the first system call argument on riscv
architecture [1].
ptrace(2) man page:
long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);
...
PTRACE_SET_SYSCALL_INFO
Modify information about the system call that caused the stop.
The "data" argument is a pointer to struct ptrace_syscall_info
that specifies the system call information to be set.
The "addr" argument should be set to sizeof(struct ptrace_syscall_info)).
[1] https://lore.kernel.org/all/59505464-c84a-403d-972f-d4b2055eeaac@gmail.com/
Notes:
v4:
* Split out syscall_set_return_value() for hexagon into a separate patch
* s390: Change the style of syscall_set_arguments() implementation as
requested
* Add more Reviewed-by
* v3: https://lore.kernel.org/all/20250128091445.GA8257@strace.io/
v3:
* powerpc: Submit syscall_set_return_value() fix for "sc" case separately
* mips: Do not introduce erroneous argument truncation on mips n32,
add a detailed description to the commit message of the
mips_get_syscall_arg() change
* ptrace: Add explicit padding to the end of struct ptrace_syscall_info,
simplify obtaining of user ptrace_syscall_info,
do not introduce PTRACE_SYSCALL_INFO_SIZE_VER0
* ptrace: Change the return type of ptrace_set_syscall_info_* functions
from "unsigned long" to "int"
* ptrace: Add -ERANGE check to ptrace_set_syscall_info_exit(),
add comments to -ERANGE checks
* ptrace: Update comments about supported syscall stops
* selftests: Extend set_syscall_info test, fix for mips n32
* Add Tested-by and Reviewed-by
v2:
* Add patch to fix syscall_set_return_value() on powerpc
* Add patch to fix mips_get_syscall_arg() on mips
* Add syscall_set_return_value() implementation on hexagon
* Add syscall_set_return_value() invocation to syscall_set_nr()
on arm and arm64.
* Fix syscall_set_nr() and mips_set_syscall_arg() on mips
* Add a comment to syscall_set_nr() on arc, powerpc, s390, sh,
and sparc
* Remove redundant ptrace_syscall_info.op assignments in
ptrace_get_syscall_info_*
* Minor style tweaks in ptrace_get_syscall_info_op()
* Remove syscall_set_return_value() invocation from
ptrace_set_syscall_info_entry()
* Skip syscall_set_arguments() invocation in case of syscall number -1
in ptrace_set_syscall_info_entry()
* Split ptrace_syscall_info.reserved into ptrace_syscall_info.reserved
and ptrace_syscall_info.flags
* Use __kernel_ulong_t instead of unsigned long in set_syscall_info test
v1:
Dmitry V. Levin (7):
mips: fix mips_get_syscall_arg() for o32
hexagon: add syscall_set_return_value()
syscall.h: add syscall_set_arguments()
syscall.h: introduce syscall_set_nr()
ptrace_get_syscall_info: factor out ptrace_get_syscall_info_op
ptrace: introduce PTRACE_SET_SYSCALL_INFO request
selftests/ptrace: add a test case for PTRACE_SET_SYSCALL_INFO
arch/arc/include/asm/syscall.h | 25 +
arch/arm/include/asm/syscall.h | 37 ++
arch/arm64/include/asm/syscall.h | 29 +
arch/csky/include/asm/syscall.h | 13 +
arch/hexagon/include/asm/syscall.h | 21 +
arch/loongarch/include/asm/syscall.h | 15 +
arch/m68k/include/asm/syscall.h | 7 +
arch/microblaze/include/asm/syscall.h | 7 +
arch/mips/include/asm/syscall.h | 70 ++-
arch/nios2/include/asm/syscall.h | 16 +
arch/openrisc/include/asm/syscall.h | 13 +
arch/parisc/include/asm/syscall.h | 19 +
arch/powerpc/include/asm/syscall.h | 20 +
arch/riscv/include/asm/syscall.h | 16 +
arch/s390/include/asm/syscall.h | 21 +
arch/sh/include/asm/syscall_32.h | 24 +
arch/sparc/include/asm/syscall.h | 22 +
arch/um/include/asm/syscall-generic.h | 19 +
arch/x86/include/asm/syscall.h | 43 ++
arch/xtensa/include/asm/syscall.h | 18 +
include/asm-generic/syscall.h | 30 +
include/uapi/linux/ptrace.h | 7 +-
kernel/ptrace.c | 179 +++++-
tools/testing/selftests/ptrace/Makefile | 2 +-
.../selftests/ptrace/set_syscall_info.c | 514 ++++++++++++++++++
25 files changed, 1140 insertions(+), 47 deletions(-)
create mode 100644 tools/testing/selftests/ptrace/set_syscall_info.c
--
ldv
Commit 4094871db1d6 ("udp: only do GSO if # of segs > 1") avoided GSO
for small packets. But the kernel currently dismisses GSO requests only
after checking MTU/PMTU on gso_size. This means any packets, regardless
of their payload sizes, could be dropped when PMTU becomes smaller than
requested gso_size. We encountered this issue in production and it
caused a reliability problem that new QUIC connection cannot be
established before PMTU cache expired, while non GSO sockets still
worked fine at the same time.
Ideally, do not check any GSO related constraints when payload size is
smaller than requested gso_size, and return EMSGSIZE instead of EINVAL
on MTU/PMTU check failure to be more specific on the error cause.
Fixes: 4094871db1d6 ("udp: only do GSO if # of segs > 1")
Signed-off-by: Yan Zhai <yan(a)cloudflare.com>
Suggested-by: Willem de Bruijn <willemdebruijn.kernel(a)gmail.com>
---
v2->v3: simplify the code; adding two test cases
v1->v2: add a missing MTU check when fall back to no GSO mode; Fixed up
commit message to be more precise.
v2: https://lore.kernel.org/netdev/Z5swit7ykNRbJFMS@debian.debian/T/#u
v1: https://lore.kernel.org/all/Z5cgWh%2F6bRQm9vVU@debian.debian/
---
net/ipv4/udp.c | 4 ++--
net/ipv6/udp.c | 4 ++--
tools/testing/selftests/net/udpgso.c | 26 ++++++++++++++++++++++++++
3 files changed, 30 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c472c9a57cf6..a9bb9ce5438e 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1141,9 +1141,9 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
const int hlen = skb_network_header_len(skb) +
sizeof(struct udphdr);
- if (hlen + cork->gso_size > cork->fragsize) {
+ if (hlen + min(datalen, cork->gso_size) > cork->fragsize) {
kfree_skb(skb);
- return -EINVAL;
+ return -EMSGSIZE;
}
if (datalen > cork->gso_size * UDP_MAX_SEGMENTS) {
kfree_skb(skb);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 6671daa67f4f..c6ea438b5c75 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1389,9 +1389,9 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
const int hlen = skb_network_header_len(skb) +
sizeof(struct udphdr);
- if (hlen + cork->gso_size > cork->fragsize) {
+ if (hlen + min(datalen, cork->gso_size) > cork->fragsize) {
kfree_skb(skb);
- return -EINVAL;
+ return -EMSGSIZE;
}
if (datalen > cork->gso_size * UDP_MAX_SEGMENTS) {
kfree_skb(skb);
diff --git a/tools/testing/selftests/net/udpgso.c b/tools/testing/selftests/net/udpgso.c
index 3f2fca02fec5..36ff28af4b19 100644
--- a/tools/testing/selftests/net/udpgso.c
+++ b/tools/testing/selftests/net/udpgso.c
@@ -102,6 +102,19 @@ struct testcase testcases_v4[] = {
.gso_len = CONST_MSS_V4,
.r_num_mss = 1,
},
+ {
+ /* datalen <= MSS < gso_len: will fall back to no GSO */
+ .tlen = CONST_MSS_V4,
+ .gso_len = CONST_MSS_V4 + 1,
+ .r_num_mss = 0,
+ .r_len_last = CONST_MSS_V4,
+ },
+ {
+ /* MSS < datalen < gso_len: fail */
+ .tlen = CONST_MSS_V4 + 1,
+ .gso_len = CONST_MSS_V4 + 2,
+ .tfail = true,
+ },
{
/* send a single MSS + 1B */
.tlen = CONST_MSS_V4 + 1,
@@ -205,6 +218,19 @@ struct testcase testcases_v6[] = {
.gso_len = CONST_MSS_V6,
.r_num_mss = 1,
},
+ {
+ /* datalen <= MSS < gso_len: will fall back to no GSO */
+ .tlen = CONST_MSS_V6,
+ .gso_len = CONST_MSS_V6 + 1,
+ .r_num_mss = 0,
+ .r_len_last = CONST_MSS_V6,
+ },
+ {
+ /* MSS < datalen < gso_len: fail */
+ .tlen = CONST_MSS_V6 + 1,
+ .gso_len = CONST_MSS_V6 + 2,
+ .tfail = true
+ },
{
/* send a single MSS + 1B */
.tlen = CONST_MSS_V6 + 1,
--
2.30.2
After submitting the patch implementing the SO_RCVPRIORITY socket option
(https://lore.kernel.org/netdev/20241213084457.45120-5-annaemesenyiri@gmail.…),
it was requested to include a test for the functionality. As a first step, write
a test that also validates the SO_RCVMARK value, since no existing test covers
it. If this combined test is not suitable, I will provide a standalone
test specifically for SO_RCVPRIORITY and submit it separately.
Anna Emese Nyiri (1):
add support for testing SO_RCVMARK and SO_RCVPRIORITY
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/so_rcv_listener.c | 147 ++++++++++++++++++
tools/testing/selftests/net/test_so_rcv.sh | 56 +++++++
3 files changed, 204 insertions(+)
create mode 100644 tools/testing/selftests/net/so_rcv_listener.c
create mode 100755 tools/testing/selftests/net/test_so_rcv.sh
--
2.43.0
This patch allows progs to elide a null check on statically known map
lookup keys. In other words, if the verifier can statically prove that
the lookup will be in-bounds, allow the prog to drop the null check.
This is useful for two reasons:
1. Large numbers of nullness checks (especially when they cannot fail)
unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ.
2. It forms a tighter contract between programmer and verifier.
For (1), bpftrace is starting to make heavier use of percpu scratch
maps. As a result, for user scripts with large number of unrolled loops,
we are starting to hit jump complexity verification errors. These
percpu lookups cannot fail anyways, as we only use static key values.
Eliding nullness probably results in less work for verifier as well.
For (2), percpu scratch maps are often used as a larger stack, as the
currrent stack is limited to 512 bytes. In these situations, it is
desirable for the programmer to express: "this lookup should never fail,
and if it does, it means I messed up the code". By omitting the null
check, the programmer can "ask" the verifier to double check the logic.
=== Changelog ===
Changes in v7:
* Use more accurate frame number when marking precise
* Add test for non-stack key
* Test for marking stack slot precise
Changes in v6:
* Use is_spilled_scalar_reg() helper and remove unnecessary comment
* Add back deleted selftest with different helper to dirty dst buffer
* Check size of spill is exactly key_size and update selftests
* Read slot_type from correct offset into the spi
* Rewrite selftests in C where possible
* Mark constant map keys as precise
Changes in v5:
* Dropped all acks
* Use s64 instead of long for const_map_key
* Ensure stack slot contains spilled reg before accessing spilled_ptr
* Ensure spilled reg is a scalar before accessing tnum const value
* Fix verifier selftest for 32-bit write to write at 8 byte alignment
to ensure spill is tracked
* Introduce more precise tracking of helper stack accesses
* Do constant map key extraction as part of helper argument processing
and then remove duplicated stack checks
* Use ret_flag instead of regs[BPF_REG_0].type
* Handle STACK_ZERO
* Fix bug in bpf_load_hdr_opt() arg annotation
Changes in v4:
* Only allow for CAP_BPF
* Add test for stack growing upwards
* Improve comment about stack growing upwards
Changes in v3:
* Check if stack is (erroneously) growing upwards
* Mention in commit message why existing tests needed change
Changes in v2:
* Added a check for when R2 is not a ptr to stack
* Added a check for when stack is uninitialized (no stack slot yet)
* Updated existing tests to account for null elision
* Added test case for when R2 can be both const and non-const
Daniel Xu (5):
bpf: verifier: Add missing newline on verbose() call
bpf: tcp: Mark bpf_load_hdr_opt() arg2 as read-write
bpf: verifier: Refactor helper access type tracking
bpf: verifier: Support eliding map lookup nullness
bpf: selftests: verifier: Add nullness elision tests
kernel/bpf/verifier.c | 139 ++++++++++---
net/core/filter.c | 2 +-
.../testing/selftests/bpf/progs/dynptr_fail.c | 6 +-
tools/testing/selftests/bpf/progs/iters.c | 14 +-
.../selftests/bpf/progs/map_kptr_fail.c | 2 +-
.../selftests/bpf/progs/test_global_func10.c | 2 +-
.../selftests/bpf/progs/uninit_stack.c | 5 +-
.../bpf/progs/verifier_array_access.c | 188 ++++++++++++++++++
.../bpf/progs/verifier_basic_stack.c | 2 +-
.../selftests/bpf/progs/verifier_const_or.c | 4 +-
.../progs/verifier_helper_access_var_len.c | 12 +-
.../selftests/bpf/progs/verifier_int_ptr.c | 2 +-
.../selftests/bpf/progs/verifier_map_in_map.c | 2 +-
.../selftests/bpf/progs/verifier_mtu.c | 2 +-
.../selftests/bpf/progs/verifier_raw_stack.c | 4 +-
.../selftests/bpf/progs/verifier_unpriv.c | 2 +-
.../selftests/bpf/progs/verifier_var_off.c | 8 +-
tools/testing/selftests/bpf/verifier/calls.c | 2 +-
.../testing/selftests/bpf/verifier/map_kptr.c | 2 +-
19 files changed, 331 insertions(+), 69 deletions(-)
--
2.47.1
Fix the build failure caused by the undefined `CLONE_NEWTIME`.
Include the `linux/sched.h` header file where the function is defined to
ensure successful compilation of the selftests.
Signed-off-by: Purva Yeshi <purvayeshi550(a)gmail.com>
---
tools/testing/selftests/vDSO/vdso_test_getrandom.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/vDSO/vdso_test_getrandom.c b/tools/testing/selftests/vDSO/vdso_test_getrandom.c
index 95057f7567db..b2c9cf15878b 100644
--- a/tools/testing/selftests/vDSO/vdso_test_getrandom.c
+++ b/tools/testing/selftests/vDSO/vdso_test_getrandom.c
@@ -29,6 +29,8 @@
#include "vdso_config.h"
#include "vdso_call.h"
+#include <linux/sched.h>
+
#ifndef timespecsub
#define timespecsub(tsp, usp, vsp) \
do { \
--
2.34.1
A few cleanups and optimizations for the management of the kernel
configuration.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Changes in v2:
- Rebase on current torvalds/master
- Call "make defconfig" separately
- Link to v1: https://lore.kernel.org/r/20250122-nolibc-config-v1-0-a697db968b49@weisssch…
---
Thomas Weißschuh (5):
selftests/nolibc: drop custom EXTRACONFIG functionality
selftests/nolibc: drop call to prepare target
selftests/nolibc: drop call to mrproper target
selftests/nolibc: execute defconfig before other targets
selftests/nolibc: always keep test kernel configuration up to date
tools/testing/selftests/nolibc/Makefile | 17 +++++------------
tools/testing/selftests/nolibc/run-tests.sh | 7 +++----
2 files changed, 8 insertions(+), 16 deletions(-)
---
base-commit: 21266b8df5224c4f677acf9f353eecc9094731f0
change-id: 20250122-nolibc-config-d639e1612c93
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>