1. Issue
Syzkaller reported this issue [1].
2. Reproduce
We can reproduce this issue by using the test_sockmap_with_close_on_write()
test I provided in selftest, also you need to apply the following patch to
ensure 100% reproducibility (sleep after checking sock):
'''
static void sk_psock_verdict_data_ready(struct sock *sk)
{
.......
if (unlikely(!sock))
return;
+ if (!strcmp("test_progs", current->comm)) {
+ printk("sleep 2s to wait socket freed\n");
+ mdelay(2000);
+ printk("sleep end\n");
+ }
ops = READ_ONCE(sock->ops);
if (!ops || !ops->read_skb)
return;
}
'''
Then running './test_progs -v sockmap_basic', and if the kernel has KASAN
enabled [2], you will see the following warning:
'''
BUG: KASAN: slab-use-after-free in sk_psock_verdict_data_ready+0x29b/0x2d0
Read of size 8 at addr ffff88813a777020 by task test_progs/47055
Tainted: [O]=OOT_MODULE
Call Trace:
<TASK>
dump_stack_lvl+0x53/0x70
print_address_description.constprop.0+0x30/0x420
? sk_psock_verdict_data_ready+0x29b/0x2d0
print_report+0xb7/0x270
? sk_psock_verdict_data_ready+0x29b/0x2d0
? kasan_addr_to_slab+0xd/0xa0
? sk_psock_verdict_data_ready+0x29b/0x2d0
kasan_report+0xca/0x100
? sk_psock_verdict_data_ready+0x29b/0x2d0
sk_psock_verdict_data_ready+0x29b/0x2d0
unix_stream_sendmsg+0x4a6/0xa40
? __pfx_unix_stream_sendmsg+0x10/0x10
? fdget+0x2c1/0x3a0
__sys_sendto+0x39c/0x410
'''
3. Reason
'''
CPU0 CPU1
unix_stream_sendmsg(sk):
other = unix_peer(sk)
other->sk_data_ready(other):
socket *sock = sk->sk_socket
if (unlikely(!sock))
return;
close(other):
...
other->close()
free(socket)
READ_ONCE(sock->ops)
^
use 'sock' after free
'''
For TCP, UDP, or other protocols, we have already performed
rcu_read_lock() when the network stack receives packets in ip_input.c:
'''
ip_local_deliver_finish():
rcu_read_lock()
ip_protocol_deliver_rcu()
xxx_rcv
rcu_read_unlock()
'''
However, for Unix sockets, sk_data_ready is called directly from the
process context without rcu_read_lock() protection.
4. Solution
Based on the fact that the 'struct socket' is released using call_rcu(),
We add rcu_read_{un}lock() at the entrance and exit of our sk_data_ready.
It will not increase performance overhead, at least for TCP and UDP, they
are already in a relatively large critical section.
Of course, we can also add a custom callback for Unix sockets and call
rcu_read_lock() before calling _verdict_data_ready like this:
'''
if (sk_is_unix(sk))
sk->sk_data_ready = sk_psock_verdict_data_ready_rcu;
else
sk->sk_data_ready = sk_psock_verdict_data_ready;
sk_psock_verdict_data_ready_rcu():
rcu_read_lock()
sk_psock_verdict_data_ready()
rcu_read_unlock()
'''
However, this will cause too many branches, and it's not suitable to
distinguish network protocols in skmsg.c.
[1] https://syzkaller.appspot.com/bug?extid=dd90a702f518e0eac072
[2] https://syzkaller.appspot.com/text?tag=KernelConfig&x=1362a5aee630ff34
---
v1 -> v2:
1. Add Fixes tag.
2. Extend selftest of edge case for TCP/UDP sockets.
3. Add Reviewed-by and Acked-by tag.
https://lore.kernel.org/bpf/20250226132242.52663-1-jiayuan.chen@linux.dev/T…
Jiayuan Chen (3):
bpf, sockmap: avoid using sk_socket after free
selftests/bpf: Add socketpair to create_pair to support unix socket
selftests/bpf: Add edge case tests for sockmap
net/core/skmsg.c | 18 ++++--
.../selftests/bpf/prog_tests/socket_helpers.h | 13 +++-
.../selftests/bpf/prog_tests/sockmap_basic.c | 59 +++++++++++++++++++
3 files changed, 84 insertions(+), 6 deletions(-)
--
2.47.1
This series expands the XDP TX metadata framework to allow user
applications to pass per packet 64-bit launch time directly to the kernel
driver, requesting launch time hardware offload support. The XDP TX
metadata framework will not perform any clock conversion or packet
reordering.
Please note that the role of Tx metadata is just to pass the launch time,
not to enable the offload feature. Users will need to enable the launch
time hardware offload feature of the device by using the respective
command, such as the tc-etf command.
Although some devices use the tc-etf command to enable their launch time
hardware offload feature, xsk packets will not go through the etf qdisc.
Therefore, in my opinion, the launch time should always be based on the PTP
Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the
clock source.
To simplify the test steps, I modified the xdp_hw_metadata bpf self-test
tool in such a way that it will set the launch time based on the offset
provided by the user and the value of the Receive Hardware Timestamp, which
is against the PHC. This will eliminate the need to discipline System Clock
with the PHC and then use clock_gettime() to get the time.
Please note that AF_XDP lacks a feedback mechanism to inform the
application if the requested launch time is invalid. So, users are expected
to familiar with the horizon of the launch time of the device they use and
not request a launch time that is beyond the horizon. Otherwise, the driver
might interpret the launch time incorrectly and react wrongly. For stmmac
and igc, where modulo computation is used, a launch time larger than the
horizon will cause the device to transmit the packet earlier that the
requested launch time.
Although there is no feedback mechanism for the launch time request
for now, user still can check whether the requested launch time is
working or not, by requesting the Transmit Completion Hardware Timestamp.
v12:
- Fix the comment in include/uapi/linux/if_xdp.h to allign with what is
generated by ./tools/net/ynl/ynl-regen.sh to avoid dirty tree error in
the netdev/ynl checks.
v11: https://lore.kernel.org/netdev/20250216074302.956937-1-yoong.siang.song@int…
- regenerate netdev_xsk_flags based on latest netdev.yaml (Jakub)
v10: https://lore.kernel.org/netdev/20250207021943.814768-1-yoong.siang.song@int…
- use net_err_ratelimited(), instead of net_ratelimit() (Maciej)
- accumulate the amount of used descs in local variable and update the
igc_metadata_request::used_desc once (Maciej)
- Ensure reverse christmas tree rule (Maciej)
V9: https://lore.kernel.org/netdev/20250206060408.808325-1-yoong.siang.song@int…
- Remove the igc_desc_unused() checking (Maciej)
- Ensure that skb allocation and DMA mapping work before proceeding to
fill in igc_tx_buffer info, context desc, and data desc (Maciej)
- Rate limit the error messages (Maciej)
- Update the comment to indicate that the 2 descriptors needed by the
empty frame are already taken into consideration (Maciej)
- Handle the case where the insertion of an empty frame fails and
explain the reason behind (Maciej)
- put self SOB tag as last tag (Maciej)
V8: https://lore.kernel.org/netdev/20250205024116.798862-1-yoong.siang.song@int…
- check the number of used descriptor in xsk_tx_metadata_request()
by using used_desc of struct igc_metadata_request, and then decreases
the budget with it (Maciej)
- submit another bug fix patch to set the buffer type for empty frame (Maciej):
https://lore.kernel.org/netdev/20250205023603.798819-1-yoong.siang.song@int…
V7: https://lore.kernel.org/netdev/20250204004907.789330-1-yoong.siang.song@int…
- split the refactoring code of igc empty packet insertion into a separate
commit (Faizal)
- add explanation on why the value "4" is used as igc transmit budget
(Faizal)
- perform a stress test by sending 1000 packets with 10ms interval and
launch time set to 500us in the future (Faizal & Yong Liang)
V6: https://lore.kernel.org/netdev/20250116155350.555374-1-yoong.siang.song@int…
- fix selftest build errors by using asprintf() and realloc(), instead of
managing the buffer sizes manually (Daniel, Stanislav)
V5: https://lore.kernel.org/netdev/20250114152718.120588-1-yoong.siang.song@int…
- change netdev feature name from tx-launch-time to tx-launch-time-fifo
to explicitly state the FIFO behaviour (Stanislav)
- improve the looping of xdp_hw_metadata app to wait for packet tx
completion to be more readable by using clock_gettime() (Stanislav)
- add launch time setup steps into xdp_hw_metadata app (Stanislav)
V4: https://lore.kernel.org/netdev/20250106135506.9687-1-yoong.siang.song@intel…
- added XDP launch time support to the igc driver (Jesper & Florian)
- added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper)
- added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub)
- added step to enable launch time in the commit message (Jesper & Willem)
- explicitly documented the type of launch_time and which clock source
it is against (Willem)
V3: https://lore.kernel.org/netdev/20231203165129.1740512-1-yoong.siang.song@in…
- renamed to use launch time (Jesper & Willem)
- changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s
because some NICs do not support such a large future time.
V2: https://lore.kernel.org/netdev/20231201062421.1074768-1-yoong.siang.song@in…
- renamed to use Earliest TxTime First (Willem)
- renamed to use txtime (Willem)
V1: https://lore.kernel.org/netdev/20231130162028.852006-1-yoong.siang.song@int…
Song Yoong Siang (5):
xsk: Add launch time hardware offload support to XDP Tx metadata
selftests/bpf: Add launch time request to xdp_hw_metadata
net: stmmac: Add launch time support to XDP ZC
igc: Refactor empty frame insertion for launch time support
igc: Add launch time support to XDP ZC
Documentation/netlink/specs/netdev.yaml | 4 +
Documentation/networking/xsk-tx-metadata.rst | 62 +++++++
drivers/net/ethernet/intel/igc/igc.h | 1 +
drivers/net/ethernet/intel/igc/igc_main.c | 143 +++++++++++----
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 +
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++
include/net/xdp_sock.h | 10 ++
include/net/xdp_sock_drv.h | 1 +
include/uapi/linux/if_xdp.h | 10 ++
include/uapi/linux/netdev.h | 3 +
net/core/netdev-genl.c | 2 +
net/xdp/xsk.c | 3 +
tools/include/uapi/linux/if_xdp.h | 10 ++
tools/include/uapi/linux/netdev.h | 3 +
tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++-
15 files changed, 396 insertions(+), 39 deletions(-)
--
2.34.1
Hi all,
This patch series continues the work to migrate the script tests into
prog_tests.
test_lwt_seg6local.sh tests some bpf_lwt_* helpers. It contains only one
test that uses a network topology quite different than the ones that
can be found in others prog_tests/lwt_*.c files so I add a new
prog_tests/lwt_seg6local.c file.
While working on the migration I noticed that some routes present in the
script weren't needed so PATCH 1 deletes them and then PATCH 2 migrates
the test into the test_progs framework.
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
---
Bastien Curutchet (eBPF Foundation) (2):
selftests/bpf: lwt_seg6local: Remove unused routes
selftests/bpf: lwt_seg6local: Move test to test_progs
tools/testing/selftests/bpf/Makefile | 1 -
.../selftests/bpf/prog_tests/lwt_seg6local.c | 176 +++++++++++++++++++++
tools/testing/selftests/bpf/test_lwt_seg6local.sh | 156 ------------------
3 files changed, 176 insertions(+), 157 deletions(-)
---
base-commit: 86eb3a47230a41c6ccf5cdae8ee0a7e7292aa29d
change-id: 20250214-seg6local-64bcde44b66e
Best regards,
--
Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
A bug was identified where the KTAP below caused an infinite loop:
TAP version 13
ok 4 test_case
1..4
The infinite loop was caused by the parser not parsing a test plan
if following a test result line.
Fix this bug to correctly parse test plan line.
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
Changes since v1:
- Remove error reported when test plan is missing.
tools/testing/kunit/kunit_parser.py | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/tools/testing/kunit/kunit_parser.py b/tools/testing/kunit/kunit_parser.py
index 29fc27e8949b..da53a709773a 100644
--- a/tools/testing/kunit/kunit_parser.py
+++ b/tools/testing/kunit/kunit_parser.py
@@ -759,7 +759,7 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest:
# If parsing the main/top-level test, parse KTAP version line and
# test plan
test.name = "main"
- ktap_line = parse_ktap_header(lines, test, printer)
+ parse_ktap_header(lines, test, printer)
test.log.extend(parse_diagnostic(lines))
parse_test_plan(lines, test)
parent_test = True
@@ -768,13 +768,12 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest:
# the KTAP version line and/or subtest header line
ktap_line = parse_ktap_header(lines, test, printer)
subtest_line = parse_test_header(lines, test)
+ test.log.extend(parse_diagnostic(lines))
+ parse_test_plan(lines, test)
parent_test = (ktap_line or subtest_line)
if parent_test:
- # If KTAP version line and/or subtest header is found, attempt
- # to parse test plan and print test header
- test.log.extend(parse_diagnostic(lines))
- parse_test_plan(lines, test)
print_test_header(test, printer)
+
expected_count = test.expected_count
subtests = []
test_num = 1
base-commit: 0619a4868fc1b32b07fb9ed6c69adc5e5cf4e4b2
--
2.49.0.rc0.332.g42c0ae87b1-goog
Hello,
While trying to implement an eBPF gatekeeper program, we ran into an
issue whereas the LSM hooks are missing some relevant data.
Certain subcommands passed to the bpf() syscall can be invoked from
either the kernel or userspace. Additionally, some fields in the
bpf_attr struct contain pointers, and depending on where the
subcommand was invoked, they could point to either user or kernel
memory. One example of this is the bpf_prog_load subcommand and its
fd_array. This data is made available and used by the verifier but not
made available to the LSM subsystem. This patchset simply exposes that
information to applicable LSM hooks.
Change list:
- v4 -> v5
- merge v4 selftest breakout patch back into a single patch
- change "is_kernel" to "kernel"
- add selftest using new kernel flag
- v3 -> v4
- split out selftest changes into a separate patch
- v2 -> v3
- reorder params so that the new boolean flag is the last param
- fixup function signatures in bpf selftests
- v1 -> v2
- Pass a boolean flag in lieu of bpfptr_t
Revisions:
- v4
https://lore.kernel.org/bpf/20250304203123.3935371-1-bboscaccy@linux.micros…
- v3
https://lore.kernel.org/bpf/20250303222416.3909228-1-bboscaccy@linux.micros…
- v2
https://lore.kernel.org/bpf/20250228165322.3121535-1-bboscaccy@linux.micros…
- v1
https://lore.kernel.org/bpf/20250226003055.1654837-1-bboscaccy@linux.micros…
Blaise Boscaccy (2):
security: Propagate caller information in bpf hooks
selftests/bpf: Add a kernel flag test for LSM bpf hook
include/linux/lsm_hook_defs.h | 6 +--
include/linux/security.h | 12 +++---
kernel/bpf/syscall.c | 10 ++---
security/security.c | 15 ++++---
security/selinux/hooks.c | 6 +--
.../selftests/bpf/prog_tests/kernel_flag.c | 43 +++++++++++++++++++
.../selftests/bpf/progs/rcu_read_lock.c | 3 +-
.../bpf/progs/test_cgroup1_hierarchy.c | 4 +-
.../selftests/bpf/progs/test_kernel_flag.c | 32 ++++++++++++++
.../bpf/progs/test_kfunc_dynptr_param.c | 6 +--
.../selftests/bpf/progs/test_lookup_key.c | 2 +-
.../selftests/bpf/progs/test_ptr_untrusted.c | 2 +-
.../bpf/progs/test_task_under_cgroup.c | 2 +-
.../bpf/progs/test_verify_pkcs7_sig.c | 2 +-
14 files changed, 112 insertions(+), 33 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/kernel_flag.c
create mode 100644 tools/testing/selftests/bpf/progs/test_kernel_flag.c
--
2.48.1
This is one of just 3 remaining "Test Module" kselftests (the others
being bitmap and scanf), the rest having been converted to KUnit.
I tested this using:
$ tools/testing/kunit/kunit.py run --arch arm64 --make_options LLVM=1 printf
I have also sent out a series converting scanf[0].
Link: https://lore.kernel.org/all/20250204-scanf-kunit-convert-v3-0-386d7c3ee714@… [0]
Signed-off-by: Tamir Duberstein <tamird(a)gmail.com>
---
Changes in v5:
- Update `do_test` `__printf` annotation (Rasmus Villemoes).
- Link to v4: https://lore.kernel.org/r/20250214-printf-kunit-convert-v4-0-c254572f1565@g…
Changes in v4:
- Add patch "implicate test line in failure messages".
- Rebase on linux-next, move scanf_kunit.c into lib/tests/.
- Link to v3: https://lore.kernel.org/r/20250210-printf-kunit-convert-v3-0-ee6ac5500f5e@g…
Changes in v3:
- Remove extraneous trailing newlines from failure messages.
- Replace `pr_warn` with `kunit_warn`.
- Drop arch changes.
- Remove KUnit boilerplate from CONFIG_PRINTF_KUNIT_TEST help text.
- Restore `total_tests` counting.
- Remove tc_fail macro in last patch.
- Link to v2: https://lore.kernel.org/r/20250207-printf-kunit-convert-v2-0-057b23860823@g…
Changes in v2:
- Incorporate code review from prior work[0] by Arpitha Raghunandan.
- Link to v1: https://lore.kernel.org/r/20250204-printf-kunit-convert-v1-0-ecf1b846a4de@g…
Link: https://lore.kernel.org/lkml/20200817043028.76502-1-98.arpi@gmail.com/t/#u [0]
---
Tamir Duberstein (3):
printf: convert self-test to KUnit
printf: break kunit into test cases
printf: implicate test line in failure messages
Documentation/core-api/printk-formats.rst | 4 +-
MAINTAINERS | 2 +-
lib/Kconfig.debug | 12 +-
lib/Makefile | 1 -
lib/tests/Makefile | 1 +
lib/{test_printf.c => tests/printf_kunit.c} | 437 ++++++++++++----------------
tools/testing/selftests/lib/config | 1 -
tools/testing/selftests/lib/printf.sh | 4 -
8 files changed, 200 insertions(+), 262 deletions(-)
---
base-commit: d4b0fd87ff0d4338b259dc79b2b3c6f7e70e8afa
change-id: 20250131-printf-kunit-convert-fd4012aa2ec6
Best regards,
--
Tamir Duberstein <tamird(a)gmail.com>
virtio-net have two usage of hashes: one is RSS and another is hash
reporting. Conventionally the hash calculation was done by the VMM.
However, computing the hash after the queue was chosen defeats the
purpose of RSS.
Another approach is to use eBPF steering program. This approach has
another downside: it cannot report the calculated hash due to the
restrictive nature of eBPF.
Introduce the code to compute hashes to the kernel in order to overcome
thse challenges.
An alternative solution is to extend the eBPF steering program so that it
will be able to report to the userspace, but it is based on context
rewrites, which is in feature freeze. We can adopt kfuncs, but they will
not be UAPIs. We opt to ioctl to align with other relevant UAPIs (KVM
and vhost_net).
The patches for QEMU to use this new feature was submitted as RFC and
is available at:
https://patchew.org/QEMU/20240915-hash-v3-0-79cb08d28647@daynix.com/
This work was presented at LPC 2024:
https://lpc.events/event/18/contributions/1963/
V1 -> V2:
Changed to introduce a new BPF program type.
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
Changes in v8:
- Disabled IPv6 to eliminate noises in tests.
- Added a branch in tap to avoid unnecessary dissection when hash
reporting is disabled.
- Removed unnecessary rtnl_lock().
- Extracted code to handle new ioctls into separate functions to avoid
adding extra NULL checks to the code handling other ioctls.
- Introduced variable named "fd" to __tun_chr_ioctl().
- s/-/=/g in a patch message to avoid confusing Git.
- Link to v7: https://lore.kernel.org/r/20250228-rss-v7-0-844205cbbdd6@daynix.com
Changes in v7:
- Ensured to set hash_report to VIRTIO_NET_HASH_REPORT_NONE for
VHOST_NET_F_VIRTIO_NET_HDR.
- s/4/sizeof(u32)/ in patch "virtio_net: Add functions for hashing".
- Added tap_skb_cb type.
- Rebased.
- Link to v6: https://lore.kernel.org/r/20250109-rss-v6-0-b1c90ad708f6@daynix.com
Changes in v6:
- Extracted changes to fill vnet header holes into another series.
- Squashed patches "skbuff: Introduce SKB_EXT_TUN_VNET_HASH", "tun:
Introduce virtio-net hash reporting feature", and "tun: Introduce
virtio-net RSS" into patch "tun: Introduce virtio-net hash feature".
- Dropped the RFC tag.
- Link to v5: https://lore.kernel.org/r/20241008-rss-v5-0-f3cf68df005d@daynix.com
Changes in v5:
- Fixed a compilation error with CONFIG_TUN_VNET_CROSS_LE.
- Optimized the calculation of the hash value according to:
https://git.dpdk.org/dpdk/commit/?id=3fb1ea032bd6ff8317af5dac9af901f1f324ca…
- Added patch "tun: Unify vnet implementation".
- Dropped patch "tap: Pad virtio header with zero".
- Added patch "selftest: tun: Test vnet ioctls without device".
- Reworked selftests to skip for older kernels.
- Documented the case when the underlying device is deleted and packets
have queue_mapping set by TC.
- Reordered test harness arguments.
- Added code to handle fragmented packets.
- Link to v4: https://lore.kernel.org/r/20240924-rss-v4-0-84e932ec0e6c@daynix.com
Changes in v4:
- Moved tun_vnet_hash_ext to if_tun.h.
- Renamed virtio_net_toeplitz() to virtio_net_toeplitz_calc().
- Replaced htons() with cpu_to_be16().
- Changed virtio_net_hash_rss() to return void.
- Reordered variable declarations in virtio_net_hash_rss().
- Removed virtio_net_hdr_v1_hash_from_skb().
- Updated messages of "tap: Pad virtio header with zero" and
"tun: Pad virtio header with zero".
- Fixed vnet_hash allocation size.
- Ensured to free vnet_hash when destructing tun_struct.
- Link to v3: https://lore.kernel.org/r/20240915-rss-v3-0-c630015db082@daynix.com
Changes in v3:
- Reverted back to add ioctl.
- Split patch "tun: Introduce virtio-net hashing feature" into
"tun: Introduce virtio-net hash reporting feature" and
"tun: Introduce virtio-net RSS".
- Changed to reuse hash values computed for automq instead of performing
RSS hashing when hash reporting is requested but RSS is not.
- Extracted relevant data from struct tun_struct to keep it minimal.
- Added kernel-doc.
- Changed to allow calling TUNGETVNETHASHCAP before TUNSETIFF.
- Initialized num_buffers with 1.
- Added a test case for unclassified packets.
- Fixed error handling in tests.
- Changed tests to verify that the queue index will not overflow.
- Rebased.
- Link to v2: https://lore.kernel.org/r/20231015141644.260646-1-akihiko.odaki@daynix.com
---
Akihiko Odaki (6):
virtio_net: Add functions for hashing
net: flow_dissector: Export flow_keys_dissector_symmetric
tun: Introduce virtio-net hash feature
selftest: tun: Test vnet ioctls without device
selftest: tun: Add tests for virtio-net hashing
vhost/net: Support VIRTIO_NET_F_HASH_REPORT
Documentation/networking/tuntap.rst | 7 +
drivers/net/Kconfig | 1 +
drivers/net/tap.c | 67 +++-
drivers/net/tun.c | 98 +++++-
drivers/net/tun_vnet.h | 159 ++++++++-
drivers/vhost/net.c | 49 +--
include/linux/if_tap.h | 2 +
include/linux/skbuff.h | 3 +
include/linux/virtio_net.h | 188 ++++++++++
include/net/flow_dissector.h | 1 +
include/uapi/linux/if_tun.h | 75 ++++
net/core/flow_dissector.c | 3 +-
net/core/skbuff.c | 4 +
tools/testing/selftests/net/Makefile | 2 +-
tools/testing/selftests/net/tun.c | 656 ++++++++++++++++++++++++++++++++++-
15 files changed, 1254 insertions(+), 61 deletions(-)
---
base-commit: dd83757f6e686a2188997cb58b5975f744bb7786
change-id: 20240403-rss-e737d89efa77
prerequisite-change-id: 20241230-tun-66e10a49b0c7:v6
prerequisite-patch-id: 871dc5f146fb6b0e3ec8612971a8e8190472c0fb
prerequisite-patch-id: 2797ed249d32590321f088373d4055ff3f430a0e
prerequisite-patch-id: ea3370c72d4904e2f0536ec76ba5d26784c0cede
prerequisite-patch-id: 837e4cf5d6b451424f9b1639455e83a260c4440d
prerequisite-patch-id: ea701076f57819e844f5a35efe5cbc5712d3080d
prerequisite-patch-id: 701646fb43ad04cc64dd2bf13c150ccbe6f828ce
prerequisite-patch-id: 53176dae0c003f5b6c114d43f936cf7140d31bb5
prerequisite-change-id: 20250116-buffers-96e14bf023fc:v2
prerequisite-patch-id: 25fd4f99d4236a05a5ef16ab79f3e85ee57e21cc
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>
This series adds a fix for KVM PMU code and improves the pmu selftest
by allowing generating precise number of interrupts. It also provided
another additional option to the overflow test that allows user to
generate custom number of LCOFI interrupts.
Signed-off-by: Atish Patra <atishp(a)rivosinc.com>
---
Changes in v2:
- Initialized the local overflow irq variable to 0 indicate that it's not a
allowed value.
- Moved the introduction of argument option `n` to the last patch.
- Link to v1: https://lore.kernel.org/r/20250226-kvm_pmu_improve-v1-0-74c058c2bf6d@rivosi…
---
Atish Patra (4):
RISC-V: KVM: Disable the kernel perf counter during configure
KVM: riscv: selftests: Do not start the counter in the overflow handler
KVM: riscv: selftests: Change command line option
KVM: riscv: selftests: Allow number of interrupts to be configurable
arch/riscv/kvm/vcpu_pmu.c | 1 +
tools/testing/selftests/kvm/riscv/sbi_pmu_test.c | 81 ++++++++++++++++--------
2 files changed, 57 insertions(+), 25 deletions(-)
---
base-commit: 0ad2507d5d93f39619fc42372c347d6006b64319
change-id: 20250225-kvm_pmu_improve-fffd038b2404
--
Regards,
Atish patra
The first fixes setting incorrect skb->truesize.
When xdp-mb prog returns XDP_PASS, skb is allocated and initialized.
Currently, The truesize is calculated as BNXT_RX_PAGE_SIZE *
sinfo->nr_frags, but sinfo->nr_frags is flushed by napi_build_skb().
So, it stores sinfo before calling napi_build_skb() and then use it
for calculate truesize.
The second fixes kernel panic in the bnxt_queue_mem_alloc().
The bnxt_queue_mem_alloc() accesses rx ring descriptor.
rx ring descriptors are allocated when the interface is up and it's
freed when the interface is down.
So, if bnxt_queue_mem_alloc() is called when the interface is down,
kernel panic occurs.
This patch makes the bnxt_queue_mem_alloc() return -ENETDOWN if rx ring
descriptors are not allocated.
The third patch fixes kernel panic in the bnxt_queue_{start | stop}().
When a queue is restarted bnxt_queue_{start | stop}() are called.
These functions set MRU to 0 to stop packet flow and then to set up the
remaining things.
MRU variable is a member of vnic_info[] the first vnic_info is for
default and the second is for ntuple.
The first vnic_info is always allocated when interface is up, but the
second is allocated only when ntuple is enabled.
(ethtool -K eth0 ntuple <on | off>).
Currently, the bnxt_queue_{start | stop}() access
vnic_info[BNXT_VNIC_NTUPLE] regardless of whether ntuple is enabled or
not.
So kernel panic occurs.
This patch make the bnxt_queue_{start | stop}() use bp->nr_vnics instead
of BNXT_VNIC_NTUPLE.
The fourth patch fixes a warning due to checksum state.
The bnxt_rx_pkt() checks whether skb->ip_summed is not CHECKSUM_NONE
before updating ip_summed. if ip_summed is not CHECKSUM_NONE, it WARNS
about it. However, the bnxt_xdp_build_skb() is called in XDP-MB-PASS
path and it updates ip_summed earlier than bnxt_rx_pkt().
So, in the XDP-MB-PASS path, the bnxt_rx_pkt() always warns about
checksum.
Updating ip_summed at the bnxt_xdp_build_skb() is unnecessary and
duplicate, so it is removed.
The fifth patch makes net_devmem_unbind_dmabuf() ignore -ENETDOWN.
When devmem socket is closed, net_devmem_unbind_dmabuf() is called to
unbind/release resources.
If interface is down, the driver returns -ENETDOWN.
The -ENETDOWN return value is not an actual error, because the interface
will release resources when the interface is down.
So, net_devmem_unbind_dmabuf() needs to ignore -ENETDOWN.
The last patch adds XDP testcases to
tools/testing/selftests/drivers/net/ping.py.
v2:
- Do not use num_frags in the bnxt_xdp_build_skb(). (1/6)
- Add Review tags from Somnath and Jakub. (2/6)
- Add new patch for fixing checksum warning. (4/6)
- Add new patch for fixing warning in net_devmem_unbind_dmabuf(). (5/6)
- Add new XDP testcases to ping.py (6/6)
Taehee Yoo (6):
eth: bnxt: fix truesize for mb-xdp-pass case
eth: bnxt: return fail if interface is down in bnxt_queue_mem_alloc()
eth: bnxt: do not use BNXT_VNIC_NTUPLE unconditionally in queue
restart logic
eth: bnxt: do not update checksum in bnxt_xdp_build_skb()
net: devmem: do not WARN conditionally after netdev_rx_queue_restart()
selftests: drv-net: add xdp cases for ping.py
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 36 ++--
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 18 +-
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h | 6 +-
net/core/devmem.c | 4 +-
tools/testing/selftests/drivers/net/ping.py | 200 ++++++++++++++++--
.../testing/selftests/net/lib/xdp_dummy.bpf.c | 6 +
6 files changed, 221 insertions(+), 49 deletions(-)
--
2.34.1
The first patch fixes the incorrect locks using in bond driver.
The second patch fixes the xfrm offload feature during setup active-backup
mode. The third patch add a ipsec offload testing.
v4: hold xs->lock for bond_ipsec_{del, add}_sa_all (Cosmin Ratiu)
use the defer helpers in lib.sh for selftest (Petr Machata)
v3: move the ipsec deletion to bond_ipsec_free_sa (Cosmin Ratiu)
v2: do not turn carrier on if bond change link failed (Nikolay Aleksandrov)
move the mutex lock to a work queue (Cosmin Ratiu)
Hangbin Liu (3):
bonding: move IPsec deletion to bond_ipsec_free_sa
bonding: fix xfrm offload feature setup on active-backup mode
selftests: bonding: add ipsec offload test
drivers/net/bonding/bond_main.c | 55 +++++--
drivers/net/bonding/bond_netlink.c | 16 +-
include/net/bonding.h | 1 +
.../selftests/drivers/net/bonding/Makefile | 3 +-
.../drivers/net/bonding/bond_ipsec_offload.sh | 154 ++++++++++++++++++
.../selftests/drivers/net/bonding/config | 4 +
6 files changed, 208 insertions(+), 25 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh
--
2.46.0
We hit a following exception on timeout, nmaps is never set:
Test bpftool bound info reporting (own ns)...
Traceback (most recent call last):
File "/home/virtme/testing-1/tools/testing/selftests/net/./bpf_offload.py", line 1128, in <module>
check_dev_info(False, "")
File "/home/virtme/testing-1/tools/testing/selftests/net/./bpf_offload.py", line 583, in check_dev_info
maps = bpftool_map_list_wait(expected=2, ns=ns)
File "/home/virtme/testing-1/tools/testing/selftests/net/./bpf_offload.py", line 215, in bpftool_map_list_wait
raise Exception("Time out waiting for map counts to stabilize want %d, have %d" % (expected, nmaps))
NameError: name 'nmaps' is not defined
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
CC: shuah(a)kernel.org
CC: linux-kselftest(a)vger.kernel.org
CC: bpf(a)vger.kernel.org
---
tools/testing/selftests/net/bpf_offload.py | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/bpf_offload.py b/tools/testing/selftests/net/bpf_offload.py
index fd0d959914e4..4a9be8c49561 100755
--- a/tools/testing/selftests/net/bpf_offload.py
+++ b/tools/testing/selftests/net/bpf_offload.py
@@ -207,9 +207,11 @@ netns = [] # net namespaces to be removed
raise Exception("Time out waiting for program counts to stabilize want %d, have %d" % (expected, nprogs))
def bpftool_map_list_wait(expected=0, n_retry=20, ns=""):
+ nmaps = None
for i in range(n_retry):
maps = bpftool_map_list(ns=ns)
- if len(maps) == expected:
+ nmaps = len(maps)
+ if nmaps == expected:
return maps
time.sleep(0.05)
raise Exception("Time out waiting for map counts to stabilize want %d, have %d" % (expected, nmaps))
--
2.48.1
v6: https://lore.kernel.org/netdev/20250222191517.743530-1-almasrymina@google.c…
===
v6 has no major changes. Addressed a few issues from Paolo and David,
and collected Acks from Stan. Thank you everyone for the review!
Changes:
- retain behavior to process MSG_FASTOPEN even if the provided cmsg is
invalid (Paolo).
- Rework the freeing of tx_vec slightly (it now has its own err label).
(Paolo).
- Squash the commit that makes dmabuf unbinding scheduled work into the
same one which implements the TX path so we don't run into future
errors on bisecting (Paolo).
- Fix/add comments to explain how dmabuf binding refcounting works
(David).
v5: https://lore.kernel.org/netdev/20250220020914.895431-1-almasrymina@google.c…
===
v5 has no major changes; it clears up the relatively minor issues
pointed out to in v4, and rebases the series on top of net-next to
resolve the conflict with a patch that raced to the tree. It also
collects the review tags from v4.
Changes:
- Rebase to net-next
- Fix issues in selftest (Stan).
- Address comments in the devmem and netmem driver docs (Stan and Bagas)
- Fix zerocopy_fill_skb_from_devmem return error code (Stan).
v4: https://lore.kernel.org/netdev/20250203223916.1064540-1-almasrymina@google.…
===
v4 mainly addresses the critical driver support issue surfaced in v3 by
Paolo and Stan. Drivers aiming to support netmem_tx should make sure not
to pass the netmem dma-addrs to the dma-mapping APIs, as these dma-addrs
may come from dma-bufs.
Additionally other feedback from v3 is addressed.
Major changes:
- Add helpers to handle netmem dma-addrs. Add GVE support for
netmem_tx.
- Fix binding->tx_vec not being freed on error paths during the
tx binding.
- Add a minimal devmem_tx test to devmem.py.
- Clean up everything obsolete from the cover letter (Paolo).
v3: https://patchwork.kernel.org/project/netdevbpf/list/?series=929401&state=*
===
Address minor comments from RFCv2 and fix a few build warnings and
ynl-regen issues. No major changes.
RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=*
=======
RFC v2 addresses much of the feedback from RFC v1. I plan on sending
something close to this as net-next reopens, sending it slightly early
to get feedback if any.
Major changes:
--------------
- much improved UAPI as suggested by Stan. We now interpret the iov_base
of the passed in iov from userspace as the offset into the dmabuf to
send from. This removes the need to set iov.iov_base = NULL which may
be confusing to users, and enables us to send multiple iovs in the
same sendmsg() call. ncdevmem and the docs show a sample use of that.
- Removed the duplicate dmabuf iov_iter in binding->iov_iter. I think
this is good improvment as it was confusing to keep track of
2 iterators for the same sendmsg, and mistracking both iterators
caused a couple of bugs reported in the last iteration that are now
resolved with this streamlining.
- Improved test coverage in ncdevmem. Now multiple sendmsg() are tested,
and sending multiple iovs in the same sendmsg() is tested.
- Fixed issue where dmabuf unmapping was happening in invalid context
(Stan).
====================================================================
The TX path had been dropped from the Device Memory TCP patch series
post RFCv1 [1], to make that series slightly easier to review. This
series rebases the implementation of the TX path on top of the
net_iov/netmem framework agreed upon and merged. The motivation for
the feature is thoroughly described in the docs & cover letter of the
original proposal, so I don't repeat the lengthy descriptions here, but
they are available in [1].
Full outline on usage of the TX path is detailed in the documentation
included with this series.
Test example is available via the kselftest included in the series as well.
The series is relatively small, as the TX path for this feature largely
piggybacks on the existing MSG_ZEROCOPY implementation.
Patch Overview:
---------------
1. Documentation & tests to give high level overview of the feature
being added.
1. Add netmem refcounting needed for the TX path.
2. Devmem TX netlink API.
3. Devmem TX net stack implementation.
4. Make dma-buf unbinding scheduled work to handle TX cases where it gets
freed from contexts where we can't sleep.
5. Add devmem TX documentation.
6. Add scaffolding enabling driver support for netmem_tx. Add helpers, driver
feature flag, and docs to enable drivers to declare netmem_tx support.
7. Guard netmem_tx against being enabled against drivers that don't
support it.
8. Add devmem_tx selftests. Add TX path to ncdevmem and add a test to
devmem.py.
Testing:
--------
Testing is very similar to devmem TCP RX path. The ncdevmem test used
for the RX path is now augemented with client functionality to test TX
path.
* Test Setup:
Kernel: net-next with this RFC and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Performance results are not included with this version, unfortunately.
I'm having issues running the dma-buf exporter driver against the
upstream kernel on my test setup. The issues are specific to that
dma-buf exporter and do not affect this patch series. I plan to follow
up this series with perf fixes if the tests point to issues once they're
up and running.
Special thanks to Stan who took a stab at rebasing the TX implementation
on top of the netmem/net_iov framework merged. Parts of his proposal [2]
that are reused as-is are forked off into their own patches to give full
credit.
[1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.…
[2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#…
Cc: sdf(a)fomichev.me
Cc: asml.silence(a)gmail.com
Cc: dw(a)davidwei.uk
Cc: Jamal Hadi Salim <jhs(a)mojatatu.com>
Cc: Victor Nogueira <victor(a)mojatatu.com>
Cc: Pedro Tammela <pctammela(a)mojatatu.com>
Cc: Samiullah Khawaja <skhawaja(a)google.com>
Mina Almasry (7):
net: add get_netmem/put_netmem support
net: devmem: Implement TX path
net: add devmem TCP TX documentation
net: enable driver support for netmem TX
gve: add netmem TX support to GVE DQO-RDA mode
net: check for driver support in netmem TX
selftests: ncdevmem: Implement devmem TCP TX
Stanislav Fomichev (1):
net: devmem: TCP tx netlink api
Documentation/netlink/specs/netdev.yaml | 12 +
Documentation/networking/devmem.rst | 150 ++++++++-
.../networking/net_cachelines/net_device.rst | 1 +
Documentation/networking/netdev-features.rst | 5 +
Documentation/networking/netmem.rst | 23 +-
drivers/net/ethernet/google/gve/gve_main.c | 4 +
drivers/net/ethernet/google/gve/gve_tx_dqo.c | 8 +-
include/linux/netdevice.h | 2 +
include/linux/skbuff.h | 17 +-
include/linux/skbuff_ref.h | 4 +-
include/net/netmem.h | 23 ++
include/net/sock.h | 1 +
include/uapi/linux/netdev.h | 1 +
net/core/datagram.c | 48 ++-
net/core/dev.c | 3 +
net/core/devmem.c | 115 ++++++-
net/core/devmem.h | 77 ++++-
net/core/netdev-genl-gen.c | 13 +
net/core/netdev-genl-gen.h | 1 +
net/core/netdev-genl.c | 73 ++++-
net/core/skbuff.c | 48 ++-
net/core/sock.c | 6 +
net/ipv4/ip_output.c | 3 +-
net/ipv4/tcp.c | 50 ++-
net/ipv6/ip6_output.c | 3 +-
net/vmw_vsock/virtio_transport_common.c | 5 +-
tools/include/uapi/linux/netdev.h | 1 +
.../selftests/drivers/net/hw/devmem.py | 26 +-
.../selftests/drivers/net/hw/ncdevmem.c | 300 +++++++++++++++++-
29 files changed, 950 insertions(+), 73 deletions(-)
base-commit: 80c4a0015ce249cf0917a04dbb3cc652a6811079
--
2.48.1.658.g4767266eb4-goog
Hi all,
this v5 of the patch series is very similar to v4, but rebased onto the
bpf-next/net branch instead of bpf-next/master.
Because the commit c047e0e0e435 ("selftests/bpf: Optionally open a
dedicated namespace to run test in it") is not yet included in this branch,
I changed the xdp_context_tuntap test to manually create a namespace to run
the test in.
Not so successful pipeline: https://github.com/kernel-patches/bpf/actions/runs/13682405154
The CI pipeline failed because of veristat changes in seemingly unrelated
eBPF programs. I don't think this has to do with the changes from this
patch series, but if it does, please let me know what I may have to do
different to make the CI pass.
---
v5:
- rebase onto bpf-next/net
- resolve rebase conflicts
- change xdp_context_tuntap test to manually create and open a network
namespace using netns_new
v4: https://lore.kernel.org/bpf/20250227142330.1605996-1-marcus.wichelmann@hetz…
- strip unrelated changes from the selftest patches
- extend commit message for "selftests/bpf: refactor xdp_context_functional
test and bpf program"
- the NOARP flag was not effective to prevent other packets from
interfering with the tests, add a filter to the XDP program instead
- run xdp_context_tuntap in a separate namespace to avoid conflicts with
other tests
v3: https://lore.kernel.org/bpf/20250224152909.3911544-1-marcus.wichelmann@hetz…
- change the condition to handle xdp_buffs without metadata support, as
suggested by Willem de Bruijn <willemb(a)google.com>
- add clarifying comment why that condition is needed
- set NOARP flag in selftests to ensure that the kernel does not send
packets on the test interfaces that may interfere with the tests
v2: https://lore.kernel.org/bpf/20250217172308.3291739-1-marcus.wichelmann@hetz…
- submit against bpf-next subtree
- split commits and improved commit messages
- remove redundant metasize check and add clarifying comment instead
- use max() instead of ternary operator
- add selftest for metadata support in the tun driver
v1: https://lore.kernel.org/all/20250130171614.1657224-1-marcus.wichelmann@hetz…
Marcus Wichelmann (6):
net: tun: enable XDP metadata support
net: tun: enable transfer of XDP metadata to skb
selftests/bpf: move open_tuntap to network helpers
selftests/bpf: refactor xdp_context_functional test and bpf program
selftests/bpf: add test for XDP metadata support in tun driver
selftests/bpf: fix file descriptor assertion in open_tuntap helper
drivers/net/tun.c | 28 +++-
tools/testing/selftests/bpf/network_helpers.c | 28 ++++
tools/testing/selftests/bpf/network_helpers.h | 3 +
.../selftests/bpf/prog_tests/lwt_helpers.h | 29 ----
.../bpf/prog_tests/xdp_context_test_run.c | 145 +++++++++++++++++-
.../selftests/bpf/progs/test_xdp_meta.c | 53 +++++--
6 files changed, 230 insertions(+), 56 deletions(-)
--
2.43.0
A bug was identified where the KTAP below caused an infinite loop:
TAP version 13
ok 4 test_case
1..4
The infinite loop was caused by the parser not parsing a test plan
if following a test result line.
Fix bug to correctly parse test plan and add error if test plan is
missing.
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
tools/testing/kunit/kunit_parser.py | 12 +++++++-----
tools/testing/kunit/kunit_tool_test.py | 5 ++---
2 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/tools/testing/kunit/kunit_parser.py b/tools/testing/kunit/kunit_parser.py
index 29fc27e8949b..5dcbc670e1dc 100644
--- a/tools/testing/kunit/kunit_parser.py
+++ b/tools/testing/kunit/kunit_parser.py
@@ -761,20 +761,22 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest:
test.name = "main"
ktap_line = parse_ktap_header(lines, test, printer)
test.log.extend(parse_diagnostic(lines))
- parse_test_plan(lines, test)
+ plan_line = parse_test_plan(lines, test)
parent_test = True
else:
# If not the main test, attempt to parse a test header containing
# the KTAP version line and/or subtest header line
ktap_line = parse_ktap_header(lines, test, printer)
subtest_line = parse_test_header(lines, test)
+ test.log.extend(parse_diagnostic(lines))
+ plan_line = parse_test_plan(lines, test)
parent_test = (ktap_line or subtest_line)
if parent_test:
- # If KTAP version line and/or subtest header is found, attempt
- # to parse test plan and print test header
- test.log.extend(parse_diagnostic(lines))
- parse_test_plan(lines, test)
print_test_header(test, printer)
+
+ if parent_test and not plan_line:
+ test.add_error(printer, 'missing test plan!')
+
expected_count = test.expected_count
subtests = []
test_num = 1
diff --git a/tools/testing/kunit/kunit_tool_test.py b/tools/testing/kunit/kunit_tool_test.py
index 0bcb0cc002f8..e1e142c1a850 100755
--- a/tools/testing/kunit/kunit_tool_test.py
+++ b/tools/testing/kunit/kunit_tool_test.py
@@ -181,8 +181,7 @@ class KUnitParserTest(unittest.TestCase):
result = kunit_parser.parse_run_tests(
kunit_parser.extract_tap_lines(
file.readlines()), stdout)
- # A missing test plan is not an error.
- self.assertEqual(result.counts, kunit_parser.TestCounts(passed=10, errors=0))
+ self.assertEqual(result.counts, kunit_parser.TestCounts(passed=10, errors=2))
self.assertEqual(kunit_parser.TestStatus.SUCCESS, result.status)
def test_no_tests(self):
@@ -203,7 +202,7 @@ class KUnitParserTest(unittest.TestCase):
self.assertEqual(
kunit_parser.TestStatus.NO_TESTS,
result.subtests[0].subtests[0].status)
- self.assertEqual(result.counts, kunit_parser.TestCounts(passed=1, errors=1))
+ self.assertEqual(result.counts, kunit_parser.TestCounts(passed=1, errors=2))
def test_no_kunit_output(self):
base-commit: 0619a4868fc1b32b07fb9ed6c69adc5e5cf4e4b2
--
2.48.1.711.g2feabab25a-goog
virtio-net have two usage of hashes: one is RSS and another is hash
reporting. Conventionally the hash calculation was done by the VMM.
However, computing the hash after the queue was chosen defeats the
purpose of RSS.
Another approach is to use eBPF steering program. This approach has
another downside: it cannot report the calculated hash due to the
restrictive nature of eBPF.
Introduce the code to compute hashes to the kernel in order to overcome
thse challenges.
An alternative solution is to extend the eBPF steering program so that it
will be able to report to the userspace, but it is based on context
rewrites, which is in feature freeze. We can adopt kfuncs, but they will
not be UAPIs. We opt to ioctl to align with other relevant UAPIs (KVM
and vhost_net).
The patches for QEMU to use this new feature was submitted as RFC and
is available at:
https://patchew.org/QEMU/20240915-hash-v3-0-79cb08d28647@daynix.com/
This work was presented at LPC 2024:
https://lpc.events/event/18/contributions/1963/
V1 -> V2:
Changed to introduce a new BPF program type.
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
Changes in v7:
- Ensured to set hash_report to VIRTIO_NET_HASH_REPORT_NONE for
VHOST_NET_F_VIRTIO_NET_HDR.
- s/4/sizeof(u32)/ in patch "virtio_net: Add functions for hashing".
- Added tap_skb_cb type.
- Rebased.
- Link to v6: https://lore.kernel.org/r/20250109-rss-v6-0-b1c90ad708f6@daynix.com
Changes in v6:
- Extracted changes to fill vnet header holes into another series.
- Squashed patches "skbuff: Introduce SKB_EXT_TUN_VNET_HASH", "tun:
Introduce virtio-net hash reporting feature", and "tun: Introduce
virtio-net RSS" into patch "tun: Introduce virtio-net hash feature".
- Dropped the RFC tag.
- Link to v5: https://lore.kernel.org/r/20241008-rss-v5-0-f3cf68df005d@daynix.com
Changes in v5:
- Fixed a compilation error with CONFIG_TUN_VNET_CROSS_LE.
- Optimized the calculation of the hash value according to:
https://git.dpdk.org/dpdk/commit/?id=3fb1ea032bd6ff8317af5dac9af901f1f324ca…
- Added patch "tun: Unify vnet implementation".
- Dropped patch "tap: Pad virtio header with zero".
- Added patch "selftest: tun: Test vnet ioctls without device".
- Reworked selftests to skip for older kernels.
- Documented the case when the underlying device is deleted and packets
have queue_mapping set by TC.
- Reordered test harness arguments.
- Added code to handle fragmented packets.
- Link to v4: https://lore.kernel.org/r/20240924-rss-v4-0-84e932ec0e6c@daynix.com
Changes in v4:
- Moved tun_vnet_hash_ext to if_tun.h.
- Renamed virtio_net_toeplitz() to virtio_net_toeplitz_calc().
- Replaced htons() with cpu_to_be16().
- Changed virtio_net_hash_rss() to return void.
- Reordered variable declarations in virtio_net_hash_rss().
- Removed virtio_net_hdr_v1_hash_from_skb().
- Updated messages of "tap: Pad virtio header with zero" and
"tun: Pad virtio header with zero".
- Fixed vnet_hash allocation size.
- Ensured to free vnet_hash when destructing tun_struct.
- Link to v3: https://lore.kernel.org/r/20240915-rss-v3-0-c630015db082@daynix.com
Changes in v3:
- Reverted back to add ioctl.
- Split patch "tun: Introduce virtio-net hashing feature" into
"tun: Introduce virtio-net hash reporting feature" and
"tun: Introduce virtio-net RSS".
- Changed to reuse hash values computed for automq instead of performing
RSS hashing when hash reporting is requested but RSS is not.
- Extracted relevant data from struct tun_struct to keep it minimal.
- Added kernel-doc.
- Changed to allow calling TUNGETVNETHASHCAP before TUNSETIFF.
- Initialized num_buffers with 1.
- Added a test case for unclassified packets.
- Fixed error handling in tests.
- Changed tests to verify that the queue index will not overflow.
- Rebased.
- Link to v2: https://lore.kernel.org/r/20231015141644.260646-1-akihiko.odaki@daynix.com
---
Akihiko Odaki (6):
virtio_net: Add functions for hashing
net: flow_dissector: Export flow_keys_dissector_symmetric
tun: Introduce virtio-net hash feature
selftest: tun: Test vnet ioctls without device
selftest: tun: Add tests for virtio-net hashing
vhost/net: Support VIRTIO_NET_F_HASH_REPORT
Documentation/networking/tuntap.rst | 7 +
drivers/net/Kconfig | 1 +
drivers/net/tap.c | 62 +++-
drivers/net/tun.c | 89 ++++-
drivers/net/tun_vnet.h | 180 +++++++++-
drivers/vhost/net.c | 49 +--
include/linux/if_tap.h | 2 +
include/linux/skbuff.h | 3 +
include/linux/virtio_net.h | 188 +++++++++++
include/net/flow_dissector.h | 1 +
include/uapi/linux/if_tun.h | 75 +++++
net/core/flow_dissector.c | 3 +-
net/core/skbuff.c | 4 +
tools/testing/selftests/net/Makefile | 2 +-
tools/testing/selftests/net/tun.c | 627 ++++++++++++++++++++++++++++++++++-
15 files changed, 1231 insertions(+), 62 deletions(-)
---
base-commit: dd83757f6e686a2188997cb58b5975f744bb7786
change-id: 20240403-rss-e737d89efa77
prerequisite-change-id: 20241230-tun-66e10a49b0c7:v6
prerequisite-patch-id: 871dc5f146fb6b0e3ec8612971a8e8190472c0fb
prerequisite-patch-id: 2797ed249d32590321f088373d4055ff3f430a0e
prerequisite-patch-id: ea3370c72d4904e2f0536ec76ba5d26784c0cede
prerequisite-patch-id: 837e4cf5d6b451424f9b1639455e83a260c4440d
prerequisite-patch-id: ea701076f57819e844f5a35efe5cbc5712d3080d
prerequisite-patch-id: 701646fb43ad04cc64dd2bf13c150ccbe6f828ce
prerequisite-patch-id: 53176dae0c003f5b6c114d43f936cf7140d31bb5
prerequisite-change-id: 20250116-buffers-96e14bf023fc:v2
prerequisite-patch-id: 25fd4f99d4236a05a5ef16ab79f3e85ee57e21cc
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>
[cc'ing linux-kselftest and kunit-dev]
Hi,
On Wed, 5 Mar 2025 01:47:55 +0800, kernel test robot wrote:
> tree: https://github.com/brauner/linux.git vfs.all
> head: ea47e99a3a234837d5fea0d1a20bb2ad1eaa6dd4
> commit: b6736cfccb582b7c016cba6cd484fbcf30d499af [205/231] initramfs_test: kunit tests for initramfs unpacking
> config: x86_64-buildonly-randconfig-002-20250304 (https://download.01.org/0day-ci/archive/20250305/202503050109.t5Ab93hX-lkp@…)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250305/202503050109.t5Ab93hX-lkp@…)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp(a)intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202503050109.t5Ab93hX-lkp@intel.com/
>
> All warnings (new ones prefixed by >>, old ones prefixed by <<):
>
> >> WARNING: modpost: vmlinux: section mismatch in reference: initramfs_test_cases+0x0 (section: .data) -> initramfs_test_extract (section: .init.text)
> >> WARNING: modpost: vmlinux: section mismatch in reference: initramfs_test_cases+0x30 (section: .data) -> initramfs_test_fname_overrun (section: .init.text)
> >> WARNING: modpost: vmlinux: section mismatch in reference: initramfs_test_cases+0x60 (section: .data) -> initramfs_test_data (section: .init.text)
> >> WARNING: modpost: vmlinux: section mismatch in reference: initramfs_test_cases+0x90 (section: .data) -> initramfs_test_csum (section: .init.text)
> >> WARNING: modpost: vmlinux: section mismatch in reference: initramfs_test_cases+0xc0 (section: .data) -> initramfs_test_hardlink (section: .init.text)
> >> WARNING: modpost: vmlinux: section mismatch in reference: initramfs_test_cases+0xf0 (section: .data) -> initramfs_test_many (section: .init.text)
These new warnings are covered in the commit message. The
kunit_test_init_section_suites() registered tests aren't in the .init
section as debugfs entries are retained for results reporting (without
an ability to rerun them).
IIUC, the __kunit_init_test_suites->CONCATENATE(..., _probe) suffix is
intended to suppress the modpost warning - @kunit-dev: any ideas why
this isn't working as intended?
Thanks, David
Vector registers are zero initialized by the kernel. Stop accepting
"all ones" as a clean value.
Note that this was not working as expected given that
value == 0xff
can be assumed to be always false by the compiler as value's range is
[-128, 127]. Both GCC (-Wtype-limits) and clang
(-Wtautological-constant-out-of-range-compare) warn about this.
Signed-off-by: Ignacio Encinas <ignacio(a)iencinas.com>
---
I tried looking why "all ones" was previously deemed a "clean" value but
couldn't find any information. It looks like the kernel always
zero-initializes the vector registers.
If "all ones" is still acceptable for any reason, my intention is to
spin a v2 changing the types of `value` and `prev_value` to unsigned
char.
---
tools/testing/selftests/riscv/vector/v_exec_initval_nolibc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/riscv/vector/v_exec_initval_nolibc.c b/tools/testing/selftests/riscv/vector/v_exec_initval_nolibc.c
index 35c0812e32de0c82a54f84bd52c4272507121e35..b712c4d258a6cb045aa96de4a75299714866f5e6 100644
--- a/tools/testing/selftests/riscv/vector/v_exec_initval_nolibc.c
+++ b/tools/testing/selftests/riscv/vector/v_exec_initval_nolibc.c
@@ -6,7 +6,7 @@
* the values. To further ensure consistency, this file is compiled without
* libc and without auto-vectorization.
*
- * To be "clean" all values must be either all ones or all zeroes.
+ * To be "clean" all values must be all zeroes.
*/
#define __stringify_1(x...) #x
@@ -46,7 +46,7 @@ int main(int argc, char **argv)
: "=r" (value)); \
if (first) { \
first = 0; \
- } else if (value != prev_value || !(value == 0x00 || value == 0xff)) { \
+ } else if (value != prev_value || value != 0x00) { \
printf("Register " __stringify(register) \
" values not clean! value: %u\n", value); \
exit(-1); \
---
base-commit: 03d38806a902b36bf364cae8de6f1183c0a35a67
change-id: 20250301-fix-v_exec_initval_nolibc-498d976c372d
Best regards,
--
Ignacio Encinas <ignacio(a)iencinas.com>
Documentation/dev-tools/kselftest.rst says you can use the "TARGETS"
variable on the make command line to run only tests targeted for a
single subsystem:
$ make TARGETS="size timers" kselftest
A natural way to narrow down further to a particular test in a subsystem
is to specify e.g., TEST_GEN_PROGS:
$ make TARGETS=net TEST_PROGS= TEST_GEN_PROGS=tun kselftest
However, this does not work well because the following statement in
tools/testing/selftests/lib.mk gets ignored:
TEST_GEN_PROGS := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS))
Add the override directive to make it and similar ones will be effective
even when TEST_GEN_PROGS and similar variables are specified in the
command line.
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
tools/testing/selftests/lib.mk | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index d6edcfcb5be832ddee4c3d34b5ad221e9295f878..68116e51f97d62376c63f727ba3fd1f616c67562 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -93,9 +93,9 @@ TOOLS_INCLUDES := -isystem $(top_srcdir)/tools/include/uapi
# TEST_PROGS are for test shell scripts.
# TEST_CUSTOM_PROGS and TEST_PROGS will be run by common run_tests
# and install targets. Common clean doesn't touch them.
-TEST_GEN_PROGS := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS))
-TEST_GEN_PROGS_EXTENDED := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS_EXTENDED))
-TEST_GEN_FILES := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_FILES))
+override TEST_GEN_PROGS := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS))
+override TEST_GEN_PROGS_EXTENDED := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS_EXTENDED))
+override TEST_GEN_FILES := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_FILES))
all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) \
$(if $(TEST_GEN_MODS_DIR),gen_mods_dir)
---
base-commit: dd83757f6e686a2188997cb58b5975f744bb7786
change-id: 20250306-lib-4ac9711c10a2
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>
The mac address on backup slave should be convert from Solicited-Node
Multicast address, not from bonding unicast target address.
v3: also fix the mac setting for slave_set_ns_maddr. (Jay)
Add function description for slave_set_ns_maddr/slave_set_ns_maddrs (Jay)
v2: fix patch 01's subject
Hangbin Liu (2):
bonding: fix incorrect MAC address setting to receive NS messages
selftests: bonding: fix incorrect mac address
drivers/net/bonding/bond_options.c | 55 ++++++++++++++++---
.../drivers/net/bonding/bond_options.sh | 4 +-
2 files changed, 49 insertions(+), 10 deletions(-)
--
2.46.0
Fixes an issue where out-of-tree kselftest builds fail when building
the BPF and bpftools components. The failure occurs because the top-level
Makefile passes a relative srctree path to its sub-Makefiles, which
leads to errors in locating necessary files.
For example, the following error is encountered:
```
$ make V=1 O=$build/ TARGETS=hid kselftest-all
...
make -C ../tools/testing/selftests all
make[4]: Entering directory '/path/to/linux/tools/testing/selftests/hid'
make -C /path/to/linux/tools/testing/selftests/../../../tools/lib/bpf OUTPUT=/path/to/linux/O/kselftest/hid/tools/build/libbpf/ \
EXTRA_CFLAGS='-g -O0' \
DESTDIR=/path/to/linux/O/kselftest/hid/tools prefix= all install_headers
make[5]: Entering directory '/path/to/linux/tools/lib/bpf'
...
make[5]: Entering directory '/path/to/linux/tools/bpf/bpftool'
Makefile:127: ../tools/build/Makefile.feature: No such file or directory
make[5]: *** No rule to make target '../tools/build/Makefile.feature'. Stop.
```
To resolve this, override the srctree in the kselftests's top Makefile
when performing an out-of-tree build. This ensures that all sub-Makefiles
have the correct path to the source tree, preventing directory resolution
errors.
Cc: Andrii Nakryiko <andrii.nakryiko(a)gmail.com>
Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com>
Tested-by: Quentin Monnet <qmo(a)kernel.org>
---
Cc: Masahiro Yamada <masahiroy(a)kernel.org>
V3:
collected Tested-by and rebased on bpf-next
V2:
- handle srctree in selftests itself rather than the linux' top Makefile # Masahiro Yamada <masahiroy(a)kernel.org>
V1: https://lore.kernel.org/lkml/20241217031052.69744-1-lizhijian@fujitsu.com/
---
tools/testing/selftests/Makefile | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 2401e973c359..f04a3b0003f6 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -154,15 +154,19 @@ override LDFLAGS =
override MAKEFLAGS =
endif
+top_srcdir ?= ../../..
+
# Append kselftest to KBUILD_OUTPUT and O to avoid cluttering
# KBUILD_OUTPUT with selftest objects and headers installed
# by selftests Makefile or lib.mk.
+# Override the `srctree` variable to ensure it is correctly resolved in
+# sub-Makefiles, such as those within `bpf`, when managing targets like
+# `net` and `hid`.
ifdef building_out_of_srctree
override LDFLAGS =
+override srctree := $(top_srcdir)
endif
-top_srcdir ?= ../../..
-
ifeq ("$(origin O)", "command line")
KBUILD_OUTPUT := $(O)
endif
--
2.44.0
From: Jeff Xu <jeffxu(a)chromium.org>
This is V9 version, addressing comments from V8, without code logic
change.
-------------------------------------------------------------------
As discussed during mseal() upstream process [1], mseal() protects
the VMAs of a given virtual memory range against modifications, such
as the read/write (RW) and no-execute (NX) bits. For complete
descriptions of memory sealing, please see mseal.rst [2].
The mseal() is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For
example, such an attacker primitive can break control-flow integrity
guarantees since read-only memory that is supposed to be trusted can
become writable or .text pages can get remapped.
The system mappings are readonly only, memory sealing can protect
them from ever changing to writable or unmmap/remapped as different
attributes.
System mappings such as vdso, vvar, vvar_vclock,
vectors (arm compat-mode), sigpage (arm compat-mode),
are created by the kernel during program initialization, and could
be sealed after creation.
Unlike the aforementioned mappings, the uprobe mapping is not
established during program startup. However, its lifetime is the same
as the process's lifetime [3]. It could be sealed from creation.
The vsyscall on x86-64 uses a special address (0xffffffffff600000),
which is outside the mm managed range. This means mprotect, munmap, and
mremap won't work on the vsyscall. Since sealing doesn't enhance
the vsyscall's security, it is skipped in this patch. If we ever seal
the vsyscall, it is probably only for decorative purpose, i.e. showing
the 'sl' flag in the /proc/pid/smaps. For this patch, it is ignored.
It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
alter the system mappings during restore operations. UML(User Mode Linux)
and gVisor, rr are also known to change the vdso/vvar mappings.
Consequently, this feature cannot be universally enabled across all
systems. As such, CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default.
To support mseal of system mappings, architectures must define
CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS and update their special
mappings calls to pass mseal flag. Additionally, architectures must
confirm they do not unmap/remap system mappings during the process
lifetime. The existence of this flag for an architecture implies that
it does not require the remapping of thest system mappings during
process lifetime, so sealing these mappings is safe from a kernel
perspective.
This version covers x86-64 and arm64 archiecture as minimum viable feature.
While no specific CPU hardware features are required for enable this
feature on an archiecture, memory sealing requires a 64-bit kernel. Other
architectures can choose whether or not to adopt this feature. Currently,
I'm not aware of any instances in the kernel code that actively
munmap/mremap a system mapping without a request from userspace. The PPC
does call munmap when _install_special_mapping fails for vdso; however,
it's uncertain if this will ever fail for PPC - this needs to be
investigated by PPC in the future [4]. The UML kernel can add this support
when KUnit tests require it [5].
In this version, we've improved the handling of system mapping sealing from
previous versions, instead of modifying the _install_special_mapping
function itself, which would affect all architectures, we now call
_install_special_mapping with a sealing flag only within the specific
architecture that requires it. This targeted approach offers two key
advantages: 1) It limits the code change's impact to the necessary
architectures, and 2) It aligns with the software architecture by keeping
the core memory management within the mm layer, while delegating the
decision of sealing system mappings to the individual architecture, which
is particularly relevant since 32-bit architectures never require sealing.
Prior to this patch series, we explored sealing special mappings from
userspace using glibc's dynamic linker. This approach revealed several
issues:
- The PT_LOAD header may report an incorrect length for vdso, (smaller
than its actual size). The dynamic linker, which relies on PT_LOAD
information to determine mapping size, would then split and partially
seal the vdso mapping. Since each architecture has its own vdso/vvar
code, fixing this in the kernel would require going through each
archiecture. Our initial goal was to enable sealing readonly mappings,
e.g. .text, across all architectures, sealing vdso from kernel since
creation appears to be simpler than sealing vdso at glibc.
- The [vvar] mapping header only contains address information, not length
information. Similar issues might exist for other special mappings.
- Mappings like uprobe are not covered by the dynamic linker,
and there is no effective solution for them.
This feature's security enhancements will benefit ChromeOS, Android,
and other high security systems.
Testing:
This feature was tested on ChromeOS and Android for both x86-64 and ARM64.
- Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly,
i.e. "sl" shown in the smaps for those mappings, and mremap is blocked.
- Passing various automation tests (e.g. pre-checkin) on ChromeOS and
Android to ensure the sealing doesn't affect the functionality of
Chromebook and Android phone.
I also tested the feature on Ubuntu on x86-64:
- With config disabled, vdso/vvar is not sealed,
- with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK,
normal operations such as browsing the web, open/edit doc are OK.
Link: https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/ [1]
Link: Documentation/userspace-api/mseal.rst [2]
Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZx… [3]
Link: https://lore.kernel.org/all/CABi2SkV6JJwJeviDLsq9N4ONvQ=EFANsiWkgiEOjyT9TQS… [4]
Link: https://lore.kernel.org/all/202502251035.239B85A93@keescook/ [5]
-------------------------------------------
History:
V9:
- Add negative test in selftest (Kees Cook)
- fx typos in text (Kees Cook)
V8:
- Change ARCH_SUPPORTS_MSEAL_X to ARCH_SUPPORTS_MSEAL_X (Liam R. Howlett)
- Update comments in Kconfig and mseal.rst (Lorenzo Stoakes, Liam R. Howlett)
- Change patch header perfix to "mseal sysmap" (Lorenzo Stoakes)
- Remove "vm_flags =" (Kees Cook, Liam R. Howlett, Oleg Nesterov)
- Drop uml architecture (Lorenzo Stoakes, Kees Cook)
- Add a selftest to verify system mappings are sealed (Lorenzo Stoakes)
V7:
https://lore.kernel.org/all/20250224225246.3712295-1-jeffxu@google.com/
- Remove cover letter from the first patch (Liam R. Howlett)
- Change macro name to VM_SEALED_SYSMAP (Liam R. Howlett)
- logging and fclose() in selftest (Liam R. Howlett)
V6:
https://lore.kernel.org/all/20250224174513.3600914-1-jeffxu@google.com/
- mseal.rst: fix a typo (Randy Dunlap)
- security/Kconfig: add rr into note (Liam R. Howlett)
- remove mseal_system_mappings() and use macro instead (Liam R. Howlett)
- mseal.rst: add incompatible userland software (Lorenzo Stoakes)
- remove RFC from title (Kees Cook)
V5
https://lore.kernel.org/all/20250212032155.1276806-1-jeffxu@google.com/
- Remove kernel cmd line (Lorenzo Stoakes)
- Add test info (Lorenzo Stoakes)
- Add threat model info (Lorenzo Stoakes)
- Fix x86 selftest: test_mremap_vdso
- Restrict code change to ARM64/x86-64/UM arch only.
- Add userprocess.h to include seal_system_mapping().
- Remove sealing vsyscall.
- Split the patch.
V4:
https://lore.kernel.org/all/20241125202021.3684919-1-jeffxu@google.com/
- ARCH_HAS_SEAL_SYSTEM_MAPPINGS (Lorenzo Stoakes)
- test info (Lorenzo Stoakes)
- Update mseal.rst (Liam R. Howlett)
- Update test_mremap_vdso.c (Liam R. Howlett)
- Misc. style, comments, doc update (Liam R. Howlett)
V3:
https://lore.kernel.org/all/20241113191602.3541870-1-jeffxu@google.com/
- Revert uprobe to v1 logic (Oleg Nesterov)
- use CONFIG_SEAL_SYSTEM_MAPPINGS instead of _ALWAYS/_NEVER (Kees Cook)
- Move kernel cmd line from fs/exec.c to mm/mseal.c and
misc. (Liam R. Howlett)
V2:
https://lore.kernel.org/all/20241014215022.68530-1-jeffxu@google.com/
- Seal uprobe always (Oleg Nesterov)
- Update comments and description (Randy Dunlap, Liam R.Howlett, Oleg Nesterov)
- Rebase to linux_main
V1:
- https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/
--------------------------------------------------
Jeff Xu (7):
mseal sysmap: kernel config and header change
selftests: x86: test_mremap_vdso: skip if vdso is msealed
mseal sysmap: enable x86-64
mseal sysmap: enable arm64
mseal sysmap: uprobe mapping
mseal sysmap: update mseal.rst
selftest: test system mappings are sealed.
Documentation/userspace-api/mseal.rst | 20 +++
arch/arm64/Kconfig | 1 +
arch/arm64/kernel/vdso.c | 12 +-
arch/x86/Kconfig | 1 +
arch/x86/entry/vdso/vma.c | 7 +-
include/linux/mm.h | 10 ++
init/Kconfig | 22 ++++
kernel/events/uprobes.c | 3 +-
security/Kconfig | 21 ++++
tools/testing/selftests/Makefile | 1 +
.../mseal_system_mappings/.gitignore | 2 +
.../selftests/mseal_system_mappings/Makefile | 6 +
.../selftests/mseal_system_mappings/config | 1 +
.../mseal_system_mappings/sysmap_is_sealed.c | 119 ++++++++++++++++++
.../testing/selftests/x86/test_mremap_vdso.c | 43 +++++++
15 files changed, 261 insertions(+), 8 deletions(-)
create mode 100644 tools/testing/selftests/mseal_system_mappings/.gitignore
create mode 100644 tools/testing/selftests/mseal_system_mappings/Makefile
create mode 100644 tools/testing/selftests/mseal_system_mappings/config
create mode 100644 tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c
--
2.48.1.711.g2feabab25a-goog
The first patch makes use of the correct terminology for synchronous and
asynchronous errors. The second patch checks whether PROT_MTE is
supported on hugetlb mappings before continuing with the tests. Such
support was added in 6.13 but people tend to use current kselftests on
older kernels. Avoid the failure reporting on such kernels, just skip
the tests.
Catalin Marinas (2):
kselftest/arm64: mte: Use the correct naming for tag check modes in
check_hugetlb_options.c
kselftest/arm64: mte: Skip the hugetlb tests if MTE not supported on
such mappings
.../arm64/mte/check_hugetlb_options.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
On Sat 2025-02-15 14:52:22, Tamir Duberstein wrote:
> On Sat, Feb 15, 2025 at 1:51 PM kernel test robot <lkp(a)intel.com> wrote:
> >
> > Hi Tamir,
> >
> > kernel test robot noticed the following build warnings:
> >
> > [auto build test WARNING on 7b7a883c7f4de1ee5040bd1c32aabaafde54d209]
> >
> > url:
> https://github.com/intel-lab-lkp/linux/commits/Tamir-Duberstein/scanf-impli…
> > base: 7b7a883c7f4de1ee5040bd1c32aabaafde54d209
> > patch link:
> https://lore.kernel.org/r/20250214-scanf-kunit-convert-v8-3-5ea50f95f83c%40…
> > patch subject: [PATCH v8 3/4] scanf: convert self-test to KUnit
> > config: sh-randconfig-002-20250216 (
> https://download.01.org/0day-ci/archive/20250216/202502160245.KUrryBJR-lkp@…
> )
> > compiler: sh4-linux-gcc (GCC) 14.2.0
> > reproduce (this is a W=1 build): (
> https://download.01.org/0day-ci/archive/20250216/202502160245.KUrryBJR-lkp@…
> )
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new
> version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <lkp(a)intel.com>
> > | Closes:
> https://lore.kernel.org/oe-kbuild-all/202502160245.KUrryBJR-lkp@intel.com/
> >
> > All warnings (new ones prefixed by >>):
> >
> > In file included from <command-line>:
> > lib/tests/scanf_kunit.c: In function 'numbers_list_ll':
> > >> include/linux/compiler.h:197:61: warning: function 'numbers_list_ll'
> might be a candidate for 'gnu_scanf' format attribute
> [-Wsuggest-attribute=format]
>
> I am not able to reproduce these warnings with clang 19.1.7. They also
> don't obviously make sense to me.
I have reproduced the problem with gcc:
$> gcc --version
gcc (SUSE Linux) 14.2.1 20250220 [revision 9ffecde121af883b60bbe60d00425036bc873048]
$> make W=1 lib/test_scanf.ko
CALL scripts/checksyscalls.sh
DESCEND objtool
INSTALL libsubcmd_headers
CC [M] lib/test_scanf.o
In file included from <command-line>:
lib/test_scanf.c: In function ‘numbers_list_ll’:
./include/linux/compiler.h:197:61: warning: function ‘numbers_list_ll’ might be a candidate for ‘gnu_scanf’ format attribute [-Wsuggest-attribute=format]
197 | #define __BUILD_BUG_ON_ZERO_MSG(e, msg) ((int)sizeof(struct {_Static_assert(!(e), msg);}))
| ^
[...]
It seems that it is a regression introduced by the first
patch of this patch set. And the fix is:
diff --git a/lib/test_scanf.c b/lib/test_scanf.c
index d1664e0d0138..e65b10c3dc11 100644
--- a/lib/test_scanf.c
+++ b/lib/test_scanf.c
@@ -27,7 +27,7 @@ static struct rnd_state rnd_state __initdata;
typedef int (*check_fn)(const char *file, const int line, const void *check_data,
const char *string, const char *fmt, int n_args, va_list ap);
-static void __scanf(6, 0) __init
+static void __scanf(6, 8) __init
_test(const char *file, const int line, check_fn fn, const void *check_data, const char *string,
const char *fmt, int n_args, ...)
{
Best Regards,
Petr
Nolibc is useful for selftests as the test programs can be very small,
and compiled with just a kernel crosscompiler, without userspace support.
Currently nolibc is only usable with kselftest.h, not the more
convenient to use kselftest_harness.h
This series provides this compatibility by adding new features to nolibc
and removing the usage of problematic features from the harness.
The first half of the series are changes to the harness, the second one
are for nolibc. Both parts are very independent and can go through
different trees.
The last patch is not meant to be applied and serves as test that
everything works correctly.
Based on the next branch of the nolibc tree:
https://web.git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc.git…
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Thomas Weißschuh (32):
selftests: harness: Add harness selftest
selftests: harness: Use C89 comment style
selftests: harness: Ignore unused variant argument warning
selftests: harness: Mark functions without prototypes static
selftests: harness: Remove inline qualifier for wrappers
selftests: harness: Guard includes on nolibc
selftests: harness: Remove dependency on libatomic
selftests: harness: Implement test timeouts through pidfd
selftests: harness: Don't set setup_completed for fixtureless tests
selftests: harness: Always provide "self" and "variant"
selftests: harness: Move teardown conditional into test metadata
selftests: harness: Add teardown callback to test metadata
selftests: harness: Stop using setjmp()/longjmp()
tools/nolibc: handle intmax_t/uintmax_t in printf
tools/nolibc: use intmax definitions from compiler
tools/nolibc: use pselect6_time64 if available
tools/nolibc: use ppoll_time64 if available
tools/nolibc: add tolower() and toupper()
tools/nolibc: add _exit()
tools/nolibc: add setpgrp()
tools/nolibc: implement waitpid() in terms of waitid()
Revert "selftests/nolibc: use waitid() over waitpid()"
tools/nolibc: add dprintf() and vdprintf()
tools/nolibc: add getopt()
tools/nolibc: allow different write callbacks in printf
tools/nolibc: allow limiting of printf destination size
tools/nolibc: add snprintf() and friends
selftests/nolibc: use snprintf() for printf tests
selftests/nolibc: rename vfprintf test suite
selftests/nolibc: add test for snprintf() truncation
tools/nolibc: implement width padding in printf()
HACK: selftests/nolibc: demonstrate usage of the kselftest harness
tools/include/nolibc/Makefile | 1 +
tools/include/nolibc/getopt.h | 105 ++
tools/include/nolibc/nolibc.h | 1 +
tools/include/nolibc/stdint.h | 4 +-
tools/include/nolibc/stdio.h | 127 +-
tools/include/nolibc/string.h | 17 +
tools/include/nolibc/sys.h | 102 +-
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/kselftest/.gitignore | 1 +
tools/testing/selftests/kselftest/Makefile | 6 +
.../testing/selftests/kselftest/harness-selftest.c | 129 ++
.../selftests/kselftest/harness-selftest.expected | 62 +
.../selftests/kselftest/harness-selftest.sh | 14 +
tools/testing/selftests/kselftest_harness.h | 188 +--
tools/testing/selftests/nolibc/Makefile | 17 +-
tools/testing/selftests/nolibc/harness-selftest.c | 1 +
tools/testing/selftests/nolibc/nolibc-test.c | 1712 +-------------------
tools/testing/selftests/nolibc/run-tests.sh | 2 +-
18 files changed, 639 insertions(+), 1851 deletions(-)
---
base-commit: cb839e0cc881b4abd4a2e64cd06c2e313987a189
change-id: 20250130-nolibc-kselftest-harness-8b2c8cac43bf
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Hi,
Please find the upcoming changes for CONFIG_PREEMPT_LAZY in RCU. The
changes can also be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux.git lazypreempt.2025.02.24a
Paul & Ankur, I put patch #7 and #8 (bug fixes in rcutorture) before
patch #9 (which is the one that enables non-preemptible RCU in
preemptible kernel), because I want to avoid introduce a bug in-between
a series, appreciate it if you can double check on this. Thanks!
Regards,
Boqun
Ankur Arora (7):
rcu: fix header guard for rcu_all_qs()
rcu: rename PREEMPT_AUTO to PREEMPT_LAZY
sched: update __cond_resched comment about RCU quiescent states
rcu: handle unstable rdp in rcu_read_unlock_strict()
rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
osnoise: provide quiescent states
rcu: limit PREEMPT_RCU configurations
Boqun Feng (1):
rcutorture: Update ->extendables check for lazy preemption
Paul E. McKenney (3):
rcutorture: Update rcutorture_one_extend_check() for lazy preemption
rcutorture: Make scenario TREE10 build CONFIG_PREEMPT_LAZY=y
rcutorture: Make scenario TREE07 build CONFIG_PREEMPT_LAZY=y
include/linux/rcupdate.h | 2 +-
include/linux/rcutree.h | 2 +-
include/linux/srcutiny.h | 2 +-
kernel/rcu/Kconfig | 4 +--
kernel/rcu/rcutorture.c | 26 ++++++++++++---
kernel/rcu/srcutiny.c | 14 ++++----
kernel/rcu/tree_plugin.h | 22 ++++++++++---
kernel/sched/core.c | 4 ++-
kernel/trace/trace_osnoise.c | 32 +++++++++----------
.../selftests/rcutorture/configs/rcu/TREE07 | 3 +-
.../selftests/rcutorture/configs/rcu/TREE10 | 3 +-
11 files changed, 73 insertions(+), 41 deletions(-)
--
2.39.5 (Apple Git-154)
Commit 29b036be1b0b ("selftests: drv-net: test XDP, HDS auto and
the ioctl path") added a new test case in the net tree, now that
this code has made its way to net-next convert it to use the env.rpath()
helper instead of manually computing the relative path.
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
tools/testing/selftests/drivers/net/hds.py | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/hds.py b/tools/testing/selftests/drivers/net/hds.py
index 873f5219e41d..7cc74faed743 100755
--- a/tools/testing/selftests/drivers/net/hds.py
+++ b/tools/testing/selftests/drivers/net/hds.py
@@ -20,8 +20,7 @@ from lib.py import defer, ethtool, ip
def _xdp_onoff(cfg):
- test_dir = os.path.dirname(os.path.realpath(__file__))
- prog = test_dir + "/../../net/lib/xdp_dummy.bpf.o"
+ prog = cfg.rpath("../../net/lib/xdp_dummy.bpf.o")
ip("link set dev %s xdp obj %s sec xdp" %
(cfg.ifname, prog))
ip("link set dev %s xdp off" % cfg.ifname)
--
2.48.1
This small series have various unrelated patches:
- Patch 1 and 2: improve code coverage by validating mptcp_diag_dump_one
thanks to a new tool displaying MPTCP info for a specific token.
- Patch 3: a fix for a commit which is only in net-next.
- Patch 4: reduce parameters for one in-kernel PM helper.
- Patch 5: exit early when processing an ADD_ADDR echo to avoid unneeded
operations.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Gang Yan (2):
selftests: mptcp: Add a tool to get specific msk_info
selftests: mptcp: add a test for mptcp_diag_dump_one
Geliang Tang (2):
mptcp: pm: in-kernel: avoid access entry without lock
mptcp: pm: in-kernel: reduce parameters of set_flags
Matthieu Baerts (NGI0) (1):
mptcp: pm: exit early with ADD_ADDR echo if possible
net/mptcp/pm.c | 3 +
net/mptcp/pm_netlink.c | 15 +-
tools/testing/selftests/net/mptcp/Makefile | 2 +-
tools/testing/selftests/net/mptcp/diag.sh | 27 +++
tools/testing/selftests/net/mptcp/mptcp_diag.c | 272 +++++++++++++++++++++++++
5 files changed, 311 insertions(+), 8 deletions(-)
---
base-commit: 56794b5862c5a9aefcf2b703257c6fb93f76573e
change-id: 20250228-net-next-mptcp-coverage-small-opti-70d8dc1d329d
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
From: Jeff Xu <jeffxu(a)google.com>
This is V8 version, addressing comments from V7, without code logic
change.
-------------------------------------------------------------------
As discussed during mseal() upstream process [1], mseal() protects
the VMAs of a given virtual memory range against modifications, such
as the read/write (RW) and no-execute (NX) bits. For complete
descriptions of memory sealing, please see mseal.rst [2].
The mseal() is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For
example, such an attacker primitive can break control-flow integrity
guarantees since read-only memory that is supposed to be trusted can
become writable or .text pages can get remapped.
The system mappings are readonly only, memory sealing can protect
them from ever changing to writable or unmmap/remapped as different
attributes.
System mappings such as vdso, vvar, vvar_vclock,
vectors (arm compact-mode), sigpage (arm compact-mode),
are created by the kernel during program initialization, and could
be sealed after creation.
Unlike the aforementioned mappings, the uprobe mapping is not
established during program startup. However, its lifetime is the same
as the process's lifetime [3]. It could be sealed from creation.
The vsyscall on x86-64 uses a special address (0xffffffffff600000),
which is outside the mm managed range. This means mprotect, munmap, and
mremap won't work on the vsyscall. Since sealing doesn't enhance
the vsyscall's security, it is skipped in this patch. If we ever seal
the vsyscall, it is probably only for decorative purpose, i.e. showing
the 'sl' flag in the /proc/pid/smaps. For this patch, it is ignored.
It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
alter the system mappings during restore operations. UML(User Mode Linux)
and gVisor, rr are also known to change the vdso/vvar mappings.
Consequently, this feature cannot be universally enabled across all
systems. As such, CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default.
To support mseal of system mappings, architectures must define
CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS and update their special
mappings calls to pass mseal flag. Additionally, architectures must
confirm they do not unmap/remap system mappings during the process
lifetime. The existence of this flag for an architecture implies that
it does not require the remapping of thest system mappings during
process lifetime, so sealing these mappings is safe from a kernel
perspective.
This version covers x86-64 and arm64 archiecture as minimum viable feature.
While no specific CPU hardware features are required for enable this
feature on an archiecture, memory sealing requires a 64-bit kernel. Other
architectures can choose whether or not to adopt this feature. Currently,
I'm not aware of any instances in the kernel code that actively
munmap/mremap a system mapping without a request from userspace. The PPC
does call munmap when _install_special_mapping fails for vdso; however,
it's uncertain if this will ever fail for PPC - this needs to be
investigated by PPC in the future [4]. The UML kernel can add this support
when KUnit tests require it [5].
In this version, we've improved the handling of system mapping sealing from
previous versions, instead of modifying the _install_special_mapping
function itself, which would affect all architectures, we now call
_install_special_mapping with a sealing flag only within the specific
architecture that requires it. This targeted approach offers two key
advantages: 1) It limits the code change's impact to the necessary
architectures, and 2) It aligns with the software architecture by keeping
the core memory management within the mm layer, while delegating the
decision of sealing system mappings to the individual architecture, which
is particularly relevant since 32-bit architectures never require sealing.
Prior to this patch series, we explored sealing special mappings from
userspace using glibc's dynamic linker. This approach revealed several
issues:
- The PT_LOAD header may report an incorrect length for vdso, (smaller
than its actual size). The dynamic linker, which relies on PT_LOAD
information to determine mapping size, would then split and partially
seal the vdso mapping. Since each architecture has its own vdso/vvar
code, fixing this in the kernel would require going through each
archiecture. Our initial goal was to enable sealing readonly mappings,
e.g. .text, across all architectures, sealing vdso from kernel since
creation appears to be simpler than sealing vdso at glibc.
- The [vvar] mapping header only contains address information, not length
information. Similar issues might exist for other special mappings.
- Mappings like uprobe are not covered by the dynamic linker,
and there is no effective solution for them.
This feature's security enhancements will benefit ChromeOS, Android,
and other high security systems.
Testing:
This feature was tested on ChromeOS and Android for both x86-64 and ARM64.
- Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly,
i.e. "sl" shown in the smaps for those mappings, and mremap is blocked.
- Passing various automation tests (e.g. pre-checkin) on ChromeOS and
Android to ensure the sealing doesn't affect the functionality of
Chromebook and Android phone.
I also tested the feature on Ubuntu on x86-64:
- With config disabled, vdso/vvar is not sealed,
- with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK,
normal operations such as browsing the web, open/edit doc are OK.
Link: https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/ [1]
Link: Documentation/userspace-api/mseal.rst [2]
Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZx… [3]
Link: https://lore.kernel.org/all/CABi2SkV6JJwJeviDLsq9N4ONvQ=EFANsiWkgiEOjyT9TQS… [4]
Link: https://lore.kernel.org/all/202502251035.239B85A93@keescook/ [5]
-------------------------------------------
History:
V8:
- Change ARCH_SUPPORTS_MSEAL_X to ARCH_SUPPORTS_MSEAL_X (Liam R. Howlett)
- Update comments in Kconfig and mseal.rst (Lorenzo Stoakes, Liam R. Howlett)
- Change patch header perfix to "mseal sysmap" (Lorenzo Stoakes)
- Remove "vm_flags =" (Kees Cook, Liam R. Howlett, Oleg Nesterov)
- Drop uml architecture (Lorenzo Stoakes, Kees Cook)
- Add a selftest to verify system mappings are sealed (Lorenzo Stoakes)
V7:
https://lore.kernel.org/all/20250224225246.3712295-1-jeffxu@google.com/
- Remove cover letter from the first patch (Liam R. Howlett)
- Change macro name to VM_SEALED_SYSMAP (Liam R. Howlett)
- logging and fclose() in selftest (Liam R. Howlett)
V6:
https://lore.kernel.org/all/20250224174513.3600914-1-jeffxu@google.com/
- mseal.rst: fix a typo (Randy Dunlap)
- security/Kconfig: add rr into note (Liam R. Howlett)
- remove mseal_system_mappings() and use macro instead (Liam R. Howlett)
- mseal.rst: add incompatible userland software (Lorenzo Stoakes)
- remove RFC from title (Kees Cook)
V5
https://lore.kernel.org/all/20250212032155.1276806-1-jeffxu@google.com/
- Remove kernel cmd line (Lorenzo Stoakes)
- Add test info (Lorenzo Stoakes)
- Add threat model info (Lorenzo Stoakes)
- Fix x86 selftest: test_mremap_vdso
- Restrict code change to ARM64/x86-64/UM arch only.
- Add userprocess.h to include seal_system_mapping().
- Remove sealing vsyscall.
- Split the patch.
V4:
https://lore.kernel.org/all/20241125202021.3684919-1-jeffxu@google.com/
- ARCH_HAS_SEAL_SYSTEM_MAPPINGS (Lorenzo Stoakes)
- test info (Lorenzo Stoakes)
- Update mseal.rst (Liam R. Howlett)
- Update test_mremap_vdso.c (Liam R. Howlett)
- Misc. style, comments, doc update (Liam R. Howlett)
V3:
https://lore.kernel.org/all/20241113191602.3541870-1-jeffxu@google.com/
- Revert uprobe to v1 logic (Oleg Nesterov)
- use CONFIG_SEAL_SYSTEM_MAPPINGS instead of _ALWAYS/_NEVER (Kees Cook)
- Move kernel cmd line from fs/exec.c to mm/mseal.c and
misc. (Liam R. Howlett)
V2:
https://lore.kernel.org/all/20241014215022.68530-1-jeffxu@google.com/
- Seal uprobe always (Oleg Nesterov)
- Update comments and description (Randy Dunlap, Liam R.Howlett, Oleg Nesterov)
- Rebase to linux_main
V1:
- https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/
--------------------------------------------------
Jeff Xu (7):
mseal sysmap: kernel config and header change
selftests: x86: test_mremap_vdso: skip if vdso is msealed
mseal sysmap: enable x86-64
mseal sysmap: enable arm64
mseal sysmap: uprobe mapping
mseal sysmap: update mseal.rst
selftest: test system mappings are sealed.
Documentation/userspace-api/mseal.rst | 20 ++++
arch/arm64/Kconfig | 1 +
arch/arm64/kernel/vdso.c | 12 +-
arch/x86/Kconfig | 1 +
arch/x86/entry/vdso/vma.c | 7 +-
include/linux/mm.h | 10 ++
init/Kconfig | 22 ++++
kernel/events/uprobes.c | 3 +-
security/Kconfig | 21 ++++
.../mseal_system_mappings/.gitignore | 2 +
.../selftests/mseal_system_mappings/Makefile | 6 +
.../selftests/mseal_system_mappings/config | 1 +
.../mseal_system_mappings/sysmap_is_sealed.c | 113 ++++++++++++++++++
.../testing/selftests/x86/test_mremap_vdso.c | 43 +++++++
14 files changed, 254 insertions(+), 8 deletions(-)
create mode 100644 tools/testing/selftests/mseal_system_mappings/.gitignore
create mode 100644 tools/testing/selftests/mseal_system_mappings/Makefile
create mode 100644 tools/testing/selftests/mseal_system_mappings/config
create mode 100644 tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c
--
2.48.1.711.g2feabab25a-goog
Currently testing of userspace and in-kernel API use two different
frameworks. kselftests for the userspace ones and Kunit for the
in-kernel ones. Besides their different scopes, both have different
strengths and limitations:
Kunit:
* Tests are normal kernel code.
* They use the regular kernel toolchain.
* They can be packaged and distributed as modules conveniently.
Kselftests:
* Tests are normal userspace code
* They need a userspace toolchain.
A kernel cross toolchain is likely not enough.
* A fair amout of userland is required to run the tests,
which means a full distro or handcrafted rootfs.
* There is no way to conveniently package and run kselftests with a
given kernel image.
* The kselftests makefiles are not as powerful as regular kbuild.
For example they are missing proper header dependency tracking or more
complex compiler option modifications.
Therefore kunit is much easier to run against different kernel
configurations and architectures.
This series aims to combine kselftests and kunit, avoiding both their
limitations. It works by compiling the userspace kselftests as part of
the regular kernel build, embedding them into the kunit kernel or module
and executing them from there. If the kernel toolchain is not fit to
produce userspace because of a missing libc, the kernel's own nolibc can
be used instead.
The structured TAP output from the kselftest is integrated into the
kunit KTAP output transparently, the kunit parser can parse the combined
logs together.
Further room for improvements:
* Call each test in its completely dedicated namespace
* Handle additional test files besides the test executable through
archives. CPIO, cramfs, etc.
* Compatibility with kselftest_harness.h (in progress)
* Expose the blobs in debugfs
* Provide some convience wrappers around compat userprogs
* Figure out a migration path/coexistence solution for
kunit UAPI and tools/testing/selftests/
Output from the kunit example testcase, note the output of
"example_uapi_tests".
$ ./tools/testing/kunit/kunit.py run --kunitconfig lib/kunit example
...
Running tests with:
$ .kunit/linux kunit.filter_glob=example kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[11:53:53] ================== example (10 subtests) ===================
[11:53:53] [PASSED] example_simple_test
[11:53:53] [SKIPPED] example_skip_test
[11:53:53] [SKIPPED] example_mark_skipped_test
[11:53:53] [PASSED] example_all_expect_macros_test
[11:53:53] [PASSED] example_static_stub_test
[11:53:53] [PASSED] example_static_stub_using_fn_ptr_test
[11:53:53] [PASSED] example_priv_test
[11:53:53] =================== example_params_test ===================
[11:53:53] [SKIPPED] example value 3
[11:53:53] [PASSED] example value 2
[11:53:53] [PASSED] example value 1
[11:53:53] [SKIPPED] example value 0
[11:53:53] =============== [PASSED] example_params_test ===============
[11:53:53] [PASSED] example_slow_test
[11:53:53] ======================= (4 subtests) =======================
[11:53:53] [PASSED] procfs
[11:53:53] [PASSED] userspace test 2
[11:53:53] [SKIPPED] userspace test 3: some reason
[11:53:53] [PASSED] userspace test 4
[11:53:53] ================ [PASSED] example_uapi_test ================
[11:53:53] ===================== [PASSED] example =====================
[11:53:53] ============================================================
[11:53:53] Testing complete. Ran 16 tests: passed: 11, skipped: 5
[11:53:53] Elapsed time: 67.543s total, 1.823s configuring, 65.655s building, 0.058s running
Based on v6.14-rc1 and the series
"tools/nolibc: compatibility with -Wmissing-prototypes" [0].
For compatibility with LLVM/clang another series is needed [1].
[0] https://lore.kernel.org/lkml/20250123-nolibc-prototype-v1-0-e1afc5c1999a@we…
[1] https://lore.kernel.org/lkml/20250213-kbuild-userprog-fixes-v1-0-f255fb477d…
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Thomas Weißschuh (12):
kconfig: implement CONFIG_HEADERS_INSTALL for Usermode Linux
kconfig: introduce CONFIG_ARCH_HAS_NOLIBC
kbuild: userprogs: respect CONFIG_WERROR
kbuild: userprogs: add nolibc support
kbuild: introduce blob framework
kunit: tool: Add test for nested test result reporting
kunit: tool: Don't overwrite test status based on subtest counts
kunit: tool: Parse skipped tests from kselftest.h
kunit: Introduce UAPI testing framework
kunit: uapi: Add example for UAPI tests
kunit: uapi: Introduce preinit executable
kunit: uapi: Validate usability of /proc
Documentation/kbuild/makefiles.rst | 12 +
Makefile | 5 +-
include/kunit/uapi.h | 17 ++
include/linux/blob.h | 21 ++
init/Kconfig | 2 +
lib/Kconfig.debug | 1 -
lib/kunit/Kconfig | 9 +
lib/kunit/Makefile | 17 +-
lib/kunit/kunit-example-test.c | 17 ++
lib/kunit/kunit-uapi-example.c | 58 +++++
lib/kunit/uapi-preinit.c | 61 +++++
lib/kunit/uapi.c | 250 +++++++++++++++++++++
scripts/Makefile.blobs | 19 ++
scripts/Makefile.build | 6 +
scripts/Makefile.clean | 2 +-
scripts/Makefile.userprogs | 18 +-
scripts/blob-wrap.c | 27 +++
tools/include/nolibc/Kconfig.nolibc | 18 ++
tools/testing/kunit/kunit_parser.py | 13 +-
tools/testing/kunit/kunit_tool_test.py | 9 +
.../test_is_test_passed-failure-nested.log | 10 +
.../test_data/test_is_test_passed-kselftest.log | 3 +-
22 files changed, 584 insertions(+), 11 deletions(-)
---
base-commit: 20e952894066214a80793404c9578d72ef89c5e0
change-id: 20241015-kunit-kselftests-56273bc40442
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
The quiet infrastructure was moved out of Makefile.build to accomidate
the new syscall table generation scripts in perf. Syscall table
generation wanted to also be able to be quiet, so instead of again
copying the code to set the quiet variables, the code was moved into
Makefile.perf to be used globally. This was not the right solution. It
should have been moved even further upwards in the call chain.
Makefile.include is imported in many files so this seems like a proper
place to put it.
To:
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
---
Changes in v3:
- Add back erroneously removed "silent=1" (Jiri)
- Link to v2: https://lore.kernel.org/r/20250210-quiet_tools-v2-0-b2f18cbf72af@rivosinc.c…
Changes in v2:
- Fix spacing around Q= (Andrii)
- Link to v1: https://lore.kernel.org/r/20250203-quiet_tools-v1-0-d25c8956e59a@rivosinc.c…
---
Charlie Jenkins (2):
tools: Unify top-level quiet infrastructure
tools: Remove redundant quiet setup
tools/arch/arm64/tools/Makefile | 6 -----
tools/bpf/Makefile | 6 -----
tools/bpf/bpftool/Documentation/Makefile | 6 -----
tools/bpf/bpftool/Makefile | 6 -----
tools/bpf/resolve_btfids/Makefile | 2 --
tools/bpf/runqslower/Makefile | 5 +---
tools/build/Makefile | 8 +-----
tools/lib/bpf/Makefile | 13 ----------
tools/lib/perf/Makefile | 13 ----------
tools/lib/thermal/Makefile | 13 ----------
tools/objtool/Makefile | 6 -----
tools/perf/Makefile.perf | 41 -------------------------------
tools/scripts/Makefile.include | 30 ++++++++++++++++++++++
tools/testing/selftests/bpf/Makefile.docs | 6 -----
tools/testing/selftests/hid/Makefile | 2 --
tools/thermal/lib/Makefile | 13 ----------
tools/tracing/latency/Makefile | 6 -----
tools/tracing/rtla/Makefile | 6 -----
tools/verification/rv/Makefile | 6 -----
19 files changed, 32 insertions(+), 162 deletions(-)
---
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
change-id: 20250203-quiet_tools-9a6ea9d65a19
--
- Charlie
This patchset introduces a new feature to the netconsole extradata
subsystem that enables the inclusion of the current task's name in the
sysdata output of netconsole messages.
This enhancement is particularly valuable for large-scale deployments,
such as Meta's, where netconsole collects messages from millions of
servers and stores them in a data warehouse for analysis. Engineers
often rely on these messages to investigate issues and assess kernel
health.
One common challenge we face is determining the context in which
a particular message was generated. By including the task name
(task->comm) with each message, this feature provides a direct answer to
the frequently asked question: "What was running when this message was
generated?"
This added context will significantly improve our ability to diagnose
and troubleshoot issues, making it easier to interpret output of
netconsole.
The patchset consists of seven patches that implement the following changes:
* Refactor CPU number formatting into a separate function
* Prefix CPU_NR sysdata feature with SYSDATA_
* Patch to covert a bitwise operation into boolean
* Add configfs controls for taskname sysdata feature
* Add taskname to extradata entry count
* Add support for including task name in netconsole's extra data output
* Document the task name feature in Documentation/networking/netconsole.rst
* Add test coverage for the task name feature to the existing sysdata selftest script
These changes allow users to enable or disable the task name feature via
configfs and provide additional context for kernel messages by showing
which task generated each console message.
I have tested these patches on some servers and they seem to work as
expected.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v2:
- Add an extra patch to convert the comparison more stable. (Paolo)
- Changed the argument of a function (Simon)
- Removed the warn on `current == NULLL` since it shouldn't be the case.
(Simon and Paolo)
- Link to v1: https://lore.kernel.org/r/20250221-netcons_current-v1-0-21c86ae8fc0d@debian…
---
Breno Leitao (8):
netconsole: prefix CPU_NR sysdata feature with SYSDATA_
netconsole: Make boolean comparison consistent
netconsole: refactor CPU number formatting into separate function
netconsole: add taskname to extradata entry count
netconsole: add configfs controls for taskname sysdata feature
netconsole: add task name to extra data fields
netconsole: docs: document the task name feature
netconsole: selftest: add task name append testing
Documentation/networking/netconsole.rst | 28 +++++++
drivers/net/netconsole.c | 95 ++++++++++++++++++----
.../selftests/drivers/net/netcons_sysdata.sh | 51 ++++++++++--
3 files changed, 153 insertions(+), 21 deletions(-)
---
base-commit: 56794b5862c5a9aefcf2b703257c6fb93f76573e
change-id: 20250217-netcons_current-2c635fa5beda
prerequisite-change-id: 20250212-netdevsim-258d2d628175:v3
prerequisite-patch-id: 4ecfdbc58dd599d2358655e4ad742cbb9dde39f3
Best regards,
--
Breno Leitao <leitao(a)debian.org>
This patchset introduces a new feature to the netconsole extradata
subsystem that enables the inclusion of the current task's name in the
sysdata output of netconsole messages.
This enhancement is particularly valuable for large-scale deployments,
such as Meta's, where netconsole collects messages from millions of
servers and stores them in a data warehouse for analysis. Engineers
often rely on these messages to investigate issues and assess kernel
health.
One common challenge we face is determining the context in which
a particular message was generated. By including the task name
(task->comm) with each message, this feature provides a direct answer to
the frequently asked question: "What was running when this message was
generated?"
This added context will significantly improve our ability to diagnose
and troubleshoot issues, making it easier to interpret output of
netconsole.
The patchset consists of seven patches that implement the following changes:
* Refactor CPU number formatting into a separate function
* Prefix CPU_NR sysdata feature with SYSDATA_
* Add configfs controls for taskname sysdata feature
* Add taskname to extradata entry count
* Add support for including task name in netconsole's extra data output
* Document the task name feature in Documentation/networking/netconsole.rst
* Add test coverage for the task name feature to the existing sysdata selftest script
These changes allow users to enable or disable the task name feature via
configfs and provide additional context for kernel messages by showing
which task generated each console message.
I have tested these patches on some servers and they seem to work as
expected.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Breno Leitao (7):
netconsole: prefix CPU_NR sysdata feature with SYSDATA_
netconsole: refactor CPU number formatting into separate function
netconsole: add taskname to extradata entry count
netconsole: add configfs controls for taskname sysdata feature
netconsole: add task name to extra data fields
netconsole: docs: document the task name feature
netconsole: selftest: add task name append testing
Documentation/networking/netconsole.rst | 28 +++++++
drivers/net/netconsole.c | 98 ++++++++++++++++++----
.../selftests/drivers/net/netcons_sysdata.sh | 51 +++++++++--
3 files changed, 156 insertions(+), 21 deletions(-)
---
base-commit: bb3bb6c92e5719c0f5d7adb9d34db7e76705ac33
change-id: 20250217-netcons_current-2c635fa5beda
prerequisite-change-id: 20250212-netdevsim-258d2d628175:v3
prerequisite-patch-id: 4ecfdbc58dd599d2358655e4ad742cbb9dde39f3
Best regards,
--
Breno Leitao <leitao(a)debian.org>
The first patch fixes the incorrect locks using in bond driver.
The second patch fixes the xfrm offload feature during setup active-backup
mode. The third patch add a ipsec offload testing.
v3: move the ipsec deletion to bond_ipsec_free_sa (Cosmin Ratiu)
v2: do not turn carrier on if bond change link failed (Nikolay Aleksandrov)
move the mutex lock to a work queue (Cosmin Ratiu)
Hangbin Liu (3):
bonding: move IPsec deletion to bond_ipsec_free_sa
bonding: fix xfrm offload feature setup on active-backup mode
selftests: bonding: add ipsec offload test
drivers/net/bonding/bond_main.c | 36 ++--
drivers/net/bonding/bond_netlink.c | 16 +-
include/net/bonding.h | 1 +
.../selftests/drivers/net/bonding/Makefile | 3 +-
.../drivers/net/bonding/bond_ipsec_offload.sh | 155 ++++++++++++++++++
.../selftests/drivers/net/bonding/config | 4 +
6 files changed, 195 insertions(+), 20 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh
--
2.46.0
While taking a look at '[PATCH net] pktgen: Avoid out-of-range in
get_imix_entries' ([1]) and '[PATCH net v2] pktgen: Avoid out-of-bounds
access in get_imix_entries' ([2], [3]) and doing some tests and code review
I detected that the /proc/net/pktgen/... parsing logic does not honour the
user given buffer bounds (resulting in out-of-bounds access).
This can be observed e.g. by the following simple test (sometimes the
old/'longer' previous value is re-read from the buffer):
$ echo add_device lo@0 > /proc/net/pktgen/kpktgend_0
$ echo "min_pkt_size 12345" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 12345 max_pkt_size: 0
Result: OK: min_pkt_size=12345
$ echo -n "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 12345 max_pkt_size: 0
Result: OK: min_pkt_size=12345
$ echo "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 123 max_pkt_size: 0
Result: OK: min_pkt_size=123
So fix the out-of-bounds access (and some minor findings) and add a simple
proc_net_pktgen selftest...
Patch set splited into part I (now already applied to net-next)
- net: pktgen: replace ENOTSUPP with EOPNOTSUPP
- net: pktgen: enable 'param=value' parsing
- net: pktgen: fix hex32_arg parsing for short reads
- net: pktgen: fix 'rate 0' error handling (return -EINVAL)
- net: pktgen: fix 'ratep 0' error handling (return -EINVAL)
- net: pktgen: fix ctrl interface command parsing
- net: pktgen: fix access outside of user given buffer in pktgen_thread_write()
nd part II (this one):
- net: pktgen: use defines for the various dec/hex number parsing digits lengths
- net: pktgen: fix mix of int/long
- net: pktgen: remove extra tmp variable (re-use len instead)
- net: pktgen: remove some superfluous variable initializing
- net: pktgen: fix mpls maximum labels list parsing
- net: pktgen: fix access outside of user given buffer in pktgen_if_write()
- net: pktgen: fix mpls reset parsing
- net: pktgen: remove all superfluous index assignements
- selftest: net: add proc_net_pktgen
Regards,
Peter
Changes v7 -> v8:
- rebased on actual net-next/main
- add rev-by Simon Horman
- net: pktgen: fix mpls maximum labels list parsing
- slightly rephrase commit message, omit '/16' (suggested by Paolo Abeni)
- net: pktgen: fix mpls reset parsing
- fix c99 comment (suggested by Paolo Abeni)
- selftest: net: add proc_net_pktgen
- fix c99 comments (suggested by Paolo Abeni)
Changes v6 -> v7:
- rebased on actual net-next/main
- selftest: net: add proc_net_pktgen
- fixed conflict in tools/testing/selftests/net/config
Changes v5 -> v6:
- add rev-by Simon Horman
- drop patch 'net: pktgen: use defines for the various dec/hex number
parsing digits lengths'
- adjust to dropped patch 'net: pktgen: use defines for the various
dec/hex number parsing digits lengths'
- net: pktgen: fix mix of int/long
- fix line break (suggested by Simon Horman)
Changes v4 -> v5:
- split up patchset into part i/ii (suggested by Simon Horman)
- add rev-by Simon Horman
- net: pktgen: align some variable declarations to the most common pattern
-> net: pktgen: fix mix of int/long
- instead of align to most common pattern (int) adjust all usages to
size_t for i and max and ssize_t for len and adjust function signatures
of hex32_arg(), count_trail_chars(), num_arg() and strn_len() accordingly
- respect reverse xmas tree order for local variable declarations (where
possible without too much code churn)
- update subject line and patch description
- dropped net: pktgen: hex32_arg/num_arg error out in case no characters are
available
- keep empty hex/num arg is implicit assumed as zero value
- dropped net: pktgen: num_arg error out in case no valid character is parsed
- keep empty hex/num arg is implicit assumed as zero value
- Change patch description ('Fixes:' -> 'Addresses the following:',
suggested by Simon Horman)
- net: pktgen: remove all superfluous index assignements
- new patch (suggested by Simon Horman)
- selftest: net: add proc_net_pktgen
- addapt to dropped patch 'net: pktgen: hex32_arg/num_arg error out in case
no characters are available', empty hex/num arg is now implicit assumed as
zero value (instead of failure)
Changes v3 -> v4:
- add rev-by Simon Horman
- new patch 'net: pktgen: use defines for the various dec/hex number parsing
digits lengths' (suggested by Simon Horman)
- replace C99 comment (suggested by Paolo Abeni)
- drop available characters check in strn_len() (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: align some variable declarations to the
most common pattern' (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: remove extra tmp variable (re-use len
instead)' (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: remove some superfluous variable
initializing' (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: fix mpls maximum labels list parsing'
(suggested by Paolo Abeni)
- factored out 'net: pktgen: hex32_arg/num_arg error out in case no
characters are available' (suggested by Paolo Abeni)
- factored out 'net: pktgen: num_arg error out in case no valid character
is parsed' (suggested by Paolo Abeni)
Changes v2 -> v3:
- new patch: 'net: pktgen: fix ctrl interface command parsing'
- new patch: 'net: pktgen: fix mpls reset parsing'
- tools/testing/selftests/net/proc_net_pktgen.c:
- fix typo in change description ('v1 -> v1' and tyop)
- rename some vars to better match usage
add_loopback_0 -> thr_cmd_add_loopback_0
rm_loopback_0 -> thr_cmd_rm_loopback_0
wrong_ctrl_cmd -> wrong_thr_cmd
legacy_ctrl_cmd -> legacy_thr_cmd
ctrl_fd -> thr_fd
- add ctrl interface tests
Changes v1 -> v2:
- new patch: 'net: pktgen: fix hex32_arg parsing for short reads'
- new patch: 'net: pktgen: fix 'rate 0' error handling (return -EINVAL)'
- new patch: 'net: pktgen: fix 'ratep 0' error handling (return -EINVAL)'
- net/core/pktgen.c: additional fix get_imix_entries() and get_labels()
- tools/testing/selftests/net/proc_net_pktgen.c:
- fix typo not vs. nod (suggested by Jakub Kicinski)
- fix misaligned line (suggested by Jakub Kicinski)
- enable fomerly commented out CONFIG_XFRM dependent test (command spi),
as CONFIG_XFRM is enabled via tools/testing/selftests/net/config
CONFIG_XFRM_INTERFACE/CONFIG_XFRM_USER (suggestex by Jakub Kicinski)
- add CONFIG_NET_PKTGEN=m to tools/testing/selftests/net/config
(suggested by Jakub Kicinski)
- add modprobe pktgen to FIXTURE_SETUP() (suggested by Jakub Kicinski)
- fix some checkpatch warnings (Missing a blank line after declarations)
- shrink line length by re-naming some variables (command -> cmd,
device -> dev)
- add 'rate 0' testcase
- add 'ratep 0' testcase
[1] https://lore.kernel.org/netdev/20241006221221.3744995-1-artem.chernyshev@re…
[2] https://lore.kernel.org/netdev/20250109083039.14004-1-pchelkin@ispras.ru/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
Peter Seiderer (8):
net: pktgen: fix mix of int/long
net: pktgen: remove extra tmp variable (re-use len instead)
net: pktgen: remove some superfluous variable initializing
net: pktgen: fix mpls maximum labels list parsing
net: pktgen: fix access outside of user given buffer in
pktgen_if_write()
net: pktgen: fix mpls reset parsing
net: pktgen: remove all superfluous index assignements
selftest: net: add proc_net_pktgen
net/core/pktgen.c | 288 ++++----
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/config | 1 +
tools/testing/selftests/net/proc_net_pktgen.c | 646 ++++++++++++++++++
4 files changed, 805 insertions(+), 131 deletions(-)
create mode 100644 tools/testing/selftests/net/proc_net_pktgen.c
--
2.48.1