- Linux-kselftest-mirror - lists.linaro.org

[PATCH v2 00/32] kselftest harness and nolibc compatibility

by Thomas Weißschuh

Nolibc is useful for selftests as the test programs can be very small, and compiled with just a kernel crosscompiler, without userspace support. Currently nolibc is only usable with kselftest.h, not the more convenient to use kselftest_harness.h This series provides this compatibility by adding new features to nolibc and removing the usage of problematic features from the harness. The first half of the series are changes to the harness, the second one are for nolibc. Both parts are very independent and should go through different trees. The last patch is not meant to be applied and serves as test that everything works together correctly. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- Changes in v2: - Rebase unto v6.15-rc1 - Rename internal nolibc symbols - Handle edge case of waitpid(INT_MIN) == ESRCH - Fix arm configurations for final testing patch - Clean up global getopt.h variable declarations - Add Acks from Willy - Link to v1: https://lore.kernel.org/r/20250304-nolibc-kselftest-harness-v1-0-adca7cd231… --- Thomas Weißschuh (32): selftests: harness: Add harness selftest selftests: harness: Use C89 comment style selftests: harness: Ignore unused variant argument warning selftests: harness: Mark functions without prototypes static selftests: harness: Remove inline qualifier for wrappers selftests: harness: Remove dependency on libatomic selftests: harness: Implement test timeouts through pidfd selftests: harness: Don't set setup_completed for fixtureless tests selftests: harness: Always provide "self" and "variant" selftests: harness: Move teardown conditional into test metadata selftests: harness: Add teardown callback to test metadata selftests: harness: Stop using setjmp()/longjmp() selftests: harness: Guard includes on nolibc tools/nolibc: handle intmax_t/uintmax_t in printf tools/nolibc: use intmax definitions from compiler tools/nolibc: use pselect6_time64 if available tools/nolibc: use ppoll_time64 if available tools/nolibc: add tolower() and toupper() tools/nolibc: add _exit() tools/nolibc: add setpgrp() tools/nolibc: implement waitpid() in terms of waitid() Revert "selftests/nolibc: use waitid() over waitpid()" tools/nolibc: add dprintf() and vdprintf() tools/nolibc: add getopt() tools/nolibc: allow different write callbacks in printf tools/nolibc: allow limiting of printf destination size tools/nolibc: add snprintf() and friends selftests/nolibc: use snprintf() for printf tests selftests/nolibc: rename vfprintf test suite selftests/nolibc: add test for snprintf() truncation tools/nolibc: implement width padding in printf() HACK: selftests/nolibc: demonstrate usage of the kselftest harness tools/include/nolibc/Makefile | 1 + tools/include/nolibc/getopt.h | 101 ++ tools/include/nolibc/nolibc.h | 1 + tools/include/nolibc/stdint.h | 4 +- tools/include/nolibc/stdio.h | 127 +- tools/include/nolibc/string.h | 17 + tools/include/nolibc/sys.h | 105 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/kselftest/.gitignore | 1 + tools/testing/selftests/kselftest/Makefile | 6 + .../testing/selftests/kselftest/harness-selftest.c | 129 ++ .../selftests/kselftest/harness-selftest.expected | 62 + .../selftests/kselftest/harness-selftest.sh | 14 + tools/testing/selftests/kselftest_harness.h | 181 +- tools/testing/selftests/nolibc/Makefile | 13 +- tools/testing/selftests/nolibc/harness-selftest.c | 1 + tools/testing/selftests/nolibc/nolibc-test.c | 1729 +------------------- tools/testing/selftests/nolibc/run-tests.sh | 2 +- 18 files changed, 635 insertions(+), 1860 deletions(-) --- base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8 change-id: 20250130-nolibc-kselftest-harness-8b2c8cac43bf Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

4 months, 2 weeks

2
37
0 0

[PATCH] firmware: cs_dsp: test_bin_error: Fix uninitialized data used as fw version

by Richard Fitzgerald

Call cs_dsp_mock_xm_header_get_fw_version() to get the firmware version from the dummy XM header data in cs_dsp_bin_err_test_common_init(). Make the same change to cs_dsp_bin_test_common_init() and remove the cs_dsp_mock_xm_header_get_fw_version_from_regmap() function. The code in cs_dsp_test_bin.c was correctly calling cs_dsp_mock_xm_header_get_fw_version_from_regmap() to fetch the fw version from a dummy header it wrote to XM registers. However in cs_dsp_test_bin_error.c the test doesn't stuff a dummy header into XM, it populates it the normal way using a wmfw file. It should have called cs_dsp_mock_xm_header_get_fw_version() to get the data from its blob buffer, but was calling cs_dsp_mock_xm_header_get_fw_version_from_regmap(). As nothing had been written to the registers this returned the value of uninitialized data. The only other use of cs_dsp_mock_xm_header_get_fw_version_from_regmap() was cs_dsp_test_bin.c, but it doesn't need to use it. It already has a blob buffer containing the dummy XM header so it can use cs_dsp_mock_xm_header_get_fw_version() to read from that. Fixes: cd8c058499b6 ("firmware: cs_dsp: Add KUnit testing of bin error cases") Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com> --- .../cirrus/test/cs_dsp_mock_mem_maps.c | 30 ------------------- .../firmware/cirrus/test/cs_dsp_test_bin.c | 2 +- .../cirrus/test/cs_dsp_test_bin_error.c | 2 +- .../linux/firmware/cirrus/cs_dsp_test_utils.h | 1 - 4 files changed, 2 insertions(+), 33 deletions(-) diff --git a/drivers/firmware/cirrus/test/cs_dsp_mock_mem_maps.c b/drivers/firmware/cirrus/test/cs_dsp_mock_mem_maps.c index 161272e47bda..73412bcef50c 100644 --- a/drivers/firmware/cirrus/test/cs_dsp_mock_mem_maps.c +++ b/drivers/firmware/cirrus/test/cs_dsp_mock_mem_maps.c @@ -461,36 +461,6 @@ unsigned int cs_dsp_mock_xm_header_get_alg_base_in_words(struct cs_dsp_test *pri } EXPORT_SYMBOL_NS_GPL(cs_dsp_mock_xm_header_get_alg_base_in_words, "FW_CS_DSP_KUNIT_TEST_UTILS"); -/** - * cs_dsp_mock_xm_header_get_fw_version_from_regmap() - Firmware version. - * - * @priv: Pointer to struct cs_dsp_test. - * - * Return: Firmware version word value. - */ -unsigned int cs_dsp_mock_xm_header_get_fw_version_from_regmap(struct cs_dsp_test *priv) -{ - unsigned int xm = cs_dsp_mock_base_addr_for_mem(priv, WMFW_ADSP2_XM); - union { - struct wmfw_id_hdr adsp2; - struct wmfw_v3_id_hdr halo; - } hdr; - - switch (priv->dsp->type) { - case WMFW_ADSP2: - regmap_raw_read(priv->dsp->regmap, xm, &hdr.adsp2, sizeof(hdr.adsp2)); - return be32_to_cpu(hdr.adsp2.ver); - case WMFW_HALO: - regmap_raw_read(priv->dsp->regmap, xm, &hdr.halo, sizeof(hdr.halo)); - return be32_to_cpu(hdr.halo.ver); - default: - KUNIT_FAIL(priv->test, NULL); - return 0; - } -} -EXPORT_SYMBOL_NS_GPL(cs_dsp_mock_xm_header_get_fw_version_from_regmap, - "FW_CS_DSP_KUNIT_TEST_UTILS"); - /** * cs_dsp_mock_xm_header_get_fw_version() - Firmware version. * diff --git a/drivers/firmware/cirrus/test/cs_dsp_test_bin.c b/drivers/firmware/cirrus/test/cs_dsp_test_bin.c index 1e161bbc5b4a..163b7faecff4 100644 --- a/drivers/firmware/cirrus/test/cs_dsp_test_bin.c +++ b/drivers/firmware/cirrus/test/cs_dsp_test_bin.c @@ -2198,7 +2198,7 @@ static int cs_dsp_bin_test_common_init(struct kunit *test, struct cs_dsp *dsp) priv->local->bin_builder = cs_dsp_mock_bin_init(priv, 1, - cs_dsp_mock_xm_header_get_fw_version_from_regmap(priv)); + cs_dsp_mock_xm_header_get_fw_version(xm_hdr)); KUNIT_ASSERT_NOT_ERR_OR_NULL(test, priv->local->bin_builder); /* We must provide a dummy wmfw to load */ diff --git a/drivers/firmware/cirrus/test/cs_dsp_test_bin_error.c b/drivers/firmware/cirrus/test/cs_dsp_test_bin_error.c index 8748874f0552..a7ec956d2724 100644 --- a/drivers/firmware/cirrus/test/cs_dsp_test_bin_error.c +++ b/drivers/firmware/cirrus/test/cs_dsp_test_bin_error.c @@ -451,7 +451,7 @@ static int cs_dsp_bin_err_test_common_init(struct kunit *test, struct cs_dsp *ds local->bin_builder = cs_dsp_mock_bin_init(priv, 1, - cs_dsp_mock_xm_header_get_fw_version_from_regmap(priv)); + cs_dsp_mock_xm_header_get_fw_version(local->xm_header)); KUNIT_ASSERT_NOT_ERR_OR_NULL(test, local->bin_builder); /* Init cs_dsp */ diff --git a/include/linux/firmware/cirrus/cs_dsp_test_utils.h b/include/linux/firmware/cirrus/cs_dsp_test_utils.h index 4f87a908ab4f..ecd821ed8064 100644 --- a/include/linux/firmware/cirrus/cs_dsp_test_utils.h +++ b/include/linux/firmware/cirrus/cs_dsp_test_utils.h @@ -104,7 +104,6 @@ unsigned int cs_dsp_mock_num_dsp_words_to_num_packed_regs(unsigned int num_dsp_w unsigned int cs_dsp_mock_xm_header_get_alg_base_in_words(struct cs_dsp_test *priv, unsigned int alg_id, int mem_type); -unsigned int cs_dsp_mock_xm_header_get_fw_version_from_regmap(struct cs_dsp_test *priv); unsigned int cs_dsp_mock_xm_header_get_fw_version(struct cs_dsp_mock_xm_header *header); void cs_dsp_mock_xm_header_drop_from_regmap_cache(struct cs_dsp_test *priv); int cs_dsp_mock_xm_header_write_to_regmap(struct cs_dsp_mock_xm_header *header); -- 2.39.5

4 months, 2 weeks

2
1
0 0

[PATCH v3 2/2] rcutorture: Fix issue with re-using old images on ARM64

by Joel Fernandes

On ARM64, when running with --configs '36*SRCU-P', I noticed that only 1 instance instead of 36 for starting. Fix it by checking for Image files, instead of bzImage which ARM does not seem to have. With this I see all 36 instances running at the same time in the batch. Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com> --- tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh index ad79784e552d..957800c9ffba 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh @@ -73,7 +73,7 @@ config_override_param "$config_dir/CFcommon.$(uname -m)" KcList \ cp $T/KcList $resdir/ConfigFragment base_resdir=`echo $resdir | sed -e 's/\.[0-9]\+$//'` -if test "$base_resdir" != "$resdir" && test -f $base_resdir/bzImage && test -f $base_resdir/vmlinux +if test "$base_resdir" != "$resdir" && (test -f $base_resdir/bzImage || test -f $base_resdir/Image) && test -f $base_resdir/vmlinux then # Rerunning previous test, so use that test's kernel. QEMU="`identify_qemu $base_resdir/vmlinux`" -- 2.43.0

4 months, 2 weeks

1
0
0 0

[PATCH bpf-next v1 0/4] bpf, sockmap: Fix data loss and panic issues

by Jiayuan Chen

I was writing a benchmark based on sockmap + TCP and discovered several issues: 1. When EAGAIN occurs, the direction of skb is incorrect, causing data loss when retry. 2. When sending partial data, the offset is not recorded, leading to duplicate data being sent when retry. 3. An unexpected BUG_ON() judgment in skb_linearize is triggered. 4. The memory of psock->ingress_skb is not limited by the socket buffer and memcg. Issues 1, 2, and 3 are described in each patch's commit message. Regarding issue 4, this patchset does not cover it as it is difficult to handle in practice, and I am still working on it. Here is a brief description of the issue: When using sockmap to skb/stream redirect, if the receiving end does not perform read operations, all data will be buffered in ingress_skb. For example: ''' // set memory limit to 50G cgcreate -g memory:myGroup cgset -r memory.max="5000M" myGroup // start benchmark and disable consumer from reading cgexec -g "memory:myGroup" ./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress --delay-consumer=-1 -d 100 Iter 0 ( 29.179us): Send Speed 2668.548 MB/s (20360.406 calls/s), ... Rcv Speed 0.000 MB/s ( 0.000 calls/s) Iter 1 ( -7.237us): Send Speed 2694.467 MB/s (20557.149 calls/s), ... Rcv Speed 0.000 MB/s ( 0.000 calls/s) Iter 2 ( -1.918us): Send Speed 2693.404 MB/s (20548.039 calls/s), ... Rcv Speed 0.000 MB/s ( 0.000 calls/s) Iter 3 ( -0.684us): Send Speed 2693.138 MB/s (20548.014 calls/s), ... Rcv Speed 0.000 MB/s ( 0.000 calls/s) Iter 4 ( 7.879us): Send Speed 2698.620 MB/s (20588.838 calls/s), ... Rcv Speed 0.000 MB/s ( 0.000 calls/s) Iter 5 ( -3.224us): Send Speed 2696.553 MB/s (20573.066 calls/s), ... Rcv Speed 0.000 MB/s ( 0.000 calls/s) Iter 6 ( -5.409us): Send Speed 2699.705 MB/s (20597.111 calls/s), ... Rcv Speed 0.000 MB/s ( 0.000 calls/s) Iter 7 ( -0.439us): Send Speed 2699.691 MB/s (20597.009 calls/s), ... Rcv Speed 0.000 MB/s ( 0.000 calls/s) ... // memory usage are not limited cat /proc/slabinfo | grep skb skbuff_small_head 11824024 11824024 704 46 8 : tunables 0 0 0 : slabdata 257044 257044 0 skbuff_fclone_cache 11822080 11822080 512 32 4 : tunables 0 0 0 : slabdata 369440 369440 0 ''' Thus, a simple socket in a large file upload/download model can eat the entire OS memory. We must charge the skb memory to psock->sk, and if we do not want losing skb, we need to feedback the error info to read_sock/read_skb when the enqueue operation of psock->ingress_skb fails. --- My another patch related to stability also requires maintainers to spare some time from their busy schedules for review. https://lore.kernel.org/bpf/20250317092257.68760-1-jiayuan.chen@linux.dev/T… Jiayuan Chen (4): bpf, sockmap: Fix data lost during EAGAIN retries bpf, sockmap: fix duplicated data transmission bpf, sockmap: Fix panic when calling skb_linearize selftest/bpf/benchs: Add benchmark for sockmap usage net/core/skmsg.c | 48 +- tools/testing/selftests/bpf/Makefile | 2 + tools/testing/selftests/bpf/bench.c | 4 + .../selftests/bpf/benchs/bench_sockmap.c | 599 ++++++++++++++++++ .../selftests/bpf/progs/bench_sockmap_prog.c | 65 ++ 5 files changed, 697 insertions(+), 21 deletions(-) create mode 100644 tools/testing/selftests/bpf/benchs/bench_sockmap.c create mode 100644 tools/testing/selftests/bpf/progs/bench_sockmap_prog.c -- 2.47.1

4 months, 2 weeks

3
6
0 0

[PATCH bpf-next v4 0/3] bpf: Fix use-after-free of sockmap

by Jiayuan Chen

Syzkaller reported this issue [1]. The current sockmap has a dependency on sk_socket in both read and write stages, but there is a possibility that sk->sk_socket is released during the process, leading to panic situations. For a detailed reproduction, please refer to the description in the v2: https://lore.kernel.org/bpf/20250228055106.58071-1-jiayuan.chen@linux.dev/ The corresponding fix approaches are described in the commit messages of each patch. By the way, the current sockmap lacks statistical information, especially global statistics, such as the number of successful or failed rx and tx operations. These statistics cannot be obtained from the socket interface itself. These data will be of great help in troubleshooting issues and observing sockmap behavior. If the maintainer/reviewer does not object, I think we can provide these statistical information in the future, either through proc/trace/bpftool. [1] https://syzkaller.appspot.com/bug?extid=dd90a702f518e0eac072 --- v3 -> v4: 1. Rebase on -rc. 2. Incorporated valuable feedback from the v3 thread into the commit message, making it easier to review. https://lore.kernel.org/all/20250317092257.68760-3-jiayuan.chen@linux.dev/ v2 -> v3: 1. Michal Luczaj reported similar race issue under sockmap sending path. 2. Rcu lock is conflict with mutex_lock in unix socket read implementation. https://lore.kernel.org/bpf/20250228055106.58071-1-jiayuan.chen@linux.dev/ v1 -> v2: 1. Add Fixes tag. 2. Extend selftest of edge case for TCP/UDP sockets. 3. Add Reviewed-by and Acked-by tag. https://lore.kernel.org/bpf/20250226132242.52663-1-jiayuan.chen@linux.dev/T… Jiayuan Chen (3): bpf, sockmap: avoid using sk_socket after free when sending bpf, sockmap: avoid using sk_socket after free when reading selftests/bpf: Add edge case tests for sockmap net/core/skmsg.c | 22 ++++++- .../selftests/bpf/prog_tests/socket_helpers.h | 13 +++- .../selftests/bpf/prog_tests/sockmap_basic.c | 60 +++++++++++++++++++ 3 files changed, 91 insertions(+), 4 deletions(-) -- 2.47.1

4 months, 2 weeks

2
4
0 0

[PATCH bpf-next v2 0/2] bpf: fix ktls panic with sockmap and add tests

by Jiayuan Chen

We can reproduce the issue using the existing test program: './test_sockmap --ktls' Or use the selftest I provided, which will cause a panic: ------------[ cut here ]------------ kernel BUG at lib/iov_iter.c:629! PKRU: 55555554 Call Trace: <TASK> ? die+0x36/0x90 ? do_trap+0xdd/0x100 ? iov_iter_revert+0x178/0x180 ? iov_iter_revert+0x178/0x180 ? do_error_trap+0x7d/0x110 ? iov_iter_revert+0x178/0x180 ? exc_invalid_op+0x50/0x70 ? iov_iter_revert+0x178/0x180 ? asm_exc_invalid_op+0x1a/0x20 ? iov_iter_revert+0x178/0x180 ? iov_iter_revert+0x5c/0x180 tls_sw_sendmsg_locked.isra.0+0x794/0x840 tls_sw_sendmsg+0x52/0x80 ? inet_sendmsg+0x1f/0x70 __sys_sendto+0x1cd/0x200 ? find_held_lock+0x2b/0x80 ? syscall_trace_enter+0x140/0x270 ? __lock_release.isra.0+0x5e/0x170 ? find_held_lock+0x2b/0x80 ? syscall_trace_enter+0x140/0x270 ? lockdep_hardirqs_on_prepare+0xda/0x190 ? ktime_get_coarse_real_ts64+0xc2/0xd0 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x90/0x170 1. It looks like the issue started occurring after bpf being introduced to ktls and later the addition of assertions to iov_iter has caused a panic. If my fix tag is incorrect, please assist me in correcting the fix tag. 2. I make minimal changes for now, it's enough to make ktls work correctly. --- v1->v2: Added more content to the commit message https://lore.kernel.org/all/20250123171552.57345-1-mrpre@163.com/#r --- Jiayuan Chen (2): bpf: fix ktls panic with sockmap selftests/bpf: add ktls selftest net/tls/tls_sw.c | 8 +- .../selftests/bpf/prog_tests/sockmap_ktls.c | 174 +++++++++++++++++- .../selftests/bpf/progs/test_sockmap_ktls.c | 26 +++ 3 files changed, 205 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_ktls.c -- 2.47.1

4 months, 2 weeks

3
4
0 0

[PATCH] selftests/bpf: Fix bpf_nf selftest failure

by Saket Kumar Bhaskar

For systems with missing iptables-legacy tool, this selftest fails. Add check to find if iptables-legacy tool is available and skip the test if the tool is missing. Fixes: de9c8d848d90 ("selftests/bpf: S/iptables/iptables-legacy/ in the bpf_nf and xdp_synproxy test") Signed-off-by: Saket Kumar Bhaskar <skb99(a)linux.ibm.com> --- tools/testing/selftests/bpf/prog_tests/bpf_nf.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_nf.c b/tools/testing/selftests/bpf/prog_tests/bpf_nf.c index dbd13f8e42a7..dd6512fa652b 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_nf.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_nf.c @@ -63,6 +63,12 @@ static void test_bpf_nf_ct(int mode) .repeat = 1, ); + if (SYS_NOFAIL("iptables-legacy --version")) { + fprintf(stdout, "Missing required iptables-legacy tool\n"); + test__skip(); + return; + } + skel = test_bpf_nf__open_and_load(); if (!ASSERT_OK_PTR(skel, "test_bpf_nf__open_and_load")) return; -- 2.43.5

4 months, 2 weeks

2
1
0 0

[PATCH] selftests/bpf: close the file descriptor to avoid resource leaks

by Malaya Kumar Rout

Static Analyis for bench_htab_mem.c with cppcheck:error tools/testing/selftests/bpf/benchs/bench_htab_mem.c:284:3: error: Resource leak: fd [resourceLeak] tools/testing/selftests/bpf/prog_tests/sk_assign.c:41:3: error: Resource leak: tc [resourceLeak] fix the issue by closing the file descriptor (fd & tc) when read & fgets operation fails. Signed-off-by: Malaya Kumar Rout <malayarout91(a)gmail.com> --- tools/testing/selftests/bpf/benchs/bench_htab_mem.c | 1 + tools/testing/selftests/bpf/prog_tests/sk_assign.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/benchs/bench_htab_mem.c b/tools/testing/selftests/bpf/benchs/bench_htab_mem.c index 926ee822143e..59746fd2c23a 100644 --- a/tools/testing/selftests/bpf/benchs/bench_htab_mem.c +++ b/tools/testing/selftests/bpf/benchs/bench_htab_mem.c @@ -281,6 +281,7 @@ static void htab_mem_read_mem_cgrp_file(const char *name, unsigned long *value) got = read(fd, buf, sizeof(buf) - 1); if (got <= 0) { *value = 0; + close(fd); return; } buf[got] = 0; diff --git a/tools/testing/selftests/bpf/prog_tests/sk_assign.c b/tools/testing/selftests/bpf/prog_tests/sk_assign.c index 0b9bd1d6f7cc..10a0ab954b8a 100644 --- a/tools/testing/selftests/bpf/prog_tests/sk_assign.c +++ b/tools/testing/selftests/bpf/prog_tests/sk_assign.c @@ -37,8 +37,10 @@ configure_stack(void) tc = popen("tc -V", "r"); if (CHECK_FAIL(!tc)) return false; - if (CHECK_FAIL(!fgets(tc_version, sizeof(tc_version), tc))) + if (CHECK_FAIL(!fgets(tc_version, sizeof(tc_version), tc))) { + pclose(tc); return false; + } if (strstr(tc_version, ", libbpf ")) prog = "test_sk_assign_libbpf.bpf.o"; else -- 2.43.0

4 months, 2 weeks

5
10
0 0

[GIT PULL] Kselftest fixes update for Linux 6.15-rc2

by Shuah Khan

Hi Linus, Please pull the following kselftest fixes update for Linux 6.15-rc2 Fixes tpm2, futex, and mincore tests. Creates a dedicated .gitignore for tpm2 Details: selftests: tpm2: test_smoke: use POSIX-conformant expression operator selftests/futex: futex_waitv wouldblock test should fail selftests: tpm2: create a dedicated .gitignore selftests/mincore: Allow read-ahead pages to reach the end of the file diff is attached. thanks, -- Shuah ---------------------------------------------------------------- The following changes since commit 0af2f6be1b4281385b618cb86ad946eded089ac8: Linux 6.15-rc1 (2025-04-06 13:11:33 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.15-rc2 for you to fetch changes up to 197c1eaa7ba633a482ed7588eea6fd4aa57e08d4: selftests/mincore: Allow read-ahead pages to reach the end of the file (2025-04-08 17:08:50 -0600) ---------------------------------------------------------------- linux_kselftest-fixes-6.15-rc2 Fixes tpm2, futex, and mincore tests. Creates a dedicated .gitignore for tpm2 Details: selftests: tpm2: test_smoke: use POSIX-conformant expression operator selftests/futex: futex_waitv wouldblock test should fail selftests: tpm2: create a dedicated .gitignore selftests/mincore: Allow read-ahead pages to reach the end of the file ---------------------------------------------------------------- Ahmed Salem (1): selftests: tpm2: test_smoke: use POSIX-conformant expression operator Edward Liaw (1): selftests/futex: futex_waitv wouldblock test should fail Khaled Elnaggar (1): selftests: tpm2: create a dedicated .gitignore Qiuxu Zhuo (1): selftests/mincore: Allow read-ahead pages to reach the end of the file tools/testing/selftests/.gitignore | 1 - tools/testing/selftests/futex/functional/futex_wait_wouldblock.c | 2 +- tools/testing/selftests/mincore/mincore_selftest.c | 3 --- tools/testing/selftests/tpm2/.gitignore | 3 +++ tools/testing/selftests/tpm2/test_smoke.sh | 2 +- 5 files changed, 5 insertions(+), 6 deletions(-) create mode 100644 tools/testing/selftests/tpm2/.gitignore ----------------------------------------------------------------

4 months, 2 weeks

2
1
0 0

[PATCH v2] selftests: pid_namespace: Add missing sys/mount.h

by T.J. Mercier

pid_max.c: In function ‘pid_max_cb’: pid_max.c:42:15: error: implicit declaration of function ‘mount’ [-Wimplicit-function-declaration] 42 | ret = mount("", "/", NULL, MS_PRIVATE | MS_REC, 0); | ^~~~~ pid_max.c:42:36: error: ‘MS_PRIVATE’ undeclared (first use in this function); did you mean ‘MAP_PRIVATE’? 42 | ret = mount("", "/", NULL, MS_PRIVATE | MS_REC, 0); | ^~~~~~~~~~ | MAP_PRIVATE pid_max.c:42:49: error: ‘MS_REC’ undeclared (first use in this function) 42 | ret = mount("", "/", NULL, MS_PRIVATE | MS_REC, 0); | ^~~~~~ pid_max.c:48:9: error: implicit declaration of function ‘umount2’; did you mean ‘SYS_umount2’? [-Wimplicit-function-declaration] 48 | umount2("/proc", MNT_DETACH); | ^~~~~~~ | SYS_umount2 pid_max.c:48:26: error: ‘MNT_DETACH’ undeclared (first use in this function) 48 | umount2("/proc", MNT_DETACH); Fixes: 615ab43b838b ("tests/pid_namespace: add pid_max tests") Signed-off-by: T.J. Mercier <tjmercier(a)google.com> --- tools/testing/selftests/pid_namespace/pid_max.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/pid_namespace/pid_max.c b/tools/testing/selftests/pid_namespace/pid_max.c index 51c414faabb0..96f274f0582b 100644 --- a/tools/testing/selftests/pid_namespace/pid_max.c +++ b/tools/testing/selftests/pid_namespace/pid_max.c @@ -10,6 +10,7 @@ #include <stdlib.h> #include <string.h> #include <syscall.h> +#include <sys/mount.h> #include <sys/wait.h> #include "../kselftest_harness.h" -- 2.49.0.504.g3bcea36a83-goog

4 months, 2 weeks

2
2
0 0

Re: [PATCH net-next] configs/debug: run and debug PREEMPT

by Jakub Kicinski

On Tue, 8 Apr 2025 20:18:26 +0200 Matthieu Baerts wrote: > On 02/04/2025 19:23, Stanislav Fomichev wrote: > > Recent change [0] resulted in a "BUG: using __this_cpu_read() in > > preemptible" splat [1]. PREEMPT kernels have additional requirements > > on what can and can not run with/without preemption enabled. > > Expose those constrains in the debug kernels. > > Good idea to suggest this to find more bugs! > > I did some quick tests on my side with our CI, and the MPTCP selftests > seem to take a bit more time, but without impacting the results. > Hopefully, there will be no impact in slower/busy environments :) What kind of slow down do you see? I think we get up to 50% more time spent in the longer tests. Not sure how bad is too bad.. I'm leaning towards applying this to net-next and we can see if people running on linux-next complain? Let me CC kselftests, patch in question: https://lore.kernel.org/all/20250402172305.1775226-1-sdf@fomichev.me/

4 months, 2 weeks

2
2
0 0

[PATCH 0/2] ftrace: Fix subops accounting

by Steven Rostedt

A fix [1] came in that fixed the notrace_filter side of the subops processing of the function graph tracer. When I started testing that fix, I discovered that the many more functions were being enabled than were being traced. The function graph infrastructure uses ftrace to hook to functions. It has a single ftrace_ops to manage all the users of function graph. Each individual user (tracing, bpf, fprobes, etc) has its own ftrace_ops to track the functions it will have its callback called from. These ftrace_ops are "subops" to the main ftrace_ops of the function graph infrastructure. Each ftrace_ops has a filter_hash and a notrace_hash that is defined as: Only trace functions that are in the filter_hash but not in the notrace_hash. If the filter_hash is empty, it means to trace all functions. If the notrace_hash is empty, it means do not disable any function. The function graph main ftrace_ops needs to be a superset containing all the functions to be traced by all the subops it has. The algorithm to perform this merge was incorrect. It was merging the filter_hashes of all the subops and taking the intersect of all the notrace_hashes of the subops. But by taking the intersect of all the notrace_hashes it ignored how those notrace_hashes are dependent on the associated filter_hashes of each individual subops. Instead, modify the algorithm to be a bit simpler and correct. First, when adding a new subops, do not add the notrace_hash if the filter_hash is not empty. Instead, just add the functions that are in the filter_hash of the subops but not in the notrace_hash of the subops into the main ops filter_hash. There's no reason to add anything to the main ops notrace_hash for this case. The notrace_hash of the main ops should only be non empty iff all subops filter_hashes are empty (meaning to trace all functions) and all subops notrace_hashes have the same functions. That is, the main ops notrace_hash is empty if any subops filter_hash is non empty. The main ops notrace_hash only has content in it if all subops filter_hashes are empty, and the content are only functions that intersect all the subops notrace_hashes. If any subops notrace_hash is empty, then so is the main ops notrace_hash. [1] https://lore.kernel.org/all/20250408160258.48563-1-andybnac@gmail.com/ Steven Rostedt (2): ftrace: Fix accounting of subop hashes tracing/selftest: Add test to better test subops filtering of function graph ---- kernel/trace/ftrace.c | 314 ++++++++++++--------- .../ftrace/test.d/ftrace/fgraph-multi-filter.tc | 177 ++++++++++++ 2 files changed, 354 insertions(+), 137 deletions(-) create mode 100644 tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi-filter.tc

4 months, 2 weeks

1
2
0 0

Re: [PATCH] ASoC: cs-amp-lib-test: Don't select SND_SOC_CS_AMP_LIB

by Richard Fitzgerald

On 09/04/2025 3:24 pm, Mark Brown wrote: > On Wed, Apr 09, 2025 at 11:45:44AM +0100, Richard Fitzgerald wrote: >> Depend on SND_SOC_CS_AMP_LIB instead of selecting it. >> >> KUNIT_ALL_TESTS should only build tests for components that are >> already being built, it should not cause other stuff to be added >> to the build. > >> config SND_SOC_CS_AMP_LIB_TEST >> - tristate "KUnit test for Cirrus Logic cs-amp-lib" >> - depends on KUNIT >> + tristate "KUnit test for Cirrus Logic cs-amp-lib" if !KUNIT_ALL_TESTS >> + depends on SND_SOC_CS_AMP_LIB && KUNIT >> default KUNIT_ALL_TESTS >> - select SND_SOC_CS_AMP_LIB >> help >> This builds KUnit tests for the Cirrus Logic common >> amplifier library. > > This by itself results in the Cirrus tests being removed from a kunit > --alltests run which is a regression in coverage. I'd expect to see > some corresponding updates in the KUnit all_tests.config to keep them > enabled. That's the defined behaviour of KUNIT_ALL_TESTS. It shouldn't have been running as part of an alltests if nothing had selected it. That seems to make people angry. Probably the same people who would complain if there was a bug in the code that they didn't want to test.

4 months, 2 weeks

2
1
0 0

[PATCH net-next 0/6] xfrm & bonding: Correct use of xso.real_dev

by Cosmin Ratiu

This patch series was motivated by fixing a few bugs in the bonding driver related to xfrm state migration on device failover. struct xfrm_dev_offload has two net_device pointers: dev and real_dev. The first one is the device the xfrm_state is offloaded on and the second one is used by the bonding driver to manage the underlying device xfrm_states are actually offloaded on. When bonding isn't used, the two pointers are the same. This causes confusion in drivers: Which device pointer should they use? If they want to support bonding, they need to only use real_dev and never look at dev. Furthermore, real_dev is used without proper locking from multiple code paths and changing it is dangerous. See commit [1] for example. This patch series clears things out by removing all uses of real_dev from outside the bonding driver. Then, the bonding driver is refactored to fix a couple of long standing races and the original bug which motivated this patch series. [1] commit f8cde9805981 ("bonding: fix xfrm real_dev null pointer dereference") Cosmin Ratiu (6): Cleaning up unnecessary uses of xso.real_dev: net/mlx5: Avoid using xso.real_dev unnecessarily xfrm: Use xdo.dev instead of xdo.real_dev xfrm: Remove unneeded device check from validate_xmit_xfrm Refactoring device operations to get an explicit device pointer: xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free} Fixing a bonding xfrm state migration bug: bonding: Mark active offloaded xfrm_states Fixing long standing races in bonding: bonding: Fix multiple long standing offload races Documentation/networking/xfrm_device.rst | 10 +- drivers/net/bonding/bond_main.c | 93 +++++++++++-------- .../net/ethernet/chelsio/cxgb4/cxgb4_main.c | 20 ++-- .../inline_crypto/ch_ipsec/chcr_ipsec.c | 18 ++-- .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 40 ++++---- drivers/net/ethernet/intel/ixgbevf/ipsec.c | 20 ++-- .../marvell/octeontx2/nic/cn10k_ipsec.c | 18 ++-- .../mellanox/mlx5/core/en_accel/ipsec.c | 28 +++--- .../mellanox/mlx5/core/en_accel/ipsec.h | 1 + .../net/ethernet/netronome/nfp/crypto/ipsec.c | 11 +-- drivers/net/netdevsim/ipsec.c | 15 ++- include/linux/netdevice.h | 10 +- include/net/xfrm.h | 8 ++ net/xfrm/xfrm_device.c | 13 +-- net/xfrm/xfrm_state.c | 16 ++-- 15 files changed, 175 insertions(+), 146 deletions(-) -- 2.45.0

4 months, 2 weeks

5
12
0 0

[PATCH] selftests/x86/lam: fix memory leak and resource leak in lam.c

by Malaya Kumar Rout

Static Analyis for bench_htab_mem.c with cppcheck:error tools/testing/selftests/x86/lam.c:585:3: error: Resource leak: file_fd [resourceLeak] tools/testing/selftests/x86/lam.c:593:3: error: Resource leak: file_fd [resourceLeak] tools/testing/selftests/x86/lam.c:600:3: error: Memory leak: fi [memleak] tools/testing/selftests/x86/lam.c:1066:2: error: Resource leak: fd [resourceLeak] fix the issue by closing the file descriptors and releasing the allocated memory. Signed-off-by: Malaya Kumar Rout <malayarout91(a)gmail.com> --- tools/testing/selftests/x86/lam.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c index 4d4a76532dc9..0b43b83ad142 100644 --- a/tools/testing/selftests/x86/lam.c +++ b/tools/testing/selftests/x86/lam.c @@ -581,24 +581,28 @@ int do_uring(unsigned long lam) if (file_fd < 0) return 1; - if (fstat(file_fd, &st) < 0) + if (fstat(file_fd, &st) < 0) { + close(file_fd); return 1; - + } off_t file_sz = st.st_size; int blocks = (int)(file_sz + URING_BLOCK_SZ - 1) / URING_BLOCK_SZ; fi = malloc(sizeof(*fi) + sizeof(struct iovec) * blocks); - if (!fi) + if (!fi) { + close(file_fd); return 1; - + } fi->file_sz = file_sz; fi->file_fd = file_fd; ring = malloc(sizeof(*ring)); - if (!ring) + if (!ring) { + close(file_fd); + free(fi); return 1; - + } memset(ring, 0, sizeof(struct io_ring)); if (setup_io_uring(ring)) @@ -1060,8 +1064,10 @@ void *allocate_dsa_pasid(void) wq = mmap(NULL, 0x1000, PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd, 0); - if (wq == MAP_FAILED) + if (wq == MAP_FAILED) { + close(fd); perror("mmap"); + } return wq; } -- 2.43.0

4 months, 2 weeks

4
10
0 0

[PATCH] selftests/mm: Fix compiler -Wmaybe-uninitialized warning

by Anshuman Khandual

Following build warning comes up for cow test as 'transferred' variable has not been initialized. Fix the warning via zero init for the variable. CC cow cow.c: In function ‘do_test_vmsplice_in_parent’: cow.c:365:61: warning: ‘transferred’ may be used uninitialized [-Wmaybe-uninitialized] 365 | cur = read(fds[0], new + total, transferred - total); | ~~~~~~~~~~~~^~~~~~~ cow.c:296:29: note: ‘transferred’ was declared here 296 | ssize_t cur, total, transferred; | ^~~~~~~~~~~ CC compaction_test CC gup_longterm Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Shuah Khan <shuah(a)kernel.org> Cc: linux-mm(a)kvack.org Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual(a)arm.com> --- tools/testing/selftests/mm/cow.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/cow.c b/tools/testing/selftests/mm/cow.c index f0cb14ea8608..b6cfe0a4b7df 100644 --- a/tools/testing/selftests/mm/cow.c +++ b/tools/testing/selftests/mm/cow.c @@ -293,7 +293,7 @@ static void do_test_vmsplice_in_parent(char *mem, size_t size, .iov_base = mem, .iov_len = size, }; - ssize_t cur, total, transferred; + ssize_t cur, total, transferred = 0; struct comm_pipes comm_pipes; char *old, *new; int ret, fds[2]; -- 2.43.0

4 months, 2 weeks

2
6
0 0

[RFC bpf-next 00/13] bpf: Introduce modular verifier

by Daniel Xu

This patchset adds the base infrastructure for modular BPF verifier. The motivation remains unchanged from the LSFMMBPF25 proposal [0]. However, the design has diverged. Rather than immediately going for the facade described in [0], we instead make a stop first at the continously exported copies of the verifier in an out-of-tree repository, with a separate copy for each kernel release. Each copy will receive as many verifier backports as possible within the "boundary" of the modular portions. For example, a patch that changes the verifier at the same time as one of the kernel symbols it depends on cannot be applied, as at runtime only the verifier portion can be updated. However, a patch that only changes verifier.c can be applied, as it's within the boundary. Rough analysis of past data shows that most verifier changes fall within the latter category. The jupyter notebook for this can be found here [1]. From here, we'll gradually enlarge the "boundary" to enable backports of more and more patches, with the north star being the facade as described in the proposal. Ideally, completion of the facade will render the out-of-tree repository useless. [0]: https://lore.kernel.org/bpf/nahst74z46ov7ii3vmriyhk25zo6tkf2f3hsulzjzselvob… [1]: https://github.com/danobi/verifier-analysis/blob/master/analysis.ipynb Daniel Xu (13): bpf: Move bpf_prog_ctx_arg_info_init() body into header bpf: Move BTF related globals out of verifier.c bpf: Move percpu memory allocator definition into core bpf: Move bpf_check_attach_target() to core bpf: Remove map_set_for_each_callback_args callback for maps bpf: Move kfunc definitions out of verifier.c bpf: Make bpf_free_kfunc_btf_tab() static in core selftests: bpf: Avoid attaching to bpf_check() perf: Export perf_snapshot_branch_stack static key bpf: verifier: Add indirection to kallsyms_lookup_name() treewide: bpf: Export symbols used by verifier bpf: verifier: Make verifier loadable bpf: Supporting building verifier.ko out-of-tree arch/x86/net/bpf_jit_comp.c | 2 + drivers/media/rc/bpf-lirc.c | 1 + fs/bpf_fs_kfuncs.c | 4 + include/linux/bpf.h | 82 ++- include/linux/bpf_verifier.h | 7 - include/linux/btf.h | 4 + kernel/bpf/Kbuild | 8 + kernel/bpf/Kconfig | 12 + kernel/bpf/Makefile | 3 +- kernel/bpf/arraymap.c | 2 - kernel/bpf/bpf_iter.c | 1 + kernel/bpf/bpf_lsm.c | 5 + kernel/bpf/bpf_struct_ops.c | 2 + kernel/bpf/btf.c | 61 +- kernel/bpf/cgroup.c | 4 + kernel/bpf/core.c | 463 ++++++++++++++++ kernel/bpf/disasm.c | 4 + kernel/bpf/hashtab.c | 4 - kernel/bpf/helpers.c | 2 + kernel/bpf/local_storage.c | 2 + kernel/bpf/log.c | 12 + kernel/bpf/map_iter.c | 1 + kernel/bpf/memalloc.c | 3 + kernel/bpf/offload.c | 10 + kernel/bpf/syscall.c | 52 +- kernel/bpf/tnum.c | 20 + kernel/bpf/token.c | 1 + kernel/bpf/trampoline.c | 5 + kernel/bpf/verifier.c | 521 ++---------------- kernel/events/callchain.c | 3 + kernel/events/core.c | 1 + kernel/trace/bpf_trace.c | 9 + lib/error-inject.c | 2 + net/core/filter.c | 26 + net/core/xdp.c | 2 + net/netfilter/nf_bpf_link.c | 1 + .../selftests/bpf/progs/exceptions_assert.c | 2 +- .../selftests/bpf/progs/exceptions_fail.c | 4 +- 38 files changed, 834 insertions(+), 514 deletions(-) create mode 100644 kernel/bpf/Kbuild -- 2.47.1

4 months, 2 weeks

1
1
0 0

[PATCH 1/1] selftests/mincore: Allow read-ahead pages to reach the end of the file

by Qiuxu Zhuo

When running the mincore_selftest on a system with an XFS file system, it failed the "check_file_mmap" test case due to the read-ahead pages reaching the end of the file. The failure log is as below: RUN global.check_file_mmap ... mincore_selftest.c:264:check_file_mmap:Expected i (1024) < vec_size (1024) mincore_selftest.c:265:check_file_mmap:Read-ahead pages reached the end of the file check_file_mmap: Test failed FAIL global.check_file_mmap This is because the read-ahead window size of the XFS file system on this machine is 4 MB, which is larger than the size from the #PF address to the end of the file. As a result, all the pages for this file are populated. blockdev --getra /dev/nvme0n1p5 8192 blockdev --getbsz /dev/nvme0n1p5 512 This issue can be fixed by extending the current FILE_SIZE 4MB to a larger number, but it will still fail if the read-ahead window size of the file system is larger enough. Additionally, in the real world, read-ahead pages reaching the end of the file can happen and is an expected behavior. Therefore, allowing read-ahead pages to reach the end of the file is a better choice for the "check_file_mmap" test case. Reported-by: Yi Lai <yi1.lai(a)intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo(a)intel.com> --- tools/testing/selftests/mincore/mincore_selftest.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/tools/testing/selftests/mincore/mincore_selftest.c b/tools/testing/selftests/mincore/mincore_selftest.c index e949a43a6145..efabfcbe0b49 100644 --- a/tools/testing/selftests/mincore/mincore_selftest.c +++ b/tools/testing/selftests/mincore/mincore_selftest.c @@ -261,9 +261,6 @@ TEST(check_file_mmap) TH_LOG("No read-ahead pages found in memory"); } - EXPECT_LT(i, vec_size) { - TH_LOG("Read-ahead pages reached the end of the file"); - } /* * End of the readahead window. The rest of the pages shouldn't * be in memory. -- 2.17.1

4 months, 2 weeks

3
3
0 0

[GIT PULL] kunit fixes update for Linux 6.15-rc2

by Shuah Khan

Hi Linus, Please pull the following kunit fixes update for Linux 6.15-rc2 Fixes tool to report test count in case of a late test plan when tests are specified before the test plan. Fixes spelling error in the commit that went into 6.15-rc1. diff is attached. thanks, -- Shuah ---------------------------------------------------------------- The following changes since commit 0af2f6be1b4281385b618cb86ad946eded089ac8: Linux 6.15-rc1 (2025-04-06 13:11:33 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-kunit-6.15-rc2 for you to fetch changes up to d1be0cf3b8aeae75bc8fff5b7a3e01ebfe276008: kunit: Spelling s/slowm/slow/ (2025-04-08 14:57:24 -0600) ---------------------------------------------------------------- linux_kselftest-kunit-6.15-rc2 Fixes tool to report test count in case of a late test plan when tests are specified before the test plan. Fixes spelling error in the commit that went into 6.15-rc1. ---------------------------------------------------------------- Geert Uytterhoeven (1): kunit: Spelling s/slowm/slow/ Rae Moar (1): kunit: tool: fix count of tests if late test plan include/kunit/test.h | 2 +- tools/testing/kunit/kunit_parser.py | 4 ++++ tools/testing/kunit/kunit_tool_test.py | 4 ++-- 3 files changed, 7 insertions(+), 3 deletions(-) ----------------------------------------------------------------

4 months, 2 weeks

2
1
0 0

[PATCH net-next] selftests: tc-testing: Pre-load IFE action and its submodules

by Victor Nogueira

Recently we had some issues in parallel TDC where some of IFE tests are failing due to some of IFE's submodules (like act_meta_skbtcindex and act_meta_skbprio) taking too long to load [1]. To avoid that issue, pre-load IFE and all its submodules before running any of the tests in tdc.sh [1] https://lore.kernel.org/netdev/e909b2a0-244e-4141-9fa9-1b7d96ab7d71@mojatat… Signed-off-by: Victor Nogueira <victor(a)mojatatu.com> --- tools/testing/selftests/tc-testing/tdc.sh | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/tc-testing/tdc.sh b/tools/testing/selftests/tc-testing/tdc.sh index cddff1772e10..589b18ed758a 100755 --- a/tools/testing/selftests/tc-testing/tdc.sh +++ b/tools/testing/selftests/tc-testing/tdc.sh @@ -31,6 +31,10 @@ try_modprobe act_skbedit try_modprobe act_skbmod try_modprobe act_tunnel_key try_modprobe act_vlan +try_modprobe act_ife +try_modprobe act_meta_mark +try_modprobe act_meta_skbtcindex +try_modprobe act_meta_skbprio try_modprobe cls_basic try_modprobe cls_bpf try_modprobe cls_cgroup -- 2.49.0

4 months, 2 weeks

2
1
0 0

[PATCH net 0/2] mptcp: only inc MPJoinAckHMacFailure for HMAC failures

by Matthieu Baerts (NGI0)

Recently, during a debugging session using local MPTCP connections, I noticed MPJoinAckHMacFailure was strangely not zero on the server side. The first patch fixes this issue -- present since v5.9 -- and the second one validates it in the selftests. Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> --- Matthieu Baerts (NGI0) (2): mptcp: only inc MPJoinAckHMacFailure for HMAC failures selftests: mptcp: validate MPJoin HMacFailure counters net/mptcp/subflow.c | 8 ++++++-- tools/testing/selftests/net/mptcp/mptcp_join.sh | 18 ++++++++++++++++++ 2 files changed, 24 insertions(+), 2 deletions(-) --- base-commit: 61f96e684edd28ca40555ec49ea1555df31ba619 change-id: 20250407-net-mptcp-hmac-failure-mib-66f599305ff3 Best regards, -- Matthieu Baerts (NGI0) <matttbe(a)kernel.org>

4 months, 2 weeks

3
5
0 0

[PATCH] tests/pid_namespace: Add missing sys/mount.h

by T.J. Mercier

pid_max.c: In function ‘pid_max_cb’: pid_max.c:42:15: error: implicit declaration of function ‘mount’ [-Wimplicit-function-declaration] 42 | ret = mount("", "/", NULL, MS_PRIVATE | MS_REC, 0); | ^~~~~ pid_max.c:42:36: error: ‘MS_PRIVATE’ undeclared (first use in this function); did you mean ‘MAP_PRIVATE’? 42 | ret = mount("", "/", NULL, MS_PRIVATE | MS_REC, 0); | ^~~~~~~~~~ | MAP_PRIVATE pid_max.c:42:49: error: ‘MS_REC’ undeclared (first use in this function) 42 | ret = mount("", "/", NULL, MS_PRIVATE | MS_REC, 0); | ^~~~~~ pid_max.c:48:9: error: implicit declaration of function ‘umount2’; did you mean ‘SYS_umount2’? [-Wimplicit-function-declaration] 48 | umount2("/proc", MNT_DETACH); | ^~~~~~~ | SYS_umount2 pid_max.c:48:26: error: ‘MNT_DETACH’ undeclared (first use in this function) 48 | umount2("/proc", MNT_DETACH); Fixes: 615ab43b838b ("tests/pid_namespace: add pid_max tests") Signed-off-by: T.J. Mercier <tjmercier(a)google.com> --- tools/testing/selftests/pid_namespace/pid_max.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/pid_namespace/pid_max.c b/tools/testing/selftests/pid_namespace/pid_max.c index 51c414faabb0..96f274f0582b 100644 --- a/tools/testing/selftests/pid_namespace/pid_max.c +++ b/tools/testing/selftests/pid_namespace/pid_max.c @@ -10,6 +10,7 @@ #include <stdlib.h> #include <string.h> #include <syscall.h> +#include <sys/mount.h> #include <sys/wait.h> #include "../kselftest_harness.h" -- 2.49.0.504.g3bcea36a83-goog

4 months, 2 weeks

2
2
0 0

[PATCH] selftests/futex: futex_waitv wouldblock test should fail

by Edward Liaw

Testcase should fail if -EWOULDBLOCK is not returned when expected value differs from actual value from the waiter. Fixes: 9d57f7c79748920636f8293d2f01192d702fe390 ("selftests: futex: Test sys_futex_waitv() wouldblock") Signed-off-by: Edward Liaw <edliaw(a)google.com> --- .../testing/selftests/futex/functional/futex_wait_wouldblock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c b/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c index 7d7a6a06cdb7..2d8230da9064 100644 --- a/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c +++ b/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c @@ -98,7 +98,7 @@ int main(int argc, char *argv[]) info("Calling futex_waitv on f1: %u @ %p with val=%u\n", f1, &f1, f1+1); res = futex_waitv(&waitv, 1, 0, &to, CLOCK_MONOTONIC); if (!res || errno != EWOULDBLOCK) { - ksft_test_result_pass("futex_waitv returned: %d %s\n", + ksft_test_result_fail("futex_waitv returned: %d %s\n", res ? errno : res, res ? strerror(errno) : ""); ret = RET_FAIL; -- 2.49.0.504.g3bcea36a83-goog

4 months, 2 weeks

4
3
0 0

[RFC PATCH v1 nf-next] selftests: netfilter: Add bridge_fastpath.sh

by Eric Woudstra

Add a script to test various scenarios where a bridge is involved in the fastpath. It runs tests in the forward path, and also in a bridged path. The setup is similar to a basic home router with multiple lan ports. It uses 3 pairs of veth-devices. Each or all pairs can be replaced by a pair of real interfaces, interconnected by wire. This is necessary to test the behavior when dealing with dsa ports, foreign (dsa) ports and switchdev userports that support SWITCHDEV_OBJ_ID_PORT_VLAN. See the head of the script for a detailed description. Run without arguments to perform all tests on veth-devices. Signed-off-by: Eric Woudstra <ericwouds(a)gmail.com> --- This test script is written first for the proposed bridge-fastpath patch-sets, but it's use is more general and can easily be expanded. Because the development of this script has helped me find and fix a few issues in my last version of the patches needed for bridge-fastpath, I am sending the whole set again (split up in smaller patch-sets), including the latest fixes. Some example outputs of this last version of patches from different hardware, without and with patches: ALL VETH: ========= ./bridge_fastpath.sh -t Setup: CLIENT 0 veth0cl | veth0rt WAN ROUTER LAN1 LAN2 veth1rt veth2rt | | veth1cl veth2cl CLIENT 1 CLIENT 2 Without patches: PASS: unaware bridge, without encaps, without fastpath PASS: unaware bridge, with single vlan encap, without fastpath ERROR: unaware bridge, with double q vlan encaps, without fastpath: ipv4/6: established bytes 0 < 4194304 ERROR: unaware bridge, with 802.1ad vlan encaps, without fastpath: ipv4/6: established bytes 0 < 4194304 PASS: aware bridge, without/without vlan encap, without fastpath PASS: aware bridge, with/without vlan encap, without fastpath PASS: aware bridge, with/with vlan encap, without fastpath PASS: aware bridge, without/with vlan encap, without fastpath PASS: forward, without vlan-device, without vlan encap, client1, without fastpath PASS: forward, without vlan-device, without vlan encap, client1, with fastpath PASS: forward, without vlan-device, with vlan encap, client1, without fastpath ERROR: forward, without vlan-device, with vlan encap, client1, with fastpath: ipv4/6: tcp broken PASS: forward, with vlan-device, with vlan encap, client1, without fastpath PASS: forward, with vlan-device, with vlan encap, client1, with fastpath PASS: forward, with vlan-device, without vlan encap, client1, without fastpath PASS: forward, with vlan-device, without vlan encap, client1, with fastpath ERROR: bridge fastpath test has failed With patches: PASS: unaware bridge, without encaps, without fastpath PASS: unaware bridge, without encaps, with fastpath PASS: unaware bridge, with single vlan encap, without fastpath PASS: unaware bridge, with single vlan encap, with fastpath PASS: unaware bridge, with double q vlan encaps, without fastpath PASS: unaware bridge, with double q vlan encaps, with fastpath PASS: unaware bridge, with 802.1ad vlan encaps, without fastpath PASS: unaware bridge, with 802.1ad vlan encaps, with fastpath PASS: aware bridge, without/without vlan encap, without fastpath PASS: aware bridge, without/without vlan encap, with fastpath PASS: aware bridge, with/without vlan encap, without fastpath PASS: aware bridge, with/without vlan encap, with fastpath PASS: aware bridge, with/with vlan encap, without fastpath PASS: aware bridge, with/with vlan encap, with fastpath PASS: aware bridge, without/with vlan encap, without fastpath PASS: aware bridge, without/with vlan encap, with fastpath PASS: forward, without vlan-device, without vlan encap, client1, without fastpath PASS: forward, without vlan-device, without vlan encap, client1, with fastpath PASS: forward, without vlan-device, with vlan encap, client1, without fastpath PASS: forward, without vlan-device, with vlan encap, client1, with fastpath PASS: forward, with vlan-device, with vlan encap, client1, without fastpath PASS: forward, with vlan-device, with vlan encap, client1, with fastpath PASS: forward, with vlan-device, without vlan encap, client1, without fastpath PASS: forward, with vlan-device, without vlan encap, client1, with fastpath PASS: all tests passed BANANAPI-R3 (lan1 & lan2 are dsa): ============ Without patches: ./bridge_fastpath.sh -t -0 enu1u2,lan2 -1 enu1u1,lan1 -2 lan4,eth1 Setup: CLIENT 0 enu1u2 | lan2 WAN ROUTER LAN1 LAN2 lan1 eth1 | | enu1u1 lan4 CLIENT 1 CLIENT 2 PASS: unaware bridge, without encaps, without fastpath PASS: unaware bridge, with single vlan encap, without fastpath PASS: aware bridge, without/without vlan encap, without fastpath PASS: aware bridge, with/without vlan encap, without fastpath PASS: aware bridge, with/with vlan encap, without fastpath PASS: aware bridge, without/with vlan encap, without fastpath PASS: forward, without vlan-device, without vlan encap, client1, without fastpath ERROR: forward, without vlan-device, without vlan encap, client1, with fastpath: ipv4: counted bytes 2118540 > 2097152 ERROR: forward, without vlan-device, without vlan encap, client1, with fastpath: ipv6: counted bytes 2117904 > 2097152 PASS: forward, without vlan-device, without vlan encap, client2, without fastpath PASS: forward, without vlan-device, without vlan encap, client2, with fastpath PASS: forward, without vlan-device, without vlan encap, client2, with hw_fastpath PASS: forward, without vlan-device, with vlan encap, client1, without fastpath ERROR: forward, without vlan-device, with vlan encap, client1, with fastpath: ipv4/6: tcp broken PASS: forward, without vlan-device, with vlan encap, client2, without fastpath ERROR: forward, without vlan-device, with vlan encap, client2, with fastpath: ipv4/6: tcp broken PASS: forward, with vlan-device, with vlan encap, client1, without fastpath PASS: forward, with vlan-device, with vlan encap, client1, with fastpath PASS: forward, with vlan-device, with vlan encap, client1, with hw_fastpath PASS: forward, with vlan-device, with vlan encap, client2, without fastpath PASS: forward, with vlan-device, with vlan encap, client2, with fastpath PASS: forward, with vlan-device, with vlan encap, client2, with hw_fastpath PASS: forward, with vlan-device, without vlan encap, client1, without fastpath PASS: forward, with vlan-device, without vlan encap, client1, with fastpath PASS: forward, with vlan-device, without vlan encap, client1, with hw_fastpath PASS: forward, with vlan-device, without vlan encap, client2, without fastpath ERROR: forward, with vlan-device, without vlan encap, client2, with fastpath: ipv4: counted bytes 2109596 > 2097152 ERROR: forward, with vlan-device, without vlan encap, client2, with fastpath: ipv6: counted bytes 2121432 > 2097152 ERROR: bridge fastpath test has failed With patches: PASS: unaware bridge, without encaps, without fastpath PASS: unaware bridge, without encaps, with fastpath PASS: unaware bridge, without encaps, with hw_fastpath PASS: unaware bridge, with single vlan encap, without fastpath PASS: unaware bridge, with single vlan encap, with fastpath PASS: unaware bridge, with single vlan encap, with hw_fastpath PASS: aware bridge, without/without vlan encap, without fastpath PASS: aware bridge, without/without vlan encap, with fastpath PASS: aware bridge, without/without vlan encap, with hw_fastpath PASS: aware bridge, with/without vlan encap, without fastpath PASS: aware bridge, with/without vlan encap, with fastpath PASS: aware bridge, with/without vlan encap, with hw_fastpath PASS: aware bridge, with/with vlan encap, without fastpath PASS: aware bridge, with/with vlan encap, with fastpath PASS: aware bridge, with/with vlan encap, with hw_fastpath PASS: aware bridge, without/with vlan encap, without fastpath PASS: aware bridge, without/with vlan encap, with fastpath PASS: aware bridge, without/with vlan encap, with hw_fastpath PASS: forward, without vlan-device, without vlan encap, client1, without fastpath PASS: forward, without vlan-device, without vlan encap, client1, with fastpath PASS: forward, without vlan-device, without vlan encap, client1, with hw_fastpath PASS: forward, without vlan-device, without vlan encap, client2, without fastpath PASS: forward, without vlan-device, without vlan encap, client2, with fastpath PASS: forward, without vlan-device, without vlan encap, client2, with hw_fastpath PASS: forward, without vlan-device, with vlan encap, client1, without fastpath PASS: forward, without vlan-device, with vlan encap, client1, with fastpath PASS: forward, without vlan-device, with vlan encap, client1, with hw_fastpath PASS: forward, without vlan-device, with vlan encap, client2, without fastpath PASS: forward, without vlan-device, with vlan encap, client2, with fastpath PASS: forward, without vlan-device, with vlan encap, client2, with hw_fastpath PASS: forward, with vlan-device, with vlan encap, client1, without fastpath PASS: forward, with vlan-device, with vlan encap, client1, with fastpath PASS: forward, with vlan-device, with vlan encap, client1, with hw_fastpath PASS: forward, with vlan-device, with vlan encap, client2, without fastpath PASS: forward, with vlan-device, with vlan encap, client2, with fastpath PASS: forward, with vlan-device, with vlan encap, client2, with hw_fastpath PASS: forward, with vlan-device, without vlan encap, client1, without fastpath PASS: forward, with vlan-device, without vlan encap, client1, with fastpath PASS: forward, with vlan-device, without vlan encap, client1, with hw_fastpath PASS: forward, with vlan-device, without vlan encap, client2, without fastpath PASS: forward, with vlan-device, without vlan encap, client2, with fastpath PASS: forward, with vlan-device, without vlan encap, client2, with hw_fastpath PASS: all tests passed AM3359 (end1 supports SWITCHDEV_OBJ_ID_PORT_VLAN, ipv4 only for now): ======= ./bridge_fastpath.sh -t -a -4 -d -1 enu1u4c2,end1 Without patches: Setup: CLIENT 0 veth0cl | veth0rt WAN ROUTER LAN1 LAN2 end1 veth2rt | | enu1u4c2 veth2cl CLIENT 1 CLIENT 2 INFO: Skipping unaware bridge PASS: aware bridge, without/without vlan encap, without fastpath PASS: aware bridge, with/without vlan encap, without fastpath PASS: aware bridge, with/with vlan encap, without fastpath PASS: aware bridge, without/with vlan encap, without fastpath PASS: forward, without vlan-device, without vlan encap, client1, without fastpath ERROR: forward, without vlan-device, without vlan encap, client1, with fastpath: ipv4: counted bytes 2190092 > 2097152 PASS: forward, without vlan-device, without vlan encap, client2, without fastpath PASS: forward, without vlan-device, without vlan encap, client2, with fastpath PASS: forward, without vlan-device, with vlan encap, client1, without fastpath ERROR: forward, without vlan-device, with vlan encap, client1, with fastpath: ipv4: tcp broken PASS: forward, without vlan-device, with vlan encap, client2, without fastpath ERROR: forward, without vlan-device, with vlan encap, client2, with fastpath: ipv4: tcp broken PASS: forward, with vlan-device, with vlan encap, client1, without fastpath PASS: forward, with vlan-device, with vlan encap, client1, with fastpath PASS: forward, with vlan-device, with vlan encap, client2, without fastpath PASS: forward, with vlan-device, with vlan encap, client2, with fastpath PASS: forward, with vlan-device, without vlan encap, client1, without fastpath PASS: forward, with vlan-device, without vlan encap, client1, with fastpath PASS: forward, with vlan-device, without vlan encap, client2, without fastpath PASS: forward, with vlan-device, without vlan encap, client2, with fastpath ERROR: bridge fastpath test has failed With patches: INFO: Skipping unaware bridge PASS: aware bridge, without/without vlan encap, without fastpath PASS: aware bridge, without/without vlan encap, with fastpath PASS: aware bridge, with/without vlan encap, without fastpath PASS: aware bridge, with/without vlan encap, with fastpath PASS: aware bridge, with/with vlan encap, without fastpath PASS: aware bridge, with/with vlan encap, with fastpath PASS: aware bridge, without/with vlan encap, without fastpath PASS: aware bridge, without/with vlan encap, with fastpath PASS: forward, without vlan-device, without vlan encap, client1, without fastpath PASS: forward, without vlan-device, without vlan encap, client1, with fastpath PASS: forward, without vlan-device, without vlan encap, client2, without fastpath PASS: forward, without vlan-device, without vlan encap, client2, with fastpath PASS: forward, without vlan-device, with vlan encap, client1, without fastpath PASS: forward, without vlan-device, with vlan encap, client1, with fastpath PASS: forward, without vlan-device, with vlan encap, client2, without fastpath PASS: forward, without vlan-device, with vlan encap, client2, with fastpath PASS: forward, with vlan-device, with vlan encap, client1, without fastpath PASS: forward, with vlan-device, with vlan encap, client1, with fastpath PASS: forward, with vlan-device, with vlan encap, client2, without fastpath PASS: forward, with vlan-device, with vlan encap, client2, with fastpath PASS: forward, with vlan-device, without vlan encap, client1, without fastpath PASS: forward, with vlan-device, without vlan encap, client1, with fastpath PASS: forward, with vlan-device, without vlan encap, client2, without fastpath PASS: forward, with vlan-device, without vlan encap, client2, with fastpath PASS: all tests passed (Some problem still to figure out for my AM3359 hardware: On the second run of the command the tcp traffic is ok on all tests ipv4. On the first run the hardware is not setup correctly, some tests report broken tcp even without fastpath. Also ipv6 tcp broken even on second run even without fastpath. This may be a problem with my hardware or the test-script, but anyway it shows the fastpath is functional) .../testing/selftests/net/netfilter/Makefile | 1 + .../net/netfilter/bridge_fastpath.sh | 922 ++++++++++++++++++ 2 files changed, 923 insertions(+) create mode 100755 tools/testing/selftests/net/netfilter/bridge_fastpath.sh diff --git a/tools/testing/selftests/net/netfilter/Makefile b/tools/testing/selftests/net/netfilter/Makefile index ffe161fac8b5..104dd9e5e02a 100644 --- a/tools/testing/selftests/net/netfilter/Makefile +++ b/tools/testing/selftests/net/netfilter/Makefile @@ -8,6 +8,7 @@ MNL_LDLIBS := $(shell $(HOSTPKG_CONFIG) --libs libmnl 2>/dev/null || echo -lmnl) TEST_PROGS := br_netfilter.sh bridge_brouter.sh TEST_PROGS += br_netfilter_queue.sh +TEST_PROGS += bridge_fastpath.sh TEST_PROGS += conntrack_dump_flush.sh TEST_PROGS += conntrack_icmp_related.sh TEST_PROGS += conntrack_ipip_mtu.sh diff --git a/tools/testing/selftests/net/netfilter/bridge_fastpath.sh b/tools/testing/selftests/net/netfilter/bridge_fastpath.sh new file mode 100755 index 000000000000..68e2f9e70951 --- /dev/null +++ b/tools/testing/selftests/net/netfilter/bridge_fastpath.sh @@ -0,0 +1,922 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Check if conntrack, nft chain and fastpath is functional in setups +# where a bridge is in the fastpath. +# +# Commandline options make it possible to use real ethernet pairs +# instead of veth-device pairs. Any, or all, pairs can be tested using +# real hardware pairs. This is can be useful to test dsa-ports, +# switchdev (dsa) foreign ports and switchdev ports supporting +# SWITCHDEV_OBJ_ID_PORT_VLAN. +# +# First tcp is tested. Conntrack and nft chain are tested using a counter. +# When there is a fastpath possible between the interfaces then the +# fastpath is also tested. +# When there is a hardware offloaded fastpath possible between the +# interfaces then the hardware offloaded path is also tested. +# +# Setup is as a typical router: +# +# nsclientwan +# | +# nsrt +# | | +# nsclient1 nsclient2 +# +# Masquerading for ipv4 only. +# +# First check if a bridge table forward chain can be setup, skip +# these tests if this is not possible. +# Then check if a inet table forward chain can be setup, skip +# these tests if this is not possible. +# +# Different setups of paths are tested that involve a bridge in the +# fastpath. This can be in the forward-fastpath or in the bridge-fastpath. +# +# The first series, in the bridge-fastpath, using a vlan-unaware bridge. +# Traffic with the following vlan-tags is checked: +# - without vlan +# - single vlan +# - double q vlan (only on veth-devices) +# - 802.1ad vlan (only on veth-devices) +# - pppoe (when available) +# - pppoe-in-q (when available) +# +# (double tag testing results in broken tcp traffic on most hardware, +# in this test setup, use '-a' argument to test it anyway) +# (pppoe testing takes place if pppd and pppoe-server are installed) +# +# The second series, in the bridge-fastpath, using a vlan-aware bridge. +# Here we test all combinations of ingress/egress with or without single +# vlan encaps. +# +# The third series, in the forward-fastpath, using a vlan-aware bridge, +# without a vlan-device linked to the master port. We test the same combinations +# of ingress/egress with or without single vlan encaps. +# +# The fourth series, in the forward-fastpath, using a vlan-aware bridge, +# with a vlan-device linked to the master port. We test the same combinations +# of ingress/egress with or without single vlan encaps. +# +# Note 1: Using dsa userports on both sides of eth-pairs client1 or client2 +# gives erratic and unpredictable results. Use, for example, an usb-eth device +# on the client side to test a dsa-userport. +# +# Note 2: Testing the hardware offloaded fastpath, it is not checked if the +# packets do not follow the software fastpath instead. A universal way to +# check this should be added at some point. +# +# Mote 3: Some interfaces to test on the router side, are netns immutable. +# Use the -d or --defaultnsrouter option so that the interfaces of the router +# do not have to change netns. The router is build up in the default netns. +# + +source lib.sh + +checktool "nft --version" "run test without nft" +checktool "socat -h" "run test without socat" +checktool "bridge -V" "run test without bridge" + +VID1=100 +VID2=101 +BRWAN=brwan +BRLAN=brlan +BRCL=brcl +LINKUP_TIMEOUT=10 +PING_TIMEOUT=10 +SOCAT_TIMEOUT=10 +filesize=2 # MiB + +filein=$(mktemp) +file1out=$(mktemp) +file2out=$(mktemp) +pppoeserveroptions=$(mktemp) +pppoeserverpid=$(mktemp) + +setup_ns nsclientwan nsclientlan1 nsclientlan2 + + WAN=0 ; LAN1=1 ; LAN2=2 ; ADWAN=3 ; ADLAN=4 +nsa=( $nsclientwan $nsclientlan1 $nsclientlan2 ) # $nsrt $nsrt +AD4=( '192.168.1.1' '192.168.2.101' '192.168.2.102' '192.168.1.2' '192.168.2.1' ) +AD6=( 'dead:1::1' 'dead:2::101' 'dead:2::102' 'dead:1::2' 'dead:2::1' ) + +while [ "${1:-}" != '' ]; do + case "$1" in + '-0' | '--pairwan') + shift + vethcl[$WAN]="${1%,*}" + vethrt[$WAN]="${1#*,}" + ;; + '-1' | '--pairlan1') + shift + vethcl[$LAN1]="${1%,*}" + vethrt[$LAN1]="${1#*,}" + ;; + '-2' | '--pairlan2') + shift + vethcl[$LAN2]="${1%,*}" + vethrt[$LAN2]="${1#*,}" + ;; + '-s' | '--filesize') + shift + filesize=$1 + ;; + '-4' | '--ipv4') + do_ipv4=1 + ;; + '-6' | '--ipv6') + do_ipv6=1 + ;; + '-a' | '--aware') + skip_unaware=1 + ;; + '-n' | '--noskip') + noskip=1 + ;; + '-d' | '--defaultnsrouter') + defaultnsrouter=1 + ;; + '-f' | '--fixmac') + fixmac=1 + ;; + '-t' | '--showtree') + showtree=1 + ;; + *) + cat <<-EOF + Usage: $(basename $0) [OPTION]... + -0 --pairwan eth0cl,eth0rt pair of real interfaces to use on wan side + -1 --pairlan1 eth1cl,eth1rt pair of real interfaces to use on lan1 side + -2 --pairlan2 eth2cl,eth2rt pair of real interfaces to use on lan2 side + -s --filesize filesize to use for testing + -4|-6 --ipv4|--ipv6 test ipv4/6 only + -a --aware only test vlan aware bridge + -d --defaultnsrouter router in default network namespace, caution! + -f --fixmac change mac address when conflict found + -n --noskip also perform the normally skipped tests + -t --showtree show the tree of used interfaces + EOF + ;; + esac + shift +done + +if [ -n "$defaultnsrouter" ]; then + nsrt="nsrt-$(mktemp -u XXXXXX)" + touch /var/run/netns/$nsrt + mount --bind /proc/1/ns/net /var/run/netns/$nsrt +else + setup_ns nsrt +fi +nsa+=($nsrt $nsrt) + +cleanup() { + if [ -n "$defaultnsrouter" ]; then + umount /var/run/netns/$nsrt + rm -f /var/run/netns/$nsrt + fi + cleanup_all_ns + rm -f "$filein" "$file1out" "$file2out" "$pppoeserveroptions" "$pppoeserverpid" +} + +trap cleanup EXIT + +head -c $(($filesize * 1024 * 1024)) < /dev/urandom > "$filein" + +check_mac() +{ + local ns=$1 + local dev=$2 + local othermacs=$3 + local mac + + mac=$(ip -net "$ns" -br link show dev "$dev" | \ + grep -o -E '([[:xdigit:]]{1,2}:){5}[[:xdigit:]]{1,2}') + + if [[ ! "$othermacs" =~ "$mac" ]]; then + echo $mac + return 0 + fi + echo "WARN: Conflicting mac address $dev $mac" 1>&2 + + [ -z "$fixmac" ] && return 1 + + for (( j = 0 ; j < 10 ; j++ )); do + mac="${mac::6}$(printf %02x:%02x:%02x:%02x $(($RANDOM%256)) \ + $(($RANDOM%256)) $(($RANDOM%256)) $(($RANDOM%256)))" + [[ "$othermacs" =~ "$mac" ]] && continue + echo $mac + ip -net "$ns" link set dev "$dev" address "$mac" 1>&2 + return $? + done + return 1 +} + +is_linkup() +{ + local ns=$1 + local dev=$2 + + if [ -n "$(ip -net "$ns" link show dev "$dev" up 2>/dev/null | \ + grep 'state UP')" ]; then + return 0 + fi + return 1 +} + +wait_ping() +{ + local i1=$1 + local i2=$2 + local ns1=${nsa[$i1]} + local j + + for j in $(seq 1 $(($PING_TIMEOUT * 5 ))); do + ip netns exec "$ns1" ping -c 1 -w $PING_TIMEOUT -i 0.2 \ + -q "${AD4[$i2]}" >/dev/null 2>&1 + [ $? -le 1 ] && return $? + sleep 0.2 + done + return 1 +} + +add_addr() +{ + local i=$1 + local dev=$2 + local ns=${nsa[$i]} + local ad4=${AD4[$i]} + local ad6=${AD6[$i]} + + ip -net "$ns" addr add "${ad4}/24" dev "$dev" + ip -net "$ns" addr add "${ad6}/64" dev "$dev" nodad + if [[ "$ns" == "nsclientlan"* ]]; then + ip -net "$ns" route add default via "${AD4[$ADLAN]}" + ip -net "$ns" route add default via "${AD6[$ADLAN]}" + elif [[ "$ns" == "nsclientwan"* ]]; then + ip -net "$ns" route add default via "${AD6[$ADWAN]}" + fi + +} + +del_addr() +{ + local i=$1 + local dev=$2 + local ns=${nsa[$i]} + local ad4=${AD4[$i]} + local ad6=${AD6[$i]} + + if [[ "$ns" == "nsclientlan"* ]]; then + ip -net "$ns" route del default via "${AD6[$ADLAN]}" + ip -net "$ns" route del default via "${AD4[$ADLAN]}" + elif [[ "$ns" == "nsclientwan"* ]]; then + ip -net "$ns" route del default via "${AD6[$ADWAN]}" + fi + ip -net "$ns" addr del "${ad6}/64" dev "$dev" nodad + ip -net "$ns" addr del "${ad4}/24" dev "$dev" +} + +set_client() +{ + local i=$1 + local vlan=$2 + local arg=$3 + local ns=${nsa[$i]} + local vdev="${vethcl[$i]}" + local brdev="$BRCL" + local proto="" + local pvidslave="" + + unset_client $i + + if [[ "$vlan" == "qq" ]]; then + ip -net "$ns" link add link "$vdev" name "$vdev.$VID1" type vlan id $VID1 + ip -net "$ns" link add link "$vdev.$VID1" name "$vdev.$VID1.$VID2" \ + type vlan id $VID2 + ip -net "$ns" link set "$vdev.$VID1" up + ip -net "$ns" link set "$vdev.$VID1.$VID2" up + add_addr $i "$vdev.$VID1.$VID2" + return + fi + + [[ "$vlan" == "none" ]] && pvidslave="pvid untagged" + [[ "$vlan" == "ad" ]] && proto="vlan_protocol 802.1ad" + + ip -net "$ns" link add "$brdev" type bridge vlan_filtering 1 vlan_default_pvid 0 $proto + ip -net "$ns" link set "$vdev" master "$brdev" + ip -net "$ns" link set "$brdev" up + + bridge -net "$ns" vlan add dev "$brdev" vid $VID1 pvid untagged self + bridge -net "$ns" vlan add dev "$vdev" vid $VID1 $pvidslave + + if [[ "$vlan" == "ad" ]]; then + ip -net "$ns" link add link "$brdev" name "$brdev.$VID2" type vlan id $VID2 + brdev="$brdev.$VID2" + ip -net "$ns" link set "$brdev" up + fi + + if [[ "$arg" != "noaddress" ]]; then + add_addr $i "$brdev" + fi +} + +unset_client() +{ + local i=$1 + local ns=${nsa[$i]} + local vdev="${vethcl[$i]}" + local brdev="$BRCL" + + ip -net "$ns" link del "$brdev" type bridge 2>/dev/null + ip -net "$ns" link del "$vdev.$VID1" 2>/dev/null +} + +add_pppoe() +{ + local i1=$1 + local i2=$2 + local dev1=$3 + local dev2=$4 + local desc=$5 + local ns1=${nsa[$i1]} + local ns2=${nsa[$i2]} + + ppp1=0 + while [ -n "$(ip -net "$ns1" link show ppp$ppp$LAN1 $LAN2>/dev/null)" ] + do ((ppp1++)); done + echo "noauth defaultroute noipdefault unit $ppp1" >"$pppoeserveroptions" + ppp1="ppp$ppp1" + + if ! ip netns exec "$ns1" pppoe-server -k -L "${AD4[$i1]}" -R "${AD4[$i2]}" \ + -I $dev1 -X "$pppoeserverpid" -O "$pppoeserveroptions" >/dev/null; then + echo "ERROR: $desc: failed to setup pppoe server" 1>&2 + return 1 + fi + + if ! ip netns exec "$ns2" pppd plugin pppoe.so nic-$dev2 persist holdoff 0 noauth \ + defaultroute noipdefault noaccomp nodeflate noproxyarp nopcomp \ + novj novjccomp linkname "selftest-$$" >/dev/null; then + echo "ERROR: $desc: failed to setup pppoe client" 1>&2 + return 1 + fi + + if ! wait_ping $i1 $i2; then + echo "ERROR: $desc: failed to setup functional pppoe connection" 1>&2 + return 1 + fi + + ppp2=$(cat "/run/pppd/ppp-selftest-$$.pid" | tail -n 1) + + ip -net "$ns1" addr add "${AD6[$i1]}/64" dev "$ppp1" nodad + ip -net "$ns2" addr add "${AD6[$i2]}/64" dev "$ppp2" nodad + + return 0 +} + +del_pppoe() +{ + local i1=$1 + local i2=$2 + local dev1=$3 + local dev2=$4 + local ns1=${nsa[$i1]} + local ns2=${nsa[$i2]} + + [[ -n "$ppp1" ]] && ip -net "$ns1" addr del "${AD6[$i1]}/64" dev "$ppp1" + [[ -n "$ppp2" ]] && ip -net "$ns2" addr del "${AD6[$i2]}/64" dev "$ppp2" + + kill -9 $(cat "/run/pppd/ppp-selftest-$$.pid" | head -n 1) \ + $(cat "$pppoeserverpid" | head -n 1) +} + +listener_ready() +{ + local ns=$1 + local ipv=$2 + + ss -N "$ns" --ipv$ipv -lnt -o "sport = :8080" | grep -q 8080 +} + +test_tcp() { + local i1=$1 + local i2=$2 + local dofast=$3 + local desc=$4 + local ns1=${nsa[$i1]} + local ns2=${nsa[$i2]} + local i=-1 + local lret=0 + local ads="" + local ipv ad a lpid bytes limit error + + if [ -n "$do_ipv4" ]; then ads="${AD4[$i2]}" + elif [ -n "$do_ipv6" ]; then ads="${AD6[$i2]}" + else ads="${AD4[$i2]} ${AD6[$i2]}" + fi + for ad in $ads; do + ((i++)) + if [[ "$ad" =~ ":" ]] + then ipv="6"; a="[${ad}]" + else ipv="4"; a="${ad}" + fi + + rm -f "$file1out" "$file2out" + + # ip netns exec "$nsrt" nft reset counters >/dev/null + # But on some systems this results in 4GB values in packet and byte count, so: + (echo "flush ruleset"; ip netns exec "$nsrt" nft --stateless list ruleset) | \ + ip netns exec "$nsrt" nft -f - + + timeout "$SOCAT_TIMEOUT" ip netns exec "$ns2" socat TCP$ipv-LISTEN:8080,reuseaddr \ + STDIO <"$filein" >"$file2out" 2>/dev/null & + lpid=$! + busywait 1000 listener_ready "$ns2" "$ipv" + + timeout "$SOCAT_TIMEOUT" ip netns exec "$ns1" socat TCP$ipv:$a:8080 \ + STDIO <"$filein" >"$file1out" 2>/dev/null + wait $lpid + + if [ $? -ne 0 ]; then + error[$i]="ipv$ipv: tcp broken" + continue + fi + if ! cmp "$filein" "$file1out" >/dev/null 2>&1; then + error[$i]="ipv$ipv: file mismatch to ${ad}" + continue + fi + if ! cmp "$filein" "$file2out" >/dev/null 2>&1; then + error[$i]="ipv$ipv: file mismatch from ${ad}" + continue + fi + + limit=$((2 * $filesize * 1024 * 1024)) + bytes=$(ip netns exec "$nsrt" nft list counter $family filter "check" | \ + grep "packets" | cut -d' ' -f4) + if [ -z "$dofast" ] && [ "$bytes" -lt "$limit" ]; then + + error[$i]="ipv$ipv: established bytes $bytes < $limit" + continue + fi + if [ -n "$dofast" ] && [ "$bytes" -gt "$((limit/2))" ]; then + # Significant reduction of bytes expected + error[$i]="ipv$ipv: counted bytes $bytes > $((limit/2))" + continue + fi + done + + if [ -n "${error[0]}" ]; then + if [[ "${error[0]#*:}" == "${error[1]#*:}" ]]; then + echo "ERROR: $desc: ipv4/6:${error[0]#*:}" 1>&2 + return 1 + fi + echo "ERROR: $desc: ${error[0]}" 1>&2 + lret=1 + fi + if [ -n "${error[1]}" ]; then + echo "ERROR: $desc: ${error[1]}" 1>&2 + lret=1 + fi + if [ $lret -eq 0 ]; then + echo "PASS: $desc" + fi + return $lret +} + +test_paths() { + local i1=$1 + local i2=$2 + local desc=$3 + local ns1=${nsa[$i1]} + local ns2=${nsa[$i2]} + + + if ! setup_nftables $i1 $i2; then + echo "ERROR: $desc: cannot setup nftables" 1>&2 + return 1 + fi + if ! test_tcp $i1 $i2 "" "$desc without fastpath"; then + return 1 + fi + + if ! setup_fastpath $i1 $i2 "" 2>/dev/null; then + return 0 + fi + if ! test_tcp $i1 $i2 "fast" "$desc with fastpath"; then + return 1 + fi + + if ! setup_fastpath $i1 $i2 "hw" 2>/dev/null; then + return 0 + fi + if ! test_tcp $i1 $i2 "fast" "$desc with hw_fastpath"; then + return 1 + fi + + return 0 + +} + +add_masq() +{ + if [[ $family != "bridge" ]]; then + ip netns exec "$nsrt" nft -f - <<-EOF + table ip nat { + chain postrouting { + type nat hook postrouting priority 0; + oifname ${BRWAN} masquerade + } + } + EOF + else + return 0 + fi +} + +setup_nftables() +{ + local i1=$1 + local i2=$2 + + ip netns exec "$nsrt" nft flush ruleset + + if ! add_masq; then + return 1 + fi + + ip netns exec "$nsrt" nft -f - <<-EOF + table ${family} filter { + counter check { } + chain forward { + type filter hook forward priority 0; policy accept; + ct state established ip saddr ${AD4[$i1]} tcp dport 8080 counter name "check" + ct state established ip saddr ${AD4[$i2]} tcp sport 8080 counter name "check" + ct state established ip6 saddr ${AD6[$i1]} tcp dport 8080 counter name "check" + ct state established ip6 saddr ${AD6[$i2]} tcp sport 8080 counter name "check" + } + } + EOF +} + +setup_fastpath() +{ + local devs="${vethrt[$1]} , ${vethrt[$2]}" + local arg=$3 + local flags="" + + [[ "$arg" == "hw" ]] && flags="flags offload" + + ip netns exec "$nsrt" nft flush ruleset + + if ! add_masq; then + return 1 + fi + + ip netns exec "$nsrt" nft -f - <<-EOF + table ${family} filter { + counter check { } + flowtable f { + hook ingress priority filter + devices = { ${devs} } + ${flags} + } + chain forward { + type filter hook forward priority 0; policy accept; + counter name "check" + ct state established flow add @f + } + } + EOF +} + +ret=0 +### Start Initial Setup ### + +for i in 4 6; do + ip netns exec "$nsrt" sysctl -q net.ipv$i.conf.all.forwarding=1 +done + +### Setup brlan as vlan unaware bridge ### +### Use brwan to make sure software fastpath is ### +### direct xmit in other direction also ### + +ip -net "$nsrt" link add $BRWAN type bridge +ret=$(($ret | $?)) +ip -net "$nsrt" link set $BRWAN up +ret=$(($ret | $?)) +if [ $ret -ne 0 ]; then + echo "SKIP: Can't create bridge" + exit $ksft_skip +fi + +# If both lan clients are veth-devices, only test 1 in the forward path +if [ -z "${vethcl[$LAN1]}" ] && [ -z "${vethcl[$LAN2]}" ]; then + lan_all_veth=1 +fi + +for i in $WAN $LAN1 $LAN2; do + ns="${nsa[$i]}" + if [ -z "${vethcl[$i]}" ]; then + vethcl[$i]="veth${i}cl" + vethrt[$i]="veth${i}rt" + ip link add "${vethcl[$i]}" netns "$ns" type veth \ + peer name "${vethrt[$i]}" netns "$nsrt" + ret=$(($ret | $?)) + else # Use pair of interconnected hardware interfaces + ip link set "${vethrt[$i]}" netns "$nsrt" + ret=$(($ret | $?)) + ip link set "${vethcl[$i]}" netns "$ns" + ret=$(($ret | $?)) + fi +done +if [ $ret -ne 0 ]; then + echo "SKIP: (v)eth pairs cannot be used" + exit $ksft_skip +fi + +if [ -n "$showtree" ]; then + cat <<-EOF + Setup: + CLIENT 0 + ${vethcl[$WAN]} + | + ${vethrt[$WAN]} + WAN + ROUTER + LAN1 LAN2 + $(printf "%14.14s" ${vethrt[$LAN1]}) ${vethrt[$LAN2]} + | | + $(printf "%14.14s" ${vethcl[$LAN1]}) ${vethcl[$LAN2]} + CLIENT 1 CLIENT 2 + + EOF +fi + +for n in nsclientwan nsclientlan; do + routerside=""; clientside="" + for i in $WAN $LAN1 $LAN2; do + ns="${nsa[$i]}" + [[ "$ns" != "$n"* ]] && continue + mac=$(check_mac $ns ${vethcl[$i]} "$routerside $clientside") + ret=$(($ret | $?)) + clientside+=" $mac" + mac=$(check_mac $nsrt ${vethrt[$i]} "$clientside") + ret=$(($ret | $?)) + routerside+=" $mac" + done +done +if [ $ret -ne 0 ]; then + echo "SKIP: because of conflicting mac address" + exit $ksft_skip +fi + +for i in $WAN $LAN1 $LAN2; do + ns="${nsa[$i]}" + ip -net "$ns" link set "${vethcl[$i]}" up + ret=$(($ret | $?)) + ip -net "$nsrt" link set "${vethrt[$i]}" up + ret=$(($ret | $?)) +done +if [ $ret -ne 0 ]; then + echo "SKIP: setting (v)eth pairs link up failed" + exit $ksft_skip +fi + +for j in $(seq 1 $(($LINKUP_TIMEOUT * 5 ))); do + ret=0 + for i in $WAN $LAN1 $LAN2; do + ns="${nsa[$i]}" + is_linkup $ns "${vethcl[$i]}" + ret=$(($ret | $?)) + is_linkup $nsrt "${vethrt[$i]}" + ret=$(($ret | $?)) + done + [ $ret -eq 0 ] && break + sleep 0.2 +done +if [ $ret -ne 0 ]; then + echo "SKIP: waiting for (v)eth pairs link up failed" + exit $ksft_skip +fi + +i=$WAN +ip -net "$nsrt" link set "${vethrt[$i]}" master $BRWAN + +### End Initial Setup ### + +family="bridge" +setup_nftables $LAN1 $LAN2 2>/dev/null +if [ $? -ne 0 ]; then + echo "INFO: Cannot add nftables table $family" + skip_family_bridge_part2=1 +elif [ -n "$skip_unaware" ]; then + echo "INFO: Skipping unaware bridge" +else + +### Start nft family bridge test part 1 ### + +ip -net "$nsrt" link add $BRLAN type bridge +ip -net "$nsrt" link set $BRLAN up +for i in $LAN1 $LAN2; do + ns="${nsa[$i]}" + ip -net "$nsrt" link set "${vethrt[$i]}" master $BRLAN +done + +for i in $LAN1 $LAN2; do + set_client $i none +done + +test_paths $LAN1 $LAN2 "unaware bridge, without encaps, " +ret=$(($ret | $?)) + +for i in $LAN1 $LAN2; do + set_client $i q +done + +test_paths $LAN1 $LAN2 "unaware bridge, with single vlan encap, " +ret=$(($ret | $?)) + +for i in $LAN1 $LAN2; do + set_client $i qq +done + +# Skip testing double tagged packets on real hardware +if [ -n "$lan_all_veth" ] || [ -n "$noskip" ]; then + +test_paths $LAN1 $LAN2 "unaware bridge, with double q vlan encaps," +ret=$(($ret | $?)) + +for i in $LAN1 $LAN2; do + set_client $i ad +done + +test_paths $LAN1 $LAN2 "unaware bridge, with 802.1ad vlan encaps, " +ret=$(($ret | $?)) + +fi +# End Skip testing double tagged packets + +if [ -n "$(command -v pppd 2>/dev/null)" ] && + [ -n "$(command -v pppoe-server 2>/dev/null)" ]; then +# Start pppoe + +for i in $LAN1 $LAN2; do + set_client $i none noaddress +done + +if add_pppoe $LAN1 $LAN2 "$BRCL" "$BRCL" "unaware bridge, with pppoe encap"; then + test_paths $LAN1 $LAN2 "unaware bridge, with pppoe encap, " + ret=$(($ret | $?)) +fi + +del_pppoe $LAN1 $LAN2 "$BRCL" "$BRCL" + +for i in $LAN1 $LAN2; do + set_client $i q noaddress +done + +if add_pppoe $LAN1 $LAN2 "$BRCL" "$BRCL" "unaware bridge, with pppoe-in-q encaps"; then + test_paths $LAN1 $LAN2 "unaware bridge, with pppoe-in-q encaps, " + ret=$(($ret | $?)) +fi + +del_pppoe $LAN1 $LAN2 "$BRCL" "$BRCL" + +# End pppoe +fi + +ip -net "$nsrt" link del $BRLAN type bridge + +### End nft family bridge test part 1 ### +fi + +### Setup brlan as vlan aware bridge ### + +ip -net "$nsrt" link add $BRLAN type bridge vlan_filtering 1 vlan_default_pvid 0 +ip -net "$nsrt" link set $BRLAN up +bridge -net "$nsrt" vlan add dev $BRLAN vid $VID1 pvid untagged self +for i in $LAN1 $LAN2; do + ip -net "$nsrt" link set "${vethrt[$i]}" master $BRLAN + bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 pvid untagged +done + +for i in $LAN1 $LAN2; do + set_client $i none +done + +if [ -z "$skip_family_bridge_part2" ]; then +### Start nft family bridge test part 2 ### + +test_paths $LAN1 $LAN2 "aware bridge, without/without vlan encap," +ret=$(($ret | $?)) + +i=$LAN1 +bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 pvid untagged +bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 +set_client $i q + +test_paths $LAN1 $LAN2 "aware bridge, with/without vlan encap, " +ret=$(($ret | $?)) + +i=$LAN2 +bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 pvid untagged +bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 +set_client $i q + +test_paths $LAN1 $LAN2 "aware bridge, with/with vlan encap, " +ret=$(($ret | $?)) + +i=$LAN1 +bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 +bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 pvid untagged +set_client $i none + +test_paths $LAN1 $LAN2 "aware bridge, without/with vlan encap, " +ret=$(($ret | $?)) + +i=$LAN2 +bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 +bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 pvid untagged +set_client $i none + +fi + +### End nft family bridge test part 2 ### + +### Start nft family inet test ### +family="inet" +if ! setup_nftables $WAN $LAN1 $LAN2>/dev/null; then + echo "INFO: Cannot add nftables table $family" + exit $ret +fi + +set_client $WAN none +add_addr $ADWAN "$BRWAN" +add_addr $ADLAN "$BRLAN" + +test_paths $LAN1 $WAN "forward, without vlan-device, without vlan encap, client1," +ret=$(($ret | $?)) +if [ -z "$lan_all_veth" ] || [ -n "$noskip" ]; then +test_paths $LAN2 $WAN "forward, without vlan-device, without vlan encap, client2," +ret=$(($ret | $?)) +fi + +for i in $LAN1 $LAN2; do +bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 pvid untagged +bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 +set_client $i q +done + +test_paths $LAN1 $WAN "forward, without vlan-device, with vlan encap, client1," +ret=$(($ret | $?)) +if [ -z "$lan_all_veth" ] || [ -n "$noskip" ]; then +test_paths $LAN2 $WAN "forward, without vlan-device, with vlan encap, client2," +ret=$(($ret | $?)) +fi + +# Setup vlan-device linked to brlan master port +del_addr $ADLAN "$BRLAN" +ip -net "$nsrt" link set $BRLAN down +bridge -net "$nsrt" vlan del dev $BRLAN vid $VID1 pvid untagged self +bridge -net "$nsrt" vlan add dev $BRLAN vid $VID1 self +ip -net "$nsrt" link add link $BRLAN name $BRLAN.$VID1 type vlan id $VID1 +ip -net "$nsrt" link set $BRLAN up +ip -net "$nsrt" link set "$BRLAN.$VID1" up +add_addr $ADLAN "$BRLAN.$VID1" + +test_paths $LAN1 $WAN "forward, with vlan-device, with vlan encap, client1," +ret=$(($ret | $?)) +if [ -z "$lan_all_veth" ] || [ -n "$noskip" ]; then +test_paths $LAN2 $WAN "forward, with vlan-device, with vlan encap, client2," +ret=$(($ret | $?)) +fi + +for i in $LAN1 $LAN2; do +bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 +bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 pvid untagged +set_client $i none +done + +test_paths $LAN1 $WAN "forward, with vlan-device, without vlan encap, client1," +ret=$(($ret | $?)) +if [ -z "$lan_all_veth" ] || [ -n "$noskip" ]; then +test_paths $LAN2 $WAN "forward, with vlan-device, without vlan encap, client2," +ret=$(($ret | $?)) +fi + +### End nft family inet test ### + +for i in $WAN $LAN1 $LAN2; do + unset_client $i +done +ip -net "$nsrt" link del $BRLAN type bridge +ip -net "$nsrt" link del $BRWAN type bridge + +if [ $ret -eq 0 ]; then + echo "PASS: all tests passed" +else + echo "ERROR: bridge fastpath test has failed" +fi + +exit $ret -- 2.47.1

4 months, 2 weeks

1
0
0 0

[PATCH v8 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ)

by Nicolin Chen

As the vIOMMU infrastructure series part-3, this introduces a new vEVENTQ object. The existing FAULT object provides a nice notification pathway to the user space with a queue already, so let vEVENTQ reuse that. Mimicing the HWPT structure, add a common EVENTQ structure to support its derivatives: IOMMUFD_OBJ_FAULT (existing) and IOMMUFD_OBJ_VEVENTQ (new). An IOMMUFD_CMD_VEVENTQ_ALLOC is introduced to allocate vEVENTQ object for vIOMMUs. One vIOMMU can have multiple vEVENTQs in different types but can not support multiple vEVENTQs in the same type. The forwarding part is fairly simple but might need to replace a physical device ID with a virtual device ID in a driver-level event data structure. So, this also adds some helpers for drivers to use. As usual, this series comes with the selftest coverage for this new ioctl and with a real world use case in the ARM SMMUv3 driver. This is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_veventq-v8 Paring QEMU branch for testing: https://github.com/nicolinc/qemu/commits/wip/for_iommufd_veventq-v8 Changelog v8 * Add Reviewed-by from Jason and Pranjal * Fix errno returned in arm_smmu_handle_event() * Validate domain->type outside of arm_smmu_attach_prepare_vmaster() * Drop unnecessary vmaster comparison in arm_smmu_attach_commit_vmaster() v7 https://lore.kernel.org/all/cover.1740238876.git.nicolinc@nvidia.com/ * Rebase on Jason's for-next tree for latest fault.c * Add Reviewed-by * Update commit logs * Add __reserved field sanity * Skip kfree() on the static header * Replace "bool on_list" with list_is_last() * Use u32 for flags in iommufd_vevent_header * Drop casting in iommufd_viommu_get_vdev_id() * Update the bounding logic to veventq->sequence * Add missing cpu_to_le64() around STRTAB_STE_1_MEV * Reuse veventq->common.lock to fence sequence and num_events * Rename overflow to lost_events and log it in upon kmalloc failure * Correct the error handling part in iommufd_veventq_deliver_fetch() * Add an arm_smmu_clear_vmaster() to simplify identity/blocked domain attach ops * Add additional four event records to forward to user space VM, and update the uAPI doc * Reuse the existing smmu->streams_mutex lock to fence master->vmaster pointer, instead of adding a new rwsem v6 https://lore.kernel.org/all/cover.1737754129.git.nicolinc@nvidia.com/ * Drop supports_veventq viommu op * Split bug/cosmetics fixes out of the series * Drop the blocking mutex around copy_to_user() * Add veventq_depth in uAPI to limit vEVENTQ size * Revise the documentation for a clear description * Fix sparse warnings in arm_vmaster_report_event() * Rework iommufd_viommu_get_vdev_id() to return -ENOENT v.s. 0 * Allow Abort/Bypass STEs to allocate vEVENTQ and set STE.MEV for DoS mitigations v5 https://lore.kernel.org/all/cover.1736237481.git.nicolinc@nvidia.com/ * Add Reviewed-by from Baolu * Reorder the OBJ list as well * Fix alphabetical order after renaming in v4 * Add supports_veventq viommu op for vEVENTQ type validation v4 https://lore.kernel.org/all/cover.1735933254.git.nicolinc@nvidia.com/ * Rename "vIRQ" to "vEVENTQ" * Use flexible array in struct iommufd_vevent * Add the new ioctl command to union ucmd_buffer * Fix the alphabetical order in union ucmd_buffer too * Rename _TYPE_NONE to _TYPE_DEFAULT aligning with vIOMMU naming v3 https://lore.kernel.org/all/cover.1734477608.git.nicolinc@nvidia.com/ * Rebase on Will's for-joerg/arm-smmu/updates for arm_smmu_event series * Add "Reviewed-by" lines from Kevin * Fix typos in comments, kdocs, and jump tags * Add a patch to sort struct iommufd_ioctl_op * Update iommufd's userpsace-api documentation * Update uAPI kdoc to quote SMMUv3 offical spec * Drop the unused workqueue in struct iommufd_virq * Drop might_sleep() in iommufd_viommu_report_irq() helper * Add missing "break" in iommufd_viommu_get_vdev_id() helper * Shrink the scope of the vmaster's read lock in SMMUv3 driver * Pass in two arguments to iommufd_eventq_virq_handler() helper * Move "!ops || !ops->read" validation into iommufd_eventq_init() * Move "fault->ictx = ictx" closer to iommufd_ctx_get(fault->ictx) * Update commit message for arm_smmu_attach_prepare/commit_vmaster() * Keep "iommufd_fault" as-is and rename "iommufd_eventq_virq" to just "iommufd_virq" v2 https://lore.kernel.org/all/cover.1733263737.git.nicolinc@nvidia.com/ * Rebase on v6.13-rc1 * Add IOPF and vIRQ in iommufd.rst (userspace-api) * Add a proper locking in iommufd_event_virq_destroy * Add iommufd_event_virq_abort with a lockdep_assert_held * Rename "EVENT_*" to "EVENTQ_*" to describe the objects better * Reorganize flows in iommufd_eventq_virq_alloc for abort() to work * Adde struct arm_smmu_vmaster to store vSID upon attaching to a nested domain, calling a newly added iommufd_viommu_get_vdev_id helper * Adde an arm_vmaster_report_event helper in arm-smmu-v3-iommufd file to simplify the routine in arm_smmu_handle_evt() of the main driver v1 https://lore.kernel.org/all/cover.1724777091.git.nicolinc@nvidia.com/ Thanks! Nicolin Nicolin Chen (14): iommufd/fault: Move two fault functions out of the header iommufd/fault: Add an iommufd_fault_init() helper iommufd: Abstract an iommufd_eventq from iommufd_fault iommufd: Rename fault.c to eventq.c iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC iommufd/viommu: Add iommufd_viommu_get_vdev_id helper iommufd/viommu: Add iommufd_viommu_report_event helper iommufd/selftest: Require vdev_id when attaching to a nested domain iommufd/selftest: Add IOMMU_TEST_OP_TRIGGER_VEVENT for vEVENTQ coverage iommufd/selftest: Add IOMMU_VEVENTQ_ALLOC test coverage Documentation: userspace-api: iommufd: Update FAULT and VEVENTQ iommu/arm-smmu-v3: Introduce struct arm_smmu_vmaster iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations drivers/iommu/iommufd/Makefile | 2 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 36 ++ drivers/iommu/iommufd/iommufd_private.h | 135 +++- drivers/iommu/iommufd/iommufd_test.h | 10 + include/linux/iommufd.h | 23 + include/uapi/linux/iommufd.h | 105 +++ tools/testing/selftests/iommu/iommufd_utils.h | 115 ++++ .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 64 ++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 82 ++- drivers/iommu/iommufd/driver.c | 72 +++ drivers/iommu/iommufd/eventq.c | 597 ++++++++++++++++++ drivers/iommu/iommufd/fault.c | 342 ---------- drivers/iommu/iommufd/hw_pagetable.c | 6 +- drivers/iommu/iommufd/main.c | 7 + drivers/iommu/iommufd/selftest.c | 54 ++ drivers/iommu/iommufd/viommu.c | 2 + tools/testing/selftests/iommu/iommufd.c | 36 ++ .../selftests/iommu/iommufd_fail_nth.c | 7 + Documentation/userspace-api/iommufd.rst | 17 + 19 files changed, 1304 insertions(+), 408 deletions(-) create mode 100644 drivers/iommu/iommufd/eventq.c delete mode 100644 drivers/iommu/iommufd/fault.c base-commit: 598749522d4254afb33b8a6c1bea614a95896868 -- 2.43.0

4 months, 2 weeks

6
32
0 0

[PATCH] selftests: mptcp: add comment for getaddrinfo

by zhenwei pi

mptcp_connect.c is a startup tutorial of MPTCP programming, however there is a lack of ai_protocol(IPPROTO_MPTCP) usage. Add comment for getaddrinfo MPTCP support. Signed-off-by: zhenwei pi <zhenwei.pi(a)linux.dev> Signed-off-by: zhenwei pi <pizhenwei(a)bytedance.com> --- tools/testing/selftests/net/mptcp/mptcp_connect.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c index c83a8b47bbdf..6b9031273964 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -179,6 +179,18 @@ static void xgetnameinfo(const struct sockaddr *addr, socklen_t addrlen, } } +/* There is a lack of MPTCP support from glibc, these code leads error: + * struct addrinfo hints = { + * .ai_protocol = IPPROTO_MPTCP, + * ... + * }; + * err = getaddrinfo(node, service, &hints, res); + * ... + * So using IPPROTO_TCP to resolve, and use TCP/MPTCP to create socket. + * + * glibc starts to support MPTCP since v2.42. + * Link: https://sourceware.org/git/?p=glibc.git;a=commit;h=a8e9022e0f82 + */ static void xgetaddrinfo(const char *node, const char *service, const struct addrinfo *hints, struct addrinfo **res) -- 2.34.1

4 months, 2 weeks

3
3
0 0

[PATCH] selftest/mm: Make hugetlb_reparenting_test tolerant to async reparenting

by Li Wang

In cgroup v2, memory and hugetlb usage reparenting is asynchronous. This can cause test flakiness when immediately asserting usage after deleting a child cgroup. To address this, add a helper function `assert_with_retry()` that checks usage values with a timeout-based retry. This improves test stability without relying on fixed sleep delays. Also bump up the tolerance size to 7MB. To avoid False Positives: ... # Assert memory charged correctly for child only use. # actual a = 11 MB # expected a = 0 MB # fail # cleanup # [FAIL] not ok 11 hugetlb_reparenting_test.sh -cgroup-v2 # exit=1 # 0 # SUMMARY: PASS=10 SKIP=0 FAIL=1 Signed-off-by: Li Wang <liwang(a)redhat.com> Cc: Waiman Long <longman(a)redhat.com> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com> Cc: Dev Jain <dev.jain(a)arm.com> Cc: Kirill A. Shuemov <kirill.shutemov(a)linux.intel.com> Cc: Shuah Khan <shuah(a)kernel.org> --- .../selftests/mm/hugetlb_reparenting_test.sh | 96 ++++++++----------- 1 file changed, 41 insertions(+), 55 deletions(-) diff --git a/tools/testing/selftests/mm/hugetlb_reparenting_test.sh b/tools/testing/selftests/mm/hugetlb_reparenting_test.sh index 11f9bbe7dc22..1c172c6999f4 100755 --- a/tools/testing/selftests/mm/hugetlb_reparenting_test.sh +++ b/tools/testing/selftests/mm/hugetlb_reparenting_test.sh @@ -36,7 +36,7 @@ else do_umount=1 fi fi -MNT='/mnt/huge/' +MNT='/mnt/huge' function get_machine_hugepage_size() { hpz=$(grep -i hugepagesize /proc/meminfo) @@ -60,6 +60,41 @@ function cleanup() { set -e } +function assert_with_retry() { + local actual_path="$1" + local expected="$2" + local tolerance=$((7 * 1024 * 1024)) + local timeout=20 + local interval=1 + local start_time + local now + local elapsed + local actual + + start_time=$(date +%s) + + while true; do + actual="$(cat "$actual_path")" + + if [[ $actual -ge $(($expected - $tolerance)) ]] && + [[ $actual -le $(($expected + $tolerance)) ]]; then + return 0 + fi + + now=$(date +%s) + elapsed=$((now - start_time)) + + if [[ $elapsed -ge $timeout ]]; then + echo "actual = $((${actual%% *} / 1024 / 1024)) MB" + echo "expected = $((${expected%% *} / 1024 / 1024)) MB" + cleanup + exit 1 + fi + + sleep $interval + done +} + function assert_state() { local expected_a="$1" local expected_a_hugetlb="$2" @@ -70,58 +105,13 @@ function assert_state() { expected_b="$3" expected_b_hugetlb="$4" fi - local tolerance=$((5 * 1024 * 1024)) - - local actual_a - actual_a="$(cat "$CGROUP_ROOT"/a/memory.$usage_file)" - if [[ $actual_a -lt $(($expected_a - $tolerance)) ]] || - [[ $actual_a -gt $(($expected_a + $tolerance)) ]]; then - echo actual a = $((${actual_a%% *} / 1024 / 1024)) MB - echo expected a = $((${expected_a%% *} / 1024 / 1024)) MB - echo fail - - cleanup - exit 1 - fi - - local actual_a_hugetlb - actual_a_hugetlb="$(cat "$CGROUP_ROOT"/a/hugetlb.${MB}MB.$usage_file)" - if [[ $actual_a_hugetlb -lt $(($expected_a_hugetlb - $tolerance)) ]] || - [[ $actual_a_hugetlb -gt $(($expected_a_hugetlb + $tolerance)) ]]; then - echo actual a hugetlb = $((${actual_a_hugetlb%% *} / 1024 / 1024)) MB - echo expected a hugetlb = $((${expected_a_hugetlb%% *} / 1024 / 1024)) MB - echo fail - - cleanup - exit 1 - fi - - if [[ -z "$expected_b" || -z "$expected_b_hugetlb" ]]; then - return - fi - - local actual_b - actual_b="$(cat "$CGROUP_ROOT"/a/b/memory.$usage_file)" - if [[ $actual_b -lt $(($expected_b - $tolerance)) ]] || - [[ $actual_b -gt $(($expected_b + $tolerance)) ]]; then - echo actual b = $((${actual_b%% *} / 1024 / 1024)) MB - echo expected b = $((${expected_b%% *} / 1024 / 1024)) MB - echo fail - - cleanup - exit 1 - fi - local actual_b_hugetlb - actual_b_hugetlb="$(cat "$CGROUP_ROOT"/a/b/hugetlb.${MB}MB.$usage_file)" - if [[ $actual_b_hugetlb -lt $(($expected_b_hugetlb - $tolerance)) ]] || - [[ $actual_b_hugetlb -gt $(($expected_b_hugetlb + $tolerance)) ]]; then - echo actual b hugetlb = $((${actual_b_hugetlb%% *} / 1024 / 1024)) MB - echo expected b hugetlb = $((${expected_b_hugetlb%% *} / 1024 / 1024)) MB - echo fail + assert_with_retry "$CGROUP_ROOT/a/memory.$usage_file" "$expected_a" + assert_with_retry "$CGROUP_ROOT/a/hugetlb.${MB}MB.$usage_file" "$expected_a_hugetlb" - cleanup - exit 1 + if [[ -n "$expected_b" && -n "$expected_b_hugetlb" ]]; then + assert_with_retry "$CGROUP_ROOT/a/b/memory.$usage_file" "$expected_b" + assert_with_retry "$CGROUP_ROOT/a/b/hugetlb.${MB}MB.$usage_file" "$expected_b_hugetlb" fi } @@ -174,7 +164,6 @@ size=$((${MB} * 1024 * 1024 * 25)) # 50MB = 25 * 2MB hugepages. cleanup -echo echo echo Test charge, rmdir, uncharge setup @@ -195,7 +184,6 @@ cleanup echo done echo -echo if [[ ! $cgroup2 ]]; then echo "Test parent and child hugetlb usage" setup @@ -212,7 +200,6 @@ if [[ ! $cgroup2 ]]; then assert_state 0 $(($size * 2)) 0 $size rmdir "$CGROUP_ROOT"/a/b - sleep 5 echo Assert memory reparent correctly. assert_state 0 $(($size * 2)) @@ -224,7 +211,6 @@ if [[ ! $cgroup2 ]]; then cleanup fi -echo echo echo "Test child only hugetlb usage" echo setup -- 2.48.1

4 months, 2 weeks

2
1
0 0

[PATCH v3] ublk: improve detection and handling of ublk server exit

by Uday Shankar

There are currently two ways in which ublk server exit is detected by ublk_drv: 1. uring_cmd cancellation. If there are any outstanding uring_cmds which have not been completed to the ublk server when it exits, io_uring calls the uring_cmd callback with a special cancellation flag as the issuing task is exiting. 2. I/O timeout. This is needed in addition to the above to handle the "saturated queue" case, when all I/Os for a given queue are in the ublk server, and therefore there are no outstanding uring_cmds to cancel when the ublk server exits. There are a couple of issues with this approach: - It is complex and inelegant to have two methods to detect the same condition - The second method detects ublk server exit only after a long delay (~30s, the default timeout assigned by the block layer). This delays the nosrv behavior from kicking in and potential subsequent recovery of the device. The second issue is brought to light with the new test_generic_04. It fails before this fix: selftests: ublk: test_generic_04.sh dev id is 0 dd: error writing '/dev/ublkb0': Input/output error 1+0 records in 0+0 records out 0 bytes copied, 30.0611 s, 0.0 kB/s DEAD dd took 31 seconds to exit (>= 5s tolerance)! generic_04 : [FAIL] Fix this by instead detecting and handling ublk server exit in the character file release callback. This has several advantages: - This one place can handle both saturated and unsaturated queues. Thus, it replaces both preexisting methods of detecting ublk server exit. - It runs quickly on ublk server exit - there is no 30s delay. - It starts the process of removing task references in ublk_drv. This is needed if we want to relax restrictions in the driver like letting only one thread serve each queue There is also the disadvantage that the character file release callback can also be triggered by intentional close of the file, which is a significant behavior change. Preexisting ublk servers (libublksrv) are dependent on the ability to open/close the file multiple times. To address this, only transition to a nosrv state if the file is released while the ublk device is live. This allows for programs to open/close the file multiple times during setup. It is still a behavior change if a ublk server decides to close/reopen the file while the device is LIVE (i.e. while it is responsible for serving I/O), but that would be highly unusual. This behavior is in line with what is done by FUSE, which is very similar to ublk in that a userspace daemon is providing services traditionally provided by the kernel. With this change in, the new test (and all other selftests, and all ublksrv tests) pass: selftests: ublk: test_generic_04.sh dev id is 0 dd: error writing '/dev/ublkb0': Input/output error 1+0 records in 0+0 records out 0 bytes copied, 0.0376731 s, 0.0 kB/s DEAD generic_04 : [PASS] Signed-off-by: Uday Shankar <ushankar(a)purestorage.com> --- Changes in v3: - Quiesce queue earlier to avoid concurrent cancellation and "normal" completion of io_uring cmds (Ming Lei) - Fix del_gendisk hang, found by test_stress_02 - Remove unnecessary parameters in fault_inject target (Ming Lei) - Fix delay implementation to have separate per-I/O delay instead of blocking the whole thread (Ming Lei) - Add delay_us to docs - Link to v2: https://lore.kernel.org/r/20250402-ublk_timeout-v2-1-249bc5523000@purestora… Changes in v2: - Leave null ublk selftests target untouched, instead create new fault_inject target for injecting per-I/O delay (Ming Lei) - Allow multiple open/close of ublk character device with some restrictions - Drop patches which made it in separately at https://lore.kernel.org/r/20250401-ublk_selftests-v1-1-98129c9bc8bb@puresto… - Consolidate more nosrv logic in ublk character device release, and associated code cleanup - Link to v1: https://lore.kernel.org/r/20250325-ublk_timeout-v1-0-262f0121a7bd@purestora… --- drivers/block/ublk_drv.c | 228 +++++++++--------------- tools/testing/selftests/ublk/Makefile | 4 +- tools/testing/selftests/ublk/fault_inject.c | 72 ++++++++ tools/testing/selftests/ublk/kublk.c | 6 +- tools/testing/selftests/ublk/kublk.h | 4 + tools/testing/selftests/ublk/test_generic_04.sh | 43 +++++ 6 files changed, 215 insertions(+), 142 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 2fd05c1bd30b03343cb6f357f8c08dd92ff47af9..73baa9d22ccafb00723defa755a0b3aab7238934 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -162,7 +162,6 @@ struct ublk_queue { bool force_abort; bool timeout; - bool canceling; bool fail_io; /* copy of dev->state == UBLK_S_DEV_FAIL_IO */ unsigned short nr_io_ready; /* how many ios setup */ spinlock_t cancel_lock; @@ -199,8 +198,6 @@ struct ublk_device { struct completion completion; unsigned int nr_queues_ready; unsigned int nr_privileged_daemon; - - struct work_struct nosrv_work; }; /* header of ublk_params */ @@ -209,8 +206,9 @@ struct ublk_params_header { __u32 types; }; -static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq); - +static void ublk_stop_dev_unlocked(struct ublk_device *ub); +static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq); +static void __ublk_quiesce_dev(struct ublk_device *ub); static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, struct ublk_queue *ubq, int tag, size_t offset); static inline unsigned int ublk_req_build_flags(struct request *req); @@ -1314,8 +1312,6 @@ static void ublk_queue_cmd_list(struct ublk_queue *ubq, struct rq_list *l) static enum blk_eh_timer_return ublk_timeout(struct request *rq) { struct ublk_queue *ubq = rq->mq_hctx->driver_data; - unsigned int nr_inflight = 0; - int i; if (ubq->flags & UBLK_F_UNPRIVILEGED_DEV) { if (!ubq->timeout) { @@ -1326,26 +1322,6 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq) return BLK_EH_DONE; } - if (!ubq_daemon_is_dying(ubq)) - return BLK_EH_RESET_TIMER; - - for (i = 0; i < ubq->q_depth; i++) { - struct ublk_io *io = &ubq->ios[i]; - - if (!(io->flags & UBLK_IO_FLAG_ACTIVE)) - nr_inflight++; - } - - /* cancelable uring_cmd can't help us if all commands are in-flight */ - if (nr_inflight == ubq->q_depth) { - struct ublk_device *ub = ubq->dev; - - if (ublk_abort_requests(ub, ubq)) { - schedule_work(&ub->nosrv_work); - } - return BLK_EH_DONE; - } - return BLK_EH_RESET_TIMER; } @@ -1356,19 +1332,16 @@ static blk_status_t ublk_prep_req(struct ublk_queue *ubq, struct request *rq) if (unlikely(ubq->fail_io)) return BLK_STS_TARGET; - /* With recovery feature enabled, force_abort is set in - * ublk_stop_dev() before calling del_gendisk(). We have to - * abort all requeued and new rqs here to let del_gendisk() - * move on. Besides, we cannot not call io_uring_cmd_complete_in_task() - * to avoid UAF on io_uring ctx. + /* + * force_abort is set in ublk_stop_dev() before calling + * del_gendisk(). We have to abort all requeued and new rqs here + * to let del_gendisk() move on. Besides, we cannot not call + * io_uring_cmd_complete_in_task() to avoid UAF on io_uring ctx. * * Note: force_abort is guaranteed to be seen because it is set * before request queue is unqiuesced. */ - if (ublk_nosrv_should_queue_io(ubq) && unlikely(ubq->force_abort)) - return BLK_STS_IOERR; - - if (unlikely(ubq->canceling)) + if (unlikely(ubq->force_abort)) return BLK_STS_IOERR; /* fill iod to slot in io cmd buffer */ @@ -1391,16 +1364,6 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx, if (res != BLK_STS_OK) return res; - /* - * ->canceling has to be handled after ->force_abort and ->fail_io - * is dealt with, otherwise this request may not be failed in case - * of recovery, and cause hang when deleting disk - */ - if (unlikely(ubq->canceling)) { - __ublk_abort_rq(ubq, rq); - return BLK_STS_OK; - } - ublk_queue_cmd(ubq, rq); return BLK_STS_OK; } @@ -1461,8 +1424,71 @@ static int ublk_ch_open(struct inode *inode, struct file *filp) static int ublk_ch_release(struct inode *inode, struct file *filp) { struct ublk_device *ub = filp->private_data; + int i; + + mutex_lock(&ub->mutex); + /* + * If the device is not live, we will not transition to a nosrv + * state. This protects against: + * - accidental poking of the ublk character device + * - some ublk servers which may open/close the ublk character + * device during startup + */ + if (ub->dev_info.state != UBLK_S_DEV_LIVE) + goto out; + + /* + * Since we are releasing the ublk character file descriptor, we + * know that there cannot be any concurrent file-related + * activity (e.g. uring_cmds or reads/writes). However, I/O + * might still be getting dispatched. Quiesce that too so that + * we don't need to worry about anything concurrent. + * + * We may have already quiesced the queue if we canceled any + * uring_cmds, so only quiesce if necessary (quiesce is not + * idempotent, it has an internal counter which we need to + * manage carefully). + */ + if (!blk_queue_quiesced(ub->ub_disk->queue)) + blk_mq_quiesce_queue(ub->ub_disk->queue); + + /* + * Handle any requests outstanding to the ublk server + */ + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) + ublk_abort_queue(ub, ublk_get_queue(ub, i)); + /* + * Transition the device to the nosrv state. What exactly this + * means depends on the recovery flags + */ + if (ublk_nosrv_should_stop_dev(ub)) { + /* + * Allow any pending/future I/O to pass through quickly + * with an error. This is needed because del_gendisk + * waits for all pending I/O to complete + */ + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) + ublk_get_queue(ub, i)->force_abort = true; + blk_mq_unquiesce_queue(ub->ub_disk->queue); + + ublk_stop_dev_unlocked(ub); + } else { + if (ublk_nosrv_dev_should_queue_io(ub)) { + __ublk_quiesce_dev(ub); + } else { + ub->dev_info.state = UBLK_S_DEV_FAIL_IO; + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) + ublk_get_queue(ub, i)->fail_io = true; + } + + /* pair with earlier quiesce */ + blk_mq_unquiesce_queue(ub->ub_disk->queue); + } + +out: clear_bit(UB_STATE_OPEN, &ub->state); + mutex_unlock(&ub->mutex); return 0; } @@ -1556,57 +1582,6 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq) } } -/* Must be called when queue is frozen */ -static bool ublk_mark_queue_canceling(struct ublk_queue *ubq) -{ - bool canceled; - - spin_lock(&ubq->cancel_lock); - canceled = ubq->canceling; - if (!canceled) - ubq->canceling = true; - spin_unlock(&ubq->cancel_lock); - - return canceled; -} - -static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq) -{ - bool was_canceled = ubq->canceling; - struct gendisk *disk; - - if (was_canceled) - return false; - - spin_lock(&ub->lock); - disk = ub->ub_disk; - if (disk) - get_device(disk_to_dev(disk)); - spin_unlock(&ub->lock); - - /* Our disk has been dead */ - if (!disk) - return false; - - /* - * Now we are serialized with ublk_queue_rq() - * - * Make sure that ubq->canceling is set when queue is frozen, - * because ublk_queue_rq() has to rely on this flag for avoiding to - * touch completed uring_cmd - */ - blk_mq_quiesce_queue(disk->queue); - was_canceled = ublk_mark_queue_canceling(ubq); - if (!was_canceled) { - /* abort queue is for making forward progress */ - ublk_abort_queue(ub, ubq); - } - blk_mq_unquiesce_queue(disk->queue); - put_device(disk_to_dev(disk)); - - return !was_canceled; -} - static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io, unsigned int issue_flags) { @@ -1634,9 +1609,8 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd, { struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd); struct ublk_queue *ubq = pdu->ubq; + struct ublk_device *ub = ubq->dev; struct task_struct *task; - struct ublk_device *ub; - bool need_schedule; struct ublk_io *io; if (WARN_ON_ONCE(!ubq)) @@ -1649,16 +1623,20 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd, if (WARN_ON_ONCE(task && task != ubq->ubq_daemon)) return; - ub = ubq->dev; - need_schedule = ublk_abort_requests(ub, ubq); + /* + * We could be the first to notice that the ublk server is dying + * here. If we are, quiesce the queue to eliminate concurrent + * "normal" io_uring cmd completions in the I/O submission path. + */ + mutex_lock(&ub->mutex); + if (ub->dev_info.state == UBLK_S_DEV_LIVE && + !blk_queue_quiesced(ub->ub_disk->queue)) + blk_mq_quiesce_queue(ub->ub_disk->queue); + mutex_unlock(&ub->mutex); io = &ubq->ios[pdu->tag]; WARN_ON_ONCE(io->cmd != cmd); ublk_cancel_cmd(ubq, io, issue_flags); - - if (need_schedule) { - schedule_work(&ub->nosrv_work); - } } static inline bool ublk_queue_ready(struct ublk_queue *ubq) @@ -1756,13 +1734,13 @@ static struct gendisk *ublk_detach_disk(struct ublk_device *ub) return disk; } -static void ublk_stop_dev(struct ublk_device *ub) +static void ublk_stop_dev_unlocked(struct ublk_device *ub) + __must_hold(&ub->mutex) { struct gendisk *disk; - mutex_lock(&ub->mutex); if (ub->dev_info.state == UBLK_S_DEV_DEAD) - goto unlock; + return; if (ublk_nosrv_dev_should_queue_io(ub)) { if (ub->dev_info.state == UBLK_S_DEV_LIVE) __ublk_quiesce_dev(ub); @@ -1771,38 +1749,12 @@ static void ublk_stop_dev(struct ublk_device *ub) del_gendisk(ub->ub_disk); disk = ublk_detach_disk(ub); put_disk(disk); - unlock: - mutex_unlock(&ub->mutex); - ublk_cancel_dev(ub); } -static void ublk_nosrv_work(struct work_struct *work) +static void ublk_stop_dev(struct ublk_device *ub) { - struct ublk_device *ub = - container_of(work, struct ublk_device, nosrv_work); - int i; - - if (ublk_nosrv_should_stop_dev(ub)) { - ublk_stop_dev(ub); - return; - } - mutex_lock(&ub->mutex); - if (ub->dev_info.state != UBLK_S_DEV_LIVE) - goto unlock; - - if (ublk_nosrv_dev_should_queue_io(ub)) { - __ublk_quiesce_dev(ub); - } else { - blk_mq_quiesce_queue(ub->ub_disk->queue); - ub->dev_info.state = UBLK_S_DEV_FAIL_IO; - for (i = 0; i < ub->dev_info.nr_hw_queues; i++) { - ublk_get_queue(ub, i)->fail_io = true; - } - blk_mq_unquiesce_queue(ub->ub_disk->queue); - } - - unlock: + ublk_stop_dev_unlocked(ub); mutex_unlock(&ub->mutex); ublk_cancel_dev(ub); } @@ -2388,7 +2340,6 @@ static void ublk_remove(struct ublk_device *ub) bool unprivileged; ublk_stop_dev(ub); - cancel_work_sync(&ub->nosrv_work); cdev_device_del(&ub->cdev, &ub->cdev_dev); unprivileged = ub->dev_info.flags & UBLK_F_UNPRIVILEGED_DEV; ublk_put_device(ub); @@ -2675,7 +2626,6 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd) goto out_unlock; mutex_init(&ub->mutex); spin_lock_init(&ub->lock); - INIT_WORK(&ub->nosrv_work, ublk_nosrv_work); ret = ublk_alloc_dev_number(ub, header->dev_id); if (ret < 0) @@ -2807,7 +2757,6 @@ static inline void ublk_ctrl_cmd_dump(struct io_uring_cmd *cmd) static int ublk_ctrl_stop_dev(struct ublk_device *ub) { ublk_stop_dev(ub); - cancel_work_sync(&ub->nosrv_work); return 0; } @@ -2927,7 +2876,6 @@ static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq) /* We have to reset it to NULL, otherwise ub won't accept new FETCH_REQ */ ubq->ubq_daemon = NULL; ubq->timeout = false; - ubq->canceling = false; for (i = 0; i < ubq->q_depth; i++) { struct ublk_io *io = &ubq->ios[i]; diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile index c7781efea0f33c02f340f90f547d3a37c1d1b8a0..afee027cccdd1b8f13f1cb9a90a3348cd54b18bc 100644 --- a/tools/testing/selftests/ublk/Makefile +++ b/tools/testing/selftests/ublk/Makefile @@ -6,6 +6,7 @@ LDLIBS += -lpthread -lm -luring TEST_PROGS := test_generic_01.sh TEST_PROGS += test_generic_02.sh TEST_PROGS += test_generic_03.sh +TEST_PROGS += test_generic_04.sh TEST_PROGS += test_null_01.sh TEST_PROGS += test_null_02.sh @@ -26,7 +27,8 @@ TEST_GEN_PROGS_EXTENDED = kublk include ../lib.mk -$(TEST_GEN_PROGS_EXTENDED): kublk.c null.c file_backed.c common.c stripe.c +$(TEST_GEN_PROGS_EXTENDED): kublk.c null.c file_backed.c common.c stripe.c \ + fault_inject.c check: shellcheck -x -f gcc *.sh diff --git a/tools/testing/selftests/ublk/fault_inject.c b/tools/testing/selftests/ublk/fault_inject.c new file mode 100644 index 0000000000000000000000000000000000000000..3a8574e6a73767b1f9d0d81c62c7dbf28d2445d0 --- /dev/null +++ b/tools/testing/selftests/ublk/fault_inject.c @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Fault injection ublk target. Hack this up however you like for + * testing specific behaviors of ublk_drv. Currently is a null target + * with a configurable delay before completing each I/O. This delay can + * be used to test ublk_drv's handling of I/O outstanding to the ublk + * server when it dies. + */ + +#include "kublk.h" + +static int ublk_fault_inject_tgt_init(const struct dev_ctx *ctx, + struct ublk_dev *dev) +{ + const struct ublksrv_ctrl_dev_info *info = &dev->dev_info; + unsigned long dev_size = 250UL << 30; + + dev->tgt.dev_size = dev_size; + dev->tgt.params = (struct ublk_params) { + .types = UBLK_PARAM_TYPE_BASIC, + .basic = { + .logical_bs_shift = 9, + .physical_bs_shift = 12, + .io_opt_shift = 12, + .io_min_shift = 9, + .max_sectors = info->max_io_buf_bytes >> 9, + .dev_sectors = dev_size >> 9, + }, + }; + + dev->private_data = (void *)(ctx->delay_us * 1000); + return 0; +} + +static int ublk_fault_inject_queue_io(struct ublk_queue *q, int tag) +{ + const struct ublksrv_io_desc *iod = ublk_get_iod(q, tag); + struct io_uring_sqe *sqe; + struct __kernel_timespec ts = { + .tv_nsec = (long long)q->dev->private_data, + }; + + ublk_queue_alloc_sqes(q, &sqe, 1); + io_uring_prep_timeout(sqe, &ts, 1, 0); + sqe->user_data = build_user_data(tag, ublksrv_get_op(iod), 0, 1); + + ublk_queued_tgt_io(q, tag, 1); + + return 0; +} + +static void ublk_fault_inject_tgt_io_done(struct ublk_queue *q, int tag, + const struct io_uring_cqe *cqe) +{ + const struct ublksrv_io_desc *iod = ublk_get_iod(q, tag); + + if (cqe->res != -ETIME) + ublk_err("%s: unexpected cqe res %d\n", __func__, cqe->res); + + if (ublk_completed_tgt_io(q, tag)) + ublk_complete_io(q, tag, iod->nr_sectors << 9); + else + ublk_err("%s: io not complete after 1 cqe\n", __func__); +} + +const struct ublk_tgt_ops fault_inject_tgt_ops = { + .name = "fault_inject", + .init_tgt = ublk_fault_inject_tgt_init, + .queue_io = ublk_fault_inject_queue_io, + .tgt_io_done = ublk_fault_inject_tgt_io_done, +}; diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c index 91c282bc767449a418cce7fc816dc8e9fc732d6a..b741d91b2288b19d450ad22a045b014da18c3f8d 100644 --- a/tools/testing/selftests/ublk/kublk.c +++ b/tools/testing/selftests/ublk/kublk.c @@ -10,6 +10,7 @@ static const struct ublk_tgt_ops *tgt_ops_list[] = { &null_tgt_ops, &loop_tgt_ops, &stripe_tgt_ops, + &fault_inject_tgt_ops, }; static const struct ublk_tgt_ops *ublk_find_tgt(const char *name) @@ -1041,7 +1042,7 @@ static int cmd_dev_get_features(void) static int cmd_dev_help(char *exe) { - printf("%s add -t [null|loop] [-q nr_queues] [-d depth] [-n dev_id] [backfile1] [backfile2] ...\n", exe); + printf("%s add -t [null|loop|stripe|fault_inject] [-q nr_queues] [-d depth] [-n dev_id] [--delay_us delay] [backfile1] [backfile2] ...\n", exe); printf("\t default: nr_queues=2(max 4), depth=128(max 128), dev_id=-1(auto allocation)\n"); printf("%s del [-n dev_id] -a \n", exe); printf("\t -a delete all devices -n delete specified device\n"); @@ -1064,6 +1065,7 @@ int main(int argc, char *argv[]) { "zero_copy", 0, NULL, 'z' }, { "foreground", 0, NULL, 0 }, { "chunk_size", 1, NULL, 0 }, + { "delay_us", 1, NULL, 0 }, { 0, 0, 0, 0 } }; int option_idx, opt; @@ -1112,6 +1114,8 @@ int main(int argc, char *argv[]) ctx.fg = 1; if (!strcmp(longopts[option_idx].name, "chunk_size")) ctx.chunk_size = strtol(optarg, NULL, 10); + if (!strcmp(longopts[option_idx].name, "delay_us")) + ctx.delay_us = strtoll(optarg, NULL, 10); } } diff --git a/tools/testing/selftests/ublk/kublk.h b/tools/testing/selftests/ublk/kublk.h index 760ff8ffb8107037a19a8fb7ab408818845e010d..a1a8a802fb43f0fe9272f33c8a3161e9316a5507 100644 --- a/tools/testing/selftests/ublk/kublk.h +++ b/tools/testing/selftests/ublk/kublk.h @@ -70,6 +70,9 @@ struct dev_ctx { /* stripe */ unsigned int chunk_size; + /* fault_inject */ + long long delay_us; + int _evtfd; }; @@ -357,6 +360,7 @@ static inline int ublk_queue_use_zc(const struct ublk_queue *q) extern const struct ublk_tgt_ops null_tgt_ops; extern const struct ublk_tgt_ops loop_tgt_ops; extern const struct ublk_tgt_ops stripe_tgt_ops; +extern const struct ublk_tgt_ops fault_inject_tgt_ops; void backing_file_tgt_deinit(struct ublk_dev *dev); int backing_file_tgt_init(struct ublk_dev *dev); diff --git a/tools/testing/selftests/ublk/test_generic_04.sh b/tools/testing/selftests/ublk/test_generic_04.sh new file mode 100755 index 0000000000000000000000000000000000000000..48af48164aa444d8ac6a58fef1743d2a16a56a14 --- /dev/null +++ b/tools/testing/selftests/ublk/test_generic_04.sh @@ -0,0 +1,43 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +. "$(cd "$(dirname "$0")" && pwd)"/test_common.sh + +TID="generic_04" +ERR_CODE=0 + +_prep_test "fault_inject" "fast cleanup when all I/Os of one hctx are in server" + +# configure ublk server to sleep 2s before completing each I/O +dev_id=$(_add_ublk_dev -t fault_inject -q 2 -d 1 --delay_us 2000000) +_check_add_dev $TID $? + +echo "dev id is ${dev_id}" + +STARTTIME=${SECONDS} + +dd if=/dev/urandom of=/dev/ublkb${dev_id} oflag=direct bs=4k count=1 & +dd_pid=$! + +__ublk_kill_daemon ${dev_id} "DEAD" + +wait $dd_pid +dd_exitcode=$? + +ENDTIME=${SECONDS} +ELAPSED=$(($ENDTIME - $STARTTIME)) + +# assert that dd sees an error and exits quickly after ublk server is +# killed. previously this relied on seeing an I/O timeout and so would +# take ~30s +if [ $dd_exitcode -eq 0 ]; then + echo "dd unexpectedly exited successfully!" + ERR_CODE=255 +fi +if [ $ELAPSED -ge 5 ]; then + echo "dd took $ELAPSED seconds to exit (>= 5s tolerance)!" + ERR_CODE=255 +fi + +_cleanup_test "fault_inject" +_show_result $TID $ERR_CODE --- base-commit: 710e2c687a16b28a873a282517a85faf02a8b7cc change-id: 20250325-ublk_timeout-b06b9b51c591 Best regards, -- Uday Shankar <ushankar(a)purestorage.com>

4 months, 2 weeks

2
3
0 0

[PATCH v2 net 0/2] fix wrong hds-thresh value setting

by Taehee Yoo

A hds-thresh value is not set correctly if input value is 0. The cause is that ethtool_ringparam_get_cfg(), which is a internal function that returns ringparameters from both ->get_ringparam() and dev->cfg can't return a correct hds-thresh value. The first patch fixes ethtool_ringparam_get_cfg() to set hds-thresh value correcltly. The second patch adds random test for hds-thresh value. So that we can test 0 value for a hds-thresh properly. v2: - Skips set_hds_thresh_random test when hds-thresh-max value is too small. (2/2) - Change random range from 1-MAX to 1-(MAX-1). (2/2) Taehee Yoo (2): net: ethtool: fix ethtool_ringparam_get_cfg() returns a hds_thresh value always as 0. selftests: drv-net: test random value for hds-thresh net/ethtool/common.c | 1 + tools/testing/selftests/drivers/net/hds.py | 33 +++++++++++++++++++++- 2 files changed, 33 insertions(+), 1 deletion(-) -- 2.34.1

4 months, 2 weeks

2
3
0 0

[RFC PATCH 0/6] Deep talk about folio vmap

by Huan Yang

Bingbu reported an issue in [1] that udmabuf vmap failed and in [2], we discussed the scenario of folio vmap due to the misuse of vmap_pfn in udmabuf. We reached the conclusion that vmap_pfn prohibits the use of page-based PFNs: Christoph Hellwig : 'No, vmap_pfn is entirely for memory not backed by pages or folios, i.e. PCIe BARs and similar memory. This must not be mixed with proper folio backed memory.' But udmabuf still need consider HVO based folio's vmap, and need fix vmap issue. This RFC code want to show the two point that I mentioned in [2], and more deep talk it: Point1. simple copy vmap_pfn code, don't bother common vmap_pfn, use by itself and remove pfn_valid check. Point2. implement folio array based vmap(vmap_folios), which can given a range of each folio(offset, nr_pages), so can suit HVO folio's vmap. Patch 1-2 implement point1, and add a test simple set in udmabuf driver. Patch 3-5 implement point2, also can test it. Kasireddy also show that 'another option is to just limit udmabuf's vmap() to only shmem folios'(This I guess folio_test_hugetlb_vmemmap_optimized can help.) But I prefer point2 to solution this issue, and IMO, folio based vmap still need. Compare to page based vmap(or pfn based), we need split each large folio into single page struct, this need more large array struct and more longer iter. If each tail page struct not exist(like HVO), can only use pfn vmap, but there are no common api to do this. In [2], we talked that udmabuf can use hugetlb as the memory provider, and can give a range use. So if HVO used in hugetlb, each folio's tail page may freed, so we can't use page based vmap, only can use pfn based, which show in point1. Further more, Folio based vmap only need record each folio(and offset, nr_pages if range need). For 20MB vmap, page based need 5120 pages(40KB), 2MB folios only need 10 folio(80Byte). Matthew show that Vishal also offered a folio based vmap - vmap_file[3]. This RFC patch want a range based folio, not only a full folio's map(like file's folio), to resolve some problem like HVO's range folio vmap. Please give me more suggestion. Test case: //enable/disable HVO 1. echo [1|0] > /proc/sys/vm/hugetlb_optimize_vmemmap //prepare HUGETLB 2. echo 10 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 3. ./udmabuf_vmap 4. check output, and dmesg if any warn. [1] https://lore.kernel.org/all/9172a601-c360-0d5b-ba1b-33deba430455@linux.inte… [2] https://lore.kernel.org/lkml/20250312061513.1126496-1-link@vivo.com/ [3] https://lore.kernel.org/linux-mm/20250131001806.92349-1-vishal.moola@gmail.… Huan Yang (6): udmabuf: try fix udmabuf vmap udmabuf: try udmabuf vmap test mm/vmalloc: try add vmap folios range udmabuf: use vmap_range_folios udmabuf: vmap test suit for pages and pfns compare udmabuf: remove no need code drivers/dma-buf/udmabuf.c | 29 +++++++++----------- include/linux/vmalloc.h | 57 +++++++++++++++++++++++++++++++++++++++ mm/vmalloc.c | 47 ++++++++++++++++++++++++++++++++ 3 files changed, 117 insertions(+), 16 deletions(-) -- 2.48.1

4 months, 2 weeks

4
20
0 0

Exclude cirrus FW tests from KUNIT_ALL_TESTS

by Jakub Kicinski

Hi! The Cirrus tests keep failing for me when run on x86 ./tools/testing/kunit/kunit.py run --alltests --json --arch=x86_64 https://netdev-3.bots.linux.dev/kunit/results/60103/stdout It seems like new cases continue to appear and we have to keep adding them to the local ignored list. Is it possible to get these fixed or can we exclude the cirrus tests from KUNIT_ALL_TESTS?

4 months, 2 weeks

4
12
0 0

[PATCH 0/2] selftests/nolibc: only consider XARCH for CFLAGS when requested

by Thomas Weißschuh

If no explicit XARCH is specified, use the toolchains default. Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net> --- Thomas Weißschuh (2): selftests/nolibc: drop dependency from sysroot to defconfig selftests/nolibc: only consider XARCH for CFLAGS when requested tools/testing/selftests/nolibc/Makefile | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) --- base-commit: 9a9b20007ab833c1aa3791efcfdf67e7e3ea8902 change-id: 20250330-nolibc-nolibc-test-native-6d4d84d764eb Best regards, -- Thomas Weißschuh <linux(a)weissschuh.net>

4 months, 3 weeks

2
3
0 0

[PATCH v2 1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0

by Waiman Long

The test_memcontrol selftest consistently fails its test_memcg_low sub-test due to the fact that two of its test child cgroups which have a memmory.low of 0 or an effective memory.low of 0 still have low events generated for them since mem_cgroup_below_low() use the ">=" operator when comparing to elow. The two failed use cases are as follows: 1) memory.low is set to 0, but low events can still be triggered and so the cgroup may have a non-zero low event count. I doubt users are looking for that as they didn't set memory.low at all. 2) memory.low is set to a non-zero value but the cgroup has no task in it so that it has an effective low value of 0. Again it may have a non-zero low event count if memory reclaim happens. This is probably not a result expected by the users and it is really doubtful that users will check an empty cgroup with no task in it and expecting some non-zero event counts. The simple and naive fix of changing the operator to ">", however, changes the memory reclaim behavior which can lead to other failures as low events are needed to facilitate memory reclaim. So we can't do that without some relatively riskier changes in memory reclaim. Another simpler alternative is to avoid reporting below_low failure if either memory.low or its effective equivalent is 0 which is done by this patch specifically for the two failed use cases above. With this patch applied, the test_memcg_low sub-test finishes successfully without failure in most cases. Though both test_memcg_low and test_memcg_min sub-tests may still fail occasionally if the memory.current values fall outside of the expected ranges. To be consistent, similar change is appled to mem_cgroup_below_min() as to avoid the two failed use cases above with low replaced by min. Signed-off-by: Waiman Long <longman(a)redhat.com> --- include/linux/memcontrol.h | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 53364526d877..4d4a1f159eaa 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -601,21 +601,31 @@ static inline bool mem_cgroup_unprotected(struct mem_cgroup *target, static inline bool mem_cgroup_below_low(struct mem_cgroup *target, struct mem_cgroup *memcg) { + unsigned long elow; + if (mem_cgroup_unprotected(target, memcg)) return false; - return READ_ONCE(memcg->memory.elow) >= - page_counter_read(&memcg->memory); + elow = READ_ONCE(memcg->memory.elow); + if (!elow || !READ_ONCE(memcg->memory.low)) + return false; + + return page_counter_read(&memcg->memory) <= elow; } static inline bool mem_cgroup_below_min(struct mem_cgroup *target, struct mem_cgroup *memcg) { + unsigned long emin; + if (mem_cgroup_unprotected(target, memcg)) return false; - return READ_ONCE(memcg->memory.emin) >= - page_counter_read(&memcg->memory); + emin = READ_ONCE(memcg->memory.emin); + if (!emin || !READ_ONCE(memcg->memory.min)) + return false; + + return page_counter_read(&memcg->memory) <= emin; } int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp); -- 2.48.1

4 months, 3 weeks

5
9
0 0

Re: [PATCH 6.13 00/23] 6.13.10-rc1 review

by Naresh Kamboju

On Thu, 3 Apr 2025 at 20:56, Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> wrote: > > This is the start of the stable review cycle for the 6.13.10 release. > There are 23 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Sat, 05 Apr 2025 15:16:11 +0000. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.13.10-rc… > or in the git tree and branch at: > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.13.y > and the diffstat can be found below. > > thanks, > > greg k-h Regressions on arm, arm64 and x86_64. 1) The selftests rseq failed across the boards and virtual environments. These test failures were also noticed on Linux mainline and next. We will bisect these lists of regressions and get back to you. * kselftest-rseq - rseq_basic_percpu_ops_mm_cid_test - rseq_basic_percpu_ops_test - rseq_basic_test - rseq_param_test - rseq_param_test_benchmark - rseq_param_test_compare_twice - rseq_param_test_mm_cid - rseq_param_test_mm_cid_benchmark - rseq_param_test_mm_cid_compare_twice 2) The clang-nightly build issues reported on mainline and next. * S390, powerpc, build - clang-nightly-defconfig - clang-nightly-lkftconfig-hardening - clang-nightly-lkftconfig-lto-full - clang-nightly-lkftconfig-lto-thing clang-nightly: ERROR: modpost: "wcslen" [fs/smb/client/cifs.ko] undefined! - https://lore.kernel.org/all/CA+G9fYuQHeGicnEx1d=XBC0p1LCsndi5q0p86V7pCZ02d8… 3) The clang-nightly boot regressions with no console output have been reported on mainline and next. * boot - clang-nightly-lkftconfig-hardening - clang-nightly-lkftconfig-kselftest - clang-nightly-lkftconfig-lto-full - clang-nightly-lkftconfig-lto-thing - gcc-13-lkftconfig-debug v6.14-12245-g91e5bfe317d8: Boot regression: rk3399-rock-pi-4b dragonboard-410c dragonboard-845c no console output - https://lore.kernel.org/all/CA+G9fYve7+nXJNoV48TksXoMeVjgJuP8Gs=+1br+Qur1DP… Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org> ## Build * kernel: 6.13.10-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git commit: 8cbfaadfa0ec371208123554d6ad9994433929bb * git describe: v6.13.7-385-g8cbfaadfa0ec * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.13.y/build/v6.13… ## Test Regressions (compared to v6.13.7-362-g3d21aad34dfa) * arm, build - clang-nightly-nhk8815_defconfig * arm64, build - clang-nightly-allyesconfig * dragonboard-410c, boot - clang-nightly-lkftconfig-hardening - clang-nightly-lkftconfig-kselftest - clang-nightly-lkftconfig-lto-full - clang-nightly-lkftconfig-lto-thing * dragonboard-845c, boot - clang-nightly-lkftconfig-hardening - clang-nightly-lkftconfig-kselftest - clang-nightly-lkftconfig-lto-full - clang-nightly-lkftconfig-lto-thing - gcc-13-lkftconfig-debug * e850-96, boot - clang-nightly-lkftconfig-kselftest * fvp-aemva, boot - clang-nightly-lkftconfig-kselftest * juno-r2, boot - clang-nightly-lkftconfig-kselftest * dragonboard-410c, kselftest-rseq - rseq_basic_percpu_ops_mm_cid_test - rseq_basic_percpu_ops_test - rseq_basic_test - rseq_param_test - rseq_param_test_benchmark - rseq_param_test_compare_twice - rseq_param_test_mm_cid - rseq_param_test_mm_cid_benchmark - rseq_param_test_mm_cid_compare_twice * dragonboard-845c, kselftest-rseq - rseq_basic_percpu_ops_mm_cid_test - rseq_basic_percpu_ops_test - rseq_basic_test - rseq_param_test - rseq_param_test_benchmark - rseq_param_test_compare_twice - rseq_param_test_mm_cid - rseq_param_test_mm_cid_benchmark - rseq_param_test_mm_cid_compare_twice * e850-96, kselftest-rseq - rseq_basic_percpu_ops_mm_cid_test - rseq_basic_percpu_ops_test - rseq_basic_test - rseq_param_test - rseq_param_test_benchmark - rseq_param_test_compare_twice - rseq_param_test_mm_cid - rseq_param_test_mm_cid_benchmark - rseq_param_test_mm_cid_compare_twice * powerpc, build - clang-nightly-defconfig - clang-nightly-lkftconfig-hardening - clang-nightly-lkftconfig-lto-full - clang-nightly-lkftconfig-lto-thing - clang-nightly-ppc64e_defconfig * qemu-arm64, kselftest-rseq - rseq_basic_percpu_ops_mm_cid_test - rseq_basic_percpu_ops_test - rseq_basic_test - rseq_param_test - rseq_param_test_benchmark - rseq_param_test_compare_twice - rseq_param_test_mm_cid - rseq_param_test_mm_cid_benchmark - rseq_param_test_mm_cid_compare_twice * qemu-i386, kselftest-rseq - rseq_basic_percpu_ops_mm_cid_test - rseq_basic_percpu_ops_test - rseq_param_test - rseq_param_test_benchmark - rseq_param_test_compare_twice - rseq_param_test_mm_cid - rseq_param_test_mm_cid_benchmark - rseq_param_test_mm_cid_compare_twice - shardfile-rseq * qemu-x86_64, kselftest-rseq - rseq_basic_percpu_ops_mm_cid_test - rseq_basic_percpu_ops_test - rseq_basic_test - rseq_param_test - rseq_param_test_benchmark - rseq_param_test_compare_twice - rseq_param_test_mm_cid - rseq_param_test_mm_cid_benchmark - rseq_param_test_mm_cid_compare_twice * rk3399-rock-pi-4b, boot - clang-nightly-lkftconfig-hardening - clang-nightly-lkftconfig-kselftest - clang-nightly-lkftconfig-lto-full - clang-nightly-lkftconfig-lto-thing * s390, build - clang-nightly-defconfig - clang-nightly-lkftconfig-hardening - clang-nightly-lkftconfig-lto-full - clang-nightly-lkftconfig-lto-thing * x86, kselftest-rseq - rseq_basic_percpu_ops_mm_cid_test - rseq_basic_percpu_ops_test - rseq_basic_test - rseq_param_test - rseq_param_test_benchmark - rseq_param_test_compare_twice - rseq_param_test_mm_cid - rseq_param_test_mm_cid_benchmark - rseq_param_test_mm_cid_compare_twice * x86_64, build - clang-nightly-allyesconfig ## Metric Regressions (compared to v6.13.7-362-g3d21aad34dfa) ## Test Fixes (compared to v6.13.7-362-g3d21aad34dfa) ## Metric Fixes (compared to v6.13.7-362-g3d21aad34dfa) ## Test result summary total: 125983, pass: 99795, fail: 7289, skip: 18899, xfail: 0 ## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 139 total, 136 passed, 3 failed * arm64: 57 total, 56 passed, 1 failed * i386: 18 total, 18 passed, 0 failed * mips: 34 total, 33 passed, 1 failed * parisc: 4 total, 3 passed, 1 failed * powerpc: 40 total, 35 passed, 5 failed * riscv: 25 total, 23 passed, 2 failed * s390: 22 total, 18 passed, 4 failed * sh: 5 total, 5 passed, 0 failed * sparc: 4 total, 3 passed, 1 failed * x86_64: 49 total, 48 passed, 1 failed ## Test suites summary * boot * commands * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-efivarfs * kselftest-exec * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-kcmp * kselftest-kvm * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-mincore * kselftest-mqueue * kselftest-net * kselftest-net-mptcp * kselftest-openat2 * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-rust * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-tc-testing * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user_events * kselftest-vDSO * kselftest-x86 * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-build-clang * log-parser-build-gcc * log-parser-test * ltp-capability * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-hugetlb * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-smoke * ltp-syscalls * ltp-tracing * perf * rcutorture -- Linaro LKFT https://lkft.linaro.org

4 months, 3 weeks

2
1
0 0

[PATCH] selftests/mm: Add missing gitignore file

by I Hsin Cheng

Add "guard-pages" binary file into .gitignore. Signed-off-by: I Hsin Cheng <richard120310(a)gmail.com> --- tools/testing/selftests/mm/.gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore index c5241b193db8..c9fd69ece95c 100644 --- a/tools/testing/selftests/mm/.gitignore +++ b/tools/testing/selftests/mm/.gitignore @@ -58,3 +58,4 @@ hugetlb_dio pkey_sighandler_tests_32 pkey_sighandler_tests_64 guard-regions +guard-pages -- 2.43.0

4 months, 3 weeks

1
1
0 0

[PATCH v16 00/15] PCI: EP: Add RC-to-EP doorbell with platform MSI controller

by Frank Li

┌────────────┐ ┌───────────────────────────────────┐ ┌────────────────┐ │ │ │ │ │ │ │ │ │ PCI Endpoint │ │ PCI Host │ │ │ │ │ │ │ │ │◄──┤ 1.platform_msi_domain_alloc_irqs()│ │ │ │ │ │ │ │ │ │ MSI ├──►│ 2.write_msi_msg() ├──►├─BAR<n> │ │ Controller │ │ update doorbell register address│ │ │ │ │ │ for BAR │ │ │ │ │ │ │ │ 3. Write BAR<n>│ │ │◄──┼───────────────────────────────────┼───┤ │ │ │ │ │ │ │ │ ├──►│ 4.Irq Handle │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └────────────┘ └───────────────────────────────────┘ └────────────────┘ This patches based on old https://lore.kernel.org/imx/20221124055036.1630573-1-Frank.Li@nxp.com/ Original patch only target to vntb driver. But actually it is common method. This patches add new API to pci-epf-core, so any EP driver can use it. Previous v2 discussion here. https://lore.kernel.org/imx/20230911220920.1817033-1-Frank.Li@nxp.com/ Changes in v16: - remove arm64: dts: imx95-19x19-evk: Add PCIe1 endpoint function overlay file because there are better patches, which under review. - Add document for pcie-ep msi-map usage - other change to see each patch's change log About IMMUTABLE (No change for this part, tglx provide feedback) > - This IMMUTABLE thing serves no purpose, because you don't randomly > plug this end-point block on any MSI controller. They come as part > of an SoC. "Yes and no. The problem is that the EP implementation is meant to be a generic library and while GIC-ITS guarantees immutability of the address/data pair after setup, there are architectures (x86, loongson, riscv) where the base MSI controller does not and immutability is only achieved when interrupt remapping is enabled. The latter can be disabled at boot-time and then the EP implementation becomes a lottery across affinity changes. That was my concern about this library implementation and that's why I asked for a mechanism to ensure that the underlying irqdomain provides a immutable address/data pair. So it does not matter for GIC-ITS, but in the larger picture it matters. Thanks, tglx " So it does not matter for GIC-ITS, but in the larger picture it matters. - Link to v15: https://lore.kernel.org/r/20250211-ep-msi-v15-0-bcacc1f2b1a9@nxp.com Changes in v15: - rebase to v6.14-rc1 - fix build issue find by kernel test robot - Link to v14: https://lore.kernel.org/r/20250207-ep-msi-v14-0-9671b136f2b8@nxp.com Changes in v14: Marc Zyngier raised concerns about adding DOMAIN_BUS_DEVICE_PCI_EP_MSI. As a result, the approach has been reverted to the v9 method. However, there are several improvements: MSI now supports msi-map in addition to msi-parent. - The struct device: id is used as the endpoint function (EPF) device identity to map to the stream ID (sideband information). - The EPC device tree source (DTS) utilizes msi-map to provide such information. - The EPF device's of_node is set to the EPC controller’s node. This approach is commonly used for multi-function device (MFD) platform child devices, allowing them to inherit properties from the MFD device’s DTS, such as reset-cells and gpio-cells. This method is well-suited for the current case, as the EPF is inherently created/binded to the EPC and should inherit the EPC’s DTS node properties. Additionally: Since the basic IMX95 LUT support has already been merged into the mainline, a DTS and driver increment patch is added to complete the solution. The patch is rebased onto the latest linux-next tree and aligned with the new pcitest framework. - Link to v13: https://lore.kernel.org/r/20241218-ep-msi-v13-0-646e2192dc24@nxp.com Changes in v13: - Change to use DOMAIN_BUS_PCI_DEVICE_EP_MSI - Change request id as func | vfunc << 3 - Remove IRQ_DOMAIN_MSI_IMMUTABLE Thomas Gleixner: I hope capture all your points in review comments. If missed, let me know. - Link to v12: https://lore.kernel.org/r/20241211-ep-msi-v12-0-33d4532fa520@nxp.com Changes in v12: - Change to use IRQ_DOMAIN_MSI_IMMUTABLE and add help function irq_domain_msi_is_immuatble(). - split PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check to 3 patches - Link to v11: https://lore.kernel.org/r/20241209-ep-msi-v11-0-7434fa8397bd@nxp.com Changes in v11: - Change to use MSI_FLAG_MSG_IMMUTABLE - Link to v10: https://lore.kernel.org/r/20241204-ep-msi-v10-0-87c378dbcd6d@nxp.com Changes in v10: Thomas Gleixner: There are big change in pci-ep-msi.c. I am sure if go on the corrent path. The key improvement is remove only 1 function devices's limitation. I use new patch for imutable check, which relative additional feature compared to base enablement patch. - Remove patch Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() - Add new patch irqchip/gic-v3-its: Avoid overwriting msi_prepare callback if provided by msi_domain_info - Remove only support 1 endpoint function limiation. - Create one MSI domain for each endpoint function devices. - Use "msi-map" in pci ep controler node, instead of of msi-parent. first argument is (func_no << 8 | vfunc_no) - Link to v9: https://lore.kernel.org/r/20241203-ep-msi-v9-0-a60dbc3f15dd@nxp.com Changes in v9 - Add patch platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() - Remove patch PCI: endpoint: Add pci_epc_get_fn() API for customizable filtering - Remove API pci_epf_align_inbound_addr_lo_hi - Move doorbell_alloc in to doorbell_enable function. - Link to v8: https://lore.kernel.org/r/20241116-ep-msi-v8-0-6f1f68ffd1bb@nxp.com Changes in v8: - update helper function name to pci_epf_align_inbound_addr() - Link to v7: https://lore.kernel.org/r/20241114-ep-msi-v7-0-d4ac7aafbd2c@nxp.com Changes in v7: - Add helper function pci_epf_align_addr(); - Link to v6: https://lore.kernel.org/r/20241112-ep-msi-v6-0-45f9722e3c2a@nxp.com Changes in v6: - change doorbell_addr to doorbell_offset - use round_down() - add Niklas's test by tag - rebase to pci/endpoint - Link to v5: https://lore.kernel.org/r/20241108-ep-msi-v5-0-a14951c0d007@nxp.com Changes in v5: - Move request_irq to epf test function driver for more flexiable user case - Add fixed size bar handler - Some minor improvememtn to see each patches's changelog. - Link to v4: https://lore.kernel.org/r/20241031-ep-msi-v4-0-717da2d99b28@nxp.com Changes in v4: - Remove patch genirq/msi: Add cleanup guard define for msi_lock_descs()/msi_unlock_descs() - Use new method to avoid compatible problem. Add new command DOORBELL_ENABLE and DOORBELL_DISABLE. pcitest -B send DOORBELL_ENABLE first, EP test function driver try to remap one of BAR_N (except test register bar) to ITS MSI MMIO space. Old driver don't support new command, so failure return, not side effect. After test, DOORBELL_DISABLE command send out to recover original map, so pcitest bar test can pass as normal. - Other detail change see each patches's change log - Link to v3: https://lore.kernel.org/r/20241015-ep-msi-v3-0-cedc89a16c1a@nxp.com Change from v2 to v3 - Fixed manivannan's comments - Move common part to pci-ep-msi.c and pci-ep-msi.h - rebase to 6.12-rc1 - use RevID to distingiush old version mkdir /sys/kernel/config/pci_ep/functions/pci_epf_test/func1 echo 16 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/msi_interrupts echo 0x080c > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/deviceid echo 0x1957 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/vendorid echo 1 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/revid ^^^^^^ to enable platform msi support. ln -s /sys/kernel/config/pci_ep/functions/pci_epf_test/func1 /sys/kernel/config/pci_ep/controllers/4c380000.pcie-ep - use new device ID, which identify support doorbell to avoid broken compatility. Enable doorbell support only for PCI_DEVICE_ID_IMX8_DB, while other devices keep the same behavior as before. EP side RC with old driver RC with new driver PCI_DEVICE_ID_IMX8_DB no probe doorbell enabled Other device ID doorbell disabled* doorbell disabled* * Behavior remains unchanged. Change from v1 to v2 - Add missed patch for endpont/pci-epf-test.c - Move alloc and free to epc driver from epf. - Provide general help function for EPC driver to alloc platform msi irq. - Fixed manivannan's comments. Signed-off-by: Frank Li <Frank.Li(a)nxp.com> --- Frank Li (15): platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() irqdomain: Add IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and irq_domain_is_msi_immutable() irqchip/gic-v3-its: Set IRQ_DOMAIN_FLAG_MSI_IMMUTABLE for ITS dt-bindings: pci: pci-msi: Add support for PCI Endpoint msi-map irqchip/gic-v3-its: Add support for device tree msi-map and msi-mask PCI: endpoint: Set ID and of_node for function driver PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check PCI: endpoint: Add pci_epf_align_inbound_addr() helper for address alignment PCI: endpoint: pci-epf-test: Add doorbell test support misc: pci_endpoint_test: Add doorbell test case selftests: pci_endpoint: Add doorbell test case pci: imx6: Add helper function imx_pcie_add_lut_by_rid() pci: imx6: Add LUT setting for MSI/IOMMU in Endpoint mode arm64: dts: imx95: Add msi-map for pci-ep device Documentation/devicetree/bindings/pci/pci-msi.txt | 51 ++++++++ arch/arm64/boot/dts/freescale/imx95.dtsi | 1 + drivers/base/platform-msi.c | 1 + drivers/irqchip/irq-gic-v3-its-msi-parent.c | 8 ++ drivers/irqchip/irq-gic-v3-its.c | 2 +- drivers/misc/pci_endpoint_test.c | 82 ++++++++++++ drivers/pci/controller/dwc/pci-imx6.c | 25 ++-- drivers/pci/endpoint/Makefile | 1 + drivers/pci/endpoint/functions/pci-epf-test.c | 142 +++++++++++++++++++++ drivers/pci/endpoint/pci-ep-msi.c | 90 +++++++++++++ drivers/pci/endpoint/pci-epf-core.c | 48 +++++++ include/linux/irqdomain.h | 7 + include/linux/pci-ep-msi.h | 28 ++++ include/linux/pci-epf.h | 21 +++ include/uapi/linux/pcitest.h | 1 + .../selftests/pci_endpoint/pci_endpoint_test.c | 28 ++++ 16 files changed, 527 insertions(+), 9 deletions(-) --- base-commit: a4949bd40778aa9beac77c89e4c6a1da52875c8b change-id: 20241010-ep-msi-8b4cab33b1be Best regards, --- Frank Li <Frank.Li(a)nxp.com>

4 months, 3 weeks

3
18
0 0

[PATCH] selftests/mm: Generate a temporary mountpoint for cgroup filesystem

by Mark Brown

Currently if the filesystem for the cgroups version it wants to use is not mounted charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh tests will attempt to mount it on the hard coded path /dev/cgroup/memory, deleting that directory when the test finishes. This will fail if there is not a preexisting directory at that path, and since the directory is deleted subsequent runs of the test will fail. Instead of relying on this hard coded directory name use mktemp to generate a temporary directory to use as a mountpoint, fixing both the assumption and the disruption caused by deleting a preexisting directory. This means that if the relevant cgroup filesystem is not already mounted then we rely on having coreutils (which provides mktemp) installed. I suspect that many current users are relying on having things automounted by default, and given that the script relies on bash it's probably not an unreasonable requirement. Fixes: 209376ed2a84 ("selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting") Signed-off-by: Mark Brown <broonie(a)kernel.org> --- tools/testing/selftests/mm/charge_reserved_hugetlb.sh | 4 ++-- tools/testing/selftests/mm/hugetlb_reparenting_test.sh | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh index 67df7b47087f..e1fe16bcbbe8 100755 --- a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh +++ b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh @@ -29,7 +29,7 @@ fi if [[ $cgroup2 ]]; then cgroup_path=$(mount -t cgroup2 | head -1 | awk '{print $3}') if [[ -z "$cgroup_path" ]]; then - cgroup_path=/dev/cgroup/memory + cgroup_path=$(mktemp -d) mount -t cgroup2 none $cgroup_path do_umount=1 fi @@ -37,7 +37,7 @@ if [[ $cgroup2 ]]; then else cgroup_path=$(mount -t cgroup | grep ",hugetlb" | awk '{print $3}') if [[ -z "$cgroup_path" ]]; then - cgroup_path=/dev/cgroup/memory + cgroup_path=$(mktemp -d) mount -t cgroup memory,hugetlb $cgroup_path do_umount=1 fi diff --git a/tools/testing/selftests/mm/hugetlb_reparenting_test.sh b/tools/testing/selftests/mm/hugetlb_reparenting_test.sh index 11f9bbe7dc22..0b0d4ba1af27 100755 --- a/tools/testing/selftests/mm/hugetlb_reparenting_test.sh +++ b/tools/testing/selftests/mm/hugetlb_reparenting_test.sh @@ -23,7 +23,7 @@ fi if [[ $cgroup2 ]]; then CGROUP_ROOT=$(mount -t cgroup2 | head -1 | awk '{print $3}') if [[ -z "$CGROUP_ROOT" ]]; then - CGROUP_ROOT=/dev/cgroup/memory + CGROUP_ROOT=$(mktemp -d) mount -t cgroup2 none $CGROUP_ROOT do_umount=1 fi --- base-commit: a4cda136f021ad44b8b52286aafd613030a6db5f change-id: 20250403-kselftest-mm-cgroup2-detection-b761fd232f9d Best regards, -- Mark Brown <broonie(a)kernel.org>

4 months, 3 weeks

1
0
0 0

[PATCH] selftests/bpf: Convert comma to semicolon

by Chen Ni

Replace comma between expressions with semicolons. Using a ',' in place of a ';' can have unintended side effects. Although that is not the case here, it is seems best to use ';' unless ',' is intended. Found by inspection. No functional change intended. Compile tested only. Signed-off-by: Chen Ni <nichen(a)iscas.ac.cn> --- tools/testing/selftests/bpf/test_kmods/bpf_testmod.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c index 3220f1d28697..f38eaf0d35ef 100644 --- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c +++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c @@ -1340,7 +1340,7 @@ static int st_ops_gen_prologue_with_kfunc(struct bpf_insn *insn_buf, bool direct *insn++ = BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_7, offsetof(struct st_ops_args, a)); *insn++ = BPF_JMP_IMM(BPF_JA, 0, 0, 2); *insn++ = BPF_MOV64_REG(BPF_REG_1, BPF_REG_0); - *insn++ = BPF_CALL_KFUNC(0, bpf_cgroup_release_id), + *insn++ = BPF_CALL_KFUNC(0, bpf_cgroup_release_id); *insn++ = BPF_MOV64_REG(BPF_REG_1, BPF_REG_8); *insn++ = prog->insnsi[0]; @@ -1379,7 +1379,7 @@ static int st_ops_gen_epilogue_with_kfunc(struct bpf_insn *insn_buf, const struc *insn++ = BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6, offsetof(struct st_ops_args, a)); *insn++ = BPF_JMP_IMM(BPF_JA, 0, 0, 2); *insn++ = BPF_MOV64_REG(BPF_REG_1, BPF_REG_0); - *insn++ = BPF_CALL_KFUNC(0, bpf_cgroup_release_id), + *insn++ = BPF_CALL_KFUNC(0, bpf_cgroup_release_id); *insn++ = BPF_MOV64_REG(BPF_REG_0, BPF_REG_6); *insn++ = BPF_ALU64_IMM(BPF_MUL, BPF_REG_0, 2); *insn++ = BPF_EXIT_INSN(); -- 2.25.1

4 months, 3 weeks

3
2
0 0

[PATCH net] selftests: net: amt: indicate progress in the stress test

by Jakub Kicinski

Our CI expects output from the test at least once every 10 minutes. The AMT test when running on debug kernel is just on the edge of that time for the stress test. Improve the output: - print the name of the test first, before starting it, - output a dot every 10% of the way. Output after: TEST: amt discovery [ OK ] TEST: IPv4 amt multicast forwarding [ OK ] TEST: IPv6 amt multicast forwarding [ OK ] TEST: IPv4 amt traffic forwarding torture .......... [ OK ] TEST: IPv6 amt traffic forwarding torture .......... [ OK ] Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- Since net-next is closed I'm sending this for net. We enabled DEBUG_PREEMPT in the debug flavor and the test now times out most of the time. CC: ap420073(a)gmail.com CC: shuah(a)kernel.org CC: linux-kselftest(a)vger.kernel.org --- tools/testing/selftests/net/amt.sh | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/net/amt.sh b/tools/testing/selftests/net/amt.sh index d458b45c775b..3ef209cacb8e 100755 --- a/tools/testing/selftests/net/amt.sh +++ b/tools/testing/selftests/net/amt.sh @@ -194,15 +194,21 @@ test_remote_ip() send_mcast_torture4() { - ip netns exec "${SOURCE}" bash -c \ - 'cat /dev/urandom | head -c 1G | nc -w 1 -u 239.0.0.1 4001' + for i in `seq 10`; do + ip netns exec "${SOURCE}" bash -c \ + 'cat /dev/urandom | head -c 100M | nc -w 1 -u 239.0.0.1 4001' + echo -n "." + done } send_mcast_torture6() { - ip netns exec "${SOURCE}" bash -c \ - 'cat /dev/urandom | head -c 1G | nc -w 1 -u ff0e::5:6 6001' + for i in `seq 10`; do + ip netns exec "${SOURCE}" bash -c \ + 'cat /dev/urandom | head -c 100M | nc -w 1 -u ff0e::5:6 6001' + echo -n "." + done } check_features() @@ -278,10 +284,12 @@ wait $pid || err=$? if [ $err -eq 1 ]; then ERR=1 fi +printf "TEST: %-50s" "IPv4 amt traffic forwarding torture" send_mcast_torture4 -printf "TEST: %-60s [ OK ]\n" "IPv4 amt traffic forwarding torture" +printf " [ OK ]\n" +printf "TEST: %-50s" "IPv6 amt traffic forwarding torture" send_mcast_torture6 -printf "TEST: %-60s [ OK ]\n" "IPv6 amt traffic forwarding torture" +printf " [ OK ]\n" sleep 5 if [ "${ERR}" -eq 1 ]; then echo "Some tests failed." >&2 -- 2.49.0

4 months, 3 weeks

3
2
0 0

[PATCH v2 00/16] selftests: vDSO: parse_vdso: Make compatible with nolibc

by Thomas Weißschuh

For testing the functionality of the vDSO, it is necessary to build userspace programs for multiple different architectures. It is additional work to acquire matching userspace cross-compilers with full C libraries and then building root images out of those. The kernel tree already contains nolibc, a small, header-only C library. By using it, it is possible to build userspace programs without any additional dependencies. For example the kernel.org crosstools or multi-target clang can be used to build test programs for a multitude of architectures. While nolibc is very limited, it is enough for many selftests. With some minor adjustments it is possible to make parse_vdso.c compatible with nolibc. As an example, vdso_standalone_test_x86 is now built from the same C code as the regular vdso_test_gettimeofday, while still being completely standalone. Also drop the dependency of parse_vdso.c on the elf.h header from libc and only use the one from the kernel's UAPI. While this series is useful on its own now, it will also integrate with the kunit UAPI framework currently under development: https://lore.kernel.org/lkml/20250217-kunit-kselftests-v1-0-42b4524c3b0a@li… Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- Changes in v2: - Provide a limits.h header in nolibc - Pick up Reviewed-by tags from Kees - Link to v1: https://lore.kernel.org/r/20250203-parse_vdso-nolibc-v1-0-9cb6268d77be@linu… --- Thomas Weißschuh (16): MAINTAINERS: Add vDSO selftests elf, uapi: Add definition for STN_UNDEF elf, uapi: Add definition for DT_GNU_HASH elf, uapi: Add definitions for VER_FLG_BASE and VER_FLG_WEAK elf, uapi: Add type ElfXX_Versym elf, uapi: Add types ElfXX_Verdef and ElfXX_Veraux tools/include: Add uapi/linux/elf.h selftests: Add headers target tools/nolibc: add limits.h shim header selftests: vDSO: vdso_standalone_test_x86: Use vdso_init_form_sysinfo_ehdr selftests: vDSO: parse_vdso: Drop vdso_init_from_auxv() selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers selftests: vDSO: parse_vdso: Test __SIZEOF_LONG__ instead of ULONG_MAX selftests: vDSO: vdso_test_gettimeofday: Clean up includes selftests: vDSO: vdso_test_gettimeofday: Make compatible with nolibc selftests: vDSO: vdso_standalone_test_x86: Switch to nolibc MAINTAINERS | 1 + include/uapi/linux/elf.h | 38 ++ tools/include/nolibc/Makefile | 1 + tools/include/nolibc/limits.h | 7 + tools/include/uapi/linux/elf.h | 524 +++++++++++++++++++++ tools/testing/selftests/lib.mk | 5 +- tools/testing/selftests/vDSO/Makefile | 11 +- tools/testing/selftests/vDSO/parse_vdso.c | 19 +- tools/testing/selftests/vDSO/parse_vdso.h | 1 - .../selftests/vDSO/vdso_standalone_test_x86.c | 143 +----- .../selftests/vDSO/vdso_test_gettimeofday.c | 4 +- 11 files changed, 590 insertions(+), 164 deletions(-) --- base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b change-id: 20241017-parse_vdso-nolibc-e069baa7ff48 Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

4 months, 3 weeks

5
39
0 0

posix_timers kself test failed in 6.12.y

by ALOK TIWARI

Hi Thomas Following posix_timers kself test failed in 6.12.y ------------------------- not ok 7 check_sig_ign SIGEV_SIGNAL not ok 9 check_rearm not ok 10 check_delete ------------------------- Reason of failure: 6.12.y does not support these KSELT tests, Because the following commits were not backported in 6.12.y: 6017a158beb: "posix-timers: Embed sigqueue in struct k_itimer" 69f032c92cf: "signal: Provide ignored_posix_timers list" is it feasible to backport in these commits or complete series of patch in 6.12.y ( [patch V7 00/21] posix-timers: Cure the SIG_IGN mess) 6017a158beb: "posix-timers: Embed sigqueue in struct k_itimer" 69f032c92cf8: "signal: Provide ignored_posix_timers list" if not, we shall revert following kself test from 6.12.y 45c4225c3dcc "selftests/timers/posix_timers: Add SIG_IGN test" e65bb03e4427 "selftests/timers/posix_timers: Validate signal rules" Thanks, Alok

4 months, 3 weeks

1
0
0 0

[PATCH net 0/2] fix wrong hds-thresh value setting

by Taehee Yoo

A hds-thresh value is not set correctly if input value is 0. The cause is that ethtool_ringparam_get_cfg(), which is a internal function that returns ringparameters from both ->get_ringparam() and dev->cfg can't return a correct hds-thresh value. The first patch fixes ethtool_ringparam_get_cfg() to set hds-thresh value correcltly. The second patch adds random test for hds-thresh value. So that we can test 0 value for a hds-thresh properly. Taehee Yoo (2): net: ethtool: fix ethtool_ringparam_get_cfg() returns a hds_thresh value always as 0. selftests: drv-net: test random value for hds-thresh net/ethtool/common.c | 1 + tools/testing/selftests/drivers/net/hds.py | 28 +++++++++++++++++++++- 2 files changed, 28 insertions(+), 1 deletion(-) -- 2.34.1

4 months, 3 weeks

1
3
0 0

[PATCH bpf v2 0/2] bpf, xdp: clean adjust_{head,meta} memory when offset < 0

by Jiayuan Chen

This patchset originates from my attempt to resolve a KMSAN warning that has existed for over 3 years: https://syzkaller.appspot.com/bug?extid=0e6ddb1ef80986bdfe64 Previously, we had a brief discussion in this thread about whether we can simply perform memset in adjust_{head,meta}: https://lore.kernel.org/netdev/20250328043941.085de23b@kernel.org/T/#t Unfortunately, I couldn't find a similar topic in the mail list, but I did find a similar security-related commit: commit 6dfb970d3dbd ("xdp: avoid leaking info stored in frame data on page reuse") I just create a new topic here and make subject more clear, we can discuss this here. Meanwhile, I also discovered a related issue that led to a CVE,specifically the Facebook Katran vulnerability (https://vuldb.com/?id.246309). Currently, even with unprivileged functionality disabled, a user can load a BPF program using CAP_BPF and CAP_NET_ADMIN, which I believe we should avoid exposing kernel memory directly to users now. Regarding performance considerations, I added corresponding results to the selftest, testing common MAC headers and IP headers of various sizes. Compared to not using memset, the execution time increased by 2ns, but I think this is negligible considering the entire net stack. Jiayuan Chen (2): bpf, xdp: clean head/meta when expanding it selftests/bpf: add perf test for adjust_{head,meta} include/uapi/linux/bpf.h | 8 +-- net/core/filter.c | 5 +- tools/include/uapi/linux/bpf.h | 6 ++- .../selftests/bpf/prog_tests/xdp_perf.c | 52 ++++++++++++++++--- tools/testing/selftests/bpf/progs/xdp_dummy.c | 14 +++++ 5 files changed, 72 insertions(+), 13 deletions(-) -- 2.47.1

4 months, 3 weeks

5
10
0 0

[PATCH AUTOSEL 6.12 01/20] selftests/bpf: Fix stdout race condition in traffic monitor

by Sasha Levin

From: Amery Hung <ameryhung(a)gmail.com> [ Upstream commit b99f27e90268b1a814c13f8bd72ea1db448ea257 ] Fix a race condition between the main test_progs thread and the traffic monitoring thread. The traffic monitor thread tries to print a line using multiple printf and use flockfile() to prevent the line from being torn apart. Meanwhile, the main thread doing io redirection can reassign or close stdout when going through tests. A deadlock as shown below can happen. main traffic_monitor_thread ==== ====================== show_transport() -> flockfile(stdout) stdio_hijack_init() -> stdout = open_memstream(log_buf, log_cnt); ... env.subtest_state->stdout_saved = stdout; ... funlockfile(stdout) stdio_restore_cleanup() -> fclose(env.subtest_state->stdout_saved); After the traffic monitor thread lock stdout, A new memstream can be assigned to stdout by the main thread. Therefore, the traffic monitor thread later will not be able to unlock the original stdout. As the main thread tries to access the old stdout, it will hang indefinitely as it is still locked by the traffic monitor thread. The deadlock can be reproduced by running test_progs repeatedly with traffic monitor enabled: for ((i=1;i<=100;i++)); do ./test_progs -a flow_dissector_skb* -m '*' done Fix this by only calling printf once and remove flockfile()/funlockfile(). Signed-off-by: Amery Hung <ameryhung(a)gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau(a)kernel.org> Link: https://patch.msgid.link/20250213233217.553258-1-ameryhung@gmail.com Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- tools/testing/selftests/bpf/network_helpers.c | 33 ++++++++----------- 1 file changed, 13 insertions(+), 20 deletions(-) diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c index 27784946b01b8..af0ee70a53f9f 100644 --- a/tools/testing/selftests/bpf/network_helpers.c +++ b/tools/testing/selftests/bpf/network_helpers.c @@ -771,12 +771,13 @@ static const char *pkt_type_str(u16 pkt_type) return "Unknown"; } +#define MAX_FLAGS_STRLEN 21 /* Show the information of the transport layer in the packet */ static void show_transport(const u_char *packet, u16 len, u32 ifindex, const char *src_addr, const char *dst_addr, u16 proto, bool ipv6, u8 pkt_type) { - char *ifname, _ifname[IF_NAMESIZE]; + char *ifname, _ifname[IF_NAMESIZE], flags[MAX_FLAGS_STRLEN] = ""; const char *transport_str; u16 src_port, dst_port; struct udphdr *udp; @@ -817,29 +818,21 @@ static void show_transport(const u_char *packet, u16 len, u32 ifindex, /* TCP or UDP*/ - flockfile(stdout); + if (proto == IPPROTO_TCP) + snprintf(flags, MAX_FLAGS_STRLEN, "%s%s%s%s", + tcp->fin ? ", FIN" : "", + tcp->syn ? ", SYN" : "", + tcp->rst ? ", RST" : "", + tcp->ack ? ", ACK" : ""); + if (ipv6) - printf("%-7s %-3s IPv6 %s.%d > %s.%d: %s, length %d", + printf("%-7s %-3s IPv6 %s.%d > %s.%d: %s, length %d%s\n", ifname, pkt_type_str(pkt_type), src_addr, src_port, - dst_addr, dst_port, transport_str, len); + dst_addr, dst_port, transport_str, len, flags); else - printf("%-7s %-3s IPv4 %s:%d > %s:%d: %s, length %d", + printf("%-7s %-3s IPv4 %s:%d > %s:%d: %s, length %d%s\n", ifname, pkt_type_str(pkt_type), src_addr, src_port, - dst_addr, dst_port, transport_str, len); - - if (proto == IPPROTO_TCP) { - if (tcp->fin) - printf(", FIN"); - if (tcp->syn) - printf(", SYN"); - if (tcp->rst) - printf(", RST"); - if (tcp->ack) - printf(", ACK"); - } - - printf("\n"); - funlockfile(stdout); + dst_addr, dst_port, transport_str, len, flags); } static void show_ipv6_packet(const u_char *packet, u32 ifindex, u8 pkt_type) -- 2.39.5

4 months, 3 weeks

1
0
0 0

[PATCH AUTOSEL 6.13 16/22] selftests/bpf: Fix cap_enable_effective() return code

by Sasha Levin

From: Feng Yang <yangfeng(a)kylinos.cn> [ Upstream commit 339c1f8ea11cc042c30c315c1a8f61e4b8a90117 ] The caller of cap_enable_effective() expects negative error code. Fix it. Before: failed to restore CAP_SYS_ADMIN: -1, Unknown error -1 After: failed to restore CAP_SYS_ADMIN: -3, No such process failed to restore CAP_SYS_ADMIN: -22, Invalid argument Signed-off-by: Feng Yang <yangfeng(a)kylinos.cn> Acked-by: Eduard Zingerman <eddyz87(a)gmail.com> Link: https://lore.kernel.org/r/20250305022234.44932-1-yangfeng59949@163.com Signed-off-by: Alexei Starovoitov <ast(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- tools/testing/selftests/bpf/cap_helpers.c | 8 ++++---- tools/testing/selftests/bpf/cap_helpers.h | 1 + tools/testing/selftests/bpf/prog_tests/verifier.c | 4 ++-- tools/testing/selftests/bpf/test_loader.c | 6 +++--- 4 files changed, 10 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/bpf/cap_helpers.c b/tools/testing/selftests/bpf/cap_helpers.c index d5ac507401d7c..98f840c3a38f7 100644 --- a/tools/testing/selftests/bpf/cap_helpers.c +++ b/tools/testing/selftests/bpf/cap_helpers.c @@ -19,7 +19,7 @@ int cap_enable_effective(__u64 caps, __u64 *old_caps) err = capget(&hdr, data); if (err) - return err; + return -errno; if (old_caps) *old_caps = (__u64)(data[1].effective) << 32 | data[0].effective; @@ -32,7 +32,7 @@ int cap_enable_effective(__u64 caps, __u64 *old_caps) data[1].effective |= cap1; err = capset(&hdr, data); if (err) - return err; + return -errno; return 0; } @@ -49,7 +49,7 @@ int cap_disable_effective(__u64 caps, __u64 *old_caps) err = capget(&hdr, data); if (err) - return err; + return -errno; if (old_caps) *old_caps = (__u64)(data[1].effective) << 32 | data[0].effective; @@ -61,7 +61,7 @@ int cap_disable_effective(__u64 caps, __u64 *old_caps) data[1].effective &= ~cap1; err = capset(&hdr, data); if (err) - return err; + return -errno; return 0; } diff --git a/tools/testing/selftests/bpf/cap_helpers.h b/tools/testing/selftests/bpf/cap_helpers.h index 6d163530cb0fd..8dcb28557f762 100644 --- a/tools/testing/selftests/bpf/cap_helpers.h +++ b/tools/testing/selftests/bpf/cap_helpers.h @@ -4,6 +4,7 @@ #include <linux/types.h> #include <linux/capability.h> +#include <errno.h> #ifndef CAP_PERFMON #define CAP_PERFMON 38 diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c index 3ee40ee9413a9..88cb75b65cecd 100644 --- a/tools/testing/selftests/bpf/prog_tests/verifier.c +++ b/tools/testing/selftests/bpf/prog_tests/verifier.c @@ -118,7 +118,7 @@ static void run_tests_aux(const char *skel_name, /* test_verifier tests are executed w/o CAP_SYS_ADMIN, do the same here */ err = cap_disable_effective(1ULL << CAP_SYS_ADMIN, &old_caps); if (err) { - PRINT_FAIL("failed to drop CAP_SYS_ADMIN: %i, %s\n", err, strerror(err)); + PRINT_FAIL("failed to drop CAP_SYS_ADMIN: %i, %s\n", err, strerror(-err)); return; } @@ -128,7 +128,7 @@ static void run_tests_aux(const char *skel_name, err = cap_enable_effective(old_caps, NULL); if (err) - PRINT_FAIL("failed to restore CAP_SYS_ADMIN: %i, %s\n", err, strerror(err)); + PRINT_FAIL("failed to restore CAP_SYS_ADMIN: %i, %s\n", err, strerror(-err)); } #define RUN(skel) run_tests_aux(#skel, skel##__elf_bytes, NULL) diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c index 53b06647cf57d..8a403e5aa3145 100644 --- a/tools/testing/selftests/bpf/test_loader.c +++ b/tools/testing/selftests/bpf/test_loader.c @@ -773,7 +773,7 @@ static int drop_capabilities(struct cap_state *caps) err = cap_disable_effective(caps_to_drop, &caps->old_caps); if (err) { - PRINT_FAIL("failed to drop capabilities: %i, %s\n", err, strerror(err)); + PRINT_FAIL("failed to drop capabilities: %i, %s\n", err, strerror(-err)); return err; } @@ -790,7 +790,7 @@ static int restore_capabilities(struct cap_state *caps) err = cap_enable_effective(caps->old_caps, NULL); if (err) - PRINT_FAIL("failed to restore capabilities: %i, %s\n", err, strerror(err)); + PRINT_FAIL("failed to restore capabilities: %i, %s\n", err, strerror(-err)); caps->initialized = false; return err; } @@ -959,7 +959,7 @@ void run_subtest(struct test_loader *tester, if (subspec->caps) { err = cap_enable_effective(subspec->caps, NULL); if (err) { - PRINT_FAIL("failed to set capabilities: %i, %s\n", err, strerror(err)); + PRINT_FAIL("failed to set capabilities: %i, %s\n", err, strerror(-err)); goto subtest_cleanup; } } -- 2.39.5

4 months, 3 weeks

1
0
0 0

[PATCH AUTOSEL 6.13 02/22] selftests/bpf: Fix stdout race condition in traffic monitor

by Sasha Levin

From: Amery Hung <ameryhung(a)gmail.com> [ Upstream commit b99f27e90268b1a814c13f8bd72ea1db448ea257 ] Fix a race condition between the main test_progs thread and the traffic monitoring thread. The traffic monitor thread tries to print a line using multiple printf and use flockfile() to prevent the line from being torn apart. Meanwhile, the main thread doing io redirection can reassign or close stdout when going through tests. A deadlock as shown below can happen. main traffic_monitor_thread ==== ====================== show_transport() -> flockfile(stdout) stdio_hijack_init() -> stdout = open_memstream(log_buf, log_cnt); ... env.subtest_state->stdout_saved = stdout; ... funlockfile(stdout) stdio_restore_cleanup() -> fclose(env.subtest_state->stdout_saved); After the traffic monitor thread lock stdout, A new memstream can be assigned to stdout by the main thread. Therefore, the traffic monitor thread later will not be able to unlock the original stdout. As the main thread tries to access the old stdout, it will hang indefinitely as it is still locked by the traffic monitor thread. The deadlock can be reproduced by running test_progs repeatedly with traffic monitor enabled: for ((i=1;i<=100;i++)); do ./test_progs -a flow_dissector_skb* -m '*' done Fix this by only calling printf once and remove flockfile()/funlockfile(). Signed-off-by: Amery Hung <ameryhung(a)gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau(a)kernel.org> Link: https://patch.msgid.link/20250213233217.553258-1-ameryhung@gmail.com Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- tools/testing/selftests/bpf/network_helpers.c | 33 ++++++++----------- 1 file changed, 13 insertions(+), 20 deletions(-) diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c index 27784946b01b8..af0ee70a53f9f 100644 --- a/tools/testing/selftests/bpf/network_helpers.c +++ b/tools/testing/selftests/bpf/network_helpers.c @@ -771,12 +771,13 @@ static const char *pkt_type_str(u16 pkt_type) return "Unknown"; } +#define MAX_FLAGS_STRLEN 21 /* Show the information of the transport layer in the packet */ static void show_transport(const u_char *packet, u16 len, u32 ifindex, const char *src_addr, const char *dst_addr, u16 proto, bool ipv6, u8 pkt_type) { - char *ifname, _ifname[IF_NAMESIZE]; + char *ifname, _ifname[IF_NAMESIZE], flags[MAX_FLAGS_STRLEN] = ""; const char *transport_str; u16 src_port, dst_port; struct udphdr *udp; @@ -817,29 +818,21 @@ static void show_transport(const u_char *packet, u16 len, u32 ifindex, /* TCP or UDP*/ - flockfile(stdout); + if (proto == IPPROTO_TCP) + snprintf(flags, MAX_FLAGS_STRLEN, "%s%s%s%s", + tcp->fin ? ", FIN" : "", + tcp->syn ? ", SYN" : "", + tcp->rst ? ", RST" : "", + tcp->ack ? ", ACK" : ""); + if (ipv6) - printf("%-7s %-3s IPv6 %s.%d > %s.%d: %s, length %d", + printf("%-7s %-3s IPv6 %s.%d > %s.%d: %s, length %d%s\n", ifname, pkt_type_str(pkt_type), src_addr, src_port, - dst_addr, dst_port, transport_str, len); + dst_addr, dst_port, transport_str, len, flags); else - printf("%-7s %-3s IPv4 %s:%d > %s:%d: %s, length %d", + printf("%-7s %-3s IPv4 %s:%d > %s:%d: %s, length %d%s\n", ifname, pkt_type_str(pkt_type), src_addr, src_port, - dst_addr, dst_port, transport_str, len); - - if (proto == IPPROTO_TCP) { - if (tcp->fin) - printf(", FIN"); - if (tcp->syn) - printf(", SYN"); - if (tcp->rst) - printf(", RST"); - if (tcp->ack) - printf(", ACK"); - } - - printf("\n"); - funlockfile(stdout); + dst_addr, dst_port, transport_str, len, flags); } static void show_ipv6_packet(const u_char *packet, u32 ifindex, u8 pkt_type) -- 2.39.5

4 months, 3 weeks

1
0
0 0

[PATCH AUTOSEL 6.14 17/23] selftests/bpf: Fix cap_enable_effective() return code

by Sasha Levin

From: Feng Yang <yangfeng(a)kylinos.cn> [ Upstream commit 339c1f8ea11cc042c30c315c1a8f61e4b8a90117 ] The caller of cap_enable_effective() expects negative error code. Fix it. Before: failed to restore CAP_SYS_ADMIN: -1, Unknown error -1 After: failed to restore CAP_SYS_ADMIN: -3, No such process failed to restore CAP_SYS_ADMIN: -22, Invalid argument Signed-off-by: Feng Yang <yangfeng(a)kylinos.cn> Acked-by: Eduard Zingerman <eddyz87(a)gmail.com> Link: https://lore.kernel.org/r/20250305022234.44932-1-yangfeng59949@163.com Signed-off-by: Alexei Starovoitov <ast(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- tools/testing/selftests/bpf/cap_helpers.c | 8 ++++---- tools/testing/selftests/bpf/cap_helpers.h | 1 + tools/testing/selftests/bpf/prog_tests/verifier.c | 4 ++-- tools/testing/selftests/bpf/test_loader.c | 6 +++--- 4 files changed, 10 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/bpf/cap_helpers.c b/tools/testing/selftests/bpf/cap_helpers.c index d5ac507401d7c..98f840c3a38f7 100644 --- a/tools/testing/selftests/bpf/cap_helpers.c +++ b/tools/testing/selftests/bpf/cap_helpers.c @@ -19,7 +19,7 @@ int cap_enable_effective(__u64 caps, __u64 *old_caps) err = capget(&hdr, data); if (err) - return err; + return -errno; if (old_caps) *old_caps = (__u64)(data[1].effective) << 32 | data[0].effective; @@ -32,7 +32,7 @@ int cap_enable_effective(__u64 caps, __u64 *old_caps) data[1].effective |= cap1; err = capset(&hdr, data); if (err) - return err; + return -errno; return 0; } @@ -49,7 +49,7 @@ int cap_disable_effective(__u64 caps, __u64 *old_caps) err = capget(&hdr, data); if (err) - return err; + return -errno; if (old_caps) *old_caps = (__u64)(data[1].effective) << 32 | data[0].effective; @@ -61,7 +61,7 @@ int cap_disable_effective(__u64 caps, __u64 *old_caps) data[1].effective &= ~cap1; err = capset(&hdr, data); if (err) - return err; + return -errno; return 0; } diff --git a/tools/testing/selftests/bpf/cap_helpers.h b/tools/testing/selftests/bpf/cap_helpers.h index 6d163530cb0fd..8dcb28557f762 100644 --- a/tools/testing/selftests/bpf/cap_helpers.h +++ b/tools/testing/selftests/bpf/cap_helpers.h @@ -4,6 +4,7 @@ #include <linux/types.h> #include <linux/capability.h> +#include <errno.h> #ifndef CAP_PERFMON #define CAP_PERFMON 38 diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c index 8a0e1ff8a2dc6..ecc320e045513 100644 --- a/tools/testing/selftests/bpf/prog_tests/verifier.c +++ b/tools/testing/selftests/bpf/prog_tests/verifier.c @@ -121,7 +121,7 @@ static void run_tests_aux(const char *skel_name, /* test_verifier tests are executed w/o CAP_SYS_ADMIN, do the same here */ err = cap_disable_effective(1ULL << CAP_SYS_ADMIN, &old_caps); if (err) { - PRINT_FAIL("failed to drop CAP_SYS_ADMIN: %i, %s\n", err, strerror(err)); + PRINT_FAIL("failed to drop CAP_SYS_ADMIN: %i, %s\n", err, strerror(-err)); return; } @@ -131,7 +131,7 @@ static void run_tests_aux(const char *skel_name, err = cap_enable_effective(old_caps, NULL); if (err) - PRINT_FAIL("failed to restore CAP_SYS_ADMIN: %i, %s\n", err, strerror(err)); + PRINT_FAIL("failed to restore CAP_SYS_ADMIN: %i, %s\n", err, strerror(-err)); } #define RUN(skel) run_tests_aux(#skel, skel##__elf_bytes, NULL) diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c index 53b06647cf57d..8a403e5aa3145 100644 --- a/tools/testing/selftests/bpf/test_loader.c +++ b/tools/testing/selftests/bpf/test_loader.c @@ -773,7 +773,7 @@ static int drop_capabilities(struct cap_state *caps) err = cap_disable_effective(caps_to_drop, &caps->old_caps); if (err) { - PRINT_FAIL("failed to drop capabilities: %i, %s\n", err, strerror(err)); + PRINT_FAIL("failed to drop capabilities: %i, %s\n", err, strerror(-err)); return err; } @@ -790,7 +790,7 @@ static int restore_capabilities(struct cap_state *caps) err = cap_enable_effective(caps->old_caps, NULL); if (err) - PRINT_FAIL("failed to restore capabilities: %i, %s\n", err, strerror(err)); + PRINT_FAIL("failed to restore capabilities: %i, %s\n", err, strerror(-err)); caps->initialized = false; return err; } @@ -959,7 +959,7 @@ void run_subtest(struct test_loader *tester, if (subspec->caps) { err = cap_enable_effective(subspec->caps, NULL); if (err) { - PRINT_FAIL("failed to set capabilities: %i, %s\n", err, strerror(err)); + PRINT_FAIL("failed to set capabilities: %i, %s\n", err, strerror(-err)); goto subtest_cleanup; } } -- 2.39.5

4 months, 3 weeks

1
0
0 0

[PATCH AUTOSEL 6.14 02/23] selftests/bpf: Fix stdout race condition in traffic monitor

by Sasha Levin

From: Amery Hung <ameryhung(a)gmail.com> [ Upstream commit b99f27e90268b1a814c13f8bd72ea1db448ea257 ] Fix a race condition between the main test_progs thread and the traffic monitoring thread. The traffic monitor thread tries to print a line using multiple printf and use flockfile() to prevent the line from being torn apart. Meanwhile, the main thread doing io redirection can reassign or close stdout when going through tests. A deadlock as shown below can happen. main traffic_monitor_thread ==== ====================== show_transport() -> flockfile(stdout) stdio_hijack_init() -> stdout = open_memstream(log_buf, log_cnt); ... env.subtest_state->stdout_saved = stdout; ... funlockfile(stdout) stdio_restore_cleanup() -> fclose(env.subtest_state->stdout_saved); After the traffic monitor thread lock stdout, A new memstream can be assigned to stdout by the main thread. Therefore, the traffic monitor thread later will not be able to unlock the original stdout. As the main thread tries to access the old stdout, it will hang indefinitely as it is still locked by the traffic monitor thread. The deadlock can be reproduced by running test_progs repeatedly with traffic monitor enabled: for ((i=1;i<=100;i++)); do ./test_progs -a flow_dissector_skb* -m '*' done Fix this by only calling printf once and remove flockfile()/funlockfile(). Signed-off-by: Amery Hung <ameryhung(a)gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau(a)kernel.org> Link: https://patch.msgid.link/20250213233217.553258-1-ameryhung@gmail.com Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- tools/testing/selftests/bpf/network_helpers.c | 33 ++++++++----------- 1 file changed, 13 insertions(+), 20 deletions(-) diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c index 80844a5fb1fee..95e943270f359 100644 --- a/tools/testing/selftests/bpf/network_helpers.c +++ b/tools/testing/selftests/bpf/network_helpers.c @@ -771,12 +771,13 @@ static const char *pkt_type_str(u16 pkt_type) return "Unknown"; } +#define MAX_FLAGS_STRLEN 21 /* Show the information of the transport layer in the packet */ static void show_transport(const u_char *packet, u16 len, u32 ifindex, const char *src_addr, const char *dst_addr, u16 proto, bool ipv6, u8 pkt_type) { - char *ifname, _ifname[IF_NAMESIZE]; + char *ifname, _ifname[IF_NAMESIZE], flags[MAX_FLAGS_STRLEN] = ""; const char *transport_str; u16 src_port, dst_port; struct udphdr *udp; @@ -817,29 +818,21 @@ static void show_transport(const u_char *packet, u16 len, u32 ifindex, /* TCP or UDP*/ - flockfile(stdout); + if (proto == IPPROTO_TCP) + snprintf(flags, MAX_FLAGS_STRLEN, "%s%s%s%s", + tcp->fin ? ", FIN" : "", + tcp->syn ? ", SYN" : "", + tcp->rst ? ", RST" : "", + tcp->ack ? ", ACK" : ""); + if (ipv6) - printf("%-7s %-3s IPv6 %s.%d > %s.%d: %s, length %d", + printf("%-7s %-3s IPv6 %s.%d > %s.%d: %s, length %d%s\n", ifname, pkt_type_str(pkt_type), src_addr, src_port, - dst_addr, dst_port, transport_str, len); + dst_addr, dst_port, transport_str, len, flags); else - printf("%-7s %-3s IPv4 %s:%d > %s:%d: %s, length %d", + printf("%-7s %-3s IPv4 %s:%d > %s:%d: %s, length %d%s\n", ifname, pkt_type_str(pkt_type), src_addr, src_port, - dst_addr, dst_port, transport_str, len); - - if (proto == IPPROTO_TCP) { - if (tcp->fin) - printf(", FIN"); - if (tcp->syn) - printf(", SYN"); - if (tcp->rst) - printf(", RST"); - if (tcp->ack) - printf(", ACK"); - } - - printf("\n"); - funlockfile(stdout); + dst_addr, dst_port, transport_str, len, flags); } static void show_ipv6_packet(const u_char *packet, u32 ifindex, u8 pkt_type) -- 2.39.5

4 months, 3 weeks

1
0
0 0

[PATCH net-next] net/selftests: Add loopback link local route for self-connect

by Dmitry Safonov via B4 Relay

From: Dmitry Safonov <0x7f454c46(a)gmail.com> self-connect-ipv6 got slightly flaky on netdev: > # timeout set to 120 > # selftests: net/tcp_ao: self-connect_ipv6 > # 1..5 > # # 708[lib/setup.c:250] rand seed 1742872572 > # TAP version 13 > # # 708[lib/proc.c:213] Snmp6 Ip6OutNoRoutes: 0 => 1 > # not ok 1 # error 708[self-connect.c:70] failed to connect() > # ok 2 No unexpected trace events during the test run > # # Planned tests != run tests (5 != 2) > # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:1 > ok 1 selftests: net/tcp_ao: self-connect_ipv6 I can not reproduce it on my machines, but judging by "Ip6OutNoRoutes" there is no route to the local_addr (::1). Looking at the kernel code, I see that kernel does add link-local address automatically in init_loopback(), but that is called from ipv6 notifier block. So, in turn the userspace that brought up the loopback interface may see rtnetlink ACK earlier than addrconf_notify() does it's job (at least, on a slow VM such as netdev). Probably, for ipv4 it's the same, judging by inetdev_event(). The fix is quite simple: set the link-local route straight after bringing the loopback interface. That will make it synchronous. Signed-off-by: Dmitry Safonov <0x7f454c46(a)gmail.com> --- Sorry to send this during the merge window, it's a test stability fix. It seems that netdev build bot has hit the issue a couple of times, but seems not hitting it constantly at this moment: https://netdev.bots.linux.dev/flakes.html?br-cnt=150&tn-needle=tcp-ao I'm marking it net-next, so that build bot carries it until the merge closes. If it's not fine, I can re-send it after the merge window. --- tools/testing/selftests/net/tcp_ao/self-connect.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/testing/selftests/net/tcp_ao/self-connect.c b/tools/testing/selftests/net/tcp_ao/self-connect.c index 73b2f2276f3f5410aaa74bede7f366f81761bd6e..2c73bea698a677f9aedd7bec28f6e7fee7845d2e 100644 --- a/tools/testing/selftests/net/tcp_ao/self-connect.c +++ b/tools/testing/selftests/net/tcp_ao/self-connect.c @@ -16,6 +16,9 @@ static void __setup_lo_intf(const char *lo_intf, if (link_set_up(lo_intf)) test_error("Failed to bring %s up", lo_intf); + + if (ip_route_add(lo_intf, TEST_FAMILY, local_addr, local_addr)) + test_error("Failed to add a local route %s", lo_intf); } static void setup_lo_intf(const char *lo_intf) --- base-commit: 1a9239bb4253f9076b5b4b2a1a4e8d7defd77a95 change-id: 20250402-tcp-ao-selfconnect-flake-e0aabc03c076 Best regards, -- Dmitry Safonov <0x7f454c46(a)gmail.com>

4 months, 3 weeks

2
1
0 0

[PATCH bpf-next 00/11] bpf: Mitigate Spectre v1 using barriers

by Luis Gerhorst

This improves the expressiveness of unprivileged BPF by inserting speculation barriers instead of rejecting the programs. The approach was previously presented at LPC'24 [1] and RAID'24 [2]. To mitigate the Spectre v1 (PHT) vulnerability, the kernel rejects potentially-dangerous unprivileged BPF programs as of commit 9183671af6db ("bpf: Fix leakage under speculation on mispredicted branches"). In [2], we have analyzed 364 object files from open source projects (Linux Samples and Selftests, BCC, Loxilb, Cilium, libbpf Examples, Parca, and Prevail) and found that this affects 31% to 54% of programs. To resolve this in the majority of cases this patchset adds a fall-back for mitigating Spectre v1 using speculation barriers. The kernel still optimistically attempts to verify all speculative paths but uses speculation barriers against v1 when unsafe behavior is detected. This allows for more programs to be accepted without disabling the BPF Spectre mitigations (e.g., by setting cpu_mitigations_off()). In [1] we have measured the overhead of this approach relative to having mitigations off and including the upstream Spectre v4 mitigations. For event tracing and stack-sampling profilers, we found that mitigations increase BPF program execution time by 0% to 62%. For the Loxilb network load balancer, we have measured a 14% slowdown in SCTP performance but no significant slowdown for TCP. This overhead only applies to programs that were previously rejected. I reran the expressiveness-evaluation with v6.14 and made sure the main results still match those from [1] and [2] (which used v6.5). Main design decisions are: * Do not use separate bytecode insns for v1 and v4 barriers. This simplifies the verifier significantly and has the only downside that performance on PowerPC is not as high as it could be. * Allow archs to still disable v1/v4 mitigations separately by setting bpf_jit_bypass_spec_v1/v4(). This has the benefit that archs can benefit from improved BPF expressiveness / performance if they are not vulnerable (e.g., ARM64 for v4 in the kernel). * Do not remove the empty BPF_NOSPEC implementation for backends for which it is unknown whether they are vulnerable to Spectre v1. [1] https://lpc.events/event/18/contributions/1954/ ("Mitigating Spectre-PHT using Speculation Barriers in Linux eBPF") [2] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and Precise Spectre Defenses for Untrusted Linux Kernel Extensions") Changes: * RFC -> v1: - rebase to bpf-next-250313 - tests: mark expected successes/new errors - add bpt_jit_bypass_spec_v1/v4() to avoid #ifdef in bpf_bypass_spec_v1/v4() - ensure that nospec with v1-support is implemented for archs for which GCC supports speculation barriers, except for MIPS - arm64: emit speculation barrier - powerpc: change nospec to include v1 barrier - discuss potential security (archs that do not impl. BPF nospec) and performance (only PowerPC) regressions RFC: https://lore.kernel.org/bpf/20250224203619.594724-1-luis.gerhorst@fau.de/ Luis Gerhorst (11): bpf: Move insn if/else into do_check_insn() bpf: Return -EFAULT on misconfigurations bpf: Return -EFAULT on internal errors bpf, arm64, powerpc: Add bpf_jit_bypass_spec_v1/v4() bpf, arm64, powerpc: Change nospec to include v1 barrier bpf: Rename sanitize_stack_spill to nospec_result bpf: Fall back to nospec for Spectre v1 bpf: Allow nospec-protected var-offset stack access bpf: Return PTR_ERR from push_stack() bpf: Fall back to nospec for sanitization-failures bpf: Fall back to nospec for spec path verification arch/arm64/net/bpf_jit.h | 5 + arch/arm64/net/bpf_jit_comp.c | 28 +- arch/powerpc/net/bpf_jit_comp64.c | 79 +- include/linux/bpf.h | 11 +- include/linux/bpf_verifier.h | 3 +- include/linux/filter.h | 2 +- kernel/bpf/core.c | 32 +- kernel/bpf/verifier.c | 723 ++++++++++-------- .../selftests/bpf/progs/verifier_and.c | 3 +- .../selftests/bpf/progs/verifier_bounds.c | 35 +- .../bpf/progs/verifier_bounds_deduction.c | 43 +- .../selftests/bpf/progs/verifier_map_ptr.c | 12 +- .../selftests/bpf/progs/verifier_movsx.c | 6 +- .../selftests/bpf/progs/verifier_unpriv.c | 3 +- .../bpf/progs/verifier_value_ptr_arith.c | 50 +- .../selftests/bpf/verifier/dead_code.c | 3 +- tools/testing/selftests/bpf/verifier/jmp32.c | 33 +- tools/testing/selftests/bpf/verifier/jset.c | 10 +- 18 files changed, 630 insertions(+), 451 deletions(-) base-commit: 46d38f489ef02175dcff1e03a849c226eb0729a6 -- 2.48.1

4 months, 3 weeks

3
22
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror