From: angquan yu <angquan21(a)gmail.com>
This commit addresses compiler warnings in lam.c related to the usage
of non-literal format strings without format arguments in the
'run_test' function.
Warnings fixed:
- Resolved warnings indicating that 'ksft_test_result_skip' and
'ksft_test_result' were called with 't->msg' as a format string without
accompanying format arguments.
Changes made:
- Modified the calls to 'ksft_test_result_skip' and 'ksft_test_result'
to explicitly include a format specifier ("%s") for 't->msg'.
- This ensures that the string is safely treated as a format argument,
adhering to safer coding practices and resolving the compiler warnings.
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/x86/lam.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c
index 8f9b06d9c..215b8150b 100644
--- a/tools/testing/selftests/x86/lam.c
+++ b/tools/testing/selftests/x86/lam.c
@@ -817,7 +817,7 @@ static void run_test(struct testcases *test, int count)
/* return 3 is not support LA57, the case should be skipped */
if (ret == 3) {
- ksft_test_result_skip(t->msg);
+ ksft_test_result_skip("%s", t->msg);
continue;
}
@@ -826,7 +826,7 @@ static void run_test(struct testcases *test, int count)
else
ret = !(t->expected);
- ksft_test_result(ret, t->msg);
+ ksft_test_result(ret, "%s", t->msg);
}
}
--
2.39.2
From: angquan yu <angquan21(a)gmail.com>
This commit resolves a compiler warning regardingthe
use of non-literal format strings in breakpoint_test.c.
The functions `ksft_test_result_pass` and `ksft_test_result_fail`
were previously called with a variable `msg` directly, which could
potentially lead to format string vulnerabilities.
Changes made:
- Modified the calls to `ksft_test_result_pass` and `ksft_test_result_fail`
by adding a "%s" format specifier. This explicitly declares `msg` as a
string argument, adhering to safer coding practices and resolving
the compiler warning.
This change does not affect the functional behavior of the code but ensures
better code safety and compliance with recommended C programming standards.
The previous warning is "breakpoint_test.c:287:17:
warning: format not a string literal and no format arguments
[-Wformat-security]
287 | ksft_test_result_pass(msg);
| ^~~~~~~~~~~~~~~~~~~~~
breakpoint_test.c:289:17: warning: format not a string literal
and no format arguments [-Wformat-security]
289 | ksft_test_result_fail(msg);
| "
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/breakpoints/breakpoint_test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/breakpoints/breakpoint_test.c b/tools/testing/selftests/breakpoints/breakpoint_test.c
index 3266cc929..d46962a24 100644
--- a/tools/testing/selftests/breakpoints/breakpoint_test.c
+++ b/tools/testing/selftests/breakpoints/breakpoint_test.c
@@ -284,9 +284,9 @@ static void check_success(const char *msg)
nr_tests++;
if (ret)
- ksft_test_result_pass(msg);
+ ksft_test_result_pass("%s", msg);
else
- ksft_test_result_fail(msg);
+ ksft_test_result_fail("%s", msg);
}
static void launch_instruction_breakpoints(char *buf, int local, int global)
--
2.39.2
From: angquan yu <angquan21(a)gmail.com>
In tools/testing/selftests/proc/proc-empty->because the return value
of a write call was being ignored. This call was partof a conditional
debugging block (if (0) { ... }), which meant it would neveractually
execute.
This patch removes the unused debug write call. This cleanup resolves
the compi>warning about ignoring the result of write declared with
the warn_unused_resultattribute.
Removing this code also improves the clarity and maintainability of
the function, as it eliminates a non-functional block of code.
This is original warning: proc-empty-vm.c: In function
‘test_proc_pid_statm’ :proc-empty-vm.c:385:17:
warning: ignoring return value of ‘write’
declared with>385 | write(1, buf, rv);|
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/proc/proc-empty-vm.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/proc/proc-empty-vm.c b/tools/testing/selftests/proc/proc-empty-vm.c
index 5e7020630..d231e61e4 100644
--- a/tools/testing/selftests/proc/proc-empty-vm.c
+++ b/tools/testing/selftests/proc/proc-empty-vm.c
@@ -383,8 +383,10 @@ static int test_proc_pid_statm(pid_t pid)
assert(rv <= sizeof(buf));
if (0) {
ssize_t written = write(1, buf, rv);
+
if (written == -1) {
perror("write failed to /proc/${pid}");
+ return EXIT_FAILURE;
}
}
--
2.39.2
From: Eduard Zingerman <eddyz87(a)gmail.com>
[ Upstream commit f40bfd1679446b22d321e64a1fa98b7d07d2be08 ]
This is a preparatory change. A follow-up patch "bpf: verify callbacks
as if they are called unknown number of times" changes logic for
callbacks handling. While previously callbacks were verified as a
single function call, new scheme takes into account that callbacks
could be executed unknown number of times.
This has dire implications for bpf_loop_bench:
SEC("fentry/" SYS_PREFIX "sys_getpgid")
int benchmark(void *ctx)
{
for (int i = 0; i < 1000; i++) {
bpf_loop(nr_loops, empty_callback, NULL, 0);
__sync_add_and_fetch(&hits, nr_loops);
}
return 0;
}
W/o callbacks change verifier sees it as a 1000 calls to
empty_callback(). However, with callbacks change things become
exponential:
- i=0: state exploring empty_callback is scheduled with i=0 (a);
- i=1: state exploring empty_callback is scheduled with i=1;
...
- i=999: state exploring empty_callback is scheduled with i=999;
- state (a) is popped from stack;
- i=1: state exploring empty_callback is scheduled with i=1;
...
Avoid this issue by rewriting outer loop as bpf_loop().
Unfortunately, this adds a function call to a loop at runtime, which
negatively affects performance:
throughput latency
before: 149.919 ± 0.168 M ops/s, 6.670 ns/op
after : 137.040 ± 0.187 M ops/s, 7.297 ns/op
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87(a)gmail.com>
Link: https://lore.kernel.org/r/20231121020701.26440-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/bpf_loop_bench.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/bpf_loop_bench.c b/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
index 4ce76eb064c41..d461746fd3c1e 100644
--- a/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
+++ b/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
@@ -15,13 +15,16 @@ static int empty_callback(__u32 index, void *data)
return 0;
}
+static int outer_loop(__u32 index, void *data)
+{
+ bpf_loop(nr_loops, empty_callback, NULL, 0);
+ __sync_add_and_fetch(&hits, nr_loops);
+ return 0;
+}
+
SEC("fentry/" SYS_PREFIX "sys_getpgid")
int benchmark(void *ctx)
{
- for (int i = 0; i < 1000; i++) {
- bpf_loop(nr_loops, empty_callback, NULL, 0);
-
- __sync_add_and_fetch(&hits, nr_loops);
- }
+ bpf_loop(1000, outer_loop, NULL, 0);
return 0;
}
--
2.42.0
Hi,
v1 [1] was discussed during Plumbers [2], where a lot of feedback was given. I
hope to justify the changes in v2 and address the feedback here.
One feedback from Shuah was that keeping per-platform files with the USB/PCI
devices to test as part of the kselftest tree wasn't maintainable. One proposed
alternative was to generate a list of probed devices on a known-good kernel and
use that as a reference. However you need someone to look at that generated
reference to be able to say it is a good one, and you need to save it to ensure
it will be reproducible later anyway, so that wouldn't actually solve the
problem. It is a matter of hand-crafting vs generating the test definitions, but
they will need to be vouched by someone and stored somewhere in both cases.
So for this v2, in patch 2 I just have a sample test definition, and the
per-platform test definitions would be added to a separate repository.
The other feedback received was that the BIOS might reconfigure the PCI
topology (at least on x86), meaning that relying on a sequence of device and
function numbers (eg 1d.0/02.0/0.0) as a stable description of a device on the
platform is not possible. I couldn't verify whether this is really the case (if
you have any more insight into this, please let me know), but with that in mind,
here in v2 I have taken a different approach. Here I'm using the device's
properties which are used for driver matching (the same that show on modalias)
to identify a device in a stable way.
This approach has some drawbacks compared to the one on v1. For one it doesn't
uniquely identify a device, so if there are multiple of the same device on a
platform they have to be checked as a group. Also the test definition isn't as
human-readable.
I'm adding in CC the people I recognized at the Plumbers session that were
interested in this work. Feel free to add anyone missing.
Thanks,
Nícolas
[1] https://lore.kernel.org/all/20231024211818.365844-1-nfraprado@collabora.com
[2] https://www.youtube.com/watch?v=oE73eVSyFXQ&t=9377s
Original cover letter:
This is part of an effort to improve detection of regressions impacting
device probe on all platforms. The recently merged DT kselftest [3]
detects probe issues for all devices described statically in the DT.
That leaves out devices discovered at run-time from discoverable busses.
This is where this test comes in. All of the devices that are connected
through discoverable busses (ie USB and PCI), and which are internal and
therefore always present, can be described in a per-platform file so
they can be checked for. The test will check that the device has been
instantiated and bound to a driver.
Patch 1 introduces the test. Patch 2 adds the test definitions for the
google,spherion machine (Acer Chromebook 514) as an example.
This is the sample output from the test running on Spherion:
TAP version 13
Using board file: boards/google,spherion
1..3
ok 1 usb.camera
ok 2 usb.bluetooth
ok 3 pci.wifi
Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
[3] https://lore.kernel.org/all/20230828211424.2964562-1-nfraprado@collabora.co…
Changes in v2:
- Changed approach of encoding stable device reference in test file from
HW topology to device match fields (the ones from modalias)
- Better documented test format
Nícolas F. R. A. Prado (2):
kselftest: Add test to verify probe of devices from discoverable
busses
kselftest: devices: Add sample board file for google,spherion
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/devices/.gitignore | 1 +
tools/testing/selftests/devices/Makefile | 8 +
.../selftests/devices/boards/google,spherion | 12 ++
.../devices/test_discoverable_devices.sh | 160 ++++++++++++++++++
5 files changed, 182 insertions(+)
create mode 100644 tools/testing/selftests/devices/.gitignore
create mode 100644 tools/testing/selftests/devices/Makefile
create mode 100644 tools/testing/selftests/devices/boards/google,spherion
create mode 100755 tools/testing/selftests/devices/test_discoverable_devices.sh
--
2.42.1
From: Eduard Zingerman <eddyz87(a)gmail.com>
[ Upstream commit f40bfd1679446b22d321e64a1fa98b7d07d2be08 ]
This is a preparatory change. A follow-up patch "bpf: verify callbacks
as if they are called unknown number of times" changes logic for
callbacks handling. While previously callbacks were verified as a
single function call, new scheme takes into account that callbacks
could be executed unknown number of times.
This has dire implications for bpf_loop_bench:
SEC("fentry/" SYS_PREFIX "sys_getpgid")
int benchmark(void *ctx)
{
for (int i = 0; i < 1000; i++) {
bpf_loop(nr_loops, empty_callback, NULL, 0);
__sync_add_and_fetch(&hits, nr_loops);
}
return 0;
}
W/o callbacks change verifier sees it as a 1000 calls to
empty_callback(). However, with callbacks change things become
exponential:
- i=0: state exploring empty_callback is scheduled with i=0 (a);
- i=1: state exploring empty_callback is scheduled with i=1;
...
- i=999: state exploring empty_callback is scheduled with i=999;
- state (a) is popped from stack;
- i=1: state exploring empty_callback is scheduled with i=1;
...
Avoid this issue by rewriting outer loop as bpf_loop().
Unfortunately, this adds a function call to a loop at runtime, which
negatively affects performance:
throughput latency
before: 149.919 ± 0.168 M ops/s, 6.670 ns/op
after : 137.040 ± 0.187 M ops/s, 7.297 ns/op
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87(a)gmail.com>
Link: https://lore.kernel.org/r/20231121020701.26440-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/bpf_loop_bench.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/bpf_loop_bench.c b/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
index 4ce76eb064c41..d461746fd3c1e 100644
--- a/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
+++ b/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
@@ -15,13 +15,16 @@ static int empty_callback(__u32 index, void *data)
return 0;
}
+static int outer_loop(__u32 index, void *data)
+{
+ bpf_loop(nr_loops, empty_callback, NULL, 0);
+ __sync_add_and_fetch(&hits, nr_loops);
+ return 0;
+}
+
SEC("fentry/" SYS_PREFIX "sys_getpgid")
int benchmark(void *ctx)
{
- for (int i = 0; i < 1000; i++) {
- bpf_loop(nr_loops, empty_callback, NULL, 0);
-
- __sync_add_and_fetch(&hits, nr_loops);
- }
+ bpf_loop(1000, outer_loop, NULL, 0);
return 0;
}
--
2.42.0
From: angquan yu <angquan21(a)gmail.com>
This commit resolves a compiler warning regardingthe
use of non-literal format strings in breakpoint_test.c.
The functions `ksft_test_result_pass` and `ksft_test_result_fail`
were previously called with a variable `msg` directly, which could
potentially lead to format string vulnerabilities.
Changes made:
- Modified the calls to `ksft_test_result_pass` and `ksft_test_result_fail`
by adding a "%s" format specifier. This explicitly declares `msg` as a
string argument, adhering to safer coding practices and resolving
the compiler warning.
This change does not affect the functional behavior of the code but ensures
better code safety and compliance with recommended C programming standards.
The previous warning is "breakpoint_test.c:287:17:
warning: format not a string literal and no format arguments
[-Wformat-security]
287 | ksft_test_result_pass(msg);
| ^~~~~~~~~~~~~~~~~~~~~
breakpoint_test.c:289:17: warning: format not a string literal
and no format arguments [-Wformat-security]
289 | ksft_test_result_fail(msg);
| "
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/breakpoints/breakpoint_test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/breakpoints/breakpoint_test.c b/tools/testing/selftests/breakpoints/breakpoint_test.c
index 3266cc929..d46962a24 100644
--- a/tools/testing/selftests/breakpoints/breakpoint_test.c
+++ b/tools/testing/selftests/breakpoints/breakpoint_test.c
@@ -284,9 +284,9 @@ static void check_success(const char *msg)
nr_tests++;
if (ret)
- ksft_test_result_pass(msg);
+ ksft_test_result_pass("%s", msg);
else
- ksft_test_result_fail(msg);
+ ksft_test_result_fail("%s", msg);
}
static void launch_instruction_breakpoints(char *buf, int local, int global)
--
2.39.2
The root-only cpuset.cpus.isolated control file shows the current set
of isolated CPUs in isolated partitions. This control file is currently
exposed only with the cgroup_debug boot command line option which also
adds the ".__DEBUG__." prefix. This is actually a useful control file if
users want to find out which CPUs are currently in an isolated state by
the cpuset controller. Remove CFTYPE_DEBUG flag for this control file and
make it available by default without any prefix.
The test_cpuset_prs.sh test script and the cgroup-v2.rst documentation
file are also updated accordingly. Minor code change is also made in
test_cpuset_prs.sh to avoid false test failure when running on debug
kernel.
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
Documentation/admin-guide/cgroup-v2.rst | 7 ++++
kernel/cgroup/cpuset.c | 2 +-
.../selftests/cgroup/test_cpuset_prs.sh | 32 +++++++++++--------
3 files changed, 26 insertions(+), 15 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index cf5651a11df8..30f6ff2eba47 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2316,6 +2316,13 @@ Cpuset Interface Files
treated to have an implicit value of "cpuset.cpus" in the
formation of local partition.
+ cpuset.cpus.isolated
+ A read-only and root cgroup only multiple values file.
+
+ This file shows the set of all isolated CPUs used in existing
+ isolated partitions. It will be empty if no isolated partition
+ is created.
+
cpuset.cpus.partition
A read-write single value file which exists on non-root
cpuset-enabled cgroups. This flag is owned by the parent cgroup
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 1bad4007ff4b..2a16df86c55c 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3974,7 +3974,7 @@ static struct cftype dfl_files[] = {
.name = "cpus.isolated",
.seq_show = cpuset_common_seq_show,
.private = FILE_ISOLATED_CPULIST,
- .flags = CFTYPE_ONLY_ON_ROOT | CFTYPE_DEBUG,
+ .flags = CFTYPE_ONLY_ON_ROOT,
},
{ } /* terminate */
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
index 7b7c4c2b6d85..b5eb1be2248c 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
@@ -508,7 +508,7 @@ dump_states()
XECPUS=$DIR/cpuset.cpus.exclusive.effective
PRS=$DIR/cpuset.cpus.partition
PCPUS=$DIR/.__DEBUG__.cpuset.cpus.subpartitions
- ISCPUS=$DIR/.__DEBUG__.cpuset.cpus.isolated
+ ISCPUS=$DIR/cpuset.cpus.isolated
[[ -e $CPUS ]] && echo "$CPUS: $(cat $CPUS)"
[[ -e $XCPUS ]] && echo "$XCPUS: $(cat $XCPUS)"
[[ -e $ECPUS ]] && echo "$ECPUS: $(cat $ECPUS)"
@@ -593,17 +593,17 @@ check_cgroup_states()
#
# Get isolated (including offline) CPUs by looking at
-# /sys/kernel/debug/sched/domains and *cpuset.cpus.isolated control file,
+# /sys/kernel/debug/sched/domains and cpuset.cpus.isolated control file,
# if available, and compare that with the expected value.
#
# Note that isolated CPUs from the sched/domains context include offline
# CPUs as well as CPUs in non-isolated 1-CPU partition. Those CPUs may
-# not be included in the *cpuset.cpus.isolated control file which contains
+# not be included in the cpuset.cpus.isolated control file which contains
# only CPUs in isolated partitions.
#
# $1 - expected isolated cpu list(s) <isolcpus1>{,<isolcpus2>}
# <isolcpus1> - expected sched/domains value
-# <isolcpus2> - *cpuset.cpus.isolated value = <isolcpus1> if not defined
+# <isolcpus2> - cpuset.cpus.isolated value = <isolcpus1> if not defined
#
check_isolcpus()
{
@@ -611,7 +611,7 @@ check_isolcpus()
ISOLCPUS=
LASTISOLCPU=
SCHED_DOMAINS=/sys/kernel/debug/sched/domains
- ISCPUS=${CGROUP2}/.__DEBUG__.cpuset.cpus.isolated
+ ISCPUS=${CGROUP2}/cpuset.cpus.isolated
if [[ $EXPECT_VAL = . ]]
then
EXPECT_VAL=
@@ -692,14 +692,18 @@ test_fail()
null_isolcpus_check()
{
[[ $VERBOSE -gt 0 ]] || return 0
- pause 0.02
- check_isolcpus "."
- if [[ $? -ne 0 ]]
- then
- echo "Unexpected isolated CPUs: $ISOLCPUS"
- dump_states
- exit 1
- fi
+ # Retry a few times before printing error
+ RETRY=0
+ while [[ $RETRY -lt 5 ]]
+ do
+ pause 0.01
+ check_isolcpus "."
+ [[ $? -eq 0 ]] && return 0
+ ((RETRY++))
+ done
+ echo "Unexpected isolated CPUs: $ISOLCPUS"
+ dump_states
+ exit 1
}
#
@@ -776,7 +780,7 @@ run_state_test()
#
NEWLIST=$(cat cpuset.cpus.effective)
RETRY=0
- while [[ $NEWLIST != $CPULIST && $RETRY -lt 5 ]]
+ while [[ $NEWLIST != $CPULIST && $RETRY -lt 8 ]]
do
# Wait a bit longer & recheck a few times
pause 0.01
--
2.39.3
This patchset adds two kfunc helpers, bpf_xdp_get_xfrm_state() and
bpf_xdp_xfrm_state_release() that wrap xfrm_state_lookup() and
xfrm_state_put(). The intent is to support software RSS (via XDP) for
the ongoing/upcoming ipsec pcpu work [0]. Recent experiments performed
on (hopefully) reproducible AWS testbeds indicate that single tunnel
pcpu ipsec can reach line rate on 100G ENA nics.
Note this patchset only tests/shows generic xfrm_state access. The
"secret sauce" (if you can really even call it that) involves accessing
a soon-to-be-upstreamed pcpu_num field in xfrm_state. Early example is
available here [1].
[0]: https://datatracker.ietf.org/doc/draft-ietf-ipsecme-multi-sa-performance/03/
[1]: https://github.com/danobi/xdp-tools/blob/e89a1c617aba3b50d990f779357d6ce286…
Changes from v1:
* Move xfrm tunnel tests to test_progs
* Fix writing to opts->error when opts is invalid
* Use __bpf_kfunc_start_defs()
* Remove unused vxlanhdr definition
* Add and use BPF_CORE_WRITE_BITFIELD() macro
* Make series bisect clean
Changes from RFCv2:
* Rebased to ipsec-next
* Fix netns leak
Changes from RFCv1:
* Add Antony's commit tags
* Add KF_ACQUIRE and KF_RELEASE semantics
Daniel Xu (6):
bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc
bpf: xfrm: Add bpf_xdp_xfrm_state_release() kfunc
libbpf: Add BPF_CORE_WRITE_BITFIELD() macro
bpf: selftests: test_tunnel: Use vmlinux.h declarations
bpf: selftests: Move xfrm tunnel test to test_progs
bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state()
include/net/xfrm.h | 9 +
net/xfrm/Makefile | 1 +
net/xfrm/xfrm_policy.c | 2 +
net/xfrm/xfrm_state_bpf.c | 128 +++++++++++++++
tools/lib/bpf/bpf_core_read.h | 36 ++++
.../selftests/bpf/prog_tests/test_tunnel.c | 155 ++++++++++++++++++
.../selftests/bpf/progs/bpf_tracing_net.h | 1 +
.../selftests/bpf/progs/test_tunnel_kern.c | 138 +++++++++-------
tools/testing/selftests/bpf/test_tunnel.sh | 92 -----------
9 files changed, 412 insertions(+), 150 deletions(-)
create mode 100644 net/xfrm/xfrm_state_bpf.c
--
2.42.1
This patchset adds two kfunc helpers, bpf_xdp_get_xfrm_state() and
bpf_xdp_xfrm_state_release() that wrap xfrm_state_lookup() and
xfrm_state_put(). The intent is to support software RSS (via XDP) for
the ongoing/upcoming ipsec pcpu work [0]. Recent experiments performed
on (hopefully) reproducible AWS testbeds indicate that single tunnel
pcpu ipsec can reach line rate on 100G ENA nics.
Note this patchset only tests/shows generic xfrm_state access. The
"secret sauce" (if you can really even call it that) involves accessing
a soon-to-be-upstreamed pcpu_num field in xfrm_state. Early example is
available here [1].
[0]: https://datatracker.ietf.org/doc/draft-ietf-ipsecme-multi-sa-performance/03/
[1]: https://github.com/danobi/xdp-tools/blob/e89a1c617aba3b50d990f779357d6ce286…
Changes from RFCv2:
* Rebased to ipsec-next
* Fix netns leak
Changes from RFCv1:
* Add Antony's commit tags
* Add KF_ACQUIRE and KF_RELEASE semantics
Daniel Xu (7):
bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc
bpf: xfrm: Add bpf_xdp_xfrm_state_release() kfunc
bpf: selftests: test_tunnel: Use ping -6 over ping6
bpf: selftests: test_tunnel: Mount bpffs if necessary
bpf: selftests: test_tunnel: Use vmlinux.h declarations
bpf: selftests: test_tunnel: Disable CO-RE relocations
bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state()
include/net/xfrm.h | 9 ++
net/xfrm/Makefile | 1 +
net/xfrm/xfrm_policy.c | 2 +
net/xfrm/xfrm_state_bpf.c | 127 ++++++++++++++++++
.../selftests/bpf/progs/bpf_tracing_net.h | 1 +
.../selftests/bpf/progs/test_tunnel_kern.c | 98 ++++++++------
tools/testing/selftests/bpf/test_tunnel.sh | 43 ++++--
7 files changed, 227 insertions(+), 54 deletions(-)
create mode 100644 net/xfrm/xfrm_state_bpf.c
--
2.42.1
We have started printing more and more intentional stack traces. Whether
it's testing KASAN is able to detect use after frees or it's part of a
kunit test.
These stack traces can be problematic. They suddenly show up as a new
failure. Now the test team has to contact the developers. A bunch of
people have to investigate the bug. We finally decide that it's
intentional so now the test team has to update their filter scripts to
mark it as intentional. These filters are ad-hoc because there is no
standard format for warnings.
A better way would be to mark it as intentional from the start.
Here, I have marked the beginning and the end of the trace. It's more
tricky for things like lkdtm_FORTIFY_MEM_MEMBER() where the flow doesn't
reach the end of the function. I guess I would print a different
warning for stack traces that can't have a
"Intentional warning finished\n" message at the end.
I haven't actually tested this patch... Daniel, do you have a
list of intentional stack traces we could annotate?
Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org>
---
drivers/gpu/drm/tests/drm_rect_test.c | 2 ++
include/kunit/test.h | 3 +++
2 files changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/tests/drm_rect_test.c b/drivers/gpu/drm/tests/drm_rect_test.c
index 76332cd2ead8..367738254493 100644
--- a/drivers/gpu/drm/tests/drm_rect_test.c
+++ b/drivers/gpu/drm/tests/drm_rect_test.c
@@ -409,8 +409,10 @@ static void drm_test_rect_calc_hscale(struct kunit *test)
const struct drm_rect_scale_case *params = test->param_value;
int scaling_factor;
+ START_INTENTIONAL_WARNING();
scaling_factor = drm_rect_calc_hscale(¶ms->src, ¶ms->dst,
params->min_range, params->max_range);
+ END_INTENTIONAL_WARNING();
KUNIT_EXPECT_EQ(test, scaling_factor, params->expected_scaling_factor);
}
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 20ed9f9275c9..1f01d4c81055 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -337,6 +337,9 @@ void __kunit_test_suites_exit(struct kunit_suite **suites, int num_suites);
void kunit_exec_run_tests(struct kunit_suite_set *suite_set, bool builtin);
void kunit_exec_list_tests(struct kunit_suite_set *suite_set, bool include_attr);
+#define START_INTENTIONAL_WARNING() pr_info("Triggering a stack trace\n")
+#define END_INTENTIONAL_WARNING() pr_info("Intentional warning finished\n")
+
#if IS_BUILTIN(CONFIG_KUNIT)
int kunit_run_all_tests(void);
#else
--
2.42.0
KUnit's deferred action API accepts a void(*)(void *) function pointer
which is called when the test is exited. However, we very frequently
want to use existing functions which accept a single pointer, but which
may not be of type void*. While this is probably dodgy enough to be on
the wrong side of the C standard, it's been often used for similar
callbacks, and gcc's -Wcast-function-type seems to ignore cases where
the only difference is the type of the argument, assuming it's
compatible (i.e., they're both pointers to data).
However, clang 16 has introduced -Wcast-function-type-strict, which no
longer permits any deviation in function pointer type. This seems to be
because it'd break CFI, which validates the type of function calls.
This rather ruins our attempts to cast functions to defer them, and
leaves us with a few options. The one we've chosen is to implement a
macro which will generate a wrapper function which accepts a void*, and
casts the argument to the appropriate type.
For example, if you were trying to wrap:
void foo_close(struct foo *handle);
you could use:
KUNIT_DEFINE_ACTION_WRAPPER(kunit_action_foo_close,
foo_close,
struct foo *);
This would create a new kunit_action_foo_close() function, of type
kunit_action_t, which could be passed into kunit_add_action() and
similar functions.
In addition to defining this macro, update KUnit and its tests to use
it.
Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Reviewed-by: Nathan Chancellor <nathan(a)kernel.org>
Tested-by: Nathan Chancellor <nathan(a)kernel.org>
Acked-by: Daniel Vetter <daniel(a)ffwll.ch>
Reviewed-by: Maxime Ripard <mripard(a)kernel.org>
Signed-off-by: David Gow <davidgow(a)google.com>
---
Thanks everyone for testing v1 of this: this update only changes
documentation.
Changes since v1:
https://lore.kernel.org/linux-kselftest/20231110200830.1832556-1-davidgow@g…
- Update the usage.rst documentation (Thanks, Nathan)
- Add a better doc comment for KUNIT_DEFINE_ACTION_WRAPPER()
---
Documentation/dev-tools/kunit/usage.rst | 10 +++++++---
include/kunit/resource.h | 21 +++++++++++++++++++++
lib/kunit/kunit-test.c | 5 +----
lib/kunit/test.c | 6 ++++--
4 files changed, 33 insertions(+), 9 deletions(-)
diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst
index c27e1646ecd9..9db12e91668e 100644
--- a/Documentation/dev-tools/kunit/usage.rst
+++ b/Documentation/dev-tools/kunit/usage.rst
@@ -651,12 +651,16 @@ For example:
}
Note that, for functions like device_unregister which only accept a single
-pointer-sized argument, it's possible to directly cast that function to
-a ``kunit_action_t`` rather than writing a wrapper function, for example:
+pointer-sized argument, it's possible to automatically generate a wrapper
+with the ``KUNIT_DEFINE_ACTION_WRAPPER()`` macro, for example:
.. code-block:: C
- kunit_add_action(test, (kunit_action_t *)&device_unregister, &dev);
+ KUNIT_DEFINE_ACTION_WRAPPER(device_unregister, device_unregister_wrapper, struct device *);
+ kunit_add_action(test, &device_unregister_wrapper, &dev);
+
+You should do this in preference to manually casting to the ``kunit_action_t`` type,
+as casting function pointers will break Control Flow Integrity (CFI).
``kunit_add_action`` can fail if, for example, the system is out of memory.
You can use ``kunit_add_action_or_reset`` instead which runs the action
diff --git a/include/kunit/resource.h b/include/kunit/resource.h
index c7383e90f5c9..4ad69a2642a5 100644
--- a/include/kunit/resource.h
+++ b/include/kunit/resource.h
@@ -390,6 +390,27 @@ void kunit_remove_resource(struct kunit *test, struct kunit_resource *res);
/* A 'deferred action' function to be used with kunit_add_action. */
typedef void (kunit_action_t)(void *);
+/**
+ * KUNIT_DEFINE_ACTION_WRAPPER() - Wrap a function for use as a deferred action.
+ *
+ * @wrapper: The name of the new wrapper function define.
+ * @orig: The original function to wrap.
+ * @arg_type: The type of the argument accepted by @orig.
+ *
+ * Defines a wrapper for a function which accepts a single, pointer-sized
+ * argument. This wrapper can then be passed to kunit_add_action() and
+ * similar. This should be used in preference to casting a function
+ * directly to kunit_action_t, as casting function pointers will break
+ * control flow integrity (CFI), leading to crashes.
+ */
+#define KUNIT_DEFINE_ACTION_WRAPPER(wrapper, orig, arg_type) \
+ static void wrapper(void *in) \
+ { \
+ arg_type arg = (arg_type)in; \
+ orig(arg); \
+ }
+
+
/**
* kunit_add_action() - Call a function when the test ends.
* @test: Test case to associate the action with.
diff --git a/lib/kunit/kunit-test.c b/lib/kunit/kunit-test.c
index 99d2a3a528e1..3e9c5192d095 100644
--- a/lib/kunit/kunit-test.c
+++ b/lib/kunit/kunit-test.c
@@ -538,10 +538,7 @@ static struct kunit_suite kunit_resource_test_suite = {
#if IS_BUILTIN(CONFIG_KUNIT_TEST)
/* This avoids a cast warning if kfree() is passed direct to kunit_add_action(). */
-static void kfree_wrapper(void *p)
-{
- kfree(p);
-}
+KUNIT_DEFINE_ACTION_WRAPPER(kfree_wrapper, kfree, const void *);
static void kunit_log_test(struct kunit *test)
{
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index f2eb71f1a66c..0308865194bb 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -772,6 +772,8 @@ static struct notifier_block kunit_mod_nb = {
};
#endif
+KUNIT_DEFINE_ACTION_WRAPPER(kfree_action_wrapper, kfree, const void *)
+
void *kunit_kmalloc_array(struct kunit *test, size_t n, size_t size, gfp_t gfp)
{
void *data;
@@ -781,7 +783,7 @@ void *kunit_kmalloc_array(struct kunit *test, size_t n, size_t size, gfp_t gfp)
if (!data)
return NULL;
- if (kunit_add_action_or_reset(test, (kunit_action_t *)kfree, data) != 0)
+ if (kunit_add_action_or_reset(test, kfree_action_wrapper, data) != 0)
return NULL;
return data;
@@ -793,7 +795,7 @@ void kunit_kfree(struct kunit *test, const void *ptr)
if (!ptr)
return;
- kunit_release_action(test, (kunit_action_t *)kfree, (void *)ptr);
+ kunit_release_action(test, kfree_action_wrapper, (void *)ptr);
}
EXPORT_SYMBOL_GPL(kunit_kfree);
--
2.43.0.rc1.413.gea7ed67945-goog
From: Willem de Bruijn <willemb(a)google.com>
Observed a clang warning when backporting cmsg_sender.
Ran the same build against all the .c files under selftests/net.
This is clang-14 with -Wall
Which is what tools/testing/selftests/net/Makefile also enables.
Willem de Bruijn (4):
selftests/net: ipsec: fix constant out of range
selftests/net: fix a char signedness issue
selftests/net: unix: fix unused variable compiler warning
selftests/net: mptcp: fix uninitialized variable warnings
tools/testing/selftests/net/af_unix/diag_uid.c | 1 -
tools/testing/selftests/net/cmsg_sender.c | 2 +-
tools/testing/selftests/net/ipsec.c | 4 ++--
tools/testing/selftests/net/mptcp/mptcp_connect.c | 11 ++++-------
tools/testing/selftests/net/mptcp/mptcp_inq.c | 11 ++++-------
5 files changed, 11 insertions(+), 18 deletions(-)
--
2.43.0.rc1.413.gea7ed67945-goog
From: Christoph Müllner <christoph.muellner(a)vrull.eu>
The upcoming RISC-V Ssdtso specification introduces a bit in the senvcfg
CSR to switch the memory consistency model at run-time from RVWMO to TSO
(and back). The active consistency model can therefore be switched on a
per-hart base and managed by the kernel on a per-process/thread base.
This patch implements basic Ssdtso support and adds a prctl API on top
so that user-space processes can switch to a stronger memory consistency
model (than the kernel was written for) at run-time.
I am not sure if other architectures support switching the memory
consistency model at run-time, but designing the prctl API in an
arch-independent way allows reusing it in the future.
The patchset also comes with a short documentation of the prctl API.
This series is based on the second draft of the Ssdtso specification
which was published recently on an RVI list:
https://lists.riscv.org/g/tech-arch-review/message/183
Note, that the Ssdtso specification is in development state
(i.e., not frozen or even ratified) which is also the reason
why I marked the series as RFC.
One aspect that is not covered in this patchset is virtualization.
It is planned to add virtualization support in a later version.
Hints/suggestions on how to implement this part are very much
appreciated.
Christoph Müllner (5):
RISC-V: Add basic Ssdtso support
RISC-V: Expose Ssdtso via hwprobe API
uapi: prctl: Add new prctl call to set/get the memory consistency
model
RISC-V: Implement prctl call to set/get the memory consistency model
RISC-V: selftests: Add DTSO tests
Documentation/arch/riscv/hwprobe.rst | 3 +
.../mm/dynamic-memory-consistency-model.rst | 76 ++++++++++++++++++
arch/riscv/Kconfig | 10 +++
arch/riscv/include/asm/csr.h | 1 +
arch/riscv/include/asm/dtso.h | 74 ++++++++++++++++++
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/include/asm/processor.h | 8 ++
arch/riscv/include/asm/switch_to.h | 3 +
arch/riscv/include/uapi/asm/hwprobe.h | 1 +
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/cpufeature.c | 1 +
arch/riscv/kernel/dtso.c | 33 ++++++++
arch/riscv/kernel/process.c | 4 +
arch/riscv/kernel/sys_riscv.c | 1 +
include/uapi/linux/prctl.h | 5 ++
kernel/sys.c | 12 +++
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/dtso/.gitignore | 1 +
tools/testing/selftests/riscv/dtso/Makefile | 11 +++
tools/testing/selftests/riscv/dtso/dtso.c | 77 +++++++++++++++++++
20 files changed, 324 insertions(+), 1 deletion(-)
create mode 100644 Documentation/mm/dynamic-memory-consistency-model.rst
create mode 100644 arch/riscv/include/asm/dtso.h
create mode 100644 arch/riscv/kernel/dtso.c
create mode 100644 tools/testing/selftests/riscv/dtso/.gitignore
create mode 100644 tools/testing/selftests/riscv/dtso/Makefile
create mode 100644 tools/testing/selftests/riscv/dtso/dtso.c
--
2.41.0
As Guillaume pointed, many selftests create namespaces with very common
names (like "client" or "server") or even (partially) run directly in init_net.
This makes these tests prone to failure if another namespace with the same
name already exists. It also makes it impossible to run several instances
of these tests in parallel.
This patch set conver all the net selftests to run in unique namespace,
so we can update the selftest freamwork to run all tests in it's own namespace
in parallel. After update, we only need to wait for the test which need
longest time.
]# per_test_logging=1 time ./run_kselftest.sh -n -c net
TAP version 13
# selftests: net: reuseport_bpf_numa
not ok 3 selftests: net: reuseport_bpf_numa # exit=1
# selftests: net: reuseport_bpf_cpu
not ok 2 selftests: net: reuseport_bpf_cpu # exit=1
# selftests: net: reuseport_dualstack
not ok 4 selftests: net: reuseport_dualstack # exit=1
# selftests: net: reuseaddr_conflict
ok 5 selftests: net: reuseaddr_conflict
...
# selftests: net: test_vxlan_mdb.sh
ok 90 selftests: net: test_vxlan_mdb.sh
# selftests: net: fib_nexthops.sh
not ok 41 selftests: net: fib_nexthops.sh # exit=1
# selftests: net: fcnal-test.sh
not ok 36 selftests: net: fcnal-test.sh # exit=1
real 55m1.238s
user 12m10.350s
sys 22m17.432s
Hangbin Liu (38):
selftests/net: add lib.sh
selftests/net: arp_ndisc_evict_nocarrier.sh convert to run test in
unique namespace
selftest: arp_ndisc_untracked_subnets.sh convert to run test in unique
namespace
selftests/net: convert cmsg tests to make them run in unique namespace
selftests/net: convert drop_monitor_tests.sh to run it in unique
namespace
selftests/net: convert fcnal-test.sh to run it in unique namespace
selftests/net: convert fib_nexthop_multiprefix to run it in unique
namespace
selftests/net: convert fib_nexthop_nongw.sh to run it in unique
namespace
selftests/net: convert fib_nexthops.sh to run it in unique namespace
selftests/net: convert fib-onlink-tests.sh to run it in unique
namespace
selftests/net: convert fib_rule_tests.sh to run it in unique namespace
selftests/net: convert fib_tests.sh to run it in unique namespace
selftests/net: convert gre_gso.sh to run it in unique namespace
selftests/net: convert icmp_redirect.sh to run it in unique namespace
sleftests/net: convert icmp.sh to run it in unique namespace
selftests/net: convert ioam6.sh to run it in unique namespace
selftests/net: convert l2tp.sh to run it in unique namespace
selftests/net: convert ndisc_unsolicited_na_test.sh to run it in
unique namespace
selftests/net: convert netns-name.sh to run it in unique namespace
selftests/net: convert fdb_flush.sh to run it in unique namespace
selftests/net: convert rtnetlink.sh to run it in unique namespace
selftests/net: convert sctp_vrf.sh to run it in unique namespace
selftests/net: use unique netns name for setup_loopback.sh
setup_veth.sh
selftests/net: convert stress_reuseport_listen.sh to run it in unique
namespace
selftests/net: convert test_bridge_backup_port.sh to run it in unique
namespace
selftests/net: convert test_bridge_neigh_suppress.sh to run it in
unique namespace
selftests/net: convert test_vxlan_mdb.sh to run it in unique namespace
selftests/net: convert test_vxlan_nolocalbypass.sh to run it in unique
namespace
selftests/net: convert test_vxlan_under_vrf.sh to run it in unique
namespace
selftests/net: convert test_vxlan_vnifiltering.sh to run it in unique
namespace
selftests/net: convert toeplitz.sh to run it in unique namespace
selftests/net: convert unicast_extensions.sh to run it in unique
namespace
selftests/net: convert vrf_route_leaking.sh to run it in unique
namespace
selftests/net: convert vrf_strict_mode_test.sh to run it in unique
namespace
selftests/net: convert vrf-xfrm-tests.sh to run it in unique namespace
selftests/net: convert traceroute.sh to run it in unique namespace
selftests/net: convert xfrm_policy.sh to run it in unique namespace
kselftest/runner.sh: add netns support
tools/testing/selftests/kselftest/runner.sh | 26 +-
tools/testing/selftests/net/Makefile | 2 +-
.../net/arp_ndisc_evict_nocarrier.sh | 46 +--
.../net/arp_ndisc_untracked_subnets.sh | 18 +-
tools/testing/selftests/net/cmsg_ipv6.sh | 10 +-
tools/testing/selftests/net/cmsg_so_mark.sh | 7 +-
tools/testing/selftests/net/cmsg_time.sh | 7 +-
.../selftests/net/drop_monitor_tests.sh | 21 +-
tools/testing/selftests/net/fcnal-test.sh | 30 +-
tools/testing/selftests/net/fdb_flush.sh | 11 +-
.../testing/selftests/net/fib-onlink-tests.sh | 7 +-
.../selftests/net/fib_nexthop_multiprefix.sh | 104 +++--
.../selftests/net/fib_nexthop_nongw.sh | 34 +-
tools/testing/selftests/net/fib_nexthops.sh | 142 ++++---
tools/testing/selftests/net/fib_rule_tests.sh | 36 +-
tools/testing/selftests/net/fib_tests.sh | 184 +++++----
tools/testing/selftests/net/gre_gso.sh | 18 +-
tools/testing/selftests/net/icmp.sh | 10 +-
tools/testing/selftests/net/icmp_redirect.sh | 182 +++++----
tools/testing/selftests/net/ioam6.sh | 247 ++++++------
tools/testing/selftests/net/l2tp.sh | 130 +++----
tools/testing/selftests/net/lib.sh | 98 +++++
.../net/ndisc_unsolicited_na_test.sh | 19 +-
tools/testing/selftests/net/netns-name.sh | 44 +--
tools/testing/selftests/net/rtnetlink.sh | 21 +-
tools/testing/selftests/net/sctp_vrf.sh | 12 +-
tools/testing/selftests/net/settings | 2 +-
tools/testing/selftests/net/setup_loopback.sh | 8 +-
tools/testing/selftests/net/setup_veth.sh | 9 +-
.../selftests/net/stress_reuseport_listen.sh | 6 +-
.../selftests/net/test_bridge_backup_port.sh | 368 +++++++++---------
.../net/test_bridge_neigh_suppress.sh | 333 ++++++++--------
tools/testing/selftests/net/test_vxlan_mdb.sh | 202 +++++-----
.../selftests/net/test_vxlan_nolocalbypass.sh | 48 ++-
.../selftests/net/test_vxlan_under_vrf.sh | 70 ++--
.../selftests/net/test_vxlan_vnifiltering.sh | 154 +++++---
tools/testing/selftests/net/toeplitz.sh | 16 +-
tools/testing/selftests/net/traceroute.sh | 82 ++--
.../selftests/net/unicast_extensions.sh | 99 +++--
tools/testing/selftests/net/vrf-xfrm-tests.sh | 77 ++--
.../selftests/net/vrf_route_leaking.sh | 201 +++++-----
.../selftests/net/vrf_strict_mode_test.sh | 47 ++-
tools/testing/selftests/net/xfrm_policy.sh | 138 +++----
tools/testing/selftests/run_kselftest.sh | 4 +
44 files changed, 1676 insertions(+), 1654 deletions(-)
create mode 100644 tools/testing/selftests/net/lib.sh
--
2.41.0
Hi,
On Mon, Nov 27, 2023 at 11:49:16AM +0000, Felix Huettner wrote:
> conntrack zones are heavily used by tools like openvswitch to run
> multiple virtual "routers" on a single machine. In this context each
> conntrack zone matches to a single router, thereby preventing
> overlapping IPs from becoming issues.
> In these systems it is common to operate on all conntrack entries of a
> given zone, e.g. to delete them when a router is deleted. Previously this
> required these tools to dump the full conntrack table and filter out the
> relevant entries in userspace potentially causing performance issues.
>
> To do this we reuse the existing CTA_ZONE attribute. This was previous
> parsed but not used during dump and flush requests. Now if CTA_ZONE is
> set we filter these operations based on the provided zone.
> However this means that users that previously passed CTA_ZONE will
> experience a difference in functionality.
>
> Alternatively CTA_FILTER could have been used for the same
> functionality. However it is not yet supported during flush requests and
> is only available when using AF_INET or AF_INET6.
You mean, AF_UNSPEC cannot be specified in CTA_FILTER?
Please, extend libnetfilter_conntrack to support for this feature,
there is a filter API that can be used for this purpose.
Thanks.
On Fri, Nov 24, 2023 at 12:04:09PM +0100, Jonas Oberhauser wrote:
> > I think ARM64 approached this problem by adding the
> > load-acquire/store-release instructions and for TSO based code,
> > translate into those (eg. x86 -> arm64 transpilers).
>
>
> Although those instructions have a bit more ordering constraints.
>
> I have heard rumors that the apple chips also have a register that can be
> set at runtime.
Oh, I thought they made do with the load-acquire/store-release thingies.
But to be fair, I haven't been paying *that* much attention to the apple
stuff.
I did read about how they fudged some of the x86 flags thing.
> And there are some IBM machines that have a setting, but not sure how it is
> controlled.
Cute, I'm assuming this is the Power series (s390 already being TSO)? I
wasn't aware they had this.
> > IIRC Risc-V actually has such instructions as well, so *why* are you
> > doing this?!?!
>
>
> Unfortunately, at least last time I checked RISC-V still hadn't gotten such
> instructions.
> What they have is the *semantics* of the instructions, but no actual opcodes
> to encode them.
Well, that sucks..
> I argued for them in the RISC-V memory group, but it was considered to be
> outside the scope of that group.
>
> Transpiling with sufficient DMB ISH to get the desired ordering is really
> bad for performance.
Ha!, quite dreadful I would imagine.
> That is not to say that linux should support this. Perhaps linux should
> pressure RISC-V into supporting implicit barriers instead.
I'm not sure I count for much in this regard, but yeah, that sounds like
a plan :-)
The series adds support for setrlimit/getrlimit.
Mainly to avoid spurious coredumps when running the tests under
qemu-user.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (3):
tools/nolibc: drop custom definition of struct rusage
tools/nolibc: add support for getrlimit/setrlimit
selftests/nolibc: disable coredump via setrlimit
tools/include/nolibc/sys.h | 38 ++++++++++++++++++++++++++++
tools/include/nolibc/types.h | 21 +--------------
tools/testing/selftests/nolibc/nolibc-test.c | 31 +++++++++++++++++++++++
3 files changed, 70 insertions(+), 20 deletions(-)
---
base-commit: 0dbd4651f3f80151910a36416fa0df28a10c3b0a
change-id: 20231122-nolibc-rlimit-bb5b1f264fc4
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
In public cloud scenario, if kdump service works abnormally,
users cannot get vmcore. Without vmcore, user has no idea why the
kernel crashed. Meanwhile, there is no additional information
to find the reason why the kdump service is abnormal.
One way is to obtain console messages through VNC. The drawback
is that VNC is real-time, if user missed the timing to get the VNC
output, the crash needs to be retriggered.
Another way is to enable the console frontend of pstore and record the
console messages to the pstore backend. On the one hand, the console
logs only contain kernel printk logs and does not cover
user-mode print logs. Although we can redirect user-mode logs to the
pmsg frontend provided by pstore, user-mode information related to
booting and kdump service vary from systemd, kdump.sh, and so on which
makes redirection troublesome. So we added a tty frontend and save all
logs of tty driver to the pstore backend.
Another problem is that currently pstore only supports a single backend.
For debugging kdump problems, we hope to save the console logs and tty
logs to the ramoops backend of pstore, as it will not be lost after
rebooting. If the user has enabled another backend, the ramoops backend
will not be registered. To this end, we add the multi-backend function
to support simultaneous registration of multiple backends.
Based on the above changes, we can enable pstore in the crashdump kernel
and save the console logs and tty logs to the ramoops backend of pstore.
After rebooting, we can view the relevant logs by mounting the pstore
file system.
Furthermore, we also modified kexec-tools referring to crash-utils for
reading memory, so that pstore ramoops information can be read without
enabling pstore in first kernel. As we set the address and size of ramoops,
as well as the sizes of console and tty, we can infer the physical address
of console logs and tty logs in memory. Referring to the read method of
crash-utils, the console logs and tty logs are read from the memory,
user can get pstore debug information without affecting the first kernel
at all.
kexec-tools modification can be seen at
https://github.com/shuyuanmen/kexec-tools/blob/main/Add-pstore-segment.patch
Yuanhe Shu (5):
pstore: add tty frontend
pstore: add multi-backends support
pstore: add subdirs for multi-backends
pstore: remove the module parameter "backend"
tools/pstore: update pstore selftests
drivers/tty/n_tty.c | 1 +
fs/pstore/Kconfig | 23 ++
fs/pstore/Makefile | 2 +
fs/pstore/blk.c | 10 +
fs/pstore/ftrace.c | 22 +-
fs/pstore/inode.c | 86 ++++++-
fs/pstore/internal.h | 16 +-
fs/pstore/platform.c | 238 ++++++++++++--------
fs/pstore/pmsg.c | 23 +-
fs/pstore/ram.c | 40 +++-
fs/pstore/tty.c | 56 +++++
fs/pstore/zone.c | 42 +++-
include/linux/pstore.h | 33 +++
include/linux/pstore_blk.h | 3 +
include/linux/pstore_ram.h | 1 +
include/linux/pstore_zone.h | 2 +
include/linux/tty.h | 14 ++
tools/testing/selftests/pstore/common_tests | 4 -
18 files changed, 500 insertions(+), 116 deletions(-)
create mode 100644 fs/pstore/tty.c
--
2.39.3
Regressions that prevent a driver from probing a device can significantly
affect the functionality of a platform.
A kselftest to verify if devices on a DT-based platform are probed
correctly was recently introduced [1], but no such generic test is
available for ACPI platforms yet. bootrr [2] provides device probe
testing, but relies on a pre-defined list of the peripherals present on
each DUT.
On ACPI based hardware, a complete description of the platform is
provided to the OS by the system firmware. ACPI namespace objects are
mapped by the Linux ACPI subsystem into a device tree in
/sys/devices/LNXSYSTEM:00; the information in this subtree can be parsed
to build a list of the hw peripherals present on the DUT dynamically.
This series adds a test to verify if the devices declared in the ACPI
namespace and supported by the kernel are probed correctly.
This work follows a similar approach to [1], adapted for the ACPI use
case.
The first patch introduces a script that builds a list of all ACPI device
IDs supported by the kernel, by inspecting the acpi_device_id structs in
the sources. This list can be used to avoid testing ACPI-enumerated
devices that don't have a matching driver in the kernel. This script was
highly inspired by the dt-extract-compatibles script [3].
In the second patch, a new kselftest is added. It parses the
/sys/devices/LNXSYSTEM:00 tree to obtain a list of all platform
peripherals and verifies which of those, if supported, are correctly
bound to a driver.
Feedback is much appreciated,
Thank you,
Laura
[1] https://lore.kernel.org/all/20230828211424.2964562-1-nfraprado@collabora.co…
[2] https://github.com/kernelci/bootr
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scr…
Laura Nao (2):
acpi: Add script to extract ACPI device ids in the kernel
kselftest: Add test to detect unprobed devices on ACPI platforms
MAINTAINERS | 2 +
scripts/acpi/acpi-extract-ids | 60 +++++++++++++++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/acpi/.gitignore | 2 +
tools/testing/selftests/acpi/Makefile | 23 ++++++
.../selftests/acpi/test_unprobed_devices.sh | 75 +++++++++++++++++++
6 files changed, 163 insertions(+)
create mode 100755 scripts/acpi/acpi-extract-ids
create mode 100644 tools/testing/selftests/acpi/.gitignore
create mode 100644 tools/testing/selftests/acpi/Makefile
create mode 100755 tools/testing/selftests/acpi/test_unprobed_devices.sh
--
2.30.2
The za-fork test does not output a newline when reporting the result of
the one test it runs, causing the counts printed by kselftest to be
included in the test name. Add the newline.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/fp/za-fork.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/fp/za-fork.c b/tools/testing/selftests/arm64/fp/za-fork.c
index b86cb1049497..587b94648222 100644
--- a/tools/testing/selftests/arm64/fp/za-fork.c
+++ b/tools/testing/selftests/arm64/fp/za-fork.c
@@ -85,7 +85,7 @@ int main(int argc, char **argv)
*/
ret = open("/proc/sys/abi/sme_default_vector_length", O_RDONLY, 0);
if (ret >= 0) {
- ksft_test_result(fork_test(), "fork_test");
+ ksft_test_result(fork_test(), "fork_test\n");
} else {
ksft_print_msg("SME not supported\n");
---
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
change-id: 20231115-arm64-fix-za-fork-output-21cdd7a7195c
Best regards,
--
Mark Brown <broonie(a)kernel.org>
commit 05f1edac8009 ("selftests/mm: run all tests from run_vmtests.sh")
fixed the inconsistancy caused by tests being defined as TEST_GEN_PROGS.
This issue was leading to tests not being executed via run_vmtests.sh and
furthermore some tests running twice due to the kselftests wrapper also
executing them.
Fix the definition of two tests (soft-dirty and pagemap_ioctl)
that are still incorrectly defined.
Signed-off-by: Nico Pache <npache(a)redhat.com>
---
tools/testing/selftests/mm/Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 78dfec8bc676..dede0bcf97a3 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -60,7 +60,7 @@ TEST_GEN_FILES += mrelease_test
TEST_GEN_FILES += mremap_dontunmap
TEST_GEN_FILES += mremap_test
TEST_GEN_FILES += on-fault-limit
-TEST_GEN_PROGS += pagemap_ioctl
+TEST_GEN_FILES += pagemap_ioctl
TEST_GEN_FILES += thuge-gen
TEST_GEN_FILES += transhuge-stress
TEST_GEN_FILES += uffd-stress
@@ -72,7 +72,7 @@ TEST_GEN_FILES += mdwe_test
TEST_GEN_FILES += hugetlb_fault_after_madv
ifneq ($(ARCH),arm64)
-TEST_GEN_PROGS += soft-dirty
+TEST_GEN_FILES += soft-dirty
endif
ifeq ($(ARCH),x86_64)
--
2.41.0
Intel SIOV allows creating virtual devices of which the vRID is
represented by a pasid of a physical device. It is called as SIOV
virtual device in this series. Such devices can be bound to an iommufd
as physical device does and then later be attached to an IOAS/hwpt
using that pasid. Such PASIDs are called as default pasid.
iommufd has already supported pasid attach[1]. So a simple way to
support SIOV virtual device attachment is to let device driver call
the iommufd_device_pasid_attach() and pass in the default pasid for
the virtual device. This should work for now, but it may have problem
if iommufd core wants to differentiate the default pasids with other
kind of pasids (e.g. pasid given by userspace). In the later forwarding
page request to userspace, the default pasids are not supposed to send
to userspace as default pasids are mainly used by the SIOV device driver.
With above reason, this series chooses to have a new API to bind the
default pasid to iommufd, and extends the iommufd_device_attach() to
convert the attachment to be pasid attach with the default pasid. Device
drivers (e.g. VFIO) that support SIOV shall call the below APIs to
interact with iommufd:
- iommufd_device_bind_pasid(): Bind virtual device (a pasid of a device)
to iommufd;
- iommufd_device_attach(): Attach a SIOV virtual device to IOAS/HWPT;
- iommufd_device_replace(): Replace IOAS/HWPT of a SIOV virtual device;
- iommufd_device_detach(): Detach IOAS/HWPT of a SIOV virtual device;
- iommufd_device_unbind(): Unbind virtual device from iommufd;
For vfio devices, the device drivers that support SIOV should:
- use below API to register vdev for SIOV virtual device
vfio_register_pasid_iommu_dev()
- use below API to bind vdev to iommufd in .bind_iommufd() callback
iommufd_device_bind_pasid()
- allocate pasid by itself before calling iommufd_device_bind_pasid()
Complete code can be found at[2]
[1] https://lore.kernel.org/linux-iommu/20230926092651.17041-1-yi.l.liu@intel.c…
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_pasid_siov
Regards,
Yi Liu
Kevin Tian (5):
iommufd: Handle unsafe interrupts in a separate function
iommufd: Introduce iommufd_alloc_device()
iommufd: Add iommufd_device_bind_pasid()
iommufd: Support attach/replace for SIOV virtual device {dev, pasid}
vfio: Add vfio_register_pasid_iommu_dev()
Yi Liu (2):
iommufd/selftest: Extend IOMMU_TEST_OP_MOCK_DOMAIN to pass in pasid
iommufd/selftest: Add test coverage for SIOV virtual device
drivers/iommu/iommufd/device.c | 163 ++++++++++++++----
drivers/iommu/iommufd/iommufd_private.h | 7 +
drivers/iommu/iommufd/iommufd_test.h | 2 +
drivers/iommu/iommufd/selftest.c | 10 +-
drivers/vfio/group.c | 18 ++
drivers/vfio/vfio.h | 8 +
drivers/vfio/vfio_main.c | 10 ++
include/linux/iommufd.h | 3 +
include/linux/vfio.h | 1 +
tools/testing/selftests/iommu/iommufd.c | 75 ++++++--
.../selftests/iommu/iommufd_fail_nth.c | 42 ++++-
tools/testing/selftests/iommu/iommufd_utils.h | 21 ++-
12 files changed, 296 insertions(+), 64 deletions(-)
--
2.34.1
Multiple files/programs in `tools/testing/selftests/bpf/prog_tests/` still
heavily use the `CHECK` macro, even when better `ASSERT_` alternatives are
available.
As it was already pointed out by Yonghong Song [1] in the bpf selftests the use
of the ASSERT_* series of macros is preferred over the CHECK macro.
This patchset replaces the usage of `CHECK(` macros to the equivalent `ASSERT_`
family of macros in the following prog_tests:
- bind_perm.c
- bpf_obj_id.c
- bpf_tcp_ca.c
- vmlinux.c
[1] https://lore.kernel.org/lkml/0a142924-633c-44e6-9a92-2dc019656bf2@linux.dev
Changes in v3:
- Addressed the following points mentioned by Yonghong Song
- Improved `bpf_map_lookup_elem` assertion in bpf_tcp_ca.
- Replaced assertion introduced in v2 with one that checks `thread_ret`
instead of `pthread_join`. This ensures that `server`'s return value
(thread_ret) is the one being checked, as oposed to `pthread_join`'s
return value, since the latter one is less likely to fail.
Changes in v2:
- Fixed pthread_join assertion that broke the previous test
Previous version:
v2 - https://lore.kernel.org/lkml/GV1PR10MB6563AECF8E94798A1E5B36A4E8B6A@GV1PR10…
v1 - https://lore.kernel.org/lkml/GV1PR10MB6563FCFF1C5DEBE84FEA985FE8B0A@GV1PR10…
Yuran Pereira (4):
Replaces the usage of CHECK calls for ASSERTs in bpf_tcp_ca
Replaces the usage of CHECK calls for ASSERTs in bind_perm
Replaces the usage of CHECK calls for ASSERTs in bpf_obj_id
selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in
vmlinux
.../selftests/bpf/prog_tests/bind_perm.c | 6 +-
.../selftests/bpf/prog_tests/bpf_obj_id.c | 204 +++++++-----------
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 48 ++---
.../selftests/bpf/prog_tests/vmlinux.c | 16 +-
4 files changed, 105 insertions(+), 169 deletions(-)
--
2.25.1
Changes from v1:
* Dropped some changes that were independently fixed[1]
* No longer separate the f strings to their own patch
* Use r strings when the value is a regular expression
* Updated verification script
In retrospect a script to find the instances and apply fixes isn't that
useful for review, so the attached script this time just looks for
differences in the AST. Apply the series and run the script, with
the two references to compare as arguments.
There are some intentional changes to the AST now though, as the r strings
turn '\t' from a single character tab into a backslash and 't' character
pair (similar for '\n'). This does not affect the correctness of the
regular expression though.
v1: https://lore.kernel.org/all/20230814060704.79655-1-bgray@linux.ibm.com/
[1]: https://lore.kernel.org/all/20230816122133.1231599-1-vishalc@linux.ibm.com/
---
#!/usr/bin/env python3
"""
Verify Python syntax trees are equivalent between two references
"""
import argparse
import ast
from pathlib import Path
import subprocess as sp
def read_file(path: Path, ref: str) -> str:
return sp.run(f"git show {ref}:{path}", stdout=sp.PIPE, shell=True, encoding="utf-8", check=True).stdout
parser = argparse.ArgumentParser("Compare Python ASTs between revisions")
parser.add_argument("ref1", type=str, help="First revision to use")
parser.add_argument("ref2", type=str, help="Second revision to use")
args = parser.parse_args()
for pyfile in Path(".").glob("**/*.py"):
try:
ref1_content = read_file(pyfile, args.ref1)
ref2_content = read_file(pyfile, args.ref2)
except Exception as e:
print(f"ERROR:{pyfile}: Failed to read ({e})")
continue
try:
ref1_syntax = ast.parse(ref1_content, filename=pyfile)
ref2_syntax = ast.parse(ref2_content, filename=pyfile)
except SyntaxError as e:
print(f"ERROR:{pyfile}: Failed to parse, is it Python3? ({e})")
continue
if ast.dump(ref1_syntax) != ast.dump(ref2_syntax):
print(f"ERROR:{pyfile}: Revisions have different AST")
cmd = f"diff <(git show {args.ref1}:{pyfile} | python -m ast) <(git show {args.ref2}:{pyfile} | python -m ast)"
print(cmd)
sp.run(cmd, shell=True)
continue
Benjamin Gray (7):
ia64: fix Python string escapes
Documentation/sphinx: fix Python string escapes
drivers/comedi: fix Python string escapes
scripts: fix Python string escapes
tools/perf: fix Python string escapes
tools/power: fix Python string escapes
selftests/bpf: fix Python string escapes
Documentation/sphinx/cdomain.py | 2 +-
Documentation/sphinx/kernel_abi.py | 2 +-
Documentation/sphinx/kernel_feat.py | 2 +-
Documentation/sphinx/kerneldoc.py | 2 +-
Documentation/sphinx/maintainers_include.py | 8 +++---
arch/ia64/scripts/unwcheck.py | 2 +-
.../ni_routing/tools/convert_csv_to_c.py | 2 +-
scripts/clang-tools/gen_compile_commands.py | 2 +-
scripts/gdb/linux/symbols.py | 2 +-
tools/perf/pmu-events/jevents.py | 2 +-
.../scripts/python/arm-cs-trace-disasm.py | 4 +--
tools/perf/scripts/python/compaction-times.py | 2 +-
.../scripts/python/exported-sql-viewer.py | 4 +--
tools/power/pm-graph/bootgraph.py | 12 ++++-----
.../selftests/bpf/test_bpftool_synctypes.py | 26 +++++++++----------
tools/testing/selftests/bpf/test_offload.py | 2 +-
16 files changed, 38 insertions(+), 38 deletions(-)
--
2.41.0
From: Willem de Bruijn <willemb(a)google.com>
Commit 29f834aa326e ("net_sched: sch_fq: add 3 bands and WRR
scheduling") introduces multiple traffic bands, and per-band maximum
packet count.
Per-band limits ensures that packets in one class cannot fill the
entire qdisc and so cause DoS to the traffic in the other classes.
Verify this behavior:
1. set the limit to 10 per band
2. send 20 pkts on band A: verify that 10 are queued, 10 dropped
3. send 20 pkts on band A: verify that 0 are queued, 20 dropped
4. send 20 pkts on band B: verify that 10 are queued, 10 dropped
Packets must remain queued for a period to trigger this behavior.
Use SO_TXTIME to store packets for 100 msec.
The test reuses existing upstream test infra. The script is a fork of
cmsg_time.sh. The scripts call cmsg_sender.
The test extends cmsg_sender with two arguments:
* '-P' SO_PRIORITY
There is a subtle difference between IPv4 and IPv6 stack behavior:
PF_INET/IP_TOS sets IP header bits and sk_priority
PF_INET6/IPV6_TCLASS sets IP header bits BUT NOT sk_priority
* '-n' num pkts
Send multiple packets in quick succession.
I first attempted a for loop in the script, but this is too slow in
virtualized environments, causing flakiness as the 100ms timeout is
reached and packets are dequeued.
Also do not wait for timestamps to be queued unless timestamps are
requested.
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
---
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/cmsg_sender.c | 50 ++++++++++------
.../testing/selftests/net/fq_band_pktlimit.sh | 57 +++++++++++++++++++
3 files changed, 91 insertions(+), 17 deletions(-)
create mode 100755 tools/testing/selftests/net/fq_band_pktlimit.sh
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 5b2aca4c5f10..9274edfb76ff 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -91,6 +91,7 @@ TEST_PROGS += test_bridge_neigh_suppress.sh
TEST_PROGS += test_vxlan_nolocalbypass.sh
TEST_PROGS += test_bridge_backup_port.sh
TEST_PROGS += fdb_flush.sh
+TEST_PROGS += fq_band_pktlimit.sh
TEST_FILES := settings
diff --git a/tools/testing/selftests/net/cmsg_sender.c b/tools/testing/selftests/net/cmsg_sender.c
index 24b21b15ed3f..8d7575389f58 100644
--- a/tools/testing/selftests/net/cmsg_sender.c
+++ b/tools/testing/selftests/net/cmsg_sender.c
@@ -45,11 +45,13 @@ struct options {
const char *host;
const char *service;
unsigned int size;
+ unsigned int num_pkt;
struct {
unsigned int mark;
unsigned int dontfrag;
unsigned int tclass;
unsigned int hlimit;
+ unsigned int priority;
} sockopt;
struct {
unsigned int family;
@@ -72,6 +74,7 @@ struct options {
} v6;
} opt = {
.size = 13,
+ .num_pkt = 1,
.sock = {
.family = AF_UNSPEC,
.type = SOCK_DGRAM,
@@ -112,7 +115,7 @@ static void cs_parse_args(int argc, char *argv[])
{
int o;
- while ((o = getopt(argc, argv, "46sS:p:m:M:d:tf:F:c:C:l:L:H:")) != -1) {
+ while ((o = getopt(argc, argv, "46sS:p:P:m:M:n:d:tf:F:c:C:l:L:H:")) != -1) {
switch (o) {
case 's':
opt.silent_send = true;
@@ -138,7 +141,9 @@ static void cs_parse_args(int argc, char *argv[])
cs_usage(argv[0]);
}
break;
-
+ case 'P':
+ opt.sockopt.priority = atoi(optarg);
+ break;
case 'm':
opt.mark.ena = true;
opt.mark.val = atoi(optarg);
@@ -146,6 +151,9 @@ static void cs_parse_args(int argc, char *argv[])
case 'M':
opt.sockopt.mark = atoi(optarg);
break;
+ case 'n':
+ opt.num_pkt = atoi(optarg);
+ break;
case 'd':
opt.txtime.ena = true;
opt.txtime.delay = atoi(optarg);
@@ -410,6 +418,10 @@ static void ca_set_sockopts(int fd)
setsockopt(fd, SOL_IPV6, IPV6_UNICAST_HOPS,
&opt.sockopt.hlimit, sizeof(opt.sockopt.hlimit)))
error(ERN_SOCKOPT, errno, "setsockopt IPV6_HOPLIMIT");
+ if (opt.sockopt.priority &&
+ setsockopt(fd, SOL_SOCKET, SO_PRIORITY,
+ &opt.sockopt.priority, sizeof(opt.sockopt.priority)))
+ error(ERN_SOCKOPT, errno, "setsockopt SO_PRIORITY");
}
int main(int argc, char *argv[])
@@ -421,6 +433,7 @@ int main(int argc, char *argv[])
char *buf;
int err;
int fd;
+ int i;
cs_parse_args(argc, argv);
@@ -480,24 +493,27 @@ int main(int argc, char *argv[])
cs_write_cmsg(fd, &msg, cbuf, sizeof(cbuf));
- err = sendmsg(fd, &msg, 0);
- if (err < 0) {
- if (!opt.silent_send)
- fprintf(stderr, "send failed: %s\n", strerror(errno));
- err = ERN_SEND;
- goto err_out;
- } else if (err != (int)opt.size) {
- fprintf(stderr, "short send\n");
- err = ERN_SEND_SHORT;
- goto err_out;
- } else {
- err = ERN_SUCCESS;
+ for (i = 0; i < opt.num_pkt; i++) {
+ err = sendmsg(fd, &msg, 0);
+ if (err < 0) {
+ if (!opt.silent_send)
+ fprintf(stderr, "send failed: %s\n", strerror(errno));
+ err = ERN_SEND;
+ goto err_out;
+ } else if (err != (int)opt.size) {
+ fprintf(stderr, "short send\n");
+ err = ERN_SEND_SHORT;
+ goto err_out;
+ }
}
+ err = ERN_SUCCESS;
- /* Make sure all timestamps have time to loop back */
- usleep(opt.txtime.delay);
+ if (opt.ts.ena) {
+ /* Make sure all timestamps have time to loop back */
+ usleep(opt.txtime.delay);
- cs_read_cmsg(fd, &msg, cbuf, sizeof(cbuf));
+ cs_read_cmsg(fd, &msg, cbuf, sizeof(cbuf));
+ }
err_out:
close(fd);
diff --git a/tools/testing/selftests/net/fq_band_pktlimit.sh b/tools/testing/selftests/net/fq_band_pktlimit.sh
new file mode 100755
index 000000000000..24b77bdf41ff
--- /dev/null
+++ b/tools/testing/selftests/net/fq_band_pktlimit.sh
@@ -0,0 +1,57 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Verify that FQ has a packet limit per band:
+#
+# 1. set the limit to 10 per band
+# 2. send 20 pkts on band A: verify that 10 are queued, 10 dropped
+# 3. send 20 pkts on band A: verify that 0 are queued, 20 dropped
+# 4. send 20 pkts on band B: verify that 10 are queued, 10 dropped
+#
+# Send packets with a 100ms delay to ensure that previously sent
+# packets are still queued when later ones are sent.
+# Use SO_TXTIME for this.
+
+die() {
+ echo "$1"
+ exit 1
+}
+
+# run inside private netns
+if [[ $# -eq 0 ]]; then
+ ./in_netns.sh "$0" __subprocess
+ exit
+fi
+
+ip link add type dummy
+ip link set dev dummy0 up
+ip -6 addr add fdaa::1/128 dev dummy0
+ip -6 route add fdaa::/64 dev dummy0
+tc qdisc replace dev dummy0 root handle 1: fq quantum 1514 initial_quantum 1514 limit 10
+
+./cmsg_sender -6 -p u -d 100000 -n 20 fdaa::2 8000
+OUT1="$(tc -s qdisc show dev dummy0 | grep '^\ Sent')"
+
+./cmsg_sender -6 -p u -d 100000 -n 20 fdaa::2 8000
+OUT2="$(tc -s qdisc show dev dummy0 | grep '^\ Sent')"
+
+./cmsg_sender -6 -p u -d 100000 -n 20 -P 7 fdaa::2 8000
+OUT3="$(tc -s qdisc show dev dummy0 | grep '^\ Sent')"
+
+# Initial stats will report zero sent, as all packets are still
+# queued in FQ. Sleep for the delay period (100ms) and see that
+# twenty are now sent.
+sleep 0.1
+OUT4="$(tc -s qdisc show dev dummy0 | grep '^\ Sent')"
+
+# Log the output after the test
+echo "${OUT1}"
+echo "${OUT2}"
+echo "${OUT3}"
+echo "${OUT4}"
+
+# Test the output for expected values
+echo "${OUT1}" | grep -q '0\ pkt\ (dropped\ 10' || die "unexpected drop count at 1"
+echo "${OUT2}" | grep -q '0\ pkt\ (dropped\ 30' || die "unexpected drop count at 2"
+echo "${OUT3}" | grep -q '0\ pkt\ (dropped\ 40' || die "unexpected drop count at 3"
+echo "${OUT4}" | grep -q '20\ pkt\ (dropped\ 40' || die "unexpected accept count at 4"
--
2.43.0.rc1.413.gea7ed67945-goog
On Wed, Nov 15, 2023 at 01:17:06PM +0800, Liu, Jing2 wrote:
> This is the right way to approach it,
>
> I learned that there was discussion about using io_uring to get the
> page fault without
>
> eventfd notification in [1], and I am new at io_uring and studying the
> man page of
>
> liburing, but there're questions in my mind on how can QEMU get the
> coming page fault
>
> with a good performance.
>
> Since both QEMU and Kernel don't know when comes faults, after QEMU
> submits one
>
> read task to io_uring, we want kernel pending until fault comes. While
> based on
>
> hwpt_fault_fops_read() in [patch v2 4/6], it just returns 0 since
> there's now no fault,
>
> thus this round of read completes to CQ but it's not what we want. So
> I'm wondering
>
> how kernel pending on the read until fault comes. Does fops callback
> need special work to
Implement a fops with poll support that triggers when a new event is
pushed and everything will be fine. There are many examples in the
kernel. The ones in the mlx5 vfio driver spring to mind as a scheme I
recently looked at.
Jason
Multiple files/programs in `tools/testing/selftests/bpf/prog_tests/` still
heavily use the `CHECK` macro, even when better `ASSERT_` alternatives are
available.
As it was already pointed out by Yonghong Song [1] in the bpf selftests the use
of the ASSERT_* series of macros is preferred over the CHECK macro.
This patchset replaces the usage of `CHECK(` macros to the equivalent `ASSERT_`
family of macros in the following prog_tests:
- bind_perm.c
- bpf_obj_id.c
- bpf_tcp_ca.c
- vmlinux.c
[1] https://lore.kernel.org/lkml/0a142924-633c-44e6-9a92-2dc019656bf2@linux.dev
Changes in v2:
- Fixed pthread_join assertion that broke the previous test
Previous version:
v1 - https://lore.kernel.org/lkml/GV1PR10MB6563FCFF1C5DEBE84FEA985FE8B0A@GV1PR10…
Yuran Pereira (4):
Replaces the usage of CHECK calls for ASSERTs in bpf_tcp_ca
Replaces the usage of CHECK calls for ASSERTs in bind_perm
Replaces the usage of CHECK calls for ASSERTs in bpf_obj_id
selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in
vmlinux
.../selftests/bpf/prog_tests/bind_perm.c | 6 +-
.../selftests/bpf/prog_tests/bpf_obj_id.c | 204 +++++++-----------
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 50 ++---
.../selftests/bpf/prog_tests/vmlinux.c | 16 +-
4 files changed, 106 insertions(+), 170 deletions(-)
--
2.25.1
v4:
- Update patch 1 to move apply_wqattrs_lock() and apply_wqattrs_unlock()
down into CONFIG_SYSFS block to avoid compilation warnings.
v3:
- Break out a separate patch to make workqueue_set_unbound_cpumask()
static and move it down to the CONFIG_SYSFS section.
- Remove the "__DEBUG__." prefix and the CFTYPE_DEBUG flag from the
new root only cpuset.cpus.isolated control files and update the
test accordingly.
v2:
- Add 2 read-only workqueue sysfs files to expose the user requested
cpumask as well as the isolated CPUs to be excluded from
wq_unbound_cpumask.
- Ensure that caller of the new workqueue_unbound_exclude_cpumask()
hold cpus_read_lock.
- Update the cpuset code to make sure the cpus_read_lock is held
whenever workqueue_unbound_exclude_cpumask() may be called.
Isolated cpuset partition can currently be created to contain an
exclusive set of CPUs not used in other cgroups and with load balancing
disabled to reduce interference from the scheduler.
The main purpose of this isolated partition type is to dynamically
emulate what can be done via the "isolcpus" boot command line option,
specifically the default domain flag. One effect of the "isolcpus" option
is to remove the isolated CPUs from the cpumasks of unbound workqueues
since running work functions in an isolated CPU can be a major source
of interference. Changing the unbound workqueue cpumasks can be done at
run time by writing an appropriate cpumask without the isolated CPUs to
/sys/devices/virtual/workqueue/cpumask. So one can set up an isolated
cpuset partition and then write to the cpumask sysfs file to achieve
similar level of CPU isolation. However, this manual process can be
error prone.
This patch series implements automatic exclusion of isolated CPUs from
unbound workqueue cpumasks when an isolated cpuset partition is created
and then adds those CPUs back when the isolated partition is destroyed.
There are also other places in the kernel that look at the HK_FLAG_DOMAIN
cpumask or other HK_FLAG_* cpumasks and exclude the isolated CPUs from
certain actions to further reduce interference. CPUs in an isolated
cpuset partition will not be able to avoid those interferences yet. That
may change in the future as the need arises.
Waiman Long (5):
workqueue: Make workqueue_set_unbound_cpumask() static
workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs
from wq_unbound_cpumask
selftests/cgroup: Minor code cleanup and reorganization of
test_cpuset_prs.sh
cgroup/cpuset: Keep track of CPUs in isolated partitions
cgroup/cpuset: Take isolated CPUs out of workqueue unbound cpumask
Documentation/admin-guide/cgroup-v2.rst | 10 +-
include/linux/workqueue.h | 2 +-
kernel/cgroup/cpuset.c | 286 +++++++++++++-----
kernel/workqueue.c | 165 +++++++---
.../selftests/cgroup/test_cpuset_prs.sh | 216 ++++++++-----
5 files changed, 475 insertions(+), 204 deletions(-)
--
2.39.3
The kernel has recently added support for shadow stacks, currently
x86 only using their CET feature but both arm64 and RISC-V have
equivalent features (GCS and Zisslpcfi respectively), I am actively
working on GCS[1]. With shadow stacks the hardware maintains an
additional stack containing only the return addresses for branch
instructions which is not generally writeable by userspace and ensures
that any returns are to the recorded addresses. This provides some
protection against ROP attacks and making it easier to collect call
stacks. These shadow stacks are allocated in the address space of the
userspace process.
Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled. The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread. This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces. As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process in a similar manner
to how the normal stack is specified, keeping the current implicit
allocation behaviour if one is not specified either with clone3() or
through the use of clone(). Unlike normal stacks only the shadow stack
size is specified, similar issues to those that lead to the creation of
map_shadow_stack() apply.
Please note that the x86 portions of this code are build tested only, I
don't appear to have a system that can run CET avaible to me, I have
done testing with an integration into my pending work for GCS. There is
some possibility that the arm64 implementation may require the use of
clone3() and explicit userspace allocation of shadow stacks, this is
still under discussion.
A new architecture feature Kconfig option for shadow stacks is added as
here, this was suggested as part of the review comments for the arm64
GCS series and since we need to detect if shadow stacks are supported it
seemed sensible to roll it in here.
[1] https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org/
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.7-rc1.
- Remove ability to provide preallocated shadow stack, just specify the
desired size.
- Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke…
---
Mark Brown (5):
mm: Introduce ARCH_HAS_USER_SHADOW_STACK
fork: Add shadow stack support to clone3()
selftests/clone3: Factor more of main loop into test_clone3()
selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
kselftest/clone3: Test shadow stack support
arch/x86/Kconfig | 1 +
arch/x86/include/asm/shstk.h | 11 +-
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 30 ++++-
fs/proc/task_mmu.c | 2 +-
include/linux/mm.h | 2 +-
include/linux/sched/task.h | 2 +
include/uapi/linux/sched.h | 4 +
kernel/fork.c | 24 +++-
mm/Kconfig | 6 +
tools/testing/selftests/clone3/clone3.c | 151 ++++++++++++++++------
tools/testing/selftests/clone3/clone3_selftests.h | 7 +
12 files changed, 188 insertions(+), 54 deletions(-)
---
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Changelog:
v5:
* Replace reference getting with an rcu_read_lock() section for
zswap lru modifications (suggested by Yosry)
* Add a new prep patch that allows mem_cgroup_iter() to return
online cgroup.
* Add a callback that updates pool->next_shrink when the cgroup is
offlined (suggested by Yosry Ahmed, Johannes Weiner)
v4:
* Rename list_lru_add to list_lru_add_obj and __list_lru_add to
list_lru_add (patch 1) (suggested by Johannes Weiner and
Yosry Ahmed)
* Some cleanups on the memcg aware LRU patch (patch 2)
(suggested by Yosry Ahmed)
* Use event interface for the new per-cgroup writeback counters.
(patch 3) (suggested by Yosry Ahmed)
* Abstract zswap's lruvec states and handling into
zswap_lruvec_state (patch 5) (suggested by Yosry Ahmed)
v3:
* Add a patch to export per-cgroup zswap writeback counters
* Add a patch to update zswap's kselftest
* Separate the new list_lru functions into its own prep patch
* Do not start from the top of the hierarchy when encounter a memcg
that is not online for the global limit zswap writeback (patch 2)
(suggested by Yosry Ahmed)
* Do not remove the swap entry from list_lru in
__read_swapcache_async() (patch 2) (suggested by Yosry Ahmed)
* Removed a redundant zswap pool getting (patch 2)
(reported by Ryan Roberts)
* Use atomic for the nr_zswap_protected (instead of lruvec's lock)
(patch 5) (suggested by Yosry Ahmed)
* Remove the per-cgroup zswap shrinker knob (patch 5)
(suggested by Yosry Ahmed)
v2:
* Fix loongarch compiler errors
* Use pool stats instead of memcg stats when !CONFIG_MEMCG_KEM
There are currently several issues with zswap writeback:
1. There is only a single global LRU for zswap, making it impossible to
perform worload-specific shrinking - an memcg under memory pressure
cannot determine which pages in the pool it owns, and often ends up
writing pages from other memcgs. This issue has been previously
observed in practice and mitigated by simply disabling
memcg-initiated shrinking:
https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u
But this solution leaves a lot to be desired, as we still do not
have an avenue for an memcg to free up its own memory locked up in
the zswap pool.
2. We only shrink the zswap pool when the user-defined limit is hit.
This means that if we set the limit too high, cold data that are
unlikely to be used again will reside in the pool, wasting precious
memory. It is hard to predict how much zswap space will be needed
ahead of time, as this depends on the workload (specifically, on
factors such as memory access patterns and compressibility of the
memory pages).
This patch series solves these issues by separating the global zswap
LRU into per-memcg and per-NUMA LRUs, and performs workload-specific
(i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The
new shrinker does not have any parameter that must be tuned by the
user, and can be opted in or out on a per-memcg basis.
As a proof of concept, we ran the following synthetic benchmark:
build the linux kernel in a memory-limited cgroup, and allocate some
cold data in tmpfs to see if the shrinker could write them out and
improved the overall performance. Depending on the amount of cold data
generated, we observe from 14% to 35% reduction in kernel CPU time used
in the kernel builds.
Domenico Cerasuolo (3):
zswap: make shrinking memcg-aware
mm: memcg: add per-memcg zswap writeback stat
selftests: cgroup: update per-memcg zswap writeback selftest
Nhat Pham (3):
list_lru: allows explicit memcg and NUMA node selection
memcontrol: allows mem_cgroup_iter() to check for onlineness
zswap: shrinks zswap pool based on memory pressure
Documentation/admin-guide/mm/zswap.rst | 7 +
drivers/android/binder_alloc.c | 5 +-
fs/dcache.c | 8 +-
fs/gfs2/quota.c | 6 +-
fs/inode.c | 4 +-
fs/nfs/nfs42xattr.c | 8 +-
fs/nfsd/filecache.c | 4 +-
fs/xfs/xfs_buf.c | 6 +-
fs/xfs/xfs_dquot.c | 2 +-
fs/xfs/xfs_qm.c | 2 +-
include/linux/list_lru.h | 46 ++-
include/linux/memcontrol.h | 9 +-
include/linux/mmzone.h | 2 +
include/linux/vm_event_item.h | 1 +
include/linux/zswap.h | 27 +-
mm/list_lru.c | 48 ++-
mm/memcontrol.c | 20 +-
mm/mmzone.c | 1 +
mm/shrinker.c | 4 +-
mm/swap.h | 3 +-
mm/swap_state.c | 26 +-
mm/vmscan.c | 26 +-
mm/vmstat.c | 1 +
mm/workingset.c | 4 +-
mm/zswap.c | 430 +++++++++++++++++---
tools/testing/selftests/cgroup/test_zswap.c | 74 ++--
26 files changed, 625 insertions(+), 149 deletions(-)
--
2.34.1
From: angquan yu <angquan21(a)gmail.com>
In tools/testing/selftests/proc/proc-empty->because the return value of a write call was being ignored. This call was partof a conditional debugging block (if (0) { ... }), which meant it would neveractually execute.
This patch removes the unused debug write call. This cleanup resolves the compi>warning about ignoring the result of write declared with the warn_unused_resultattribute.
Removing this code also improves the clarity and maintainability ofthe function, as it eliminates a non-functional block of code.
This is original warning: proc-empty-vm.c: In function ‘test_proc_pid_statm’:proc-empty-vm.c:385:17: warning: ignoring return value of ‘write’ declared with>385 | write(1, buf, rv);|
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/proc/proc-empty-vm.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/proc/proc-empty-vm.c b/tools/testing/selftests/proc/proc-empty-vm.c
index 56198d4ca..74ef8627f 100644
--- a/tools/testing/selftests/proc/proc-empty-vm.c
+++ b/tools/testing/selftests/proc/proc-empty-vm.c
@@ -382,7 +382,10 @@ static int test_proc_pid_statm(pid_t pid)
assert(rv >= 0);
assert(rv <= sizeof(buf));
if (0) {
- write(1, buf, rv);
+ ssize_t written = write(1, buf, rv);
+ if (written == -1) {
+ perror("write failed /proc/${pid}");
+ }
}
const char *p = buf;
--
2.39.2
From: angquan yu <angquan21(a)gmail.com>
In tools/testing/selftests/proc/proc-empty->because the return value of a write call was being ignored. This call was partof a conditional debugging block (if (0) { ... }), which meant it would neveractually execute.
This patch removes the unused debug write call. This cleanup resolves the compi>warning about ignoring the result of write declared with the warn_unused_resultattribute.
Removing this code also improves the clarity and maintainability ofthe function, as it eliminates a non-functional block of code.
This is original warning: proc-empty-vm.c: In function ‘test_proc_pid_statm’:proc-empty-vm.c:385:17: warning: ignoring return value of ‘write’ declared with>385 | write(1, buf, rv);|
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/proc/proc-empty-vm.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/proc/proc-empty-vm.c b/tools/testing/selftests/proc/proc-empty-vm.c
index 56198d4ca..74ef8627f 100644
--- a/tools/testing/selftests/proc/proc-empty-vm.c
+++ b/tools/testing/selftests/proc/proc-empty-vm.c
@@ -382,7 +382,10 @@ static int test_proc_pid_statm(pid_t pid)
assert(rv >= 0);
assert(rv <= sizeof(buf));
if (0) {
- write(1, buf, rv);
+ ssize_t written = write(1, buf, rv);
+ if (written == -1) {
+ perror("write failed /proc/${pid}");
+ }
}
const char *p = buf;
--
2.39.2
From: angquan yu <angquan21(a)gmail.com>
In the function 'tools/testing/selftests/breakpoints/run_test' within step_after_suspend_test.c, the ksft_print_msg function call incorrectly used '$s' as a format specifier. This commit corrects this typo to use the proper '%s' format specifier, ensuring the error message from waitpid() is correctly displayed.
The issue manifested as a compilation warning (too many arguments for format [-Wformat-extra-args]), potentially obscuring actual runtime errors and complicating debugging processes.
This fix enhances the clarity of error messages during test failures and ensures compliance with standard C format string conventions.
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/breakpoints/step_after_suspend_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/breakpoints/step_after_suspend_test.c b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
index 2cf6f10ab..b8703c499 100644
--- a/tools/testing/selftests/breakpoints/step_after_suspend_test.c
+++ b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
@@ -89,7 +89,7 @@ int run_test(int cpu)
wpid = waitpid(pid, &status, __WALL);
if (wpid != pid) {
- ksft_print_msg("waitpid() failed: $s\n", strerror(errno));
+ ksft_print_msg("waitpid() failed: %s\n", strerror(errno));
return KSFT_FAIL;
}
if (WIFEXITED(status)) {
--
2.39.2
From: angquan yu <angquan21(a)gmail.com>
In the function 'tools/testing/selftests/breakpoints/run_test' within step_after_suspend_test.c, the ksft_print_msg function call incorrectly used '$s' as a format specifier. This commit corrects this typo to use the proper '%s' format specifier, ensuring the error message from waitpid() is correctly displayed.
The issue manifested as a compilation warning (too many arguments for format [-Wformat-extra-args]), potentially obscuring actual runtime errors and complicating debugging processes.
This fix enhances the clarity of error messages during test failures and ensures compliance with standard C format string conventions.
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/breakpoints/step_after_suspend_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/breakpoints/step_after_suspend_test.c b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
index 2cf6f10ab..b8703c499 100644
--- a/tools/testing/selftests/breakpoints/step_after_suspend_test.c
+++ b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
@@ -89,7 +89,7 @@ int run_test(int cpu)
wpid = waitpid(pid, &status, __WALL);
if (wpid != pid) {
- ksft_print_msg("waitpid() failed: $s\n", strerror(errno));
+ ksft_print_msg("waitpid() failed: %s\n", strerror(errno));
return KSFT_FAIL;
}
if (WIFEXITED(status)) {
--
2.39.2
Multiple files/programs in `tools/testing/selftests/bpf/prog_tests/` still
heavily use the `CHECK` macro, even when better `ASSERT_` alternatives are
available.
As it was already pointed out by Yonghong Song [1] in the bpf selftests the use
of the ASSERT_* series of macros is preferred over the CHECK macro.
This patchset replaces the usage of `CHECK(` macros to the equivalent `ASSERT_`
family of macros in the following prog_tests:
- bind_perm.c
- bpf_obj_id.c
- bpf_tcp_ca.c
- vmlinux.c
[1] https://lore.kernel.org/lkml/0a142924-633c-44e6-9a92-2dc019656bf2@linux.dev
Yuran Pereira (4):
Replaces the usage of CHECK calls for ASSERTs in bpf_tcp_ca
Replaces the usage of CHECK calls for ASSERTs in bind_perm
Replaces the usage of CHECK calls for ASSERTs in bpf_obj_id
selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in
vmlinux
.../selftests/bpf/prog_tests/bind_perm.c | 6 +-
.../selftests/bpf/prog_tests/bpf_obj_id.c | 204 +++++++-----------
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 48 ++---
.../selftests/bpf/prog_tests/vmlinux.c | 16 +-
4 files changed, 105 insertions(+), 169 deletions(-)
--
2.25.1
On 2023/11/16 13:12, Cao, Yahui wrote:
>
> On 10/9/2023 4:51 PM, Yi Liu wrote:
>> From: Kevin Tian<kevin.tian(a)intel.com>
>>
>> This adds vfio_register_pasid_iommu_dev() for device driver to register
>> virtual devices which are isolated per PASID in physical IOMMU. The major
>> usage is for the SIOV devices which allows device driver to tag the DMAs
>> out of virtual devices within it with different PASIDs.
>>
>> For a given vfio device, VFIO core creates both group user interface and
>> device user interface (device cdev) if configured. However, for the virtual
>> devices backed by PASID of the device, VFIO core shall only create device
>> user interface as there is no plan to support such devices in the legacy
>> vfio_iommu drivers which is a must if creating group user interface for
>> such virtual devices. This introduces a VFIO_PASID_IOMMU group type for
>> the device driver to register PASID virtual devices, and provides a wrapper
>> API for it. In particular no iommu group (neither fake group or real group)
>> exists per PASID, hence no group interface for this type.
>>
>> Signed-off-by: Kevin Tian<kevin.tian(a)intel.com>
>> Signed-off-by: Yi Liu<yi.l.liu(a)intel.com>
>> ---
>> drivers/vfio/group.c | 18 ++++++++++++++++++
>> drivers/vfio/vfio.h | 8 ++++++++
>> drivers/vfio/vfio_main.c | 10 ++++++++++
>> include/linux/vfio.h | 1 +
>> 4 files changed, 37 insertions(+)
>>
>> ...
>> ...
>> +/*
>> + * Register a virtual device with IOMMU pasid protection. The user of
>> + * this device can trigger DMA as long as all of its outgoing DMAs are
>> + * always tagged with a pasid.
>> + */
>> +int vfio_register_pasid_iommu_dev(struct vfio_device *device)
>> +{
>> + return __vfio_register_dev(device, VFIO_PASID_IOMMU);
>> +}
>> +
>
>
> Missing symbol export here.
>
fixed.
--
Regards,
Yi Liu
When execute the dirty_log_test on some aarch64 machine, it sometimes
trigger the ASSERT:
==== Test Assertion Failure ====
dirty_log_test.c:384: dirty_ring_vcpu_ring_full
pid=14854 tid=14854 errno=22 - Invalid argument
1 0x00000000004033eb: dirty_ring_collect_dirty_pages at dirty_log_test.c:384
2 0x0000000000402d27: log_mode_collect_dirty_pages at dirty_log_test.c:505
3 (inlined by) run_test at dirty_log_test.c:802
4 0x0000000000403dc7: for_each_guest_mode at guest_modes.c:100
5 0x0000000000401dff: main at dirty_log_test.c:941 (discriminator 3)
6 0x0000ffff9be173c7: ?? ??:0
7 0x0000ffff9be1749f: ?? ??:0
8 0x000000000040206f: _start at ??:?
Didn't continue vcpu even without ring full
The dirty_log_test fails when execute the dirty-ring test, this is
because the sem_vcpu_cont and the sem_vcpu_stop is non-zero value when
execute the dirty_ring_collect_dirty_pages() function. When those two
sem_t variables are non-zero, the dirty_ring_wait_vcpu() at the
beginning of the dirty_ring_collect_dirty_pages() will not wait for the
vcpu to stop, but continue to execute the following code. In this case,
before vcpu stop, if the dirty_ring_vcpu_ring_full is true, and the
dirty_ring_collect_dirty_pages() has passed the check for the
dirty_ring_vcpu_ring_full but hasn't execute the check for the
continued_vcpu, the vcpu stop, and set the dirty_ring_vcpu_ring_full to
false. Then dirty_ring_collect_dirty_pages() will trigger the ASSERT.
Why sem_vcpu_cont and sem_vcpu_stop can be non-zero value? It's because
the dirty_ring_before_vcpu_join() execute the sem_post(&sem_vcpu_cont)
at the end of each dirty-ring test. It can cause two cases:
1. sem_vcpu_cont be non-zero. When we set the host_quit to be true,
the vcpu_worker directly see the host_quit to be true, it quit. So
the log_mode_before_vcpu_join() function will set the sem_vcpu_cont
to 1, since the vcpu_worker has quit, it won't consume it.
2. sem_vcpu_stop be non-zero. When we set the host_quit to be true,
the vcpu_worker has entered the guest state, the next time it exit
from guest state, it will set the sem_vcpu_stop to 1, and then see
the host_quit, no one will consume the sem_vcpu_stop.
When execute more and more dirty-ring tests, the sem_vcpu_cont and
sem_vcpu_stop can be larger and larger, which makes many code paths
don't wait for the sem_t. Thus finally cause the problem.
Fix this problem is easy, simply initialize the sem_t before every test.
Thus whatever the state previous test left, it won't interfere the next
test.
Signed-off-by: Shaoqin Huang <shahuang(a)redhat.com>
---
tools/testing/selftests/kvm/dirty_log_test.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index 936f3a8d1b83..23b179534c0b 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -726,6 +726,9 @@ static void run_test(enum vm_guest_mode mode, void *arg)
return;
}
+ sem_init(&sem_vcpu_stop, 0, 0);
+ sem_init(&sem_vcpu_cont, 0, 0);
+
/*
* We reserve page table for 2 times of extra dirty mem which
* will definitely cover the original (1G+) test range. Here
@@ -871,9 +874,6 @@ int main(int argc, char *argv[])
int opt, i;
sigset_t sigset;
- sem_init(&sem_vcpu_stop, 0, 0);
- sem_init(&sem_vcpu_cont, 0, 0);
-
guest_modes_append_default();
while ((opt = getopt(argc, argv, "c:hi:I:p:m:M:")) != -1) {
--
2.40.1