From: angquan yu <angquan21(a)gmail.com>
This commit addresses compiler warnings in lam.c related to the usage
of non-literal format strings without format arguments in the
'run_test' function.
Warnings fixed:
- Resolved warnings indicating that 'ksft_test_result_skip' and
'ksft_test_result' were called with 't->msg' as a format string without
accompanying format arguments.
Changes made:
- Modified the calls to 'ksft_test_result_skip' and 'ksft_test_result'
to explicitly include a format specifier ("%s") for 't->msg'.
- This ensures that the string is safely treated as a format argument,
adhering to safer coding practices and resolving the compiler warnings.
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/x86/lam.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c
index 8f9b06d9c..215b8150b 100644
--- a/tools/testing/selftests/x86/lam.c
+++ b/tools/testing/selftests/x86/lam.c
@@ -817,7 +817,7 @@ static void run_test(struct testcases *test, int count)
/* return 3 is not support LA57, the case should be skipped */
if (ret == 3) {
- ksft_test_result_skip(t->msg);
+ ksft_test_result_skip("%s", t->msg);
continue;
}
@@ -826,7 +826,7 @@ static void run_test(struct testcases *test, int count)
else
ret = !(t->expected);
- ksft_test_result(ret, t->msg);
+ ksft_test_result(ret, "%s", t->msg);
}
}
--
2.39.2
From: angquan yu <angquan21(a)gmail.com>
This commit resolves a compiler warning regardingthe
use of non-literal format strings in breakpoint_test.c.
The functions `ksft_test_result_pass` and `ksft_test_result_fail`
were previously called with a variable `msg` directly, which could
potentially lead to format string vulnerabilities.
Changes made:
- Modified the calls to `ksft_test_result_pass` and `ksft_test_result_fail`
by adding a "%s" format specifier. This explicitly declares `msg` as a
string argument, adhering to safer coding practices and resolving
the compiler warning.
This change does not affect the functional behavior of the code but ensures
better code safety and compliance with recommended C programming standards.
The previous warning is "breakpoint_test.c:287:17:
warning: format not a string literal and no format arguments
[-Wformat-security]
287 | ksft_test_result_pass(msg);
| ^~~~~~~~~~~~~~~~~~~~~
breakpoint_test.c:289:17: warning: format not a string literal
and no format arguments [-Wformat-security]
289 | ksft_test_result_fail(msg);
| "
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/breakpoints/breakpoint_test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/breakpoints/breakpoint_test.c b/tools/testing/selftests/breakpoints/breakpoint_test.c
index 3266cc929..d46962a24 100644
--- a/tools/testing/selftests/breakpoints/breakpoint_test.c
+++ b/tools/testing/selftests/breakpoints/breakpoint_test.c
@@ -284,9 +284,9 @@ static void check_success(const char *msg)
nr_tests++;
if (ret)
- ksft_test_result_pass(msg);
+ ksft_test_result_pass("%s", msg);
else
- ksft_test_result_fail(msg);
+ ksft_test_result_fail("%s", msg);
}
static void launch_instruction_breakpoints(char *buf, int local, int global)
--
2.39.2
From: angquan yu <angquan21(a)gmail.com>
In tools/testing/selftests/proc/proc-empty->because the return value
of a write call was being ignored. This call was partof a conditional
debugging block (if (0) { ... }), which meant it would neveractually
execute.
This patch removes the unused debug write call. This cleanup resolves
the compi>warning about ignoring the result of write declared with
the warn_unused_resultattribute.
Removing this code also improves the clarity and maintainability of
the function, as it eliminates a non-functional block of code.
This is original warning: proc-empty-vm.c: In function
‘test_proc_pid_statm’ :proc-empty-vm.c:385:17:
warning: ignoring return value of ‘write’
declared with>385 | write(1, buf, rv);|
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/proc/proc-empty-vm.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/proc/proc-empty-vm.c b/tools/testing/selftests/proc/proc-empty-vm.c
index 5e7020630..d231e61e4 100644
--- a/tools/testing/selftests/proc/proc-empty-vm.c
+++ b/tools/testing/selftests/proc/proc-empty-vm.c
@@ -383,8 +383,10 @@ static int test_proc_pid_statm(pid_t pid)
assert(rv <= sizeof(buf));
if (0) {
ssize_t written = write(1, buf, rv);
+
if (written == -1) {
perror("write failed to /proc/${pid}");
+ return EXIT_FAILURE;
}
}
--
2.39.2
From: Eduard Zingerman <eddyz87(a)gmail.com>
[ Upstream commit f40bfd1679446b22d321e64a1fa98b7d07d2be08 ]
This is a preparatory change. A follow-up patch "bpf: verify callbacks
as if they are called unknown number of times" changes logic for
callbacks handling. While previously callbacks were verified as a
single function call, new scheme takes into account that callbacks
could be executed unknown number of times.
This has dire implications for bpf_loop_bench:
SEC("fentry/" SYS_PREFIX "sys_getpgid")
int benchmark(void *ctx)
{
for (int i = 0; i < 1000; i++) {
bpf_loop(nr_loops, empty_callback, NULL, 0);
__sync_add_and_fetch(&hits, nr_loops);
}
return 0;
}
W/o callbacks change verifier sees it as a 1000 calls to
empty_callback(). However, with callbacks change things become
exponential:
- i=0: state exploring empty_callback is scheduled with i=0 (a);
- i=1: state exploring empty_callback is scheduled with i=1;
...
- i=999: state exploring empty_callback is scheduled with i=999;
- state (a) is popped from stack;
- i=1: state exploring empty_callback is scheduled with i=1;
...
Avoid this issue by rewriting outer loop as bpf_loop().
Unfortunately, this adds a function call to a loop at runtime, which
negatively affects performance:
throughput latency
before: 149.919 ± 0.168 M ops/s, 6.670 ns/op
after : 137.040 ± 0.187 M ops/s, 7.297 ns/op
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87(a)gmail.com>
Link: https://lore.kernel.org/r/20231121020701.26440-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/bpf_loop_bench.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/bpf_loop_bench.c b/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
index 4ce76eb064c41..d461746fd3c1e 100644
--- a/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
+++ b/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
@@ -15,13 +15,16 @@ static int empty_callback(__u32 index, void *data)
return 0;
}
+static int outer_loop(__u32 index, void *data)
+{
+ bpf_loop(nr_loops, empty_callback, NULL, 0);
+ __sync_add_and_fetch(&hits, nr_loops);
+ return 0;
+}
+
SEC("fentry/" SYS_PREFIX "sys_getpgid")
int benchmark(void *ctx)
{
- for (int i = 0; i < 1000; i++) {
- bpf_loop(nr_loops, empty_callback, NULL, 0);
-
- __sync_add_and_fetch(&hits, nr_loops);
- }
+ bpf_loop(1000, outer_loop, NULL, 0);
return 0;
}
--
2.42.0
Hi,
v1 [1] was discussed during Plumbers [2], where a lot of feedback was given. I
hope to justify the changes in v2 and address the feedback here.
One feedback from Shuah was that keeping per-platform files with the USB/PCI
devices to test as part of the kselftest tree wasn't maintainable. One proposed
alternative was to generate a list of probed devices on a known-good kernel and
use that as a reference. However you need someone to look at that generated
reference to be able to say it is a good one, and you need to save it to ensure
it will be reproducible later anyway, so that wouldn't actually solve the
problem. It is a matter of hand-crafting vs generating the test definitions, but
they will need to be vouched by someone and stored somewhere in both cases.
So for this v2, in patch 2 I just have a sample test definition, and the
per-platform test definitions would be added to a separate repository.
The other feedback received was that the BIOS might reconfigure the PCI
topology (at least on x86), meaning that relying on a sequence of device and
function numbers (eg 1d.0/02.0/0.0) as a stable description of a device on the
platform is not possible. I couldn't verify whether this is really the case (if
you have any more insight into this, please let me know), but with that in mind,
here in v2 I have taken a different approach. Here I'm using the device's
properties which are used for driver matching (the same that show on modalias)
to identify a device in a stable way.
This approach has some drawbacks compared to the one on v1. For one it doesn't
uniquely identify a device, so if there are multiple of the same device on a
platform they have to be checked as a group. Also the test definition isn't as
human-readable.
I'm adding in CC the people I recognized at the Plumbers session that were
interested in this work. Feel free to add anyone missing.
Thanks,
Nícolas
[1] https://lore.kernel.org/all/20231024211818.365844-1-nfraprado@collabora.com
[2] https://www.youtube.com/watch?v=oE73eVSyFXQ&t=9377s
Original cover letter:
This is part of an effort to improve detection of regressions impacting
device probe on all platforms. The recently merged DT kselftest [3]
detects probe issues for all devices described statically in the DT.
That leaves out devices discovered at run-time from discoverable busses.
This is where this test comes in. All of the devices that are connected
through discoverable busses (ie USB and PCI), and which are internal and
therefore always present, can be described in a per-platform file so
they can be checked for. The test will check that the device has been
instantiated and bound to a driver.
Patch 1 introduces the test. Patch 2 adds the test definitions for the
google,spherion machine (Acer Chromebook 514) as an example.
This is the sample output from the test running on Spherion:
TAP version 13
Using board file: boards/google,spherion
1..3
ok 1 usb.camera
ok 2 usb.bluetooth
ok 3 pci.wifi
Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
[3] https://lore.kernel.org/all/20230828211424.2964562-1-nfraprado@collabora.co…
Changes in v2:
- Changed approach of encoding stable device reference in test file from
HW topology to device match fields (the ones from modalias)
- Better documented test format
Nícolas F. R. A. Prado (2):
kselftest: Add test to verify probe of devices from discoverable
busses
kselftest: devices: Add sample board file for google,spherion
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/devices/.gitignore | 1 +
tools/testing/selftests/devices/Makefile | 8 +
.../selftests/devices/boards/google,spherion | 12 ++
.../devices/test_discoverable_devices.sh | 160 ++++++++++++++++++
5 files changed, 182 insertions(+)
create mode 100644 tools/testing/selftests/devices/.gitignore
create mode 100644 tools/testing/selftests/devices/Makefile
create mode 100644 tools/testing/selftests/devices/boards/google,spherion
create mode 100755 tools/testing/selftests/devices/test_discoverable_devices.sh
--
2.42.1
From: Eduard Zingerman <eddyz87(a)gmail.com>
[ Upstream commit f40bfd1679446b22d321e64a1fa98b7d07d2be08 ]
This is a preparatory change. A follow-up patch "bpf: verify callbacks
as if they are called unknown number of times" changes logic for
callbacks handling. While previously callbacks were verified as a
single function call, new scheme takes into account that callbacks
could be executed unknown number of times.
This has dire implications for bpf_loop_bench:
SEC("fentry/" SYS_PREFIX "sys_getpgid")
int benchmark(void *ctx)
{
for (int i = 0; i < 1000; i++) {
bpf_loop(nr_loops, empty_callback, NULL, 0);
__sync_add_and_fetch(&hits, nr_loops);
}
return 0;
}
W/o callbacks change verifier sees it as a 1000 calls to
empty_callback(). However, with callbacks change things become
exponential:
- i=0: state exploring empty_callback is scheduled with i=0 (a);
- i=1: state exploring empty_callback is scheduled with i=1;
...
- i=999: state exploring empty_callback is scheduled with i=999;
- state (a) is popped from stack;
- i=1: state exploring empty_callback is scheduled with i=1;
...
Avoid this issue by rewriting outer loop as bpf_loop().
Unfortunately, this adds a function call to a loop at runtime, which
negatively affects performance:
throughput latency
before: 149.919 ± 0.168 M ops/s, 6.670 ns/op
after : 137.040 ± 0.187 M ops/s, 7.297 ns/op
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87(a)gmail.com>
Link: https://lore.kernel.org/r/20231121020701.26440-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/bpf_loop_bench.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/bpf_loop_bench.c b/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
index 4ce76eb064c41..d461746fd3c1e 100644
--- a/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
+++ b/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
@@ -15,13 +15,16 @@ static int empty_callback(__u32 index, void *data)
return 0;
}
+static int outer_loop(__u32 index, void *data)
+{
+ bpf_loop(nr_loops, empty_callback, NULL, 0);
+ __sync_add_and_fetch(&hits, nr_loops);
+ return 0;
+}
+
SEC("fentry/" SYS_PREFIX "sys_getpgid")
int benchmark(void *ctx)
{
- for (int i = 0; i < 1000; i++) {
- bpf_loop(nr_loops, empty_callback, NULL, 0);
-
- __sync_add_and_fetch(&hits, nr_loops);
- }
+ bpf_loop(1000, outer_loop, NULL, 0);
return 0;
}
--
2.42.0
From: angquan yu <angquan21(a)gmail.com>
This commit resolves a compiler warning regardingthe
use of non-literal format strings in breakpoint_test.c.
The functions `ksft_test_result_pass` and `ksft_test_result_fail`
were previously called with a variable `msg` directly, which could
potentially lead to format string vulnerabilities.
Changes made:
- Modified the calls to `ksft_test_result_pass` and `ksft_test_result_fail`
by adding a "%s" format specifier. This explicitly declares `msg` as a
string argument, adhering to safer coding practices and resolving
the compiler warning.
This change does not affect the functional behavior of the code but ensures
better code safety and compliance with recommended C programming standards.
The previous warning is "breakpoint_test.c:287:17:
warning: format not a string literal and no format arguments
[-Wformat-security]
287 | ksft_test_result_pass(msg);
| ^~~~~~~~~~~~~~~~~~~~~
breakpoint_test.c:289:17: warning: format not a string literal
and no format arguments [-Wformat-security]
289 | ksft_test_result_fail(msg);
| "
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/breakpoints/breakpoint_test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/breakpoints/breakpoint_test.c b/tools/testing/selftests/breakpoints/breakpoint_test.c
index 3266cc929..d46962a24 100644
--- a/tools/testing/selftests/breakpoints/breakpoint_test.c
+++ b/tools/testing/selftests/breakpoints/breakpoint_test.c
@@ -284,9 +284,9 @@ static void check_success(const char *msg)
nr_tests++;
if (ret)
- ksft_test_result_pass(msg);
+ ksft_test_result_pass("%s", msg);
else
- ksft_test_result_fail(msg);
+ ksft_test_result_fail("%s", msg);
}
static void launch_instruction_breakpoints(char *buf, int local, int global)
--
2.39.2
The root-only cpuset.cpus.isolated control file shows the current set
of isolated CPUs in isolated partitions. This control file is currently
exposed only with the cgroup_debug boot command line option which also
adds the ".__DEBUG__." prefix. This is actually a useful control file if
users want to find out which CPUs are currently in an isolated state by
the cpuset controller. Remove CFTYPE_DEBUG flag for this control file and
make it available by default without any prefix.
The test_cpuset_prs.sh test script and the cgroup-v2.rst documentation
file are also updated accordingly. Minor code change is also made in
test_cpuset_prs.sh to avoid false test failure when running on debug
kernel.
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
Documentation/admin-guide/cgroup-v2.rst | 7 ++++
kernel/cgroup/cpuset.c | 2 +-
.../selftests/cgroup/test_cpuset_prs.sh | 32 +++++++++++--------
3 files changed, 26 insertions(+), 15 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index cf5651a11df8..30f6ff2eba47 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2316,6 +2316,13 @@ Cpuset Interface Files
treated to have an implicit value of "cpuset.cpus" in the
formation of local partition.
+ cpuset.cpus.isolated
+ A read-only and root cgroup only multiple values file.
+
+ This file shows the set of all isolated CPUs used in existing
+ isolated partitions. It will be empty if no isolated partition
+ is created.
+
cpuset.cpus.partition
A read-write single value file which exists on non-root
cpuset-enabled cgroups. This flag is owned by the parent cgroup
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 1bad4007ff4b..2a16df86c55c 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3974,7 +3974,7 @@ static struct cftype dfl_files[] = {
.name = "cpus.isolated",
.seq_show = cpuset_common_seq_show,
.private = FILE_ISOLATED_CPULIST,
- .flags = CFTYPE_ONLY_ON_ROOT | CFTYPE_DEBUG,
+ .flags = CFTYPE_ONLY_ON_ROOT,
},
{ } /* terminate */
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
index 7b7c4c2b6d85..b5eb1be2248c 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
@@ -508,7 +508,7 @@ dump_states()
XECPUS=$DIR/cpuset.cpus.exclusive.effective
PRS=$DIR/cpuset.cpus.partition
PCPUS=$DIR/.__DEBUG__.cpuset.cpus.subpartitions
- ISCPUS=$DIR/.__DEBUG__.cpuset.cpus.isolated
+ ISCPUS=$DIR/cpuset.cpus.isolated
[[ -e $CPUS ]] && echo "$CPUS: $(cat $CPUS)"
[[ -e $XCPUS ]] && echo "$XCPUS: $(cat $XCPUS)"
[[ -e $ECPUS ]] && echo "$ECPUS: $(cat $ECPUS)"
@@ -593,17 +593,17 @@ check_cgroup_states()
#
# Get isolated (including offline) CPUs by looking at
-# /sys/kernel/debug/sched/domains and *cpuset.cpus.isolated control file,
+# /sys/kernel/debug/sched/domains and cpuset.cpus.isolated control file,
# if available, and compare that with the expected value.
#
# Note that isolated CPUs from the sched/domains context include offline
# CPUs as well as CPUs in non-isolated 1-CPU partition. Those CPUs may
-# not be included in the *cpuset.cpus.isolated control file which contains
+# not be included in the cpuset.cpus.isolated control file which contains
# only CPUs in isolated partitions.
#
# $1 - expected isolated cpu list(s) <isolcpus1>{,<isolcpus2>}
# <isolcpus1> - expected sched/domains value
-# <isolcpus2> - *cpuset.cpus.isolated value = <isolcpus1> if not defined
+# <isolcpus2> - cpuset.cpus.isolated value = <isolcpus1> if not defined
#
check_isolcpus()
{
@@ -611,7 +611,7 @@ check_isolcpus()
ISOLCPUS=
LASTISOLCPU=
SCHED_DOMAINS=/sys/kernel/debug/sched/domains
- ISCPUS=${CGROUP2}/.__DEBUG__.cpuset.cpus.isolated
+ ISCPUS=${CGROUP2}/cpuset.cpus.isolated
if [[ $EXPECT_VAL = . ]]
then
EXPECT_VAL=
@@ -692,14 +692,18 @@ test_fail()
null_isolcpus_check()
{
[[ $VERBOSE -gt 0 ]] || return 0
- pause 0.02
- check_isolcpus "."
- if [[ $? -ne 0 ]]
- then
- echo "Unexpected isolated CPUs: $ISOLCPUS"
- dump_states
- exit 1
- fi
+ # Retry a few times before printing error
+ RETRY=0
+ while [[ $RETRY -lt 5 ]]
+ do
+ pause 0.01
+ check_isolcpus "."
+ [[ $? -eq 0 ]] && return 0
+ ((RETRY++))
+ done
+ echo "Unexpected isolated CPUs: $ISOLCPUS"
+ dump_states
+ exit 1
}
#
@@ -776,7 +780,7 @@ run_state_test()
#
NEWLIST=$(cat cpuset.cpus.effective)
RETRY=0
- while [[ $NEWLIST != $CPULIST && $RETRY -lt 5 ]]
+ while [[ $NEWLIST != $CPULIST && $RETRY -lt 8 ]]
do
# Wait a bit longer & recheck a few times
pause 0.01
--
2.39.3
This patchset adds two kfunc helpers, bpf_xdp_get_xfrm_state() and
bpf_xdp_xfrm_state_release() that wrap xfrm_state_lookup() and
xfrm_state_put(). The intent is to support software RSS (via XDP) for
the ongoing/upcoming ipsec pcpu work [0]. Recent experiments performed
on (hopefully) reproducible AWS testbeds indicate that single tunnel
pcpu ipsec can reach line rate on 100G ENA nics.
Note this patchset only tests/shows generic xfrm_state access. The
"secret sauce" (if you can really even call it that) involves accessing
a soon-to-be-upstreamed pcpu_num field in xfrm_state. Early example is
available here [1].
[0]: https://datatracker.ietf.org/doc/draft-ietf-ipsecme-multi-sa-performance/03/
[1]: https://github.com/danobi/xdp-tools/blob/e89a1c617aba3b50d990f779357d6ce286…
Changes from v1:
* Move xfrm tunnel tests to test_progs
* Fix writing to opts->error when opts is invalid
* Use __bpf_kfunc_start_defs()
* Remove unused vxlanhdr definition
* Add and use BPF_CORE_WRITE_BITFIELD() macro
* Make series bisect clean
Changes from RFCv2:
* Rebased to ipsec-next
* Fix netns leak
Changes from RFCv1:
* Add Antony's commit tags
* Add KF_ACQUIRE and KF_RELEASE semantics
Daniel Xu (6):
bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc
bpf: xfrm: Add bpf_xdp_xfrm_state_release() kfunc
libbpf: Add BPF_CORE_WRITE_BITFIELD() macro
bpf: selftests: test_tunnel: Use vmlinux.h declarations
bpf: selftests: Move xfrm tunnel test to test_progs
bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state()
include/net/xfrm.h | 9 +
net/xfrm/Makefile | 1 +
net/xfrm/xfrm_policy.c | 2 +
net/xfrm/xfrm_state_bpf.c | 128 +++++++++++++++
tools/lib/bpf/bpf_core_read.h | 36 ++++
.../selftests/bpf/prog_tests/test_tunnel.c | 155 ++++++++++++++++++
.../selftests/bpf/progs/bpf_tracing_net.h | 1 +
.../selftests/bpf/progs/test_tunnel_kern.c | 138 +++++++++-------
tools/testing/selftests/bpf/test_tunnel.sh | 92 -----------
9 files changed, 412 insertions(+), 150 deletions(-)
create mode 100644 net/xfrm/xfrm_state_bpf.c
--
2.42.1
This patchset adds two kfunc helpers, bpf_xdp_get_xfrm_state() and
bpf_xdp_xfrm_state_release() that wrap xfrm_state_lookup() and
xfrm_state_put(). The intent is to support software RSS (via XDP) for
the ongoing/upcoming ipsec pcpu work [0]. Recent experiments performed
on (hopefully) reproducible AWS testbeds indicate that single tunnel
pcpu ipsec can reach line rate on 100G ENA nics.
Note this patchset only tests/shows generic xfrm_state access. The
"secret sauce" (if you can really even call it that) involves accessing
a soon-to-be-upstreamed pcpu_num field in xfrm_state. Early example is
available here [1].
[0]: https://datatracker.ietf.org/doc/draft-ietf-ipsecme-multi-sa-performance/03/
[1]: https://github.com/danobi/xdp-tools/blob/e89a1c617aba3b50d990f779357d6ce286…
Changes from RFCv2:
* Rebased to ipsec-next
* Fix netns leak
Changes from RFCv1:
* Add Antony's commit tags
* Add KF_ACQUIRE and KF_RELEASE semantics
Daniel Xu (7):
bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc
bpf: xfrm: Add bpf_xdp_xfrm_state_release() kfunc
bpf: selftests: test_tunnel: Use ping -6 over ping6
bpf: selftests: test_tunnel: Mount bpffs if necessary
bpf: selftests: test_tunnel: Use vmlinux.h declarations
bpf: selftests: test_tunnel: Disable CO-RE relocations
bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state()
include/net/xfrm.h | 9 ++
net/xfrm/Makefile | 1 +
net/xfrm/xfrm_policy.c | 2 +
net/xfrm/xfrm_state_bpf.c | 127 ++++++++++++++++++
.../selftests/bpf/progs/bpf_tracing_net.h | 1 +
.../selftests/bpf/progs/test_tunnel_kern.c | 98 ++++++++------
tools/testing/selftests/bpf/test_tunnel.sh | 43 ++++--
7 files changed, 227 insertions(+), 54 deletions(-)
create mode 100644 net/xfrm/xfrm_state_bpf.c
--
2.42.1