Multiple files/programs in `tools/testing/selftests/bpf/prog_tests/` still
heavily use the `CHECK` macro, even when better `ASSERT_` alternatives are
available.
As it was already pointed out by Yonghong Song [1] in the bpf selftests the use
of the ASSERT_* series of macros is preferred over the CHECK macro.
This patchset replaces the usage of `CHECK(` macros to the equivalent `ASSERT_`
family of macros in the following prog_tests:
- bind_perm.c
- bpf_obj_id.c
- bpf_tcp_ca.c
- vmlinux.c
[1] https://lore.kernel.org/lkml/0a142924-633c-44e6-9a92-2dc019656bf2@linux.dev
Changes in v3:
- Addressed the following points mentioned by Yonghong Song
- Improved `bpf_map_lookup_elem` assertion in bpf_tcp_ca.
- Replaced assertion introduced in v2 with one that checks `thread_ret`
instead of `pthread_join`. This ensures that `server`'s return value
(thread_ret) is the one being checked, as oposed to `pthread_join`'s
return value, since the latter one is less likely to fail.
Changes in v2:
- Fixed pthread_join assertion that broke the previous test
Previous version:
v2 - https://lore.kernel.org/lkml/GV1PR10MB6563AECF8E94798A1E5B36A4E8B6A@GV1PR10…
v1 - https://lore.kernel.org/lkml/GV1PR10MB6563FCFF1C5DEBE84FEA985FE8B0A@GV1PR10…
Yuran Pereira (4):
Replaces the usage of CHECK calls for ASSERTs in bpf_tcp_ca
Replaces the usage of CHECK calls for ASSERTs in bind_perm
Replaces the usage of CHECK calls for ASSERTs in bpf_obj_id
selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in
vmlinux
.../selftests/bpf/prog_tests/bind_perm.c | 6 +-
.../selftests/bpf/prog_tests/bpf_obj_id.c | 204 +++++++-----------
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 48 ++---
.../selftests/bpf/prog_tests/vmlinux.c | 16 +-
4 files changed, 105 insertions(+), 169 deletions(-)
--
2.25.1
Changes from v1:
* Dropped some changes that were independently fixed[1]
* No longer separate the f strings to their own patch
* Use r strings when the value is a regular expression
* Updated verification script
In retrospect a script to find the instances and apply fixes isn't that
useful for review, so the attached script this time just looks for
differences in the AST. Apply the series and run the script, with
the two references to compare as arguments.
There are some intentional changes to the AST now though, as the r strings
turn '\t' from a single character tab into a backslash and 't' character
pair (similar for '\n'). This does not affect the correctness of the
regular expression though.
v1: https://lore.kernel.org/all/20230814060704.79655-1-bgray@linux.ibm.com/
[1]: https://lore.kernel.org/all/20230816122133.1231599-1-vishalc@linux.ibm.com/
---
#!/usr/bin/env python3
"""
Verify Python syntax trees are equivalent between two references
"""
import argparse
import ast
from pathlib import Path
import subprocess as sp
def read_file(path: Path, ref: str) -> str:
return sp.run(f"git show {ref}:{path}", stdout=sp.PIPE, shell=True, encoding="utf-8", check=True).stdout
parser = argparse.ArgumentParser("Compare Python ASTs between revisions")
parser.add_argument("ref1", type=str, help="First revision to use")
parser.add_argument("ref2", type=str, help="Second revision to use")
args = parser.parse_args()
for pyfile in Path(".").glob("**/*.py"):
try:
ref1_content = read_file(pyfile, args.ref1)
ref2_content = read_file(pyfile, args.ref2)
except Exception as e:
print(f"ERROR:{pyfile}: Failed to read ({e})")
continue
try:
ref1_syntax = ast.parse(ref1_content, filename=pyfile)
ref2_syntax = ast.parse(ref2_content, filename=pyfile)
except SyntaxError as e:
print(f"ERROR:{pyfile}: Failed to parse, is it Python3? ({e})")
continue
if ast.dump(ref1_syntax) != ast.dump(ref2_syntax):
print(f"ERROR:{pyfile}: Revisions have different AST")
cmd = f"diff <(git show {args.ref1}:{pyfile} | python -m ast) <(git show {args.ref2}:{pyfile} | python -m ast)"
print(cmd)
sp.run(cmd, shell=True)
continue
Benjamin Gray (7):
ia64: fix Python string escapes
Documentation/sphinx: fix Python string escapes
drivers/comedi: fix Python string escapes
scripts: fix Python string escapes
tools/perf: fix Python string escapes
tools/power: fix Python string escapes
selftests/bpf: fix Python string escapes
Documentation/sphinx/cdomain.py | 2 +-
Documentation/sphinx/kernel_abi.py | 2 +-
Documentation/sphinx/kernel_feat.py | 2 +-
Documentation/sphinx/kerneldoc.py | 2 +-
Documentation/sphinx/maintainers_include.py | 8 +++---
arch/ia64/scripts/unwcheck.py | 2 +-
.../ni_routing/tools/convert_csv_to_c.py | 2 +-
scripts/clang-tools/gen_compile_commands.py | 2 +-
scripts/gdb/linux/symbols.py | 2 +-
tools/perf/pmu-events/jevents.py | 2 +-
.../scripts/python/arm-cs-trace-disasm.py | 4 +--
tools/perf/scripts/python/compaction-times.py | 2 +-
.../scripts/python/exported-sql-viewer.py | 4 +--
tools/power/pm-graph/bootgraph.py | 12 ++++-----
.../selftests/bpf/test_bpftool_synctypes.py | 26 +++++++++----------
tools/testing/selftests/bpf/test_offload.py | 2 +-
16 files changed, 38 insertions(+), 38 deletions(-)
--
2.41.0
From: Willem de Bruijn <willemb(a)google.com>
Commit 29f834aa326e ("net_sched: sch_fq: add 3 bands and WRR
scheduling") introduces multiple traffic bands, and per-band maximum
packet count.
Per-band limits ensures that packets in one class cannot fill the
entire qdisc and so cause DoS to the traffic in the other classes.
Verify this behavior:
1. set the limit to 10 per band
2. send 20 pkts on band A: verify that 10 are queued, 10 dropped
3. send 20 pkts on band A: verify that 0 are queued, 20 dropped
4. send 20 pkts on band B: verify that 10 are queued, 10 dropped
Packets must remain queued for a period to trigger this behavior.
Use SO_TXTIME to store packets for 100 msec.
The test reuses existing upstream test infra. The script is a fork of
cmsg_time.sh. The scripts call cmsg_sender.
The test extends cmsg_sender with two arguments:
* '-P' SO_PRIORITY
There is a subtle difference between IPv4 and IPv6 stack behavior:
PF_INET/IP_TOS sets IP header bits and sk_priority
PF_INET6/IPV6_TCLASS sets IP header bits BUT NOT sk_priority
* '-n' num pkts
Send multiple packets in quick succession.
I first attempted a for loop in the script, but this is too slow in
virtualized environments, causing flakiness as the 100ms timeout is
reached and packets are dequeued.
Also do not wait for timestamps to be queued unless timestamps are
requested.
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
---
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/cmsg_sender.c | 50 ++++++++++------
.../testing/selftests/net/fq_band_pktlimit.sh | 57 +++++++++++++++++++
3 files changed, 91 insertions(+), 17 deletions(-)
create mode 100755 tools/testing/selftests/net/fq_band_pktlimit.sh
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 5b2aca4c5f10..9274edfb76ff 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -91,6 +91,7 @@ TEST_PROGS += test_bridge_neigh_suppress.sh
TEST_PROGS += test_vxlan_nolocalbypass.sh
TEST_PROGS += test_bridge_backup_port.sh
TEST_PROGS += fdb_flush.sh
+TEST_PROGS += fq_band_pktlimit.sh
TEST_FILES := settings
diff --git a/tools/testing/selftests/net/cmsg_sender.c b/tools/testing/selftests/net/cmsg_sender.c
index 24b21b15ed3f..8d7575389f58 100644
--- a/tools/testing/selftests/net/cmsg_sender.c
+++ b/tools/testing/selftests/net/cmsg_sender.c
@@ -45,11 +45,13 @@ struct options {
const char *host;
const char *service;
unsigned int size;
+ unsigned int num_pkt;
struct {
unsigned int mark;
unsigned int dontfrag;
unsigned int tclass;
unsigned int hlimit;
+ unsigned int priority;
} sockopt;
struct {
unsigned int family;
@@ -72,6 +74,7 @@ struct options {
} v6;
} opt = {
.size = 13,
+ .num_pkt = 1,
.sock = {
.family = AF_UNSPEC,
.type = SOCK_DGRAM,
@@ -112,7 +115,7 @@ static void cs_parse_args(int argc, char *argv[])
{
int o;
- while ((o = getopt(argc, argv, "46sS:p:m:M:d:tf:F:c:C:l:L:H:")) != -1) {
+ while ((o = getopt(argc, argv, "46sS:p:P:m:M:n:d:tf:F:c:C:l:L:H:")) != -1) {
switch (o) {
case 's':
opt.silent_send = true;
@@ -138,7 +141,9 @@ static void cs_parse_args(int argc, char *argv[])
cs_usage(argv[0]);
}
break;
-
+ case 'P':
+ opt.sockopt.priority = atoi(optarg);
+ break;
case 'm':
opt.mark.ena = true;
opt.mark.val = atoi(optarg);
@@ -146,6 +151,9 @@ static void cs_parse_args(int argc, char *argv[])
case 'M':
opt.sockopt.mark = atoi(optarg);
break;
+ case 'n':
+ opt.num_pkt = atoi(optarg);
+ break;
case 'd':
opt.txtime.ena = true;
opt.txtime.delay = atoi(optarg);
@@ -410,6 +418,10 @@ static void ca_set_sockopts(int fd)
setsockopt(fd, SOL_IPV6, IPV6_UNICAST_HOPS,
&opt.sockopt.hlimit, sizeof(opt.sockopt.hlimit)))
error(ERN_SOCKOPT, errno, "setsockopt IPV6_HOPLIMIT");
+ if (opt.sockopt.priority &&
+ setsockopt(fd, SOL_SOCKET, SO_PRIORITY,
+ &opt.sockopt.priority, sizeof(opt.sockopt.priority)))
+ error(ERN_SOCKOPT, errno, "setsockopt SO_PRIORITY");
}
int main(int argc, char *argv[])
@@ -421,6 +433,7 @@ int main(int argc, char *argv[])
char *buf;
int err;
int fd;
+ int i;
cs_parse_args(argc, argv);
@@ -480,24 +493,27 @@ int main(int argc, char *argv[])
cs_write_cmsg(fd, &msg, cbuf, sizeof(cbuf));
- err = sendmsg(fd, &msg, 0);
- if (err < 0) {
- if (!opt.silent_send)
- fprintf(stderr, "send failed: %s\n", strerror(errno));
- err = ERN_SEND;
- goto err_out;
- } else if (err != (int)opt.size) {
- fprintf(stderr, "short send\n");
- err = ERN_SEND_SHORT;
- goto err_out;
- } else {
- err = ERN_SUCCESS;
+ for (i = 0; i < opt.num_pkt; i++) {
+ err = sendmsg(fd, &msg, 0);
+ if (err < 0) {
+ if (!opt.silent_send)
+ fprintf(stderr, "send failed: %s\n", strerror(errno));
+ err = ERN_SEND;
+ goto err_out;
+ } else if (err != (int)opt.size) {
+ fprintf(stderr, "short send\n");
+ err = ERN_SEND_SHORT;
+ goto err_out;
+ }
}
+ err = ERN_SUCCESS;
- /* Make sure all timestamps have time to loop back */
- usleep(opt.txtime.delay);
+ if (opt.ts.ena) {
+ /* Make sure all timestamps have time to loop back */
+ usleep(opt.txtime.delay);
- cs_read_cmsg(fd, &msg, cbuf, sizeof(cbuf));
+ cs_read_cmsg(fd, &msg, cbuf, sizeof(cbuf));
+ }
err_out:
close(fd);
diff --git a/tools/testing/selftests/net/fq_band_pktlimit.sh b/tools/testing/selftests/net/fq_band_pktlimit.sh
new file mode 100755
index 000000000000..24b77bdf41ff
--- /dev/null
+++ b/tools/testing/selftests/net/fq_band_pktlimit.sh
@@ -0,0 +1,57 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Verify that FQ has a packet limit per band:
+#
+# 1. set the limit to 10 per band
+# 2. send 20 pkts on band A: verify that 10 are queued, 10 dropped
+# 3. send 20 pkts on band A: verify that 0 are queued, 20 dropped
+# 4. send 20 pkts on band B: verify that 10 are queued, 10 dropped
+#
+# Send packets with a 100ms delay to ensure that previously sent
+# packets are still queued when later ones are sent.
+# Use SO_TXTIME for this.
+
+die() {
+ echo "$1"
+ exit 1
+}
+
+# run inside private netns
+if [[ $# -eq 0 ]]; then
+ ./in_netns.sh "$0" __subprocess
+ exit
+fi
+
+ip link add type dummy
+ip link set dev dummy0 up
+ip -6 addr add fdaa::1/128 dev dummy0
+ip -6 route add fdaa::/64 dev dummy0
+tc qdisc replace dev dummy0 root handle 1: fq quantum 1514 initial_quantum 1514 limit 10
+
+./cmsg_sender -6 -p u -d 100000 -n 20 fdaa::2 8000
+OUT1="$(tc -s qdisc show dev dummy0 | grep '^\ Sent')"
+
+./cmsg_sender -6 -p u -d 100000 -n 20 fdaa::2 8000
+OUT2="$(tc -s qdisc show dev dummy0 | grep '^\ Sent')"
+
+./cmsg_sender -6 -p u -d 100000 -n 20 -P 7 fdaa::2 8000
+OUT3="$(tc -s qdisc show dev dummy0 | grep '^\ Sent')"
+
+# Initial stats will report zero sent, as all packets are still
+# queued in FQ. Sleep for the delay period (100ms) and see that
+# twenty are now sent.
+sleep 0.1
+OUT4="$(tc -s qdisc show dev dummy0 | grep '^\ Sent')"
+
+# Log the output after the test
+echo "${OUT1}"
+echo "${OUT2}"
+echo "${OUT3}"
+echo "${OUT4}"
+
+# Test the output for expected values
+echo "${OUT1}" | grep -q '0\ pkt\ (dropped\ 10' || die "unexpected drop count at 1"
+echo "${OUT2}" | grep -q '0\ pkt\ (dropped\ 30' || die "unexpected drop count at 2"
+echo "${OUT3}" | grep -q '0\ pkt\ (dropped\ 40' || die "unexpected drop count at 3"
+echo "${OUT4}" | grep -q '20\ pkt\ (dropped\ 40' || die "unexpected accept count at 4"
--
2.43.0.rc1.413.gea7ed67945-goog
On Wed, Nov 15, 2023 at 01:17:06PM +0800, Liu, Jing2 wrote:
> This is the right way to approach it,
>
> I learned that there was discussion about using io_uring to get the
> page fault without
>
> eventfd notification in [1], and I am new at io_uring and studying the
> man page of
>
> liburing, but there're questions in my mind on how can QEMU get the
> coming page fault
>
> with a good performance.
>
> Since both QEMU and Kernel don't know when comes faults, after QEMU
> submits one
>
> read task to io_uring, we want kernel pending until fault comes. While
> based on
>
> hwpt_fault_fops_read() in [patch v2 4/6], it just returns 0 since
> there's now no fault,
>
> thus this round of read completes to CQ but it's not what we want. So
> I'm wondering
>
> how kernel pending on the read until fault comes. Does fops callback
> need special work to
Implement a fops with poll support that triggers when a new event is
pushed and everything will be fine. There are many examples in the
kernel. The ones in the mlx5 vfio driver spring to mind as a scheme I
recently looked at.
Jason
Multiple files/programs in `tools/testing/selftests/bpf/prog_tests/` still
heavily use the `CHECK` macro, even when better `ASSERT_` alternatives are
available.
As it was already pointed out by Yonghong Song [1] in the bpf selftests the use
of the ASSERT_* series of macros is preferred over the CHECK macro.
This patchset replaces the usage of `CHECK(` macros to the equivalent `ASSERT_`
family of macros in the following prog_tests:
- bind_perm.c
- bpf_obj_id.c
- bpf_tcp_ca.c
- vmlinux.c
[1] https://lore.kernel.org/lkml/0a142924-633c-44e6-9a92-2dc019656bf2@linux.dev
Changes in v2:
- Fixed pthread_join assertion that broke the previous test
Previous version:
v1 - https://lore.kernel.org/lkml/GV1PR10MB6563FCFF1C5DEBE84FEA985FE8B0A@GV1PR10…
Yuran Pereira (4):
Replaces the usage of CHECK calls for ASSERTs in bpf_tcp_ca
Replaces the usage of CHECK calls for ASSERTs in bind_perm
Replaces the usage of CHECK calls for ASSERTs in bpf_obj_id
selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in
vmlinux
.../selftests/bpf/prog_tests/bind_perm.c | 6 +-
.../selftests/bpf/prog_tests/bpf_obj_id.c | 204 +++++++-----------
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 50 ++---
.../selftests/bpf/prog_tests/vmlinux.c | 16 +-
4 files changed, 106 insertions(+), 170 deletions(-)
--
2.25.1
v4:
- Update patch 1 to move apply_wqattrs_lock() and apply_wqattrs_unlock()
down into CONFIG_SYSFS block to avoid compilation warnings.
v3:
- Break out a separate patch to make workqueue_set_unbound_cpumask()
static and move it down to the CONFIG_SYSFS section.
- Remove the "__DEBUG__." prefix and the CFTYPE_DEBUG flag from the
new root only cpuset.cpus.isolated control files and update the
test accordingly.
v2:
- Add 2 read-only workqueue sysfs files to expose the user requested
cpumask as well as the isolated CPUs to be excluded from
wq_unbound_cpumask.
- Ensure that caller of the new workqueue_unbound_exclude_cpumask()
hold cpus_read_lock.
- Update the cpuset code to make sure the cpus_read_lock is held
whenever workqueue_unbound_exclude_cpumask() may be called.
Isolated cpuset partition can currently be created to contain an
exclusive set of CPUs not used in other cgroups and with load balancing
disabled to reduce interference from the scheduler.
The main purpose of this isolated partition type is to dynamically
emulate what can be done via the "isolcpus" boot command line option,
specifically the default domain flag. One effect of the "isolcpus" option
is to remove the isolated CPUs from the cpumasks of unbound workqueues
since running work functions in an isolated CPU can be a major source
of interference. Changing the unbound workqueue cpumasks can be done at
run time by writing an appropriate cpumask without the isolated CPUs to
/sys/devices/virtual/workqueue/cpumask. So one can set up an isolated
cpuset partition and then write to the cpumask sysfs file to achieve
similar level of CPU isolation. However, this manual process can be
error prone.
This patch series implements automatic exclusion of isolated CPUs from
unbound workqueue cpumasks when an isolated cpuset partition is created
and then adds those CPUs back when the isolated partition is destroyed.
There are also other places in the kernel that look at the HK_FLAG_DOMAIN
cpumask or other HK_FLAG_* cpumasks and exclude the isolated CPUs from
certain actions to further reduce interference. CPUs in an isolated
cpuset partition will not be able to avoid those interferences yet. That
may change in the future as the need arises.
Waiman Long (5):
workqueue: Make workqueue_set_unbound_cpumask() static
workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs
from wq_unbound_cpumask
selftests/cgroup: Minor code cleanup and reorganization of
test_cpuset_prs.sh
cgroup/cpuset: Keep track of CPUs in isolated partitions
cgroup/cpuset: Take isolated CPUs out of workqueue unbound cpumask
Documentation/admin-guide/cgroup-v2.rst | 10 +-
include/linux/workqueue.h | 2 +-
kernel/cgroup/cpuset.c | 286 +++++++++++++-----
kernel/workqueue.c | 165 +++++++---
.../selftests/cgroup/test_cpuset_prs.sh | 216 ++++++++-----
5 files changed, 475 insertions(+), 204 deletions(-)
--
2.39.3
The kernel has recently added support for shadow stacks, currently
x86 only using their CET feature but both arm64 and RISC-V have
equivalent features (GCS and Zisslpcfi respectively), I am actively
working on GCS[1]. With shadow stacks the hardware maintains an
additional stack containing only the return addresses for branch
instructions which is not generally writeable by userspace and ensures
that any returns are to the recorded addresses. This provides some
protection against ROP attacks and making it easier to collect call
stacks. These shadow stacks are allocated in the address space of the
userspace process.
Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled. The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread. This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces. As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process in a similar manner
to how the normal stack is specified, keeping the current implicit
allocation behaviour if one is not specified either with clone3() or
through the use of clone(). Unlike normal stacks only the shadow stack
size is specified, similar issues to those that lead to the creation of
map_shadow_stack() apply.
Please note that the x86 portions of this code are build tested only, I
don't appear to have a system that can run CET avaible to me, I have
done testing with an integration into my pending work for GCS. There is
some possibility that the arm64 implementation may require the use of
clone3() and explicit userspace allocation of shadow stacks, this is
still under discussion.
A new architecture feature Kconfig option for shadow stacks is added as
here, this was suggested as part of the review comments for the arm64
GCS series and since we need to detect if shadow stacks are supported it
seemed sensible to roll it in here.
[1] https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org/
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.7-rc1.
- Remove ability to provide preallocated shadow stack, just specify the
desired size.
- Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke…
---
Mark Brown (5):
mm: Introduce ARCH_HAS_USER_SHADOW_STACK
fork: Add shadow stack support to clone3()
selftests/clone3: Factor more of main loop into test_clone3()
selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
kselftest/clone3: Test shadow stack support
arch/x86/Kconfig | 1 +
arch/x86/include/asm/shstk.h | 11 +-
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 30 ++++-
fs/proc/task_mmu.c | 2 +-
include/linux/mm.h | 2 +-
include/linux/sched/task.h | 2 +
include/uapi/linux/sched.h | 4 +
kernel/fork.c | 24 +++-
mm/Kconfig | 6 +
tools/testing/selftests/clone3/clone3.c | 151 ++++++++++++++++------
tools/testing/selftests/clone3/clone3_selftests.h | 7 +
12 files changed, 188 insertions(+), 54 deletions(-)
---
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie(a)kernel.org>