There were several attempts to resolve circular include dependency
after the addition of percpu.h: 1c9df907da83 ("random: fix circular
include dependency on arm64 after addition of percpu.h"), c0842fbc1b18
("random32: move the pseudo-random 32-bit definitions to prandom.h") and
finally d9f29deb7fe8 ("prandom: Remove unused include") that completely
removes inclusion of <linux/percpu.h>.
Due to legacy reasons, <linux/random.h> includes <linux/prandom.h>, but
with the commit entry remark:
--quote--
A further cleanup step would be to remove this from <linux/random.h>
entirely, and make people who use the prandom infrastructure include
just the new header file. That's a bit of a churn patch, but grepping
for "prandom_" and "next_pseudo_random32" "struct rnd_state" should
catch most users.
But it turns out that that nice cleanup step is fairly painful, because
a _lot_ of code currently seems to depend on the implicit include of
<linux/random.h>, which can currently come in a lot of ways, including
such fairly core headfers as <linux/net.h>.
So the "nice cleanup" part may or may never happen.
--/quote--
__percpu tag is currently defined in include/linux/compiler_types.h,
so there is no direct need for the inclusion of <linux/percpu.h>.
However, in [1] we would like to repurpose __percpu tag as a named
address space qualifier, where __percpu macro uses defines from
<linux/percpu.h>.
This patch series is the "nice cleanup" part, and allows us to finally
include <linux/percpu.h> in prandom.h.
The whole series was tested by compiling the kernel for x86_64 allconfig
and some popular architectures, namely arm64 defconfig, powerpc defconfig
and loongarch defconfig.
[1] https://lore.kernel.org/lkml/20240812115945.484051-4-ubizjak@gmail.com/
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: x86(a)kernel.org
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Jani Nikula <jani.nikula(a)linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Tvrtko Ursulin <tursulin(a)ursulin.net>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Hans Verkuil <hverkuil(a)xs4all.nl>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Miquel Raynal <miquel.raynal(a)bootlin.com>
Cc: Richard Weinberger <richard(a)nod.at>
Cc: Vignesh Raghavendra <vigneshr(a)ti.com>
Cc: Eric Biggers <ebiggers(a)kernel.org>
Cc: "Theodore Y. Ts'o" <tytso(a)mit.edu>
Cc: Jaegeuk Kim <jaegeuk(a)kernel.org>
Cc: "Jason A. Donenfeld" <Jason(a)zx2c4.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Hannes Reinecke <hare(a)suse.de>
Cc: "James E.J. Bottomley" <James.Bottomley(a)HansenPartnership.com>
Cc: "Martin K. Petersen" <martin.petersen(a)oracle.com>
Cc: Alexei Starovoitov <ast(a)kernel.org>
Cc: Daniel Borkmann <daniel(a)iogearbox.net>
Cc: John Fastabend <john.fastabend(a)gmail.com>
Cc: Andrii Nakryiko <andrii(a)kernel.org>
Cc: Martin KaFai Lau <martin.lau(a)linux.dev>
Cc: Eduard Zingerman <eddyz87(a)gmail.com>
Cc: Song Liu <song(a)kernel.org>
Cc: Yonghong Song <yonghong.song(a)linux.dev>
Cc: KP Singh <kpsingh(a)kernel.org>
Cc: Stanislav Fomichev <sdf(a)fomichev.me>
Cc: Hao Luo <haoluo(a)google.com>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Brendan Higgins <brendan.higgins(a)linux.dev>
Cc: David Gow <davidgow(a)google.com>
Cc: Rae Moar <rmoar(a)google.com>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Paolo Abeni <pabeni(a)redhat.com>
Cc: Jiri Pirko <jiri(a)resnulli.us>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Cc: Rasmus Villemoes <linux(a)rasmusvillemoes.dk>
Cc: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: Stephen Hemminger <stephen(a)networkplumber.org>
Cc: Jamal Hadi Salim <jhs(a)mojatatu.com>
Cc: Cong Wang <xiyou.wangcong(a)gmail.com>
Cc: Uros Bizjak <ubizjak(a)gmail.com>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: intel-gfx(a)lists.freedesktop.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linux-media(a)vger.kernel.org
Cc: linux-mtd(a)lists.infradead.org
Cc: linux-fscrypt(a)vger.kernel.org
Cc: linux-scsi(a)vger.kernel.org
Cc: bpf(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: kunit-dev(a)googlegroups.com
Uros Bizjak (18):
x86/kaslr: Include <linux/prandom.h> instead of <linux/random.h>
drm/i915/selftests: Include <linux/prandom.h> instead of
<linux/random.h>
drm/lib: Include <linux/prandom.h> instead of <linux/random.h>
media: vivid: Include <linux/prandom.h> in vivid-vid-cap.c
mtd: tests: Include <linux/prandom.h> instead of <linux/random.h>
fscrypt: Include <linux/prandom.h> instead of <linux/random.h>
scsi: libfcoe: Include <linux/prandom.h> instead of <linux/random.h>
bpf: Include <linux/prandom.h> instead of <linux/random.h>
lib/interval_tree_test.c: Include <linux/prandom.h> instead of
<linux/random.h>
kunit: string-stream-test: Include <linux/prandom.h> instead of
<linux/random.h>
random32: Include <linux/prandom.h> instead of <linux/random.h>
lib/rbtree-test: Include <linux/prandom.h> instead of <linux/random.h>
bpf/tests: Include <linux/prandom.h> instead of <linux/random.h>
lib/test_parman: Include <linux/prandom.h> instead of <linux/random.h>
lib/test_scanf: Include <linux/prandom.h> instead of <linux/random.h>
netem: Include <linux/prandom.h> in sch_netem.c
random: Do not include <linux/prandom.h>
prandom: Include <linux/percpu.h>
arch/x86/mm/kaslr.c | 2 +-
drivers/gpu/drm/i915/selftests/i915_gem.c | 2 +-
drivers/gpu/drm/i915/selftests/i915_random.h | 2 +-
drivers/gpu/drm/i915/selftests/scatterlist.c | 2 +-
drivers/gpu/drm/lib/drm_random.h | 2 +-
drivers/media/test-drivers/vivid/vivid-vid-cap.c | 1 +
drivers/mtd/tests/oobtest.c | 2 +-
drivers/mtd/tests/pagetest.c | 2 +-
drivers/mtd/tests/subpagetest.c | 2 +-
fs/crypto/keyring.c | 2 +-
include/linux/prandom.h | 1 +
include/linux/random.h | 7 -------
include/scsi/libfcoe.h | 2 +-
kernel/bpf/core.c | 2 +-
lib/interval_tree_test.c | 2 +-
lib/kunit/string-stream-test.c | 1 +
lib/random32.c | 2 +-
lib/rbtree_test.c | 2 +-
lib/test_bpf.c | 2 +-
lib/test_parman.c | 2 +-
lib/test_scanf.c | 2 +-
net/sched/sch_netem.c | 1 +
22 files changed, 21 insertions(+), 24 deletions(-)
--
2.46.0
This is something that I've been thinking about for a while. We had a
discussion at LPC 2020 about this[1] but the proposals suggested there
never materialised.
In short, it is quite difficult for userspace to detect the feature
capability of syscalls at runtime. This is something a lot of programs
want to do, but they are forced to create elaborate scenarios to try to
figure out if a feature is supported without causing damage to the
system. For the vast majority of cases, each individual feature also
needs to be tested individually (because syscall results are
all-or-nothing), so testing even a single syscall's feature set can
easily inflate the startup time of programs.
This patchset implements the fairly minimal design I proposed in this
talk[2] and in some old LKML threads (though I can't find the exact
references ATM). The general flow looks like:
1. Userspace will indicate to the kernel that a syscall should a be
no-op by setting the top bit of the extensible struct size argument.
We will almost certainly never support exabyte sized structs, so the
top bits are free for us to use as makeshift flag bits. This is
preferable to using the per-syscall flag field inside the structure
because seccomp can easily detect the bit in the flag and allow the
probe or forcefully return -EEXTSYS_NOOP.
2. The kernel will then fill the provided structure with every valid
bit pattern that the current kernel understands.
For flags or other bitflag-like fields, this is the set of valid
flags or bits. For pointer fields or fields that take an arbitrary
value, the field has every bit set (0xFF... to fill the field) to
indicate that any value is valid in the field.
3. The syscall then returns -EEXTSYS_NOOP which is an errno that will
only ever be used for this purpose (so userspace can be sure that
the request succeeded).
On older kernels, the syscall will return a different error (usually
-E2BIG or -EFAULT) and userspace can do their old-fashioned checks.
4. Userspace can then check which flags and fields are supported by
looking at the fields in the returned structure. Flags are checked
by doing an AND with the flags field, and field support can checked
by comparing to 0. In principle you could just AND the entire
structure if you wanted to do this check generically without caring
about the structure contents (this is what libraries might consider
doing).
Userspace can even find out the internal kernel structure size by
passing a PAGE_SIZE buffer and seeing how many bytes are non-zero.
As with copy_struct_from_user(), this is designed to be forward- and
backwards- compatible.
This allows programas to get a one-shot understanding of what features a
syscall supports without having to do any elaborate setups or tricks to
detect support for destructive features. Flags can simply be ANDed to
check if they are in the supported set, and fields can just be checked
to see if they are non-zero.
This patchset is IMHO the simplest way we can add the ability to
introspect the feature set of extensible struct (copy_struct_from_user)
syscalls. It doesn't preclude the chance of a more generic mechanism
being added later.
The intended way of using this interface to get feature information
looks something like the following (imagine that openat2 has gained a
new field and a new flag in the future):
static bool openat2_no_automount_supported;
static bool openat2_cwd_fd_supported;
int check_openat2_support(void)
{
int err;
struct open_how how = {};
err = openat2(AT_FDCWD, ".", &how, CHECK_FIELDS | sizeof(how));
assert(err < 0);
switch (errno) {
case EFAULT: case E2BIG:
/* Old kernel... */
check_support_the_old_way();
break;
case EEXTSYS_NOOP:
openat2_no_automount_supported = (how.flags & RESOLVE_NO_AUTOMOUNT);
openat2_cwd_fd_supported = (how.cwd_fd != 0);
break;
}
}
This series adds CHECK_FIELDS support for the following extensible
struct syscalls, as they are quite likely to grow flags in the near
future:
* openat2
* clone3
* mount_setattr
[1]: https://lwn.net/Articles/830666/
[2]: https://youtu.be/ggD-eb3yPVs
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
Changes in v2:
- Add CHECK_FIELDS support to mount_setattr(2).
- Fix build failure on architectures with custom errno values.
- Rework selftests to use the tools/ uAPI headers rather than custom
defining EEXTSYS_NOOP.
- Make sure we return -EINVAL and -E2BIG for invalid sizes even if
CHECK_FIELDS is set, and add some tests for that.
- v1: <https://lore.kernel.org/r/20240902-extensible-structs-check_fields-v1-0-545…>
---
Aleksa Sarai (10):
uaccess: add copy_struct_to_user helper
sched_getattr: port to copy_struct_to_user
openat2: explicitly return -E2BIG for (usize > PAGE_SIZE)
openat2: add CHECK_FIELDS flag to usize argument
selftests: openat2: add 0xFF poisoned data after misaligned struct
selftests: openat2: add CHECK_FIELDS selftests
clone3: add CHECK_FIELDS flag to usize argument
selftests: clone3: add CHECK_FIELDS selftests
mount_setattr: add CHECK_FIELDS flag to usize argument
selftests: mount_setattr: add CHECK_FIELDS selftest
arch/alpha/include/uapi/asm/errno.h | 3 +
arch/mips/include/uapi/asm/errno.h | 3 +
arch/parisc/include/uapi/asm/errno.h | 3 +
arch/sparc/include/uapi/asm/errno.h | 3 +
fs/namespace.c | 17 ++
fs/open.c | 18 ++
include/linux/uaccess.h | 98 ++++++++
include/uapi/asm-generic/errno.h | 3 +
include/uapi/linux/openat2.h | 2 +
kernel/fork.c | 30 ++-
kernel/sched/syscalls.c | 42 +---
tools/arch/alpha/include/uapi/asm/errno.h | 3 +
tools/arch/mips/include/uapi/asm/errno.h | 3 +
tools/arch/parisc/include/uapi/asm/errno.h | 3 +
tools/arch/sparc/include/uapi/asm/errno.h | 3 +
tools/include/uapi/asm-generic/errno.h | 3 +
tools/include/uapi/asm-generic/posix_types.h | 101 ++++++++
tools/testing/selftests/clone3/.gitignore | 1 +
tools/testing/selftests/clone3/Makefile | 4 +-
.../testing/selftests/clone3/clone3_check_fields.c | 264 +++++++++++++++++++++
tools/testing/selftests/mount_setattr/Makefile | 2 +-
.../selftests/mount_setattr/mount_setattr_test.c | 53 ++++-
tools/testing/selftests/openat2/Makefile | 2 +
tools/testing/selftests/openat2/openat2_test.c | 165 ++++++++++++-
24 files changed, 778 insertions(+), 51 deletions(-)
---
base-commit: 431c1646e1f86b949fa3685efc50b660a364c2b6
change-id: 20240803-extensible-structs-check_fields-a47e94cef691
Best regards,
--
Aleksa Sarai <cyphar(a)cyphar.com>
From: Jason Xing <kernelxing(a)tencent.com>
When I was trying to modify the tx timestamping feature, I found that
running "./txtimestamp -4 -C -L 127.0.0.1" didn't reflect the fact
properly.
In this selftest file, we respectively test three tx generation flags.
With the generation and report flag enabled, we expect that the timestamp
must be returned to the userspace unless 1) generating the timestamp
fails, 2) reporting the timestamp fails. So we should test if the
timestamps can be read and parsed succuessfuly in txtimestamp.c, or
else there is a bug in the kernel.
After adding the check so that running ./txtimestamp will reflect the
result correctly like this if there is an error in kernel:
protocol: TCP
payload: 10
server port: 9000
family: INET
test SND
USR: 1725458477 s 667997 us (seq=0, len=0)
Failed to parse timestamps
USR: 1725458477 s 718128 us (seq=0, len=0)
Failed to parse timestamps
USR: 1725458477 s 768273 us (seq=0, len=0)
Failed to parse timestamps
USR: 1725458477 s 818416 us (seq=0, len=0)
Failed to parse timestamps
...
Signed-off-by: Jason Xing <kernelxing(a)tencent.com>
---
I'm not sure if I should also check if the cur->tv_sec or cur->tv_nsec
is zero in __print_timestamp(). Could it be valid when either of
them is zero?
---
tools/testing/selftests/net/txtimestamp.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/net/txtimestamp.c b/tools/testing/selftests/net/txtimestamp.c
index ec60a16c9307..b69aae840a67 100644
--- a/tools/testing/selftests/net/txtimestamp.c
+++ b/tools/testing/selftests/net/txtimestamp.c
@@ -358,6 +358,10 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int payload_len)
if (batch > 1)
fprintf(stderr, "batched %d timestamps\n", batch);
+ else if (!batch) {
+ fprintf(stderr, "Failed to parse timestamps\n");
+ test_failed = true;
+ }
}
static int recv_errmsg(int fd)
--
2.37.3
From: Willem de Bruijn <willemb(a)google.com>
Lay the groundwork to import into kselftests the over 150 packetdrill
TCP/IP conformance tests on github.com/google/packetdrill.
Florian recently added support for packetdrill tests in nf_conntrack,
in commit a8a388c2aae49 ("selftests: netfilter: add packetdrill based
conntrack tests").
This patch takes a slightly different implementation and reuses the
ksft python library for its KTAP, ksft, NetNS and other such tooling.
It also anticipates the large number of testcases, by creating a
separate kselftest for each feature (directory). It does this by
copying the template script packetdrill_ksft.py for each directory,
and putting those in TEST_CUSTOM_PROGS so that kselftests runs each.
To demonstrate the code with minimal patch size, initially import only
two features/directories from github. One with a single script, and
one with two. This was the only reason to pick tcp/inq and tcp/md5.
Any future imports of packetdrill tests should require no additional
coding. Just add the tcp/$FEATURE directory with *.pkt files.
Implementation notes:
- restore alphabetical order when adding the new directory to
tools/testing/selftests/Makefile
- copied *.pkt files and support verbatim from the github project,
except for
- update common/defaults.sh path (there are two paths on github)
- add SPDX headers
- remove one author statement
- Acknowledgment: drop an e (checkpatch)
Tested:
make -C tools/testing/selftests/ \
TARGETS=net/packetdrill \
install INSTALL_PATH=$KSFT_INSTALL_PATH
# in virtme-ng
sudo ./run_kselftest.sh -c net/packetdrill
sudo ./run_kselftest.sh -t net/packetdrill:tcp_inq.py
Result:
kselftest: Running tests in net/packetdrill
TAP version 13
1..2
# timeout set to 45
# selftests: net/packetdrill: tcp_inq.py
# KTAP version 1
# 1..4
# ok 1 tcp_inq.client-v4
# ok 2 tcp_inq.client-v6
# ok 3 tcp_inq.server-v4
# ok 4 tcp_inq.server-v6
# # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
ok 1 selftests: net/packetdrill: tcp_inq.py
# timeout set to 45
# selftests: net/packetdrill: tcp_md5.py
# KTAP version 1
# 1..2
# ok 1 tcp_md5.md5-only-on-client-ack-v4
# ok 2 tcp_md5.md5-only-on-client-ack-v6
# # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
ok 2 selftests: net/packetdrill: tcp_md5.py
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
---
RFC points for discussion
ksft: the choice for this python framework introduces a dependency on
the YNL scripts, and some non-obvious code:
- to include the net/lib dep in tools/testing/selftests/Makefile
- a boilerplate lib/py/__init__.py that each user of ksft will need
It seems preferable to me to use ksft.py over reinventing the wheel,
e.g., to print KTAP output. But perhaps we can make it more obvious
for future ksft users, and make the dependency on YNL optional.
kselftest-per-directory: copying packetdrill_ksft.py to create a
separate script per dir is a bit of a hack. A single script is much
simpler, optionally with nested KTAP (not supported yet by ksft). But,
I'm afraid that running time without intermediate output will be very
long when we integrate all packetdrill scripts.
nf_conntrack: we can dedup the common.sh.
*pkt files: which of the 150+ scripts on github are candidates for
kselftests, all or a subset? To avoid change detector tests. And what
is the best way to eventually send up to 150 files, 7K LoC.
---
tools/testing/selftests/Makefile | 7 +-
.../selftests/net/packetdrill/.gitignore | 1 +
.../selftests/net/packetdrill/Makefile | 28 ++++++
.../net/packetdrill/lib/py/__init__.py | 15 ++++
.../net/packetdrill/packetdrill_ksft.py | 90 +++++++++++++++++++
.../net/packetdrill/tcp/common/defaults.sh | 63 +++++++++++++
.../net/packetdrill/tcp/common/set_sysctls.py | 38 ++++++++
.../net/packetdrill/tcp/inq/client.pkt | 51 +++++++++++
.../net/packetdrill/tcp/inq/server.pkt | 51 +++++++++++
.../tcp/md5/md5-only-on-client-ack.pkt | 28 ++++++
10 files changed, 369 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/net/packetdrill/.gitignore
create mode 100644 tools/testing/selftests/net/packetdrill/Makefile
create mode 100644 tools/testing/selftests/net/packetdrill/lib/py/__init__.py
create mode 100755 tools/testing/selftests/net/packetdrill/packetdrill_ksft.py
create mode 100755 tools/testing/selftests/net/packetdrill/tcp/common/defaults.sh
create mode 100755 tools/testing/selftests/net/packetdrill/tcp/common/set_sysctls.py
create mode 100644 tools/testing/selftests/net/packetdrill/tcp/inq/client.pkt
create mode 100644 tools/testing/selftests/net/packetdrill/tcp/inq/server.pkt
create mode 100644 tools/testing/selftests/net/packetdrill/tcp/md5/md5-only-on-client-ack.pkt
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index a5f1c0c27dff9..f03d6fee7ac54 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -65,10 +65,11 @@ TARGETS += net/af_unix
TARGETS += net/forwarding
TARGETS += net/hsr
TARGETS += net/mptcp
-TARGETS += net/openvswitch
-TARGETS += net/tcp_ao
TARGETS += net/netfilter
+TARGETS += net/openvswitch
+TARGETS += net/packetdrill
TARGETS += net/rds
+TARGETS += net/tcp_ao
TARGETS += nsfs
TARGETS += perf_events
TARGETS += pidfd
@@ -122,7 +123,7 @@ TARGETS_HOTPLUG = cpu-hotplug
TARGETS_HOTPLUG += memory-hotplug
# Networking tests want the net/lib target, include it automatically
-ifneq ($(filter net drivers/net drivers/net/hw,$(TARGETS)),)
+ifneq ($(filter net net/packetdrill drivers/net drivers/net/hw,$(TARGETS)),)
ifeq ($(filter net/lib,$(TARGETS)),)
INSTALL_DEP_TARGETS := net/lib
endif
diff --git a/tools/testing/selftests/net/packetdrill/.gitignore b/tools/testing/selftests/net/packetdrill/.gitignore
new file mode 100644
index 0000000000000..a40f1a600eb94
--- /dev/null
+++ b/tools/testing/selftests/net/packetdrill/.gitignore
@@ -0,0 +1 @@
+tcp*sh
diff --git a/tools/testing/selftests/net/packetdrill/Makefile b/tools/testing/selftests/net/packetdrill/Makefile
new file mode 100644
index 0000000000000..d94c51098d1f0
--- /dev/null
+++ b/tools/testing/selftests/net/packetdrill/Makefile
@@ -0,0 +1,28 @@
+# SPDX-License-Identifier: GPL-2.0
+
+# KSFT includes
+TEST_INCLUDES := $(wildcard lib/py/*.py ../lib/py/*.py)
+
+# Packetdrill support file(s)
+TEST_INCLUDES += tcp/common/defaults.sh
+TEST_INCLUDES += tcp/common/set_sysctls.py
+
+# Packetdrill scripts: all .pkt in subdirectories
+TEST_INCLUDES += $(wildcard tcp/**/*.pkt)
+
+# Create a separate ksft test for each subdirectory
+# Running all packetdrill tests in one go will take too long
+#
+# For each tcp/$subdir, create a test script tcp_$subdir.py
+# Exclude tcp/common, which is a helper directory
+TEST_DIRS := $(wildcard tcp/*)
+TEST_DIRS := $(filter-out tcp/common, $(TEST_DIRS))
+TEST_CUSTOM_PROGS := $(foreach dir,$(TEST_DIRS),$(subst /,_,$(dir)).py)
+
+$(TEST_CUSTOM_PROGS) : packetdrill_ksft.py
+ cp $< $@
+
+# Needed to generate all TEST_CUSTOM_PROGS
+all: $(TEST_CUSTOM_PROGS)
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/net/packetdrill/lib/py/__init__.py b/tools/testing/selftests/net/packetdrill/lib/py/__init__.py
new file mode 100644
index 0000000000000..51bb6dda43d65
--- /dev/null
+++ b/tools/testing/selftests/net/packetdrill/lib/py/__init__.py
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0
+
+import pathlib
+import sys
+
+KSFT_DIR = (pathlib.Path(__file__).parent / "../../../..").resolve()
+
+try:
+ sys.path.append(KSFT_DIR.as_posix())
+ from net.lib.py import *
+except ModuleNotFoundError as e:
+ ksft_pr("Failed importing `net` library from kernel sources")
+ ksft_pr(str(e))
+ ktap_result(True, comment="SKIP")
+ sys.exit(4)
diff --git a/tools/testing/selftests/net/packetdrill/packetdrill_ksft.py b/tools/testing/selftests/net/packetdrill/packetdrill_ksft.py
new file mode 100755
index 0000000000000..62572a5b8331c
--- /dev/null
+++ b/tools/testing/selftests/net/packetdrill/packetdrill_ksft.py
@@ -0,0 +1,90 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""Run packetdrill tests in the ksft harness.
+
+ Run all packetdrill tests in a subdirectory.
+ Detect the relevant subdirectory from this script name.
+ (Because the script cannot be given arguments.)
+
+ Run each test, for both IPv4 and IPv6.
+ Return a separate ksft result for each test case.
+"""
+
+import glob
+import os
+import pathlib
+import shutil
+
+from lib.py import cmd, ksft_exit, ksft_run, KsftSkipEx, NetNS
+
+
+def test_func_builder(pktfile_path, ipv4):
+ """Create a function that can be passed to ksft_run."""
+
+ def f():
+ if ipv4:
+ args = ("--ip_version=ipv4 "
+ "--local_ip=192.168.0.1 "
+ "--gateway_ip=192.168.0.1 "
+ "--netmask_ip=255.255.0.0 "
+ "--remote_ip=192.0.2.1 "
+ "-D CMSG_LEVEL_IP=SOL_IP "
+ "-D CMSG_TYPE_RECVERR=IP_RECVERR "
+ )
+ else:
+ args = ("--ip_version=ipv6 --mtu=1520 "
+ "--local_ip=fd3d:0a0b:17d6::1 "
+ "--gateway_ip=fd3d:0a0b:17d6:8888::1 "
+ "--remote_ip=fd3d:fa7b:d17d::1 "
+ "-D CMSG_LEVEL_IP=SOL_IPV6 "
+ "-D CMSG_TYPE_RECVERR=IPV6_RECVERR"
+ )
+
+ if not shutil.which("packetdrill"):
+ raise KsftSkipEx("Cannot find packetdrill")
+
+ netns = NetNS()
+
+ # Call packetdrill from the directory hosting the .pkt script,
+ # because scripts can have relative includes.
+ savedir = os.getcwd()
+ os.chdir(os.path.dirname(pktfile_path))
+ basename = os.path.basename(pktfile_path)
+ cmd(f"packetdrill {args} {basename}", ns=netns)
+ os.chdir(savedir)
+
+ if ipv4:
+ f.__name__ = pathlib.Path(pktfile_path).stem + "-v4"
+ else:
+ f.__name__ = pathlib.Path(pktfile_path).stem + "-v6"
+
+ return f
+
+
+def scriptname_to_testdir(filepath):
+ """Extract the directory to run from this filename."""
+
+ suffix = ".sh"
+
+ subdir = os.path.basename(filepath)
+ subdir = subdir[:-len(suffix)]
+ subdir = subdir.replace("_", "/")
+ return subdir
+
+
+def main() -> None:
+ subdir = scriptname_to_testdir(__file__)
+ files = glob.glob(f"{subdir}/**/*.pkt", recursive=True)
+
+ cases = []
+ for file in files:
+ for ipv4 in [True, False]:
+ cases.append(test_func_builder(file, ipv4=ipv4))
+
+ ksft_run(cases=cases)
+ ksft_exit()
+
+
+if __name__ == "__main__":
+ main()
diff --git a/tools/testing/selftests/net/packetdrill/tcp/common/defaults.sh b/tools/testing/selftests/net/packetdrill/tcp/common/defaults.sh
new file mode 100755
index 0000000000000..1095a7b22f44d
--- /dev/null
+++ b/tools/testing/selftests/net/packetdrill/tcp/common/defaults.sh
@@ -0,0 +1,63 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Set standard production config values that relate to TCP behavior.
+
+# Flush old cached data (fastopen cookies).
+ip tcp_metrics flush all > /dev/null 2>&1
+
+# TCP min, default, and max receive and send buffer sizes.
+sysctl -q net.ipv4.tcp_rmem="4096 540000 $((15*1024*1024))"
+sysctl -q net.ipv4.tcp_wmem="4096 $((256*1024)) 4194304"
+
+# TCP timestamps.
+sysctl -q net.ipv4.tcp_timestamps=1
+
+# TCP SYN(ACK) retry thresholds
+sysctl -q net.ipv4.tcp_syn_retries=5
+sysctl -q net.ipv4.tcp_synack_retries=5
+
+# TCP Forward RTO-Recovery, RFC 5682.
+sysctl -q net.ipv4.tcp_frto=2
+
+# TCP Selective Acknowledgements (SACK)
+sysctl -q net.ipv4.tcp_sack=1
+
+# TCP Duplicate Selective Acknowledgements (DSACK)
+sysctl -q net.ipv4.tcp_dsack=1
+
+# TCP FACK (Forward Acknowldgement)
+sysctl -q net.ipv4.tcp_fack=0
+
+# TCP reordering degree ("dupthresh" threshold for entering Fast Recovery).
+sysctl -q net.ipv4.tcp_reordering=3
+
+# TCP congestion control.
+sysctl -q net.ipv4.tcp_congestion_control=cubic
+
+# TCP slow start after idle.
+sysctl -q net.ipv4.tcp_slow_start_after_idle=0
+
+# TCP RACK and TLP.
+sysctl -q net.ipv4.tcp_early_retrans=4 net.ipv4.tcp_recovery=1
+
+# TCP method for deciding when to defer sending to accumulate big TSO packets.
+sysctl -q net.ipv4.tcp_tso_win_divisor=3
+
+# TCP Explicit Congestion Notification (ECN)
+sysctl -q net.ipv4.tcp_ecn=0
+
+sysctl -q net.ipv4.tcp_pacing_ss_ratio=200
+sysctl -q net.ipv4.tcp_pacing_ca_ratio=120
+sysctl -q net.ipv4.tcp_notsent_lowat=4294967295 > /dev/null 2>&1
+
+sysctl -q net.ipv4.tcp_fastopen=0x70403
+sysctl -q net.ipv4.tcp_fastopen_key=a1a1a1a1-b2b2b2b2-c3c3c3c3-d4d4d4d4
+
+sysctl -q net.ipv4.tcp_syncookies=1
+
+# Override the default qdisc on the tun device.
+# Many tests fail with timing errors if the default
+# is FQ and that paces their flows.
+tc qdisc add dev tun0 root pfifo
+
diff --git a/tools/testing/selftests/net/packetdrill/tcp/common/set_sysctls.py b/tools/testing/selftests/net/packetdrill/tcp/common/set_sysctls.py
new file mode 100755
index 0000000000000..5ddf456ae973a
--- /dev/null
+++ b/tools/testing/selftests/net/packetdrill/tcp/common/set_sysctls.py
@@ -0,0 +1,38 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""Sets sysctl values and writes a file that restores them.
+
+The arguments are of the form "<proc-file>=<val>" separated by spaces.
+The program first reads the current value of the proc-file and creates
+a shell script named "/tmp/sysctl_restore_${PACKETDRILL_PID}.sh" which
+restores the values when executed. It then sets the new values.
+
+PACKETDRILL_PID is set by packetdrill to the pid of itself, so a .pkt
+file could restore sysctls by running `/tmp/sysctl_restore_${PPID}.sh`
+at the end.
+"""
+
+import os
+import subprocess
+import sys
+
+filename = '/tmp/sysctl_restore_%s.sh' % os.environ['PACKETDRILL_PID']
+
+# Open file for restoring sysctl values
+restore_file = open(filename, 'w')
+print('#!/bin/bash', file=restore_file)
+
+for a in sys.argv[1:]:
+ sysctl = a.split('=')
+ # sysctl[0] contains the proc-file name, sysctl[1] the new value
+
+ # read current value and add restore command to file
+ cur_val = subprocess.check_output(['cat', sysctl[0]], universal_newlines=True)
+ print('echo "%s" > %s' % (cur_val.strip(), sysctl[0]), file=restore_file)
+
+ # set new value
+ cmd = 'echo "%s" > %s' % (sysctl[1], sysctl[0])
+ os.system(cmd)
+
+os.system('chmod u+x %s' % filename)
diff --git a/tools/testing/selftests/net/packetdrill/tcp/inq/client.pkt b/tools/testing/selftests/net/packetdrill/tcp/inq/client.pkt
new file mode 100644
index 0000000000000..8cc7798c7808f
--- /dev/null
+++ b/tools/testing/selftests/net/packetdrill/tcp/inq/client.pkt
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+// Test TCP_INQ and TCP_CM_INQ on the client side.
+`../common/defaults.sh
+`
+
+// Create a socket and set it to non-blocking.
+ 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+ +0 fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
+ +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+
+// Connect to the server and enable TCP_INQ.
+ +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+ +0 setsockopt(3, SOL_TCP, TCP_INQ, [1], 4) = 0
+
+ +0 > S 0:0(0) <mss 1460,sackOK,TS val 100 ecr 0,nop,wscale 8>
+ +.01 < S. 0:0(0) ack 1 win 5792 <mss 1460,sackOK,TS val 700 ecr 100,nop,wscale 7>
+ +0 > . 1:1(0) ack 1 <nop,nop,TS val 200 ecr 700>
+
+// Now we have 10K of data ready on the socket.
+ +0 < . 1:10001(10000) ack 1 win 514
+ +0 > . 1:1(0) ack 10001 <nop,nop,TS val 200 ecr 700>
+
+// We read 1K and we should have 9K ready to read.
+ +0 recvmsg(3, {msg_name(...)=...,
+ msg_iov(1)=[{..., 1000}],
+ msg_flags=0,
+ msg_control=[{cmsg_level=SOL_TCP,
+ cmsg_type=TCP_CM_INQ,
+ cmsg_data=9000}]}, 0) = 1000
+// We read 9K and we should have no further data ready to read.
+ +0 recvmsg(3, {msg_name(...)=...,
+ msg_iov(1)=[{..., 9000}],
+ msg_flags=0,
+ msg_control=[{cmsg_level=SOL_TCP,
+ cmsg_type=TCP_CM_INQ,
+ cmsg_data=0}]}, 0) = 9000
+
+// Server sends more data and closes the connections.
+ +0 < F. 10001:20001(10000) ack 1 win 514
+ +0 > . 1:1(0) ack 20002 <nop,nop,TS val 200 ecr 700>
+
+// We read 10K and we should have one "fake" byte because the connection is
+// closed.
+ +0 recvmsg(3, {msg_name(...)=...,
+ msg_iov(1)=[{..., 10000}],
+ msg_flags=0,
+ msg_control=[{cmsg_level=SOL_TCP,
+ cmsg_type=TCP_CM_INQ,
+ cmsg_data=1}]}, 0) = 10000
+// Now, receive EOF.
+ +0 read(3, ..., 2000) = 0
diff --git a/tools/testing/selftests/net/packetdrill/tcp/inq/server.pkt b/tools/testing/selftests/net/packetdrill/tcp/inq/server.pkt
new file mode 100644
index 0000000000000..fd78609087b91
--- /dev/null
+++ b/tools/testing/selftests/net/packetdrill/tcp/inq/server.pkt
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+// Test TCP_INQ and TCP_CM_INQ on the server side.
+`../common/defaults.sh
+`
+
+// Initialize connection
+ 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+ +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+ +0 bind(3, ..., ...) = 0
+ +0 listen(3, 1) = 0
+
+ +0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 10>
+ +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
+ +.01 < . 1:1(0) ack 1 win 514
+
+// Accept the connection and enable TCP_INQ.
+ +0 accept(3, ..., ...) = 4
+ +0 setsockopt(4, SOL_TCP, TCP_INQ, [1], 4) = 0
+
+// Now we have 10K of data ready on the socket.
+ +0 < . 1:10001(10000) ack 1 win 514
+ +0 > . 1:1(0) ack 10001
+
+// We read 2K and we should have 8K ready to read.
+ +0 recvmsg(4, {msg_name(...)=...,
+ msg_iov(1)=[{..., 2000}],
+ msg_flags=0,
+ msg_control=[{cmsg_level=SOL_TCP,
+ cmsg_type=TCP_CM_INQ,
+ cmsg_data=8000}]}, 0) = 2000
+// We read 8K and we should have no further data ready to read.
+ +0 recvmsg(4, {msg_name(...)=...,
+ msg_iov(1)=[{..., 8000}],
+ msg_flags=0,
+ msg_control=[{cmsg_level=SOL_TCP,
+ cmsg_type=TCP_CM_INQ,
+ cmsg_data=0}]}, 0) = 8000
+// Client sends more data and closes the connections.
+ +0 < F. 10001:20001(10000) ack 1 win 514
+ +0 > . 1:1(0) ack 20002
+
+// We read 10K and we should have one "fake" byte because the connection is
+// closed.
+ +0 recvmsg(4, {msg_name(...)=...,
+ msg_iov(1)=[{..., 10000}],
+ msg_flags=0,
+ msg_control=[{cmsg_level=SOL_TCP,
+ cmsg_type=TCP_CM_INQ,
+ cmsg_data=1}]}, 0) = 10000
+// Now, receive error.
+ +0 read(3, ..., 2000) = -1 ENOTCONN (Transport endpoint is not connected)
diff --git a/tools/testing/selftests/net/packetdrill/tcp/md5/md5-only-on-client-ack.pkt b/tools/testing/selftests/net/packetdrill/tcp/md5/md5-only-on-client-ack.pkt
new file mode 100644
index 0000000000000..42b712e14e562
--- /dev/null
+++ b/tools/testing/selftests/net/packetdrill/tcp/md5/md5-only-on-client-ack.pkt
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+// Test what happens when client does not provide MD5 on SYN,
+// but then does on the ACK that completes the three-way handshake.
+
+`../common/defaults.sh`
+
+// Establish a connection.
+ 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+ +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+ +0 bind(3, ..., ...) = 0
+ +0 listen(3, 1) = 0
+
+ +0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 10>
+ +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
+// Ooh, weird: client provides MD5 option on the ACK:
+ +.01 < . 1:1(0) ack 1 win 514 <md5 000102030405060708090a0b0c0d0e0f,nop,nop>
+ +.01 < . 1:1(0) ack 1 win 514 <md5 000102030405060708090a0b0c0d0e0f,nop,nop>
+
+// The TCP listener refcount should be 2, but on buggy kernels it can be 0:
+ +0 `grep " 0A " /proc/net/tcp /proc/net/tcp6 | grep ":1F90"`
+
+// Now here comes the legit ACK:
+ +.01 < . 1:1(0) ack 1 win 514
+
+// Make sure the connection is OK:
+ +0 accept(3, ..., ...) = 4
+
+ +.01 write(4, ..., 1000) = 1000
--
2.46.0.469.g59c65b2a67-goog
damon_test_three_regions_in_vmas() initializes a maple tree with
MM_MT_FLAGS. The flags contains MT_FLAGS_LOCK_EXTERN, which means
mt_lock of the maple tree will not be used. And therefore the maple
tree initialization code skips initialization of the mt_lock. However,
__link_vmas(), which adds vmas for test to the maple tree, uses the
mt_lock. In other words, the uninitialized spinlock is used. The
problem becomes celar when spinlock debugging is turned on, since it
reports spinlock bad magic bug. Fix the issue by not using the mt_lock
as promised.
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Closes: https://lore.kernel.org/1453b2b2-6119-4082-ad9e-f3c5239bf87e@roeck-us.net
Fixes: d0cf3dd47f0d ("damon: convert __damon_va_three_regions to use the VMA iterator")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
---
mm/damon/tests/vaddr-kunit.h | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/mm/damon/tests/vaddr-kunit.h b/mm/damon/tests/vaddr-kunit.h
index 83626483f82b..c6c7e0e0ab07 100644
--- a/mm/damon/tests/vaddr-kunit.h
+++ b/mm/damon/tests/vaddr-kunit.h
@@ -17,23 +17,19 @@
static int __link_vmas(struct maple_tree *mt, struct vm_area_struct *vmas,
ssize_t nr_vmas)
{
- int i, ret = -ENOMEM;
+ int i;
MA_STATE(mas, mt, 0, 0);
if (!nr_vmas)
return 0;
- mas_lock(&mas);
for (i = 0; i < nr_vmas; i++) {
mas_set_range(&mas, vmas[i].vm_start, vmas[i].vm_end - 1);
if (mas_store_gfp(&mas, &vmas[i], GFP_KERNEL))
- goto failed;
+ return -ENOMEM;
}
- ret = 0;
-failed:
- mas_unlock(&mas);
- return ret;
+ return 0;
}
/*
--
2.39.2