In this patch series, we try to make more register fields writable like
ID_AA64PFR1_EL1.BT since this can benifit the migration between some of the
machines which have different BT values.
Changelog:
----------
RFCv1 -> v1:
* Fix the compilation error.
* Delete the machine specific information and make the description more
generable.
RFCv1: https://lore.kernel.org/all/20240612023553.127813-1-shahuang@redhat.com/
Shaoqin Huang (2):
KVM: arm64: Allow BT field in ID_AA64PFR1_EL1 writable
KVM: selftests: aarch64: Add writable test for ID_AA64PFR1_EL1
arch/arm64/kvm/sys_regs.c | 2 +-
tools/testing/selftests/kvm/aarch64/set_id_regs.c | 6 ++++++
2 files changed, 7 insertions(+), 1 deletion(-)
--
2.40.1
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v4:
- fix errors reported by CI.
v3:
- rename start_client to client_socket
- Use connect_to_addr in connect_to_fd_opt
v2:
- update patch 2, extract a new helper start_client.
- drop patch 3, keep must_fail in network_helper_opts.
Drop type and noconnect from network_helper_opts. And use start_server_str
in mptcp and test_tcp_check_syncookie_user.
Patches 1-4 address Martin's comments in the previous series.
Geliang Tang (6):
selftests/bpf: Drop type from network_helper_opts
selftests/bpf: Use connect_to_addr in connect_to_fd_opt
selftests/bpf: Add client_socket helper
selftests/bpf: Drop noconnect from network_helper_opts
selftests/bpf: Use start_server_str in mptcp
selftests/bpf: Use start_server_str in test_tcp_check_syncookie_user
tools/testing/selftests/bpf/network_helpers.c | 94 +++++++++----------
tools/testing/selftests/bpf/network_helpers.h | 6 +-
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 2 +-
.../selftests/bpf/prog_tests/cgroup_v1v2.c | 4 +-
.../bpf/prog_tests/ip_check_defrag.c | 10 +-
.../testing/selftests/bpf/prog_tests/mptcp.c | 7 +-
.../bpf/test_tcp_check_syncookie_user.c | 29 +-----
7 files changed, 56 insertions(+), 96 deletions(-)
--
2.43.0
Currently, if a user wants to run pmtu.sh and cover all the provided test
cases, they need to install the Open vSwitch userspace utilities. This
dependency is difficult for users as well as CI environments, because the
userspace build and setup may require lots of support and devel packages
to be installed, system setup to be correct, and things like permissions
and selinux policies to be properly configured.
The kernel selftest suite includes an ovs-dpctl.py utility which can
interact with the openvswitch module directly. This lets developers and
CI environments run without needing too many extra dependencies - just
the pyroute2 python package.
This series enhances the ovs-dpctl utility to provide support for set()
and tunnel() flow specifiers, better ipv6 handling support, and the
ability to add tunnel vports, and LWT interfaces. Finally, it modifies
the pmtu.sh script to call the ovs-dpctl.py utility rather than the
typical OVS userspace utilities.
NOTE: This could also be applied as-is. I'm trying to get the vng test
working in my environment, so I submitted as RFC because I didn't
get to test with the config change in 7/7.
Aaron Conole (6):
selftests: openvswitch: Support explicit tunnel port creation.
selftests: openvswitch: Refactor actions parsing.
selftests: openvswitch: Add set() and set_masked() support.
selftests: openvswitch: Add support for tunnel() key.
selftests: openvswitch: Support implicit ipv6 arguments.
selftests: net: Use the provided dpctl rather than the vswitchd for
tests.
selftests: net: add config for openvswitch
.../selftests/net/openvswitch/ovs-dpctl.py | 370 +++++++++++++++---
tools/testing/selftests/net/config | 5 ++++
tools/testing/selftests/net/pmtu.sh | 87 +++-
3 files changed, 394 insertions(+), 68 deletions(-)
--
2.45.1
On 6/14/24 12:06 PM, Jason A. Donenfeld wrote:
> Hook up the generic vDSO implementation to the x86 vDSO data page. Since
> the existing vDSO infrastructure is heavily based on the timekeeping
> functionality, which works over arrays of bases, a new macro is
> introduced for vvars that are not arrays.
>
> The vDSO function requires a ChaCha20 implementation that does not write
> to the stack, yet can still do an entire ChaCha20 permutation, so
> provide this using SSE2, since this is userland code that must work on
> all x86-64 processors. There's a simple test for this code as well.
>
> Reviewed-by: Samuel Neves <sneves(a)dei.uc.pt> # for vgetrandom-chacha.S
> Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
> ---
> arch/x86/Kconfig | 1 +
> arch/x86/entry/vdso/Makefile | 3 +-
> arch/x86/entry/vdso/vdso.lds.S | 2 +
> arch/x86/entry/vdso/vgetrandom-chacha.S | 178 ++++++++++++++++++
> arch/x86/entry/vdso/vgetrandom.c | 17 ++
> arch/x86/include/asm/vdso/getrandom.h | 55 ++++++
> arch/x86/include/asm/vdso/vsyscall.h | 2 +
> arch/x86/include/asm/vvar.h | 16 ++
> tools/testing/selftests/vDSO/.gitignore | 1 +
> tools/testing/selftests/vDSO/Makefile | 13 ++
> .../testing/selftests/vDSO/vdso_test_chacha.c | 43 +++++
Hi Jason,
This is a large patch, so it might be helpful to split out the selftests
into their own patch. In fact, my comments here are only about those.
I'm adding linux-kselftest to Cc for visibility, and I've also Cc'd you
on a related selftests/vDSO series I just now posted [1].
In fact, I think it might work well if you insert patches 2/3 and 3/3
from that series, and build on top of those for the
selftests/vDSO/Makefile. See below for details.
...
> diff --git a/tools/testing/selftests/vDSO/Makefile b/tools/testing/selftests/vDSO/Makefile
> index a33b4d200a32..8b87ebea1630 100644
> --- a/tools/testing/selftests/vDSO/Makefile
> +++ b/tools/testing/selftests/vDSO/Makefile
> @@ -3,6 +3,7 @@ include ../lib.mk
>
> uname_M := $(shell uname -m 2>/dev/null || echo not)
> ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/)
> +SODIUM := $(shell pkg-config --libs libsodium 2>/dev/null)
>
> TEST_GEN_PROGS := $(OUTPUT)/vdso_test_gettimeofday $(OUTPUT)/vdso_test_getcpu
> TEST_GEN_PROGS += $(OUTPUT)/vdso_test_abi
> @@ -12,9 +13,15 @@ TEST_GEN_PROGS += $(OUTPUT)/vdso_standalone_test_x86
> endif
> TEST_GEN_PROGS += $(OUTPUT)/vdso_test_correctness
> TEST_GEN_PROGS += $(OUTPUT)/vdso_test_getrandom
> +ifeq ($(uname_M),x86_64)
> +ifneq ($(SODIUM),)
> +TEST_GEN_PROGS += $(OUTPUT)/vdso_test_chacha
Unfortunately, this is "pre-existing wrong". :) That is, it is following
a pre-existing pattern that is broken: the $(OUTPUT) is not supposed to
be part of TEST_GEN_PROGS. Fixing it requires other changes, though, as
I've done in [2].
[1] https://lore.kernel.org/all/20240614233105.265009-1-jhubbard@nvidia.com/
[2] https://lore.kernel.org/all/20240614233105.265009-3-jhubbard@nvidia.com/
[3] https://lore.kernel.org/all/20240614233105.265009-4-jhubbard@nvidia.com/
> +endif
> +endif
>
> CFLAGS := -std=gnu99
> CFLAGS_vdso_standalone_test_x86 := -nostdlib -fno-asynchronous-unwind-tables -fno-stack-protector
> +CFLAGS_vdso_test_chacha := $(SODIUM) -idirafter $(top_srcdir)/include -idirafter $(top_srcdir)/arch/$(ARCH)/include -idirafter include -D__ASSEMBLY__ -DBULID_VDSO -DCONFIG_FUNCTION_ALIGNMENT=0 -Wa,--noexecstack
Line breaks via "\" are allowed in Makefiles. Might need two or three here.
> LDFLAGS_vdso_test_correctness := -ldl
> ifeq ($(CONFIG_X86_32),y)
> LDLIBS += -lgcc_s
> @@ -35,3 +42,9 @@ $(OUTPUT)/vdso_test_correctness: vdso_test_correctness.c
> -o $@ \
> $(LDFLAGS_vdso_test_correctness)
> $(OUTPUT)/vdso_test_getrandom: parse_vdso.c
> +$(OUTPUT)/vdso_test_chacha: CFLAGS += $(CFLAGS_vdso_test_chacha)
This also follows an unfortunate pattern, which I've also fixed just today
in a patch [3]. Please just add to CFLAGS directly, rather than creating
these name-mangled intermediate variables. See [3] for how that looks.
> +$(OUTPUT)/vdso_test_chacha: $(top_srcdir)/arch/$(ARCH)/entry/vdso/vgetrandom-chacha.S
> +$(OUTPUT)/vdso_test_chacha: include/asm/rwonce.h
> +include/asm/rwonce.h:
> + mkdir -p include/asm
> + touch $@
thanks,
--
John Hubbard
NVIDIA
Correctable memory errors are very common on servers with large
amount of memory, and are corrected by ECC, but with two
pain points to users:
1. Correction usually happens on the fly and adds latency overhead
2. Not-fully-proved theory states excessive correctable memory
errors can develop into uncorrectable memory error.
Soft offline is kernel's additional solution for memory pages
having (excessive) corrected memory errors. Impacted page is migrated
to healthy page if it is in use, then the original page is discarded
for any future use.
The actual policy on whether (and when) to soft offline should be
maintained by userspace, especially in case of an 1G HugeTLB page.
Soft-offline dissolves the HugeTLB page, either in-use or free, into
chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage.
If userspace has not acknowledged such behavior, it may be surprised
when later mmap hugepages MAP_FAILED due to lack of hugepages.
In case of a transparent hugepage, it will be split into 4K pages
as well; userspace will stop enjoying the transparent performance.
In addition, discarding the entire 1G HugeTLB page only because of
corrected memory errors sounds very costly and kernel better not
doing under the hood. But today there are at least 2 such cases:
1. GHES driver sees both GHES_SEV_CORRECTED and
CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER.
2. RAS Correctable Errors Collector counts correctable errors per
PFN and when the counter for a PFN reaches threshold
In both cases, userspace has no control of the soft offline performed
by kernel's memory failure recovery.
This patch series give userspace the control of softofflining any page:
kernel only soft offlines raw page / transparent hugepage / HugeTLB
hugepage if userspace has agreed to. The interface to userspace is a
new sysctl called enable_soft_offline under /proc/sys/vm. By default
enable_soft_line is 1 to preserve existing behavior in kernel.
Changelog
v1 => v2:
* incorporate feedbacks from both Miaohe Lin <linmiaohe(a)huawei.com> and
Jane Chu <jane.chu(a)oracle.com>.
* make the switch to control all pages, instead of HugeTLB specific.
* change the API from
/sys/kernel/mm/hugepages/hugepages-${size}kB/softoffline_corrected_errors
to /proc/sys/vm/enable_soft_offline.
* minor update to test code.
* update documentation of the user control API.
* v2 is based on commit 83a7eefedc9b ("Linux 6.10-rc3").
Jiaqi Yan (3):
mm/memory-failure: userspace controls soft-offlining pages
selftest/mm: test enable_soft_offline behaviors
docs: mm: add enable_soft_offline sysctl
Documentation/admin-guide/sysctl/vm.rst | 15 +
mm/memory-failure.c | 16 ++
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/hugetlb-soft-offline.c | 258 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
6 files changed, 295 insertions(+)
create mode 100644 tools/testing/selftests/mm/hugetlb-soft-offline.c
--
2.45.2.505.gda0bf45e8d-goog
From: John Hubbard <jhubbard(a)nvidia.com>
[ Upstream commit cb708ab9f584f159798b60853edcf0c8b67ce295 ]
It's slightly better to set _GNU_SOURCE in the source code, but if one
must do it via the compiler invocation, then the best way to do so is
this:
$(CC) -D_GNU_SOURCE=
...because otherwise, if this form is used:
$(CC) -D_GNU_SOURCE
...then that leads the compiler to set a value, as if you had passed in:
$(CC) -D_GNU_SOURCE=1
That, in turn, leads to warnings under both gcc and clang, like this:
futex_requeue_pi.c:20: warning: "_GNU_SOURCE" redefined
Fix this by using the "-D_GNU_SOURCE=" form.
Reviewed-by: Edward Liaw <edliaw(a)google.com>
Reviewed-by: Davidlohr Bueso <dave(a)stgolabs.net>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/futex/functional/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile
index a392d0917b4e5..994fa3468f170 100644
--- a/tools/testing/selftests/futex/functional/Makefile
+++ b/tools/testing/selftests/futex/functional/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
INCLUDES := -I../include -I../../ $(KHDR_INCLUDES)
-CFLAGS := $(CFLAGS) -g -O2 -Wall -D_GNU_SOURCE -pthread $(INCLUDES) $(KHDR_INCLUDES)
+CFLAGS := $(CFLAGS) -g -O2 -Wall -D_GNU_SOURCE= -pthread $(INCLUDES) $(KHDR_INCLUDES)
LDLIBS := -lpthread -lrt
LOCAL_HDRS := \
--
2.43.0
From: John Hubbard <jhubbard(a)nvidia.com>
[ Upstream commit cb708ab9f584f159798b60853edcf0c8b67ce295 ]
It's slightly better to set _GNU_SOURCE in the source code, but if one
must do it via the compiler invocation, then the best way to do so is
this:
$(CC) -D_GNU_SOURCE=
...because otherwise, if this form is used:
$(CC) -D_GNU_SOURCE
...then that leads the compiler to set a value, as if you had passed in:
$(CC) -D_GNU_SOURCE=1
That, in turn, leads to warnings under both gcc and clang, like this:
futex_requeue_pi.c:20: warning: "_GNU_SOURCE" redefined
Fix this by using the "-D_GNU_SOURCE=" form.
Reviewed-by: Edward Liaw <edliaw(a)google.com>
Reviewed-by: Davidlohr Bueso <dave(a)stgolabs.net>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/futex/functional/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile
index a392d0917b4e5..994fa3468f170 100644
--- a/tools/testing/selftests/futex/functional/Makefile
+++ b/tools/testing/selftests/futex/functional/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
INCLUDES := -I../include -I../../ $(KHDR_INCLUDES)
-CFLAGS := $(CFLAGS) -g -O2 -Wall -D_GNU_SOURCE -pthread $(INCLUDES) $(KHDR_INCLUDES)
+CFLAGS := $(CFLAGS) -g -O2 -Wall -D_GNU_SOURCE= -pthread $(INCLUDES) $(KHDR_INCLUDES)
LDLIBS := -lpthread -lrt
LOCAL_HDRS := \
--
2.43.0
From: John Hubbard <jhubbard(a)nvidia.com>
[ Upstream commit cb708ab9f584f159798b60853edcf0c8b67ce295 ]
It's slightly better to set _GNU_SOURCE in the source code, but if one
must do it via the compiler invocation, then the best way to do so is
this:
$(CC) -D_GNU_SOURCE=
...because otherwise, if this form is used:
$(CC) -D_GNU_SOURCE
...then that leads the compiler to set a value, as if you had passed in:
$(CC) -D_GNU_SOURCE=1
That, in turn, leads to warnings under both gcc and clang, like this:
futex_requeue_pi.c:20: warning: "_GNU_SOURCE" redefined
Fix this by using the "-D_GNU_SOURCE=" form.
Reviewed-by: Edward Liaw <edliaw(a)google.com>
Reviewed-by: Davidlohr Bueso <dave(a)stgolabs.net>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/futex/functional/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile
index a392d0917b4e5..994fa3468f170 100644
--- a/tools/testing/selftests/futex/functional/Makefile
+++ b/tools/testing/selftests/futex/functional/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
INCLUDES := -I../include -I../../ $(KHDR_INCLUDES)
-CFLAGS := $(CFLAGS) -g -O2 -Wall -D_GNU_SOURCE -pthread $(INCLUDES) $(KHDR_INCLUDES)
+CFLAGS := $(CFLAGS) -g -O2 -Wall -D_GNU_SOURCE= -pthread $(INCLUDES) $(KHDR_INCLUDES)
LDLIBS := -lpthread -lrt
LOCAL_HDRS := \
--
2.43.0
From: Michael Ellerman <mpe(a)ellerman.id.au>
[ Upstream commit e8b8c5264d4ebd248f60a5cef077fe615806e7a0 ]
Fix build error on ppc64:
dev_in_maps.c: In function ‘get_file_dev_and_inode’:
dev_in_maps.c:60:59: error: format ‘%llu’ expects argument of type
‘long long unsigned int *’, but argument 7 has type ‘__u64 *’ {aka ‘long
unsigned int *’} [-Werror=format=]
By switching to unsigned long long for u64 for ppc64 builds.
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c b/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
index 759f86e7d263e..2862aae58b79a 100644
--- a/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
+++ b/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
#define _GNU_SOURCE
+#define __SANE_USERSPACE_TYPES__ // Use ll64
#include <inttypes.h>
#include <unistd.h>
--
2.43.0
Hi guys,
I'm trying to enable migration from MtCollins(Ampere Altra, ARMv8.2+) to
AmpereOne(AmpereOne, ARMv8.6+), the migration always fails when migration from
MtCollins to AmpereOne due to some register fields differing between the
two machines.
In this patch series, we try to make more register fields writable like
ID_AA64PFR1_EL1.BT. This is first step towards making the migration possible.
Some other hurdles need to be overcome. This is not sufficient to make the
migration successful from MtCollins to AmpereOne.
Shaoqin Huang (2):
KVM: arm64: Allow BT field in ID_AA64PFR1_EL1 writable
KVM: selftests: aarch64: Add writable test for ID_AA64PFR1_EL1
arch/arm64/kvm/sys_regs.c | 2 +-
tools/testing/selftests/kvm/aarch64/set_id_regs.c | 6 ++++++
2 files changed, 7 insertions(+), 1 deletion(-)
--
2.40.1
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v3:
- rename start_client to client_socket
- Use connect_to_addr in connect_to_fd_opt
v2:
- update patch 2, extract a new helper start_client.
- drop patch 3, keep must_fail in network_helper_opts.
Drop type and noconnect from network_helper_opts. And use start_server_str
in mptcp and test_tcp_check_syncookie_user.
Patches 1-4 address Martin's comments in the previous series.
Geliang Tang (6):
selftests/bpf: Drop type from network_helper_opts
selftests/bpf: Use connect_to_addr in connect_to_fd_opt
selftests/bpf: Add client_socket helper
selftests/bpf: Drop noconnect from network_helper_opts
selftests/bpf: Use start_server_str in mptcp
selftests/bpf: Use start_server_str in test_tcp_check_syncookie_user
tools/testing/selftests/bpf/network_helpers.c | 100 ++++++++----------
tools/testing/selftests/bpf/network_helpers.h | 6 +-
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 2 +-
.../selftests/bpf/prog_tests/cgroup_v1v2.c | 4 +-
.../bpf/prog_tests/ip_check_defrag.c | 10 +-
.../testing/selftests/bpf/prog_tests/mptcp.c | 7 +-
.../bpf/test_tcp_check_syncookie_user.c | 29 +----
7 files changed, 56 insertions(+), 102 deletions(-)
--
2.43.0
Hi,
this is a v3 patch set as a follow up of the thread about the errors
reported by kselftest mixer-test. It changes HD-audio and vmaster
control behavior to return -EINVAL for invalid input values.
There is a change in kselftest itself to skip the verification after
write tests for volatile controls, too. It's for the channel map
controls that can't hold the stable values.
v2->v3:
* Replace with Mark's patch for kselftest
* Apply the validation for user controls in put callback instead
v1->v2:
* Skip only verification after write in kselftest
* Add sanity check to HDMI chmap write, too
v2: https://lore.kernel.org/r/20240614153717.30143-1-tiwai@suse.de
v1: https://lore.kernel.org/r/20240614124728.27901-1-tiwai@suse.de
Takashi
===
Mark Brown (1):
kselftest/alsa: Fix validation of writes to volatile controls
Takashi Iwai (5):
ALSA: vmaster: Return error for invalid input values
ALSA: hda: Return -EINVAL for invalid volume/switch inputs
ALSA: control: Apply sanity check of input values for user elements
ALSA: chmap: Mark Channel Map controls as volatile
ALSA: hda: Add input value sanity checks to HDMI channel map controls
sound/core/control.c | 6 ++-
sound/core/pcm_lib.c | 1 +
sound/core/vmaster.c | 8 ++++
sound/hda/hdmi_chmap.c | 18 +++++++++
sound/pci/hda/hda_codec.c | 23 +++++++++---
tools/testing/selftests/alsa/mixer-test.c | 45 +++++++++++++++--------
6 files changed, 79 insertions(+), 22 deletions(-)
--
2.43.0
When validating writes to controls we check that whatever value we wrote
actually appears in the control. For volatile controls we cannot assume
that this will be the case, the value may be changed at any time
including between our write and read. Handle this by moving the check
for volatile controls that we currently do for events to a separate
block and just verifying that whatever value we read is valid for the
control.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/alsa/mixer-test.c | 45 ++++++++++++++++++++-----------
1 file changed, 29 insertions(+), 16 deletions(-)
diff --git a/tools/testing/selftests/alsa/mixer-test.c b/tools/testing/selftests/alsa/mixer-test.c
index 1c04e5f638a0..dd74f8cc7ece 100644
--- a/tools/testing/selftests/alsa/mixer-test.c
+++ b/tools/testing/selftests/alsa/mixer-test.c
@@ -625,6 +625,21 @@ static int write_and_verify(struct ctl_data *ctl,
return err;
}
+ /*
+ * We can't verify any specific value for volatile controls
+ * but we should still check that whatever we read is a valid
+ * vale for the control.
+ */
+ if (snd_ctl_elem_info_is_volatile(ctl->info)) {
+ if (!ctl_value_valid(ctl, read_val)) {
+ ksft_print_msg("Volatile control %s has invalid value\n",
+ ctl->name);
+ return -EINVAL;
+ }
+
+ return 0;
+ }
+
/*
* Check for an event if the value changed, or confirm that
* there was none if it didn't. We rely on the kernel
@@ -632,22 +647,20 @@ static int write_and_verify(struct ctl_data *ctl,
* write, this is currently true, should that ever change this
* will most likely break and need updating.
*/
- if (!snd_ctl_elem_info_is_volatile(ctl->info)) {
- err = wait_for_event(ctl, 0);
- if (snd_ctl_elem_value_compare(initial_val, read_val)) {
- if (err < 1) {
- ksft_print_msg("No event generated for %s\n",
- ctl->name);
- show_values(ctl, initial_val, read_val);
- ctl->event_missing++;
- }
- } else {
- if (err != 0) {
- ksft_print_msg("Spurious event generated for %s\n",
- ctl->name);
- show_values(ctl, initial_val, read_val);
- ctl->event_spurious++;
- }
+ err = wait_for_event(ctl, 0);
+ if (snd_ctl_elem_value_compare(initial_val, read_val)) {
+ if (err < 1) {
+ ksft_print_msg("No event generated for %s\n",
+ ctl->name);
+ show_values(ctl, initial_val, read_val);
+ ctl->event_missing++;
+ }
+ } else {
+ if (err != 0) {
+ ksft_print_msg("Spurious event generated for %s\n",
+ ctl->name);
+ show_values(ctl, initial_val, read_val);
+ ctl->event_spurious++;
}
}
---
base-commit: 83a7eefedc9b56fe7bfeff13b6c7356688ffa670
change-id: 20240614-alsa-selftest-volatile-d6f3e8e28c08
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Hi,
this is a revised patch set as a follow up of the thread about the
errors reported by kselftest mixer-test. It changes HD-audio and
vmaster control behavior to return -EINVAL for invalid input values.
There is a change in kselftest itself to skip the verification after
write tests for volatile controls, too. It's for the channel map
controls that can't hold the stable values.
v1->v2:
* Skip only verification after write in kselftest
* Add sanity check to HDMI chmap write, too
Takashi
===
Takashi Iwai (6):
ALSA: vmaster: Return error for invalid input values
ALSA: hda: Return -EINVAL for invalid volume/switch inputs
ALSA: control: Apply sanity check of input values for user elements
kselftest/alsa: mixer-test: Skip write verification for volatile
controls
ALSA: chmap: Mark Channel Map controls as volatile
ALSA: hda: Add input value sanity checks to HDMI channel map controls
sound/core/control.c | 3 ++-
sound/core/pcm_lib.c | 1 +
sound/core/vmaster.c | 8 ++++++++
sound/hda/hdmi_chmap.c | 18 ++++++++++++++++++
sound/pci/hda/hda_codec.c | 23 ++++++++++++++++++-----
tools/testing/selftests/alsa/mixer-test.c | 4 ++++
6 files changed, 51 insertions(+), 6 deletions(-)
--
2.43.0
The goal of this series is to use helpers from net/lib.sh with MPTCP
selftests.
- Patches 1 to 4 are some clean-ups and preparation in net/lib.sh:
- Patch 1 simplifies the code handling errexit by ignoring possible
errors instead of disabling errexit temporary.
- Patch 2 removes the netns from the list after having cleaned it, not
to try to clean it twice.
- Patch 3 removes the 'readonly' attribute for the netns variable, to
allow using the same name in local variables.
- Patch 4 removes the local 'ns' var, not to conflict with the global
one it needs to setup.
- Patch 5 uses helpers from net/lib.sh to create and delete netns in
MPTCP selftests.
- Patch 6 uses wait_local_port_listen helper from net/net_helper.sh.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Geliang Tang (3):
selftests: net: lib: remove 'ns' var in setup_ns
selftests: mptcp: lib: use setup/cleanup_ns helpers
selftests: mptcp: lib: use wait_local_port_listen helper
Matthieu Baerts (NGI0) (3):
selftests: net: lib: ignore possible errors
selftests: net: lib: remove ns from list after clean-up
selftests: net: lib: do not set ns var as readonly
tools/testing/selftests/net/lib.sh | 55 +++++++++++++++-----------
tools/testing/selftests/net/mptcp/mptcp_lib.sh | 33 +++++-----------
2 files changed, 42 insertions(+), 46 deletions(-)
---
base-commit: a999973236543f0b8f6daeaa7ecba7488c3a593b
change-id: 20240607-upstream-net-next-20240607-selftests-mptcp-net-lib-365e43e2e1ca
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
This patch addresses the TODO (add non fixed feature on/off check).
I have tested it manually on my system after making changes as suggested
in v1 and v2 linked below for reference.
Patch now restores the features being tested to their initial state.
Signed-off-by: Abhinav Jain <jain.abhinav177(a)gmail.com>
---
PATCH v2:
https://lore.kernel.org/all/20240609132124.51683-1-jain.abhinav177@gmail.co…
Changes since v2:
- Added a check for netdev if it exists.
- If netdev doesn't exist, create a veth pair for testing.
- Restore the feature being tested to its intial state.
PATCH v1:
https://lore.kernel.org/all/20240606212714.27472-1-jain.abhinav177@gmail.co…
Changes since v1:
- Removed the addition of tail command as it was not required after
below change.
- Used read to parse the temp features file rather than using for loop
and took out awk/grep/sed from v1.
---
tools/testing/selftests/net/netdevice.sh | 55 +++++++++++++++++++++++-
1 file changed, 54 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/netdevice.sh b/tools/testing/selftests/net/netdevice.sh
index e3afcb424710..d937d39dda6a 100755
--- a/tools/testing/selftests/net/netdevice.sh
+++ b/tools/testing/selftests/net/netdevice.sh
@@ -104,6 +104,20 @@ kci_netdev_ethtool()
{
netdev=$1
+ #check if netdev is provided as an argument
+ if [ -z "$netdev" ]; then
+ echo "No network device provided, creating a veth pair"
+ ip link add veth0 type veth peer name veth1
+ netdev="veth0"
+ veth_created=1
+ else
+ #check if the provided netdev exists
+ if ! ip link show "$netdev" > /dev/null 2>&1; then
+ echo "Network device $netdev does not exist."
+ return 1
+ fi
+ fi
+
#check presence of ethtool
ethtool --version 2>/dev/null >/dev/null
if [ $? -ne 0 ];then
@@ -124,11 +138,50 @@ kci_netdev_ethtool()
return 1
fi
echo "PASS: $netdev: ethtool list features"
- #TODO for each non fixed features, try to turn them on/off
+
+ while read -r FEATURE VALUE FIXED; do
+ [ "$FEATURE" != "Features" ] || continue # Skip "Features" line
+ [ "$FIXED" != "[fixed]" ] || continue # Skip fixed features
+ feature = "${FEATURE%:*}"
+
+ initial_state=$(ethtool -k "$netdev" | grep "$feature:" | awk '{print $2}')
+ ethtool --offload "$netdev" "$feature" off
+ if [ $? -eq 0 ]; then
+ echo "PASS: $netdev: Turned off feature: $feature"
+ else
+ echo "FAIL: $netdev: Failed to turn off feature: $feature"
+ fi
+
+ ethtool --offload "$netdev" "$feature" on
+ if [ $? -eq 0 ]; then
+ echo "PASS: $netdev: Turned on feature: $feature"
+ else
+ echo "FAIL: $netdev: Failed to turn on feature: $feature"
+ fi
+
+ #restore the feature to its initial state
+ ethtool --offload "$netdev" "$feature" "$initial_state"
+ if [$? -eq 0]; then
+ echo "PASS: $netdev: Restore feature $feature to" \
+ " initial state $initial_state"
+ else
+ echo "FAIL: $netdev: Failed to restore feature " \
+ "$feature to initial state $initial_state"
+ fi
+
+ done < "$TMP_ETHTOOL_FEATURES"
+
rm "$TMP_ETHTOOL_FEATURES"
kci_netdev_ethtool_test 74 'dump' "ethtool -d $netdev"
kci_netdev_ethtool_test 94 'stats' "ethtool -S $netdev"
+
+ #clean up veth interface pair if it was created
+ if ["$veth_created" ]; then
+ ip link delete veth0
+ echo "Removed veth pair"
+ fi
+
return 0
}
--
2.34.1
When building with clang, via:
make LLVM=1 -C tools/testing/selftests
...there are several warnings, and an error. This fixes all of those and
allows these tests to run and pass.
1. Fix linker error (undefined reference to memcpy) by providing a local
version of memcpy.
2. clang complains about using this form:
if (g = h & 0xf0000000)
...so factor out the assignment into a separate step.
3. The code is passing a signed const char* to elf_hash(), which expects
a const unsigned char *. There are several callers, so fix this at
the source by allowing the function to accept a signed argument, and
then converting to unsigned operations, once inside the function.
4. clang doesn't have __attribute__((externally_visible)) and generates
a warning to that effect. Fortunately, gcc 12 and gcc 13 do not seem
to require that attribute in order to build, run and pass tests here,
so remove it.
[1] https://lore.kernel.org/all/20240329-selftests-libmk-llvm-rfc-v1-1-2f9ed7d1…
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
Changes since the first version:
1) Rebased onto Linux 6.10-rc1
thanks,
John Hubbard
tools/testing/selftests/vDSO/parse_vdso.c | 16 +++++++++++-----
.../selftests/vDSO/vdso_standalone_test_x86.c | 18 ++++++++++++++++--
2 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/vDSO/parse_vdso.c b/tools/testing/selftests/vDSO/parse_vdso.c
index 413f75620a35..4ae417372e9e 100644
--- a/tools/testing/selftests/vDSO/parse_vdso.c
+++ b/tools/testing/selftests/vDSO/parse_vdso.c
@@ -55,14 +55,20 @@ static struct vdso_info
ELF(Verdef) *verdef;
} vdso_info;
-/* Straight from the ELF specification. */
-static unsigned long elf_hash(const unsigned char *name)
+/*
+ * Straight from the ELF specification...and then tweaked slightly, in order to
+ * avoid a few clang warnings.
+ */
+static unsigned long elf_hash(const char *name)
{
unsigned long h = 0, g;
- while (*name)
+ const unsigned char *uch_name = (const unsigned char *)name;
+
+ while (*uch_name)
{
- h = (h << 4) + *name++;
- if (g = h & 0xf0000000)
+ h = (h << 4) + *uch_name++;
+ g = h & 0xf0000000;
+ if (g)
h ^= g >> 24;
h &= ~g;
}
diff --git a/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c b/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
index 8a44ff973ee1..27f6fdf11969 100644
--- a/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
+++ b/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c
@@ -18,7 +18,7 @@
#include "parse_vdso.h"
-/* We need a libc functions... */
+/* We need some libc functions... */
int strcmp(const char *a, const char *b)
{
/* This implementation is buggy: it never returns -1. */
@@ -34,6 +34,20 @@ int strcmp(const char *a, const char *b)
return 0;
}
+/*
+ * The clang build needs this, although gcc does not.
+ * Stolen from lib/string.c.
+ */
+void *memcpy(void *dest, const void *src, size_t count)
+{
+ char *tmp = dest;
+ const char *s = src;
+
+ while (count--)
+ *tmp++ = *s++;
+ return dest;
+}
+
/* ...and two syscalls. This is x86-specific. */
static inline long x86_syscall3(long nr, long a0, long a1, long a2)
{
@@ -70,7 +84,7 @@ void to_base10(char *lastdig, time_t n)
}
}
-__attribute__((externally_visible)) void c_main(void **stack)
+void c_main(void **stack)
{
/* Parse the stack */
long argc = (long)*stack;
base-commit: 2bfcfd584ff5ccc8bb7acde19b42570414bf880b
--
2.45.1
The following selftests: arm64 tests failed on FVP-aemva test and kernel
built with gcc-13 but pass with clang.
arm64_fp-stress_KERNEL-1-0/3-0/4-0/6-0 - gcc-13 - Failed
arm64_fp-stress_KERNEL-1-0/3-0/4-0/6-0 - clang-18 - Pass
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Test log:
---------
# timeout set to 45
# selftests: arm64: fp-stress
[ 269.882519] NET: Registered PF_ALG protocol family
# TAP version 13
# 1..96
# # 8 CPUs, 3 SVE VLs, 3 SME VLs, SME2 present
# # Will run for 10s
# # Started FPSIMD-0-0
# # Started KERNEL-0-0
# # Started SVE-VL-64-0
...
# ok 13 FPSIMD-1-0
# # KERNEL-1-0 exited with error code 1
# not ok 14 KERNEL-1-0
# ok 15 SVE-VL-64-1
...
# ok 37 FPSIMD-3-0
# # KERNEL-3-0 exited with error code 1
# not ok 38 KERNEL-3-0
# ok 39 SVE-VL-64-3
...
# ok 49 FPSIMD-4-0
# # KERNEL-4-0 exited with error code 1
# not ok 50 KERNEL-4-0
# ok 51 SVE-VL-64-4
...
# ok 73 FPSIMD-6-0
# # KERNEL-6-0 exited with error code 1
# not ok 74 KERNEL-6-0
# ok 75 SVE-VL-64-6
...
# # Totals: pass:92 fail:4 xfail:0 xpass:0 skip:0 error:0
ok 16 selftests: arm64: fp-stress
metadata:
------
git_describe: next-20240613
git_ref: master
git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
git_sha: 6906a84c482f098d31486df8dc98cead21cce2d0
git_short_log: 6906a84c482f ("Add linux-next specific files for 20240613")
arch: arm64
test-environment: fvp-aemva
toolchain: gcc-13
Links:
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240613/te…
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240613/te…
--
Linaro LKFT
https://lkft.linaro.org
Hi,
this is a patch set as a follow up of the thread about the errors
reported by kselftest mixer-test. It changes HD-audio and vmaster
control behavior to return -EINVAL for invalid input values.
There is a change in kselftest itself to skip the write tests for
volatile controls, too. It's for the channel map controls that can't
hold the stable values.
Takashi
===
Takashi Iwai (5):
ALSA: vmaster: Return error for invalid input values
ALSA: hda: Return -EINVAL for invalid volume/switch inputs
ALSA: control: Apply sanity check of input values for user elements
kselftest/alsa: mixer-test: Skip write tests for volatile controls
ALSA: chmap: Mark Channel Map controls as volatile
sound/core/control.c | 3 ++-
sound/core/pcm_lib.c | 1 +
sound/core/vmaster.c | 8 ++++++++
sound/pci/hda/hda_codec.c | 23 ++++++++++++++++++-----
tools/testing/selftests/alsa/mixer-test.c | 21 +++++++++++++++++++++
5 files changed, 50 insertions(+), 6 deletions(-)
--
2.43.0
Define a maximum allowable number of pids that can be livepatched in
test-syscall.sh as with extremely large machines the output from a
large number of processes overflows the dev/kmsg "expect" buffer in
the "check_result" function and causes a false error.
Reported-by: CKI Project <cki-project(a)redhat.com>
Signed-off-by: Ryan Sullivan <rysulliv(a)redhat.com>
---
tools/testing/selftests/livepatch/test-syscall.sh | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/livepatch/test-syscall.sh b/tools/testing/selftests/livepatch/test-syscall.sh
index b76a881d4013..289eb7d4c4b3 100755
--- a/tools/testing/selftests/livepatch/test-syscall.sh
+++ b/tools/testing/selftests/livepatch/test-syscall.sh
@@ -15,7 +15,10 @@ setup_config
start_test "patch getpid syscall while being heavily hammered"
-for i in $(seq 1 $(getconf _NPROCESSORS_ONLN)); do
+NPROC=$(getconf _NPROCESSORS_ONLN)
+MAXPROC=128
+
+for i in $(seq 1 $(($NPROC < $MAXPROC ? $NPROC : $MAXPROC))); do
./test_klp-call_getpid &
pids[$i]="$!"
done
--
2.44.0
Dear Linux folks,
Running the ALSA kselftests with Linux 6.10-rc1, `mixer-test` shows ten
failures:
# Totals: pass:24 fail:0 xfail:0 xpass:0 skip:11 error:0
These are:
not ok 5 write_invalid.0.40
not ok 11 write_valid.0.39
not ok 18 write_valid.0.38
not ok 25 write_valid.0.37
not ok 201 write_invalid.0.12
not ok 208 write_invalid.0.11
not ok 264 write_invalid.0.3
not ok 271 write_invalid.0.2
not ok 278 write_invalid.0.1
not ok 285 write_invalid.0.0
`./pcm-test` finishes with 5 out of 5 passes.
The output `sudo alsa-info` is uploaded to the database [1], and the
full output of `mixer-test` is attached.
Kind regards,
Paul
[1]: https://alsa-project.org/db/?f=cf8e1e0f37775448b5e3361e5c3cd4d4e6037d4f
Eventually, once the build succeeds on a sufficiently old distro, the
idea is to delete $(KHDR_INCLUDES) from the selftests/mm build, and then
after that, from selftests/lib.mk and all of the other selftest builds.
For now, this series merely achieves a clean build of selftests/mm on a
not-so-old distro: Ubuntu 23.04:
1. Add __NR_mseal.
2. Add fs.h, taken as usual from a snapshot of ./usr/include/linux/fs.h
after running "make headers". This is how we have agreed to do this sort
of thing, see [1].
3. Add a few selected prctl.h values that the ksm and mdwe tests require.
[1] commit e076eaca5906 ("selftests: break the dependency upon local
header files")
John Hubbard (5):
selftests/mm: mseal, self_elf: fix missing __NR_mseal
selftests/mm: fix vm_util.c build failures: add snapshot of fs.h
mm/selftests: kvm, mdwe fixes to avoid requiring "make headers"
selftests/mm: mseal, self_elf: factor out test macros and other
duplicated items
selftests/mm: mseal, self_elf: rename TEST_END_CHECK to
REPORT_TEST_PASS
tools/include/uapi/linux/fs.h | 392 +++++++++++++++++++++
tools/testing/selftests/mm/mdwe_test.c | 1 +
tools/testing/selftests/mm/mseal_helpers.h | 45 +++
tools/testing/selftests/mm/mseal_test.c | 141 +++-----
tools/testing/selftests/mm/seal_elf.c | 35 +-
tools/testing/selftests/mm/vm_util.h | 15 +
6 files changed, 502 insertions(+), 127 deletions(-)
create mode 100644 tools/include/uapi/linux/fs.h
create mode 100644 tools/testing/selftests/mm/mseal_helpers.h
base-commit: 8a92980606e3585d72d510a03b59906e96755b8a
--
2.45.2
This patch addresses the TODO (add non fixed feature on/off check).
I have tested it manually on my system and made changes as suggested in v1
Signed-off-by: Abhinav Jain <jain.abhinav177(a)gmail.com>
---
PATCH v1:
https://lore.kernel.org/all/20240606212714.27472-1-jain.abhinav177@gmail.co…
Changes since v1:
- Removed the addition of tail command as it was not required after
below change
- Used read to parse the temp features file rather than using for loop
and took out awk/grep/sed from v1
---
tools/testing/selftests/net/netdevice.sh | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/netdevice.sh b/tools/testing/selftests/net/netdevice.sh
index e3afcb424710..196b7f985db4 100755
--- a/tools/testing/selftests/net/netdevice.sh
+++ b/tools/testing/selftests/net/netdevice.sh
@@ -124,7 +124,27 @@ kci_netdev_ethtool()
return 1
fi
echo "PASS: $netdev: ethtool list features"
- #TODO for each non fixed features, try to turn them on/off
+
+ while read -r FEATURE VALUE FIXED; do
+ [ "$FEATURE" != "Features" ] || continue # Skip "Features" line
+ [ "$FIXED" != "[fixed]" ] || continue # Skip fixed features
+ feature = "${FEATURE%:*}"
+
+ ethtool --offload "$netdev" "$feature" off
+ if [ $? -eq 0 ]; then
+ echo "PASS: $netdev: Turned off feature: $feature"
+ else
+ echo "FAIL: $netdev: Failed to turn off feature: $feature"
+ fi
+
+ ethtool --offload "$netdev" "$feature" on
+ if [ $? -eq 0 ]; then
+ echo "PASS: $netdev: Turned on feature: $feature"
+ else
+ echo "FAIL: $netdev: Failed to turn on feature: $feature"
+ fi
+ done < "$TMP_ETHTOOL_FEATURES"
+
rm "$TMP_ETHTOOL_FEATURES"
kci_netdev_ethtool_test 74 'dump' "ethtool -d $netdev"
--
2.34.1
The purpose of this series is to rethink how HID-BPF is invoked.
Currently it implies a jmp table, a prog fd bpf_map, a preloaded tracing
bpf program and a lot of manual work for handling the bpf program
lifetime and addition/removal.
OTOH, bpf_struct_ops take care of most of the bpf handling leaving us
with a simple list of ops pointers, and we can directly call the
struct_ops program from the kernel as a regular function.
The net gain right now is in term of code simplicity and lines of code
removal (though is an API breakage), but udev-hid-bpf is able to handle
such breakages.
In the near future, we will be able to extend the HID-BPF struct_ops
with entrypoints for hid_hw_raw_request() and hid_hw_output_report(),
allowing for covering all of the initial use cases:
- firewalling a HID device
- fixing all of the HID device interactions (not just device events as
it is right now).
The matching user-space loader (udev-hid-bpf) MR is at
https://gitlab.freedesktop.org/libevdev/udev-hid-bpf/-/merge_requests/86
I'll put it out of draft once this is merged.
Cheers,
Benjamin
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Changes in v3:
- took Alexei's review into account
- Link to v2: https://lore.kernel.org/r/20240607-hid_bpf_struct_ops-v2-0-3f95f4d02292@ker…
Changes in v2:
- drop HID_BPF_FLAGS enum and use BPF_F_BEFORE instead
- fix .init_members to not open code member->offset
- allow struct hid_device to be writeable from HID-BPF for its name,
uniq and phys
- Link to v1: https://lore.kernel.org/r/20240528-hid_bpf_struct_ops-v1-0-8c6663df27d8@ker…
---
Benjamin Tissoires (16):
HID: rename struct hid_bpf_ops into hid_ops
HID: bpf: add hid_get/put_device() helpers
HID: bpf: implement HID-BPF through bpf_struct_ops
selftests/hid: convert the hid_bpf selftests with struct_ops
HID: samples: convert the 2 HID-BPF samples into struct_ops
HID: bpf: add defines for HID-BPF SEC in in-tree bpf fixes
HID: bpf: convert in-tree fixes into struct_ops
HID: bpf: remove tracing HID-BPF capability
selftests/hid: add subprog call test
Documentation: HID: amend HID-BPF for struct_ops
Documentation: HID: add a small blurb on udev-hid-bpf
HID: bpf: Artist24: remove unused variable
HID: bpf: error on warnings when compiling bpf objects
bpf: allow bpf helpers to be used into HID-BPF struct_ops
HID: bpf: rework hid_bpf_ops_btf_struct_access
HID: bpf: make part of struct hid_device writable
Documentation/hid/hid-bpf.rst | 173 ++++---
drivers/hid/bpf/Makefile | 2 +-
drivers/hid/bpf/entrypoints/Makefile | 93 ----
drivers/hid/bpf/entrypoints/README | 4 -
drivers/hid/bpf/entrypoints/entrypoints.bpf.c | 25 -
drivers/hid/bpf/entrypoints/entrypoints.lskel.h | 248 ---------
drivers/hid/bpf/hid_bpf_dispatch.c | 266 +++-------
drivers/hid/bpf/hid_bpf_dispatch.h | 12 +-
drivers/hid/bpf/hid_bpf_jmp_table.c | 565 ---------------------
drivers/hid/bpf/hid_bpf_struct_ops.c | 298 +++++++++++
drivers/hid/bpf/progs/FR-TEC__Raptor-Mach-2.bpf.c | 9 +-
drivers/hid/bpf/progs/HP__Elite-Presenter.bpf.c | 6 +-
drivers/hid/bpf/progs/Huion__Kamvas-Pro-19.bpf.c | 9 +-
.../hid/bpf/progs/IOGEAR__Kaliber-MMOmentum.bpf.c | 6 +-
drivers/hid/bpf/progs/Makefile | 2 +-
.../hid/bpf/progs/Microsoft__XBox-Elite-2.bpf.c | 6 +-
drivers/hid/bpf/progs/Wacom__ArtPen.bpf.c | 6 +-
drivers/hid/bpf/progs/XPPen__Artist24.bpf.c | 10 +-
drivers/hid/bpf/progs/XPPen__ArtistPro16Gen2.bpf.c | 24 +-
drivers/hid/bpf/progs/hid_bpf.h | 5 +
drivers/hid/hid-core.c | 6 +-
include/linux/hid_bpf.h | 119 +++--
samples/hid/Makefile | 5 +-
samples/hid/hid_bpf_attach.bpf.c | 18 -
samples/hid/hid_bpf_attach.h | 14 -
samples/hid/hid_mouse.bpf.c | 26 +-
samples/hid/hid_mouse.c | 39 +-
samples/hid/hid_surface_dial.bpf.c | 10 +-
samples/hid/hid_surface_dial.c | 53 +-
tools/testing/selftests/hid/hid_bpf.c | 100 +++-
tools/testing/selftests/hid/progs/hid.c | 100 +++-
.../testing/selftests/hid/progs/hid_bpf_helpers.h | 19 +-
32 files changed, 800 insertions(+), 1478 deletions(-)
---
base-commit: 70ec81c2e2b4005465ad0d042e90b36087c36104
change-id: 20240513-hid_bpf_struct_ops-e3212a224555
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v2:
- update patch 2, extract a new helper start_client.
- drop patch 3, keep must_fail in network_helper_opts.
Drop type and noconnect from network_helper_opts. And use start_server_str
in mptcp and test_tcp_check_syncookie_user.
Patches 1-2 address Martin's comments in the previous series.
Geliang Tang (4):
selftests/bpf: Drop type from network_helper_opts
selftests/bpf: Drop noconnect from network_helper_opts
selftests/bpf: Use start_server_str in mptcp
selftests/bpf: Use start_server_str in test_tcp_check_syncookie_user
tools/testing/selftests/bpf/network_helpers.c | 45 +++++++++++++++----
tools/testing/selftests/bpf/network_helpers.h | 7 +--
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 2 +-
.../selftests/bpf/prog_tests/cgroup_v1v2.c | 4 +-
.../bpf/prog_tests/ip_check_defrag.c | 7 +--
.../testing/selftests/bpf/prog_tests/mptcp.c | 7 +--
.../bpf/test_tcp_check_syncookie_user.c | 29 ++----------
7 files changed, 49 insertions(+), 52 deletions(-)
--
2.43.0
This series adds two new features required to be able to run the test on
more platforms on KernelCI.
The first patch adds a parameter to allow overriding the directory in
which the board files will be looked for. Since the board files are
hosted in a separate repository [1], this parameter allows overlaying
those files on the filesystem and passing the location to the test.
The second is needed for one platform in particular, MT8195, in which
the usb controllers are instanced from a two-level deep DT node that
doesn't allow unique matching based on the existing properties.
[1] https://github.com/kernelci/platform-test-parameters
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
---
Nícolas F. R. A. Prado (2):
kselftest: devices: Allow specifying boards directory through parameter
kselftest: devices: Add of-fullname-regex property
.../selftests/devices/boards/google,spherion.yaml | 4 +++
.../selftests/devices/test_discoverable_devices.py | 37 +++++++++++++++++++++-
2 files changed, 40 insertions(+), 1 deletion(-)
---
base-commit: d97496ca23a2d4ee80b7302849404859d9058bcd
change-id: 20240612-kselftest-discoverable-probe-mt8195-kci-ca21742776d0
Best regards,
--
Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
xtheadvector is a custom extension that is based upon riscv vector
version 0.7.1 [1]. All of the vector routines have been modified to
support this alternative vector version based upon whether xtheadvector
was determined to be supported at boot.
vlenb is not supported on the existing xtheadvector hardware, so a
devicetree property thead,vlenb is added to provide the vlenb to Linux.
There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 that is
used to request which thead vendor extensions are supported on the
current platform. This allows future vendors to allocate hwprobe keys
for their vendor.
Support for xtheadvector is also added to the vector kselftests.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
[1] https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361…
---
This series is a continuation of a different series that was fragmented
into two other series in an attempt to get part of it merged in the 6.10
merge window. The split-off series did not get merged due to a NAK on
the series that added the generic riscv,vlenb devicetree entry. This
series has converted riscv,vlenb to thead,vlenb to remedy this issue.
The original series is titled "riscv: Support vendor extensions and
xtheadvector" [3].
The series titled "riscv: Extend cpufeature.c to detect vendor
extensions" is still under development and this series is based on that
series! [4]
I have tested this with an Allwinner Nezha board. I ran into issues
booting the board after 6.9-rc1 so I applied these patches to 6.8. There
are a couple of minor merge conflicts that do arrise when doing that, so
please let me know if you have been able to boot this board with a 6.9
kernel. I used SkiffOS [1] to manage building the image, but upgraded
the U-Boot version to Samuel Holland's more up-to-date version [2] and
changed out the device tree used by U-Boot with the device trees that
are present in upstream linux and this series. Thank you Samuel for all
of the work you did to make this task possible.
[1] https://github.com/skiffos/SkiffOS/tree/master/configs/allwinner/nezha
[2] https://github.com/smaeul/u-boot/commit/2e89b706f5c956a70c989cd31665f1429e9…
[3] https://lore.kernel.org/all/20240503-dev-charlie-support_thead_vector_6_9-v…
[4] https://lore.kernel.org/linux-riscv/20240609-support_vendor_extensions-v2-0…
---
Changes in v2:
- Removed extraneous references to "riscv,vlenb" (Jess)
- Moved declaration of "thead,vlenb" into cpus.yaml and added
restriction that it's only applicable to thead cores (Conor)
- Check CONFIG_RISCV_ISA_XTHEADVECTOR instead of CONFIG_RISCV_ISA_V for
thead,vlenb (Jess)
- Fix naming of hwprobe variables (Evan)
- Link to v1: https://lore.kernel.org/r/20240609-xtheadvector-v1-0-3fe591d7f109@rivosinc.…
---
Charlie Jenkins (12):
dt-bindings: riscv: Add xtheadvector ISA extension description
dt-bindings: cpus: add a thead vlen register length property
riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
riscv: Add thead and xtheadvector as a vendor extension
riscv: vector: Use vlenb from DT for thead
riscv: csr: Add CSR encodings for VCSR_VXRM/VCSR_VXSAT
riscv: Add xtheadvector instruction definitions
riscv: vector: Support xtheadvector save/restore
riscv: hwprobe: Add thead vendor extension probing
riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
selftests: riscv: Fix vector tests
selftests: riscv: Support xtheadvector in vector tests
Heiko Stuebner (1):
RISC-V: define the elements of the VCSR vector CSR
Documentation/arch/riscv/hwprobe.rst | 10 +
Documentation/devicetree/bindings/riscv/cpus.yaml | 19 ++
.../devicetree/bindings/riscv/extensions.yaml | 10 +
arch/riscv/Kconfig.vendor | 26 ++
arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi | 3 +-
arch/riscv/include/asm/cpufeature.h | 2 +
arch/riscv/include/asm/csr.h | 13 +
arch/riscv/include/asm/hwprobe.h | 4 +-
arch/riscv/include/asm/switch_to.h | 2 +-
arch/riscv/include/asm/vector.h | 249 +++++++++++++----
arch/riscv/include/asm/vendor_extensions/thead.h | 42 +++
.../include/asm/vendor_extensions/thead_hwprobe.h | 18 ++
.../include/asm/vendor_extensions/vendor_hwprobe.h | 37 +++
arch/riscv/include/uapi/asm/hwprobe.h | 3 +-
arch/riscv/include/uapi/asm/vendor/thead.h | 3 +
arch/riscv/kernel/cpufeature.c | 51 +++-
arch/riscv/kernel/kernel_mode_vector.c | 8 +-
arch/riscv/kernel/process.c | 4 +-
arch/riscv/kernel/signal.c | 6 +-
arch/riscv/kernel/sys_hwprobe.c | 5 +
arch/riscv/kernel/vector.c | 25 +-
arch/riscv/kernel/vendor_extensions.c | 10 +
arch/riscv/kernel/vendor_extensions/Makefile | 2 +
arch/riscv/kernel/vendor_extensions/thead.c | 18 ++
.../riscv/kernel/vendor_extensions/thead_hwprobe.c | 19 ++
tools/testing/selftests/riscv/vector/.gitignore | 3 +-
tools/testing/selftests/riscv/vector/Makefile | 17 +-
.../selftests/riscv/vector/v_exec_initval_nolibc.c | 93 +++++++
tools/testing/selftests/riscv/vector/v_helpers.c | 67 +++++
tools/testing/selftests/riscv/vector/v_helpers.h | 7 +
tools/testing/selftests/riscv/vector/v_initval.c | 22 ++
.../selftests/riscv/vector/v_initval_nolibc.c | 68 -----
.../selftests/riscv/vector/vstate_exec_nolibc.c | 20 +-
.../testing/selftests/riscv/vector/vstate_prctl.c | 295 ++++++++++++---------
34 files changed, 910 insertions(+), 271 deletions(-)
---
base-commit: 11cc01d4d2af304b7288251aad7e03315db8dffc
change-id: 20240530-xtheadvector-833d3d17b423
--
- Charlie
This patch series is motivated by the following observation:
Raise a signal, jump to signal handler. The ucontext_t structure dumped
by kernel to userspace has a uc_sigmask field having the mask of blocked
signals. If you run a fresh minimalistic program doing this, this field
is empty, even if you block some signals while registering the handler
with sigaction().
Here is what the man-pages have to say:
sigaction(2): "sa_mask specifies a mask of signals which should be blocked
(i.e., added to the signal mask of the thread in which the signal handler
is invoked) during execution of the signal handler. In addition, the
signal which triggered the handler will be blocked, unless the SA_NODEFER
flag is used."
signal(7): Under "Execution of signal handlers", (1.3) implies:
"The thread's current signal mask is accessible via the ucontext_t
object that is pointed to by the third argument of the signal handler."
But, (1.4) states:
"Any signals specified in act->sa_mask when registering the handler with
sigprocmask(2) are added to the thread's signal mask. The signal being
delivered is also added to the signal mask, unless SA_NODEFER was
specified when registering the handler. These signals are thus blocked
while the handler executes."
There clearly is no distinction being made in the man pages between
"Thread's signal mask" and ucontext_t; this logically should imply
that a signal blocked by populating struct sigaction should be visible
in ucontext_t.
Here is what the kernel code does (for Aarch64):
do_signal() -> handle_signal() -> sigmask_to_save(), which returns
¤t->blocked, is passed to setup_rt_frame() -> setup_sigframe() ->
__copy_to_user(). Hence, ¤t->blocked is copied to ucontext_t
exposed to userspace. Returning back to handle_signal(),
signal_setup_done() -> signal_delivered() -> sigorsets() and
set_current_blocked() are responsible for using information from
struct ksignal ksig, which was populated through the sigaction()
system call in kernel/signal.c:
copy_from_user(&new_sa.sa, act, sizeof(new_sa.sa)),
to update ¤t->blocked; hence, the set of blocked signals for the
current thread is updated AFTER the kernel dumps ucontext_t to
userspace.
Assuming that the above is indeed the intended behaviour, because it
semantically makes sense, since the signals blocked using sigaction()
remain blocked only till the execution of the handler, and not in the
context present before jumping to the handler (but nothing can be
confirmed from the man-pages), the series introduces a test for
mangling with uc_sigmask. I will send a separate series to fix the
man-pages.
The proposed selftest has been tested out on Aarch32, Aarch64 and x86_64.
Changes in v2:
- Replace all occurrences of SIGPIPE with SIGSEGV
- Fixed a mismatch between code comment and ksft log
- Add a testcase: Raise the same signal again; it must not be queued
- Remove unneeded <assert.h>, <unistd.h>
- Give a detailed test description in the comments; also describe the
exact meaning of delivered and blocked
- Handle errors for all libc functions/syscalls
- Mention tests in Makefile and .gitignore in alphabetical order
Dev Jain (2):
selftests: Rename sigaltstack to generic signal
selftests: Add a test mangling with uc_sigmask
tools/testing/selftests/Makefile | 2 +-
.../{sigaltstack => signal}/.gitignore | 3 +-
.../{sigaltstack => signal}/Makefile | 3 +-
.../current_stack_pointer.h | 0
.../selftests/signal/mangle_uc_sigmask.c | 194 ++++++++++++++++++
.../sas.c => signal/sigaltstack.c} | 0
6 files changed, 199 insertions(+), 3 deletions(-)
rename tools/testing/selftests/{sigaltstack => signal}/.gitignore (57%)
rename tools/testing/selftests/{sigaltstack => signal}/Makefile (53%)
rename tools/testing/selftests/{sigaltstack => signal}/current_stack_pointer.h (100%)
create mode 100644 tools/testing/selftests/signal/mangle_uc_sigmask.c
rename tools/testing/selftests/{sigaltstack/sas.c => signal/sigaltstack.c} (100%)
--
2.34.1
v11: https://patchwork.kernel.org/project/netdevbpf/list/?series=857457&state=*
====
Major Changes:
--------------
v11 addresses feedback received in v10. The major change is the removal
of the memory provider ops as requested by Christoph. We still
accomplish the same thing, but utilizing direct function calls with if
statements rather than generic ops.
Additionally address sparse warnings, bugs and review comments from
folks that reviewed.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v11/
Detailed changelog:
-------------------
- Fixes in netdev_rx_queue_restart() from Pavel & David.
- Remove commit e650e8c3a36f5 ("net: page_pool: create hooks for
custom page providers") from the series to address Christoph's
feedback and rebased other patches on the series on this change.
- Fixed build errors with CONFIG_DMA_SHARED_BUFFER &&
!CONFIG_GENERIC_ALLOCATOR build.
- Fixed sparse warnings pointed out by Paolo.
- Drop unnecessary gro_pull_from_frag0 checks.
- Added Bagas reviewed-by to docs.
Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Nikolay Aleksandrov <razor(a)blackwall.org>
v10: https://patchwork.kernel.org/project/netdevbpf/list/?series=852422&state=*
====
Major Changes:
--------------
v9 was sent right before the merge window closed (sorry!). v10 is almost
a re-send of the series now that the merge window re-opened. Only
rebased to latest net-next and addressed some minor iterative comments
received on v9.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v10/
Detailed changelog:
-------------------
- Fixed tokens leaking in DONTNEED setsockopt (Nikolay).
- Moved net_iov_dma_addr() to devmem.c and made it a devmem specific
helpers (David).
- Rename hook alloc_pages to alloc_netmems as alloc_pages is now
preprocessor macro defined and causes a build error.
v9:
===
Major Changes:
--------------
GVE queue API has been merged. Submitting this version as non-RFC after
rebasing on top of the merged API, and dropped the out of tree queue API
I was carrying on github. Addressed the little feedback v8 has received.
Detailed changelog:
------------------
- Added new patch from David Wei to this series for
netdev_rx_queue_restart()
- Fixed sparse error.
- Removed CONFIG_ checks in netmem_is_net_iov()
- Flipped skb->readable to skb->unreadable
- Minor fixes to selftests & docs.
RFC v8:
=======
Major Changes:
--------------
- Fixed build error generated by patch-by-patch build.
- Applied docs suggestions from Randy.
RFC v7:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the feedback
RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v7/
Detailed changelog:
- Use admin-perm in netlink API.
- Addressed feedback from Jakub with regards to netlink API
implementation.
- Renamed devmem.c functions to something more appropriate for that
file.
- Improve the performance seen through the page_pool benchmark.
- Fix the value definition of all the SO_DEVMEM_* uapi.
- Various fixes to documentation.
Perf - page-pool benchmark:
---------------------------
Improved performance of bench_page_pool_simple.ko tests compared to v6:
https://pastebin.com/raw/v5dYRg8L
net-next base: 8 cycle fast path.
RFC v6: 10 cycle fast path.
RFC v7: 9 cycle fast path.
RFC v7 with CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path,
same as baseline.
Perf - Devmem TCP benchmark:
---------------------
Perf is about the same regardless of the changes in v7, namely the
removal of the static_branch_unlikely to improve the page_pool benchmark
performance:
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
RFC v6:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the little
feedback RFCv5 received.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v6/
This version also comes with some performance data recorded in the cover
letter (see below changelog).
Detailed changelog:
- Rebased on top of the merged netmem_ref changes.
- Converted skb->dmabuf to skb->readable (Pavel). Pavel's original
suggestion was to remove the skb->dmabuf flag entirely, but when I
looked into it closely, I found the issue that if we remove the flag
we have to dereference the shinfo(skb) pointer to obtain the first
frag to tell whether an skb is readable or not. This can cause a
performance regression if it dirties the cache line when the
shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf
flag into a generic skb->readable flag which can be re-used by io_uring
0-copy RX.
- Squashed a few locking optimizations from Eric Dumazet in the RX path
and the DEVMEM_DONTNEED setsockopt.
- Expanded the tests a bit. Added validation for invalid scenarios and
added some more coverage.
Perf - page-pool benchmark:
---------------------------
bench_page_pool_simple.ko tests with and without these changes:
https://pastebin.com/raw/ncHDwAbn
AFAIK the number that really matters in the perf tests is the
'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
cycles without the changes but there is some 1 cycle noise in some
results.
With the patches this regresses to 9 cycles with the changes but there
is 1 cycle noise occasionally running this test repeatedly.
Lastly I tried disable the static_branch_unlikely() in
netmem_is_net_iov() check. To my surprise disabling the
static_branch_unlikely() check reduces the fast path back to 8 cycles,
but the 1 cycle noise remains.
Perf - Devmem TCP benchmark:
---------------------
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
Major changes in RFC v5:
========================
1. Rebased on top of 'Abstract page from net stack' series and used the
new netmem type to refer to LSB set pointers instead of re-using
struct page.
2. Downgraded this series back to RFC and called it RFC v5. This is
because this series is now dependent on 'Abstract page from net
stack'[1] and the queue API. Both are removed from the series to
reduce the patch # and those bits are fairly independent or
pre-requisite work.
3. Reworked the page_pool devmem support to use netmem and for some
more unified handling.
4. Reworked the reference counting of net_iov (renamed from
page_pool_iov) to use pp_ref_count for refcounting.
The full changes including the dependent series and GVE page pool
support is here:
https://github.com/mina/linux/commits/tcpdevmem-rfcv5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774
Major changes in v1:
====================
1. Implemented MVP queue API ndos to remove the userspace-visible
driver reset.
2. Fixed issues in the napi_pp_put_page() devmem frag unref path.
3. Removed RFC tag.
Many smaller addressed comments across all the patches (patches have
individual change log).
Full tree including the rest of the GVE driver changes:
https://github.com/mina/linux/commits/tcpdevmem-v1
Changes in RFC v3:
==================
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergeable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Changes in RFC v2:
==================
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
==================
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this series is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This series is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this series and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: David Wei <dw(a)davidwei.uk>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Yunsheng Lin <linyunsheng(a)huawei.com>
Cc: Shailend Chand <shailend(a)google.com>
Cc: Harshitha Ramamurthy <hramamurthy(a)google.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Mina Almasry (13):
netdev: add netdev_rx_queue_restart()
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
page_pool: convert to use netmem
page_pool: devmem support
memory-provider: dmabuf devmem memory provider
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
net: add devmem TCP documentation
selftests: add ncdevmem, netcat for devmem TCP
Documentation/netlink/specs/netdev.yaml | 57 +++
Documentation/networking/devmem.rst | 258 +++++++++++
Documentation/networking/index.rst | 1 +
arch/alpha/include/uapi/asm/socket.h | 6 +
arch/mips/include/uapi/asm/socket.h | 6 +
arch/parisc/include/uapi/asm/socket.h | 6 +
arch/sparc/include/uapi/asm/socket.h | 6 +
include/linux/skbuff.h | 61 ++-
include/linux/skbuff_ref.h | 11 +-
include/linux/socket.h | 1 +
include/net/devmem.h | 124 ++++++
include/net/mp_dmabuf_devmem.h | 46 ++
include/net/netdev_rx_queue.h | 5 +
include/net/netmem.h | 208 ++++++++-
include/net/page_pool/helpers.h | 153 +++++--
include/net/page_pool/types.h | 22 +-
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/trace/events/page_pool.h | 29 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 19 +
include/uapi/linux/uio.h | 17 +
net/bpf/test_run.c | 5 +-
net/core/Makefile | 3 +-
net/core/datagram.c | 6 +
net/core/dev.c | 6 +-
net/core/devmem.c | 375 ++++++++++++++++
net/core/gro.c | 3 +-
net/core/netdev-genl-gen.c | 23 +
net/core/netdev-genl-gen.h | 6 +
net/core/netdev-genl.c | 103 +++++
net/core/netdev_rx_queue.c | 74 ++++
net/core/page_pool.c | 360 +++++++++-------
net/core/skbuff.c | 83 +++-
net/core/sock.c | 61 +++
net/ipv4/esp4.c | 3 +-
net/ipv4/tcp.c | 261 +++++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 10 +
net/ipv4/tcp_minisocks.c | 2 +
net/ipv4/tcp_output.c | 5 +-
net/ipv6/esp6.c | 3 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 19 +
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 5 +
tools/testing/selftests/net/ncdevmem.c | 542 ++++++++++++++++++++++++
47 files changed, 2759 insertions(+), 266 deletions(-)
create mode 100644 Documentation/networking/devmem.rst
create mode 100644 include/net/devmem.h
create mode 100644 include/net/mp_dmabuf_devmem.h
create mode 100644 net/core/devmem.c
create mode 100644 net/core/netdev_rx_queue.c
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.45.2.505.gda0bf45e8d-goog
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v7:
- a better fix for tls_sw_recvmsg.
v6:
- add a fix for tls_sw_recvmsg().
v5:
- add a new patch "Check recv lengths in test_sockmap" instead of using
"continue" in msg_loop.
v4:
- address Martin's comments for v3. (thanks.)
- add Yonghong's "Acked-by" tags. (thanks.)
- update subject-prefix from "bpf-next" to "bpf".
Patch 1, v3 of "selftests/bpf: Add F_SETFL for fcntl":
- detect nonblock flag automatically, then test_sockmap can run in both
block and nonblock modes.
- use continue instead of again in v2.
Patch 2, fix for umount cgroup2 error.
Geliang Tang (2):
tls: receive msg again for sk_redirect
selftests/bpf: Add F_SETFL for fcntl in test_sockmap
net/tls/tls_sw.c | 3 +++
tools/testing/selftests/bpf/test_sockmap.c | 4 +++-
2 files changed, 6 insertions(+), 1 deletion(-)
--
2.43.0
Hi,
This builds on the proposal[1] from Mark and lets me convert the
existing usercopy selftest to KUnit. Besides adding this basic test to
the KUnit collection, it also opens the door for execve testing (which
depends on having a functional current->mm), and should provide the
basic infrastructure for adding Mark's much more complete usercopy tests.
v2:
- dropped "initial VMA", turns out it wasn't needed (Mark)
- various cleanups in the test itself (Vitor)
- moved new kunit resource to a separate file (David)
v1: https://lore.kernel.org/all/20240519190422.work.715-kees@kernel.org/
-Kees
[1] https://lore.kernel.org/lkml/20230321122514.1743889-2-mark.rutland@arm.com/
Kees Cook (2):
kunit: test: Add vm_mmap() allocation resource manager
usercopy: Convert test_user_copy to KUnit test
MAINTAINERS | 1 +
include/kunit/test.h | 17 ++
lib/Kconfig.debug | 21 +-
lib/Makefile | 2 +-
lib/kunit/Makefile | 1 +
lib/kunit/user_alloc.c | 111 +++++++++
lib/{test_user_copy.c => usercopy_kunit.c} | 273 ++++++++++-----------
7 files changed, 271 insertions(+), 155 deletions(-)
create mode 100644 lib/kunit/user_alloc.c
rename lib/{test_user_copy.c => usercopy_kunit.c} (47%)
--
2.34.1
This patchset enables both detecting as well as dumping compilable
prototypes for kfuncs.
The first commit instructs pahole to DECL_TAG kfuncs when available.
This requires v1.27 which was released on 6/11/24. With it, users will
be able to look at BTF inside vmlinux (or modules) and check if the
kfunc they want is available.
The final commit teaches bpftool how to dump kfunc prototypes. This
is done for developer convenience.
The rest of the commits are fixups to enable selftests to use the
newly dumped kfunc prototypes. With these, selftests will regularly
exercise the newly added codepaths.
Tested with and without the required pahole changes:
* https://github.com/kernel-patches/bpf/pull/7186
* https://github.com/kernel-patches/bpf/pull/7187
=== Changelog ===
From v4:
* Change bpf_session_cookie() return type
* Only fixup used fentry test kfunc prototypes
* Extract out projection detection into shared btf_is_projection_of()
* Fix kernel test robot build warnings about doc comments
From v3:
* Teach selftests to use dumped prototypes
From v2:
* Update Makefile.btf with pahole flag
* More error checking
* Output formatting changes
* Drop already-merged commit
From v1:
* Add __weak annotation
* Use btf_dump for kfunc prototypes
* Update kernel bpf_rdonly_cast() signature
Daniel Xu (12):
kbuild: bpf: Tell pahole to DECL_TAG kfuncs
bpf: selftests: Fix bpf_iter_task_vma_new() prototype
bpf: selftests: Fix fentry test kfunc prototypes
bpf: selftests: Fix bpf_cpumask_first_zero() kfunc prototype
bpf: selftests: Fix bpf_map_sum_elem_count() kfunc prototype
bpf: Make bpf_session_cookie() kfunc return long *
bpf: selftests: Namespace struct_opt callbacks in bpf_dctcp
bpf: verifier: Relax caller requirements for kfunc projection type
args
bpf: treewide: Align kfunc signatures to prog point-of-view
bpf: selftests: nf: Opt out of using generated kfunc prototypes
bpf: selftests: xfrm: Opt out of using generated kfunc prototypes
bpftool: Support dumping kfunc prototypes from BTF
fs/verity/measure.c | 5 +-
include/linux/bpf.h | 8 +--
include/linux/btf.h | 1 +
kernel/bpf/btf.c | 13 ++++-
kernel/bpf/crypto.c | 24 +++++---
kernel/bpf/helpers.c | 39 +++++++++----
kernel/bpf/verifier.c | 12 +++-
kernel/trace/bpf_trace.c | 17 +++---
net/core/filter.c | 32 +++++++----
scripts/Makefile.btf | 2 +-
tools/bpf/bpftool/btf.c | 55 +++++++++++++++++++
.../testing/selftests/bpf/bpf_experimental.h | 2 +-
tools/testing/selftests/bpf/progs/bpf_dctcp.c | 36 ++++++------
.../selftests/bpf/progs/get_func_ip_test.c | 7 +--
.../selftests/bpf/progs/ip_check_defrag.c | 10 ++--
.../selftests/bpf/progs/map_percpu_stats.c | 2 +-
.../selftests/bpf/progs/nested_trust_common.h | 2 +-
.../testing/selftests/bpf/progs/test_bpf_nf.c | 1 +
.../selftests/bpf/progs/test_bpf_nf_fail.c | 1 +
.../bpf/progs/verifier_netfilter_ctx.c | 6 +-
.../selftests/bpf/progs/xdp_synproxy_kern.c | 1 +
tools/testing/selftests/bpf/progs/xfrm_info.c | 1 +
22 files changed, 193 insertions(+), 84 deletions(-)
--
2.44.0
KTAP parsers interpret the output of ksft_test_result_*() as being the
name of the test. The map_fixed_noreplace test uses a dynamically
allocated base address for the mmap()s that it tests and currently
includes this in the test names that it logs so the test names that are
logged are not stable between runs. It also uses multiples of PAGE_SIZE
which mean that runs for kernels with different PAGE_SIZE configurations
can't be directly compared. Both these factors cause issues for CI
systems when interpreting and displaying results.
Fix this by replacing the current test names with fixed strings
describing the intent of the mappings that are logged, the existing
messages with the actual addresses and sizes are retained as diagnostic
prints to aid in debugging.
Fixes: 4838cf70e539 ("selftests/mm: map_fixed_noreplace: conform test to TAP format output")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/mm/map_fixed_noreplace.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/mm/map_fixed_noreplace.c b/tools/testing/selftests/mm/map_fixed_noreplace.c
index b74813fdc951..d53de2486080 100644
--- a/tools/testing/selftests/mm/map_fixed_noreplace.c
+++ b/tools/testing/selftests/mm/map_fixed_noreplace.c
@@ -67,7 +67,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error: munmap failed!?\n");
}
- ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_test_result_pass("mmap() 5*PAGE_SIZE at base\n");
addr = base_addr + page_size;
size = 3 * page_size;
@@ -76,7 +77,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error: first mmap() failed unexpectedly\n");
}
- ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_test_result_pass("mmap() 3*PAGE_SIZE at base+PAGE_SIZE\n");
/*
* Exact same mapping again:
@@ -93,7 +95,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error:1: mmap() succeeded when it shouldn't have\n");
}
- ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_test_result_pass("mmap() 5*PAGE_SIZE at base\n");
/*
* Second mapping contained within first:
@@ -111,7 +114,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error:2: mmap() succeeded when it shouldn't have\n");
}
- ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_test_result_pass("mmap() 2*PAGE_SIZE at base+PAGE_SIZE\n");
/*
* Overlap end of existing mapping:
@@ -128,7 +132,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error:3: mmap() succeeded when it shouldn't have\n");
}
- ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_test_result_pass("mmap() 2*PAGE_SIZE at base+(3*PAGE_SIZE)\n");
/*
* Overlap start of existing mapping:
@@ -145,7 +150,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error:4: mmap() succeeded when it shouldn't have\n");
}
- ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_test_result_pass("mmap() 2*PAGE_SIZE bytes at base\n");
/*
* Adjacent to start of existing mapping:
@@ -162,7 +168,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error:5: mmap() failed when it shouldn't have\n");
}
- ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_test_result_pass("mmap() PAGE_SIZE at base\n");
/*
* Adjacent to end of existing mapping:
@@ -179,7 +186,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error:6: mmap() failed when it shouldn't have\n");
}
- ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_test_result_pass("mmap() PAGE_SIZE at base+(4*PAGE_SIZE)\n");
addr = base_addr;
size = 5 * page_size;
---
base-commit: c3f38fa61af77b49866b006939479069cd451173
change-id: 20240605-kselftest-mm-fixed-noreplace-44e7e55c861a
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Hello,
This v2 addresses some issues observed when running the ACPI probe
kselftest proposed in v1[1] across various devices and improves the overall
reliability of the test.
The acpi-extract-ids script has been improved to:
- Parse both .c and .h files
- Add an option to print only IDs matched by a driver (i.e. defined in an
ACPI match tables or in lists of IDs provided by the drivers)
The test_unprobed_devices.sh script relies on sysfs information to
determine if a device was successfully bound to a driver. Not all devices
listed in /sys/devices are expected to have a driver folder, so the script
has been adjusted to handle these cases and avoid generating false
negatives.
The test_unprobed_devices.sh test script logic has been modified to:
- Check the status attribute (when available) to exclusively test hardware
devices that are physically present, enabled and operational
- Traverse only ACPI objects with a physical_node* link, to ensure testing
of correctly enumerated devices
- Skip devices whose HID or CID are not matched by any driver, as
determined by the list generated through the acpi-extract-ids script
- Skip devices with HID or CID listed in the ignored IDs list. This list
has been added to contain IDs of devices that don't require a driver or
cannot be represented as platform devices (e.g. ACPI container and module
devices).
- Skip devices that are natively enumerated and don't need a driver, such
as certain PCI bridges
- Skip devices unassigned to any subsystem, devices linked to other devices
and class devices
Some of the heuristics used by the script are suboptimal and might require
adjustments over time. This kind of tests would greatly benefit from a
dedicated interface that exposes information about devices expected to be
matched by drivers and their probe status. Discussion regarding this matter
was initiated in v1.
As of now, I have not identified a suitable method for exposing this
information; I plan on submitting a separate RFC to propose some options
and engage in discussion. Meanwhile, this v2 focuses on utilizing already
available information to provide an ACPI equivalent of the existing DT
kselftest [2].
Adding in CC the people involved in the discussion at Plumbers [3], feel
free to add anyone that might be interested in this.
This series depends on:
- https://lore.kernel.org/all/20240102141528.169947-1-laura.nao@collabora.com…
- https://lore.kernel.org/all/20240131-ktap-sh-helpers-extend-v1-0-98ffb46871…
Thanks,
Laura
[1] https://lore.kernel.org/all/20230925155806.1812249-2-laura.nao@collabora.co…
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/too…
[3] https://www.youtube.com/watch?v=oE73eVSyFXQ&t=9377s
Original cover letter:
Regressions that prevent a driver from probing a device can significantly
affect the functionality of a platform.
A kselftest to verify if devices on a DT-based platform are probed
correctly was recently introduced [4], but no such generic test is
available for ACPI platforms yet. bootrr [5] provides device probe
testing, but relies on a pre-defined list of the peripherals present on
each DUT.
On ACPI based hardware, a complete description of the platform is
provided to the OS by the system firmware. ACPI namespace objects are
mapped by the Linux ACPI subsystem into a device tree in
/sys/devices/LNXSYSTEM:00; the information in this subtree can be parsed
to build a list of the hw peripherals present on the DUT dynamically.
This series adds a test to verify if the devices declared in the ACPI
namespace and supported by the kernel are probed correctly.
This work follows a similar approach to [4], adapted for the ACPI use
case.
The first patch introduces a script that builds a list of all ACPI device
IDs supported by the kernel, by inspecting the acpi_device_id structs in
the sources. This list can be used to avoid testing ACPI-enumerated
devices that don't have a matching driver in the kernel. This script was
highly inspired by the dt-extract-compatibles script [6].
In the second patch, a new kselftest is added. It parses the
/sys/devices/LNXSYSTEM:00 tree to obtain a list of all platform
peripherals and verifies which of those, if supported, are correctly
bound to a driver.
Feedback is much appreciated,
Thank you,
Laura
[4] https://lore.kernel.org/all/20230828211424.2964562-1-nfraprado@collabora.co…
[5] https://github.com/kernelci/bootr
[6] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scr…
Laura Nao (2):
acpi: Add script to extract ACPI device ids in the kernel
kselftest: Add test to detect unprobed devices on ACPI platforms
MAINTAINERS | 2 +
scripts/acpi/acpi-extract-ids | 99 +++++++++++++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/acpi/.gitignore | 1 +
tools/testing/selftests/acpi/Makefile | 21 +++
tools/testing/selftests/acpi/id_ignore_list | 3 +
.../selftests/acpi/test_unprobed_devices.sh | 138 ++++++++++++++++++
7 files changed, 265 insertions(+)
create mode 100755 scripts/acpi/acpi-extract-ids
create mode 100644 tools/testing/selftests/acpi/.gitignore
create mode 100644 tools/testing/selftests/acpi/Makefile
create mode 100644 tools/testing/selftests/acpi/id_ignore_list
create mode 100755 tools/testing/selftests/acpi/test_unprobed_devices.sh
--
2.30.2
Although "TAP" word is being used already in documentation, but it hasn't
been defined in informative way for developers that how to write TAP
conformant tests and what are the benefits. Write a short brief about it.
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
Documentation/dev-tools/kselftest.rst | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst
index dcf634e411bd9..b579f491f3e97 100644
--- a/Documentation/dev-tools/kselftest.rst
+++ b/Documentation/dev-tools/kselftest.rst
@@ -228,6 +228,14 @@ In general, the rules for selftests are
* Don't cause the top-level "make run_tests" to fail if your feature is
unconfigured.
+ * The output of tests must conform to the TAP standard to ensure high
+ testing quality and to capture failures/errors with specific details.
+ The kselftest.h and kselftest_harness.h headers provide wrappers for
+ outputting test results such as pass, fail, or skip etc. These wrappers
+ should be used instead of reinventing the wheel or using raw printf and
+ exit statements. CI systems can easily parse TAP output messages to
+ detect test failures.
+
Contributing new tests (details)
================================
--
2.39.2
From: Geliang Tang <tanggeliang(a)kylinos.cn>
Drop type, noconnect and must_fail from network_helper_opts. And use
start_server_str in mptcp and test_tcp_check_syncookie_user.
Patches 1-3 address Martin's comments in the previous series.
Geliang Tang (5):
selftests/bpf: Drop type from network_helper_opts
selftests/bpf: Drop noconnect from network_helper_opts
selftests/bpf: Drop must_fail from network_helper_opts
selftests/bpf: Use start_server_str in mptcp
selftests/bpf: Use start_server_str in test_tcp_check_syncookie_user
tools/testing/selftests/bpf/network_helpers.c | 20 ++++++++-----
tools/testing/selftests/bpf/network_helpers.h | 5 ++--
.../selftests/bpf/prog_tests/cgroup_v1v2.c | 5 +---
.../bpf/prog_tests/ip_check_defrag.c | 7 ++---
.../testing/selftests/bpf/prog_tests/mptcp.c | 7 +----
.../bpf/test_tcp_check_syncookie_user.c | 29 ++-----------------
6 files changed, 21 insertions(+), 52 deletions(-)
--
2.43.0
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v6:
- add a fix for tls_sw_recvmsg().
v5:
- add a new patch "Check recv lengths in test_sockmap" instead of using
"continue" in msg_loop.
v4:
- address Martin's comments for v3. (thanks.)
- add Yonghong's "Acked-by" tags. (thanks.)
- update subject-prefix from "bpf-next" to "bpf".
Patch 1, v3 of "selftests/bpf: Add F_SETFL for fcntl":
- detect nonblock flag automatically, then test_sockmap can run in both
block and nonblock modes.
- use continue instead of again in v2.
Patch 2, fix for umount cgroup2 error.
Geliang Tang (2):
tls: wait for receiving next skb for sk_redirect
selftests/bpf: Add F_SETFL for fcntl in test_sockmap
net/tls/tls_sw.c | 2 ++
tools/testing/selftests/bpf/test_sockmap.c | 4 +++-
2 files changed, 5 insertions(+), 1 deletion(-)
--
2.43.0
PMU event filter test fails on zen4 architecture because of the
unavailability of family and model check for zen4 in use_amd_pmu().
use_amd_pmu() is added to detect architectures that supports event
select 0xc2 umask 0 as "retired branch instructions".
Model ranges in is_zen1(), is_zen2() and is_zen3() are used only for
sever SOCs, so they might not cover all the model ranges which supports
retired branch instructions.
X86_FEATURE_ZEN is a synthetic feature flag specifically added to
recognize all Zen generations by commit 232afb557835d ("x86/CPU/AMD: Add
X86_FEATURE_ZEN1"). init_amd_zen_common() uses family >= 0x17 check to
enable X86_FEATURE_ZEN.
Family 17h+ is where Zen and its successors start and that event 0xc2,0
is supported on all currently released F17h+ processors as branch
instruction retired and it is true going forward to maintain the
backward compatibility for the branch instruction retired.
Since X86_FEATURE_ZEN is not recognized in selftest framework, instead
of checking family and model value for all zen architecture, "family >=
0x17" check is added in use_amd_pmu().
Fixes: bef9a701f3eb ("selftests: kvm/x86: Add test for KVM_SET_PMU_EVENT_FILTER")
Suggested-by: Sandipan Das <sandipan.das(a)amd.com>
Signed-off-by: Manali Shukla <manali.shukla(a)amd.com>
---
.../kvm/x86_64/pmu_event_filter_test.c | 32 +++----------------
1 file changed, 5 insertions(+), 27 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
index 26b3e7efe5dd..f65033fab0c0 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
@@ -353,38 +353,16 @@ static bool use_intel_pmu(void)
kvm_pmu_has(X86_PMU_FEATURE_BRANCH_INSNS_RETIRED);
}
-static bool is_zen1(uint32_t family, uint32_t model)
-{
- return family == 0x17 && model <= 0x0f;
-}
-
-static bool is_zen2(uint32_t family, uint32_t model)
-{
- return family == 0x17 && model >= 0x30 && model <= 0x3f;
-}
-
-static bool is_zen3(uint32_t family, uint32_t model)
-{
- return family == 0x19 && model <= 0x0f;
-}
-
/*
- * Determining AMD support for a PMU event requires consulting the AMD
- * PPR for the CPU or reference material derived therefrom. The AMD
- * test code herein has been verified to work on Zen1, Zen2, and Zen3.
- *
- * Feel free to add more AMD CPUs that are documented to support event
- * select 0xc2 umask 0 as "retired branch instructions."
+ * Family 17h+ is where Zen and its successors start and that event
+ * 0xc2,0 is supported on all currently released F17h+ processors as
+ * branch instruction retired and it is true going forward to maintain
+ * the backward compatibility for the branch instruction retired.
*/
static bool use_amd_pmu(void)
{
uint32_t family = kvm_cpu_family();
- uint32_t model = kvm_cpu_model();
-
- return host_cpu_is_amd &&
- (is_zen1(family, model) ||
- is_zen2(family, model) ||
- is_zen3(family, model));
+ return family >= 0x17;
}
/*
--
2.34.1
From: Jeff Xu <jeffxu(a)chromium.org>
When MFD_NOEXEC_SEAL was introduced, there was one big mistake: it
didn't have proper documentation. This led to a lot of confusion,
especially about whether or not memfd created with the MFD_NOEXEC_SEAL
flag is sealable. Before MFD_NOEXEC_SEAL, memfd had to explicitly set
MFD_ALLOW_SEALING to be sealable, so it's a fair question.
As one might have noticed, unlike other flags in memfd_create,
MFD_NOEXEC_SEAL is actually a combination of multiple flags. The idea
is to make it easier to use memfd in the most common way, which is
NOEXEC + F_SEAL_EXEC + MFD_ALLOW_SEALING. This works with sysctl
vm.noexec to help existing applications move to a more secure way of
using memfd.
Proposals have been made to put MFD_NOEXEC_SEAL non-sealable, unless
MFD_ALLOW_SEALING is set, to be consistent with other flags [1] [2],
Those are based on the viewpoint that each flag is an atomic unit,
which is a reasonable assumption. However, MFD_NOEXEC_SEAL was
designed with the intent of promoting the most secure method of using
memfd, therefore a combination of multiple functionalities into one
bit.
Furthermore, the MFD_NOEXEC_SEAL has been added for more than one
year, and multiple applications and distributions have backported and
utilized it. Altering ABI now presents a degree of risk and may lead
to disruption.
MFD_NOEXEC_SEAL is a new flag, and applications must change their code
to use it. There is no backward compatibility problem.
When sysctl vm.noexec == 1 or 2, applications that don't set
MFD_NOEXEC_SEAL or MFD_EXEC will get MFD_NOEXEC_SEAL memfd. And
old-application might break, that is by-design, in such a system
vm.noexec = 0 shall be used. Also no backward compatibility problem.
I propose to include this documentation patch to assist in clarifying
the semantics of MFD_NOEXEC_SEAL, thereby preventing any potential
future confusion.
This patch supersede previous patch which is trying different
direction [3], and please remove [2] from mm-unstable branch when
applying this patch.
Finally, I would like to express my gratitude to David Rheinsberg and
Barnabás Pőcze for initiating the discussion on the topic of sealability.
[1]
https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
[2]
https://lore.kernel.org/lkml/20240513191544.94754-1-pobrn@protonmail.com/
[3]
https://lore.kernel.org/lkml/20240524033933.135049-1-jeffxu@google.com/
v3:
Additional Randy Dunlap' comments.
v2:
Update according to Randy Dunlap' comments.
https://lore.kernel.org/linux-mm/20240611034903.3456796-1-jeffxu@chromium.o…
v1:
https://lore.kernel.org/linux-mm/20240607203543.2151433-1-jeffxu@google.com/
Jeff Xu (1):
mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/mfd_noexec.rst | 86 ++++++++++++++++++++++
2 files changed, 87 insertions(+)
create mode 100644 Documentation/userspace-api/mfd_noexec.rst
--
2.45.2.505.gda0bf45e8d-goog
From: Jeff Xu <jeffxu(a)chromium.org>
When MFD_NOEXEC_SEAL was introduced, there was one big mistake: it
didn't have proper documentation. This led to a lot of confusion,
especially about whether or not memfd created with the MFD_NOEXEC_SEAL
flag is sealable. Before MFD_NOEXEC_SEAL, memfd had to explicitly set
MFD_ALLOW_SEALING to be sealable, so it's a fair question.
As one might have noticed, unlike other flags in memfd_create,
MFD_NOEXEC_SEAL is actually a combination of multiple flags. The idea
is to make it easier to use memfd in the most common way, which is
NOEXEC + F_SEAL_EXEC + MFD_ALLOW_SEALING. This works with sysctl
vm.noexec to help existing applications move to a more secure way of
using memfd.
Proposals have been made to put MFD_NOEXEC_SEAL non-sealable, unless
MFD_ALLOW_SEALING is set, to be consistent with other flags [1] [2],
Those are based on the viewpoint that each flag is an atomic unit,
which is a reasonable assumption. However, MFD_NOEXEC_SEAL was
designed with the intent of promoting the most secure method of using
memfd, therefore a combination of multiple functionalities into one
bit.
Furthermore, the MFD_NOEXEC_SEAL has been added for more than one
year, and multiple applications and distributions have backported and
utilized it. Altering ABI now presents a degree of risk and may lead
to disruption.
MFD_NOEXEC_SEAL is a new flag, and applications must change their code
to use it. There is no backward compatibility problem.
When sysctl vm.noexec == 1 or 2, applications that don't set
MFD_NOEXEC_SEAL or MFD_EXEC will get MFD_NOEXEC_SEAL memfd. And
old-application might break, that is by-design, in such a system
vm.noexec = 0 shall be used. Also no backward compatibility problem.
I propose to include this documentation patch to assist in clarifying
the semantics of MFD_NOEXEC_SEAL, thereby preventing any potential
future confusion.
This patch supersede previous patch which is trying different
direction [3], and please remove [2] from mm-unstable branch when
applying this patch.
Finally, I would like to express my gratitude to David Rheinsberg and
Barnabás Pőcze for initiating the discussion on the topic of sealability.
[1]
https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
[2]
https://lore.kernel.org/lkml/20240513191544.94754-1-pobrn@protonmail.com/
[3]
https://lore.kernel.org/lkml/20240524033933.135049-1-jeffxu@google.com/
v2:
Update according to Randy Dunlap' comments.
v1:
https://lore.kernel.org/linux-mm/20240607203543.2151433-1-jeffxu@google.com/
Jeff Xu (1):
mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/mfd_noexec.rst | 86 ++++++++++++++++++++++
2 files changed, 87 insertions(+)
create mode 100644 Documentation/userspace-api/mfd_noexec.rst
--
2.45.2.505.gda0bf45e8d-goog
These two subsystems require very similar fixes, so I'm sending them
out together.
Changes since the first version:
1) Rebased onto Linux 6.10-rc1.
2) Added a Reviewed-by tag from Ryan Roberts. See [1] for that.
Related work: I've sent a separate fix that allows "make CC=clang" to
work in addition to "make LLVM=1" [2].
[1] https://lore.kernel.org/518dd1e3-e31a-41c3-b488-9b75a64b6c8a@arm.com
[2] https://lore.kernel.org/20240531183751.100541-2-jhubbard@nvidia.com
John Hubbard (2):
selftests/openat2: fix clang build failures: -static-libasan,
LOCAL_HDRS
selftests/fchmodat2: fix clang build failure due to -static-libasan
tools/testing/selftests/fchmodat2/Makefile | 11 ++++++++++-
tools/testing/selftests/openat2/Makefile | 14 ++++++++++++--
2 files changed, 22 insertions(+), 3 deletions(-)
base-commit: cc8ed4d0a8486c7472cd72ec3c19957e509dc68c
--
2.45.1
Correctable memory errors are very common on servers with large
amount of memory, and are corrected by ECC, but with two
pain points to users:
1. Correction usually happens on the fly and adds latency overhead
2. Not-fully-proved theory states excessive correctable memory
errors can develop into uncorrectable memory error.
Soft offline is kernel's additional solution for memory pages
having (excessive) corrected memory errors. Impacted page is migrated
to healthy page if it is in use, then the original page is discarded
for any future use.
The actual policy on whether (and when) to soft offline should be
maintained by userspace, especially in case of HugeTLB hugepages.
Soft-offline dissolves a hugepage, either in-use or free, into
chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage.
If userspace has not acknowledged such behavior, it may be surprised
when later mmap hugepages MAP_FAILED due to lack of hugepages.
In addition, discarding the entire 1G memory page only because of
corrected memory errors sounds very costly and kernel better not
doing under the hood. But today there are at least 2 such cases:
1. GHES driver sees both GHES_SEV_CORRECTED and
CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER.
2. RAS Correctable Errors Collector counts correctable errors per
PFN and when the counter for a PFN reaches threshold
In both cases, userspace has no control of the soft offline performed
by kernel's memory failure recovery.
This patch series give userspace the control of soft-offlining
HugeTLB pages: kernel only soft offlines hugepage if userspace has
opt-ed in for that specific hugepage size, and exposed to userspace
by a new sysfs entry called softoffline_corrected_errors under
/sys/kernel/mm/hugepages/hugepages-${size}kB directory:
* When softoffline_corrected_errors=0, skip soft offlining for all
hugepages of size ${size}kB.
* When softoffline_corrected_errors=1, soft offline as before this
patch series.
By default softoffline_corrected_errors is 1.
This patch set is based at
commit a52b4f11a2e1 ("selftest mm/mseal read-only elf memory segment").
Jiaqi Yan (3):
mm/memory-failure: userspace controls soft-offlining hugetlb pages
selftest/mm: test softoffline_corrected_errors behaviors
docs: hugetlbpage.rst: add softoffline_corrected_errors
Documentation/admin-guide/mm/hugetlbpage.rst | 15 +-
include/linux/hugetlb.h | 17 ++
mm/hugetlb.c | 34 +++
mm/memory-failure.c | 7 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/hugetlb-soft-offline.c | 262 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
8 files changed, 340 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/mm/hugetlb-soft-offline.c
--
2.45.1.288.g0e0cd299f1-goog
xtheadvector is a custom extension that is based upon riscv vector
version 0.7.1 [1]. All of the vector routines have been modified to
support this alternative vector version based upon whether xtheadvector
was determined to be supported at boot.
vlenb is not supported on the existing xtheadvector hardware, so a
devicetree property thead,vlenb is added to provide the vlenb to Linux.
There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 that is
used to request which thead vendor extensions are supported on the
current platform. This allows future vendors to allocate hwprobe keys
for their vendor.
Support for xtheadvector is also added to the vector kselftests.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
[1] https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361…
---
This series is a continuation of a different series that was fragmented
into two other series in an attempt to get part of it merged in the 6.10
merge window. The split-off series did not get merged due to a NAK on
the series that added the generic riscv,vlenb devicetree entry. This
series has converted riscv,vlenb to thead,vlenb to remedy this issue.
The original series is titled "riscv: Support vendor extensions and
xtheadvector" [3].
The series titled "riscv: Extend cpufeature.c to detect vendor
extensions" is still under development and this series is based on that
series! [4]
I have tested this with an Allwinner Nezha board. I ran into issues
booting the board after 6.9-rc1 so I applied these patches to 6.8. There
are a couple of minor merge conflicts that do arrise when doing that, so
please let me know if you have been able to boot this board with a 6.9
kernel. I used SkiffOS [1] to manage building the image, but upgraded
the U-Boot version to Samuel Holland's more up-to-date version [2] and
changed out the device tree used by U-Boot with the device trees that
are present in upstream linux and this series. Thank you Samuel for all
of the work you did to make this task possible.
[1] https://github.com/skiffos/SkiffOS/tree/master/configs/allwinner/nezha
[2] https://github.com/smaeul/u-boot/commit/2e89b706f5c956a70c989cd31665f1429e9…
[3] https://lore.kernel.org/all/20240503-dev-charlie-support_thead_vector_6_9-v…
[4] https://lore.kernel.org/linux-riscv/20240609-support_vendor_extensions-v2-0…
---
Charlie Jenkins (12):
dt-bindings: riscv: Add xtheadvector ISA extension description
dt-bindings: thead: add a vlen register length property
riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
riscv: Add thead and xtheadvector as a vendor extension
riscv: vector: Use vlenb from DT for thead
riscv: csr: Add CSR encodings for VCSR_VXRM/VCSR_VXSAT
riscv: Add xtheadvector instruction definitions
riscv: vector: Support xtheadvector save/restore
riscv: hwprobe: Add thead vendor extension probing
riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
selftests: riscv: Fix vector tests
selftests: riscv: Support xtheadvector in vector tests
Heiko Stuebner (1):
RISC-V: define the elements of the VCSR vector CSR
Documentation/arch/riscv/hwprobe.rst | 10 +
.../devicetree/bindings/riscv/extensions.yaml | 10 +
Documentation/devicetree/bindings/riscv/thead.yaml | 7 +
arch/riscv/Kconfig.vendor | 26 ++
arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi | 3 +-
arch/riscv/include/asm/cpufeature.h | 2 +
arch/riscv/include/asm/csr.h | 13 +
arch/riscv/include/asm/hwprobe.h | 4 +-
arch/riscv/include/asm/switch_to.h | 2 +-
arch/riscv/include/asm/vector.h | 249 +++++++++++++----
arch/riscv/include/asm/vendor_extensions/thead.h | 42 +++
.../include/asm/vendor_extensions/thead_hwprobe.h | 18 ++
.../include/asm/vendor_extensions/vendor_hwprobe.h | 37 +++
arch/riscv/include/uapi/asm/hwprobe.h | 3 +-
arch/riscv/include/uapi/asm/vendor/thead.h | 3 +
arch/riscv/kernel/cpufeature.c | 51 +++-
arch/riscv/kernel/kernel_mode_vector.c | 8 +-
arch/riscv/kernel/process.c | 4 +-
arch/riscv/kernel/signal.c | 6 +-
arch/riscv/kernel/sys_hwprobe.c | 5 +
arch/riscv/kernel/vector.c | 25 +-
arch/riscv/kernel/vendor_extensions.c | 10 +
arch/riscv/kernel/vendor_extensions/Makefile | 2 +
arch/riscv/kernel/vendor_extensions/thead.c | 18 ++
.../riscv/kernel/vendor_extensions/thead_hwprobe.c | 19 ++
tools/testing/selftests/riscv/vector/.gitignore | 3 +-
tools/testing/selftests/riscv/vector/Makefile | 17 +-
.../selftests/riscv/vector/v_exec_initval_nolibc.c | 93 +++++++
tools/testing/selftests/riscv/vector/v_helpers.c | 67 +++++
tools/testing/selftests/riscv/vector/v_helpers.h | 7 +
tools/testing/selftests/riscv/vector/v_initval.c | 22 ++
.../selftests/riscv/vector/v_initval_nolibc.c | 68 -----
.../selftests/riscv/vector/vstate_exec_nolibc.c | 20 +-
.../testing/selftests/riscv/vector/vstate_prctl.c | 295 ++++++++++++---------
34 files changed, 898 insertions(+), 271 deletions(-)
---
base-commit: 11cc01d4d2af304b7288251aad7e03315db8dffc
change-id: 20240530-xtheadvector-833d3d17b423
--
- Charlie
Currently, we can run string-stream and assertion tests only when they
are built into the kernel (with config options = y), since some of the
symbols (string-stream functions and functions from assert.c) are not
exported into any of the namespaces, therefore they are not accessible
for the modules.
This patch series exports the required symbols into the KUnit namespace.
Also, it makes the string-stream test a separate module and removes the
log test stub from kunit-test since now we can access the string-stream
symbols even if the test which uses it is built as a module.
Additionally, this patch series merges the assertion test suite into the
kunit-test, since assert.c (and all of the assertion formatting
functions in it) is a part of the KUnit core.
Ivan Orlov (5):
kunit: string-stream: export non-static functions
kunit: kunit-test: Remove stub for log tests
kunit: string-stream-test: Make it a separate module
kunit: assert: export non-static functions
kunit: Merge assertion test into kunit-test.c
lib/kunit/Kconfig | 8 +
lib/kunit/Makefile | 7 +-
lib/kunit/assert.c | 4 +
lib/kunit/assert_test.c | 388 --------------------------------
lib/kunit/kunit-test.c | 397 +++++++++++++++++++++++++++++++--
lib/kunit/string-stream-test.c | 2 +
lib/kunit/string-stream.c | 12 +-
7 files changed, 405 insertions(+), 413 deletions(-)
delete mode 100644 lib/kunit/assert_test.c
--
2.34.1
The perf subsystem today unifies various tracing and monitoring
features, from both software and hardware. One benefit of the perf
subsystem is automatically inheriting events to child tasks, which
enables process-wide events monitoring with low overheads. By default
perf events are non-intrusive, not affecting behaviour of the tasks
being monitored.
For certain use-cases, however, it makes sense to leverage the
generality of the perf events subsystem and optionally allow the tasks
being monitored to receive signals on events they are interested in.
This patch series adds the option to synchronously signal user space on
events.
To better support process-wide synchronous self-monitoring, without
events propagating to children that do not share the current process's
shared environment, two pre-requisite patches are added to optionally
restrict inheritance to CLONE_THREAD, and remove events on exec (without
affecting the parent).
Examples how to use these features can be found in the tests added at
the end of the series. In addition to the tests added, the series has
also been subjected to syzkaller fuzzing (focus on 'kernel/events/'
coverage).
Motivation and Example Uses
---------------------------
1. Our immediate motivation is low-overhead sampling-based race
detection for user space [1]. By using perf_event_open() at
process initialization, we can create hardware
breakpoint/watchpoint events that are propagated automatically
to all threads in a process. As far as we are aware, today no
existing kernel facility (such as ptrace) allows us to set up
process-wide watchpoints with minimal overheads (that are
comparable to mprotect() of whole pages).
2. Other low-overhead error detectors that rely on detecting
accesses to certain memory locations or code, process-wide and
also only in a specific set of subtasks or threads.
[1] https://llvm.org/devmtg/2020-09/slides/Morehouse-GWP-Tsan.pdf
Other ideas for use-cases we found interesting, but should only
illustrate the range of potential to further motivate the utility (we're
sure there are more):
3. Code hot patching without full stop-the-world. Specifically, by
setting a code breakpoint to entry to the patched routine, then
send signals to threads and check that they are not in the
routine, but without stopping them further. If any of the
threads will enter the routine, it will receive SIGTRAP and
pause.
4. Safepoints without mprotect(). Some Java implementations use
"load from a known memory location" as a safepoint. When threads
need to be stopped, the page containing the location is
mprotect()ed and threads get a signal. This could be replaced with
a watchpoint, which does not require a whole page nor DTLB
shootdowns.
5. Threads receiving signals on performance events to
throttle/unthrottle themselves.
6. Tracking data flow globally.
Changelog
---------
v4:
* Fix for parent and child racing to exit in sync_child_event().
* Fix race between irq_work running and task's sighand being released by
release_task().
* Generalize setting si_perf and si_addr independent of event type;
introduces perf_event_attr::sig_data, which can be set by user space
to be propagated to si_perf.
* Warning in perf_sigtrap() if ctx->task and current mismatch; we expect
this on architectures that do not properly implement
arch_irq_work_raise().
* Require events that want sigtrap to be associated with a task.
* Dropped "perf: Add breakpoint information to siginfo on SIGTRAP"
in favor of more generic solution (perf_event_attr::sig_data).
v3:
* Add patch "perf: Rework perf_event_exit_event()" to beginning of
series, courtesy of Peter Zijlstra.
* Rework "perf: Add support for event removal on exec" based on
the added "perf: Rework perf_event_exit_event()".
* Fix kselftests to work with more recent libc, due to the way it forces
using the kernel's own siginfo_t.
* Add basic perf-tool built-in test.
v2/RFC: https://lkml.kernel.org/r/20210310104139.679618-1-elver@google.com
* Patch "Support only inheriting events if cloned with CLONE_THREAD"
added to series.
* Patch "Add support for event removal on exec" added to series.
* Patch "Add kselftest for process-wide sigtrap handling" added to
series.
* Patch "Add kselftest for remove_on_exec" added to series.
* Implicitly restrict inheriting events if sigtrap, but the child was
cloned with CLONE_CLEAR_SIGHAND, because it is not generally safe if
the child cleared all signal handlers to continue sending SIGTRAP.
* Various minor fixes (see details in patches).
v1/RFC: https://lkml.kernel.org/r/20210223143426.2412737-1-elver@google.com
Pre-series: The discussion at [2] led to the changes in this series. The
approach taken in "Add support for SIGTRAP on perf events" to trigger
the signal was suggested by Peter Zijlstra in [3].
[2] https://lore.kernel.org/lkml/CACT4Y+YPrXGw+AtESxAgPyZ84TYkNZdP0xpocX2jwVAbZ…
[3] https://lore.kernel.org/lkml/YBv3rAT566k+6zjg@hirez.programming.kicks-ass.n…
Marco Elver (9):
perf: Apply PERF_EVENT_IOC_MODIFY_ATTRIBUTES to children
perf: Support only inheriting events if cloned with CLONE_THREAD
perf: Add support for event removal on exec
signal: Introduce TRAP_PERF si_code and si_perf to siginfo
perf: Add support for SIGTRAP on perf events
selftests/perf_events: Add kselftest for process-wide sigtrap handling
selftests/perf_events: Add kselftest for remove_on_exec
tools headers uapi: Sync tools/include/uapi/linux/perf_event.h
perf test: Add basic stress test for sigtrap handling
Peter Zijlstra (1):
perf: Rework perf_event_exit_event()
arch/m68k/kernel/signal.c | 3 +
arch/x86/kernel/signal_compat.c | 5 +-
fs/signalfd.c | 4 +
include/linux/compat.h | 2 +
include/linux/perf_event.h | 9 +-
include/linux/signal.h | 1 +
include/uapi/asm-generic/siginfo.h | 6 +-
include/uapi/linux/perf_event.h | 12 +-
include/uapi/linux/signalfd.h | 4 +-
kernel/events/core.c | 302 +++++++++++++-----
kernel/fork.c | 2 +-
kernel/signal.c | 11 +
tools/include/uapi/linux/perf_event.h | 12 +-
tools/perf/tests/Build | 1 +
tools/perf/tests/builtin-test.c | 5 +
tools/perf/tests/sigtrap.c | 150 +++++++++
tools/perf/tests/tests.h | 1 +
.../testing/selftests/perf_events/.gitignore | 3 +
tools/testing/selftests/perf_events/Makefile | 6 +
tools/testing/selftests/perf_events/config | 1 +
.../selftests/perf_events/remove_on_exec.c | 260 +++++++++++++++
tools/testing/selftests/perf_events/settings | 1 +
.../selftests/perf_events/sigtrap_threads.c | 210 ++++++++++++
23 files changed, 924 insertions(+), 87 deletions(-)
create mode 100644 tools/perf/tests/sigtrap.c
create mode 100644 tools/testing/selftests/perf_events/.gitignore
create mode 100644 tools/testing/selftests/perf_events/Makefile
create mode 100644 tools/testing/selftests/perf_events/config
create mode 100644 tools/testing/selftests/perf_events/remove_on_exec.c
create mode 100644 tools/testing/selftests/perf_events/settings
create mode 100644 tools/testing/selftests/perf_events/sigtrap_threads.c
--
2.31.0.208.g409f899ff0-goog
This patch series is motivated by the following observation:
Raise a signal, jump to signal handler. The ucontext_t structure dumped
by kernel to userspace has a uc_sigmask field having the mask of blocked
signals. If you run a fresh minimalistic program doing this, this field
is empty, even if you block some signals while registering the handler
with sigaction().
Here is what the man-pages have to say:
sigaction(2): "sa_mask specifies a mask of signals which should be blocked
(i.e., added to the signal mask of the thread in which the signal handler
is invoked) during execution of the signal handler. In addition, the
signal which triggered the handler will be blocked, unless the SA_NODEFER
flag is used."
signal(7): Under "Execution of signal handlers", (1.3) implies:
"The thread's current signal mask is accessible via the ucontext_t
object that is pointed to by the third argument of the signal handler."
But, (1.4) states:
"Any signals specified in act->sa_mask when registering the handler with
sigprocmask(2) are added to the thread's signal mask. The signal being
delivered is also added to the signal mask, unless SA_NODEFER was
specified when registering the handler. These signals are thus blocked
while the handler executes."
There clearly is no distinction being made in the man pages between
"Thread's signal mask" and ucontext_t; this logically should imply
that a signal blocked by populating struct sigaction should be visible
in ucontext_t.
Here is what the kernel code does (for Aarch64):
do_signal() -> handle_signal() -> sigmask_to_save(), which returns
¤t->blocked, is passed to setup_rt_frame() -> setup_sigframe() ->
__copy_to_user(). Hence, ¤t->blocked is copied to ucontext_t
exposed to userspace. Returning back to handle_signal(),
signal_setup_done() -> signal_delivered() -> sigorsets() and
set_current_blocked() are responsible for using information from
struct ksignal ksig, which was populated through the sigaction()
system call in kernel/signal.c:
copy_from_user(&new_sa.sa, act, sizeof(new_sa.sa)),
to update ¤t->blocked; hence, the set of blocked signals for the
current thread is updated AFTER the kernel dumps ucontext_t to
userspace.
Assuming that the above is indeed the intended behaviour, because it
semantically makes sense, since the signals blocked using sigaction()
remain blocked only till the execution of the handler, and not in the
context present before jumping to the handler (but nothing can be
confirmed from the man-pages), the series introduces a test for
mangling with uc_sigmask. I will send a separate series to fix the
man-pages.
The proposed selftest has been tested out on Aarch32, Aarch64 and x86_64.
Changes in v2:
- Replace all occurrences of SIGPIPE with SIGSEGV
- Add a testcase: Raise the same signal again; it must not be queued
- Remove unneeded <assert.h>, <unistd.h>
- Give a detailed test description in the comments; also describe the
exact meaning of delivered and blocked
- Handle errors for all libc functions/syscalls
- Mention tests in Makefile and .gitignore in alphabetical order
Dev Jain (2):
selftests: Rename sigaltstack to generic signal
selftests: Add a test mangling with uc_sigmask
tools/testing/selftests/Makefile | 2 +-
.../{sigaltstack => signal}/.gitignore | 3 +-
.../{sigaltstack => signal}/Makefile | 3 +-
.../current_stack_pointer.h | 0
.../selftests/signal/mangle_uc_sigmask.c | 194 ++++++++++++++++++
.../sas.c => signal/sigaltstack.c} | 0
6 files changed, 199 insertions(+), 3 deletions(-)
rename tools/testing/selftests/{sigaltstack => signal}/.gitignore (57%)
rename tools/testing/selftests/{sigaltstack => signal}/Makefile (53%)
rename tools/testing/selftests/{sigaltstack => signal}/current_stack_pointer.h (100%)
create mode 100644 tools/testing/selftests/signal/mangle_uc_sigmask.c
rename tools/testing/selftests/{sigaltstack/sas.c => signal/sigaltstack.c} (100%)
--
2.34.1
From: Jeff Xu <jeffxu(a)chromium.org>
When MFD_NOEXEC_SEAL was introduced, there was one big mistake: it
didn't have proper documentation. This led to a lot of confusion,
especially about whether or not memfd created with the MFD_NOEXEC_SEAL
flag is sealable. Before MFD_NOEXEC_SEAL, memfd had to explicitly set
MFD_ALLOW_SEALING to be sealable, so it's a fair question.
As one might have noticed, unlike other flags in memfd_create,
MFD_NOEXEC_SEAL is actually a combination of multiple flags. The idea
is to make it easier to use memfd in the most common way, which is
NOEXEC + F_SEAL_EXEC + MFD_ALLOW_SEALING. This works with sysctl
vm.noexec to help existing applications move to a more secure way of
using memfd.
Proposals have been made to put MFD_NOEXEC_SEAL non-sealable, unless
MFD_ALLOW_SEALING is set, to be consistent with other flags [1] [2],
Those are based on the viewpoint that each flag is an atomic unit,
which is a reasonable assumption. However, MFD_NOEXEC_SEAL was
designed with the intent of promoting the most secure method of using
memfd, therefore a combination of multiple functionalities into one
bit.
Furthermore, the MFD_NOEXEC_SEAL has been added for more than one
year, and multiple applications and distributions have backported and
utilized it. Altering ABI now presents a degree of risk and may lead
to disruption.
MFD_NOEXEC_SEAL is a new flag, and applications must change their code
to use it. There is no backward compatibility problem.
When sysctl vm.noexec == 1 or 2, applications that don't set
MFD_NOEXEC_SEAL or MFD_EXEC will get MFD_NOEXEC_SEAL memfd. And
old-application might break, that is by-design, in such a system
vm.noexec = 0 shall be used. Also no backward compatibility problem.
I propose to include this documentation patch to assist in clarifying
the semantics of MFD_NOEXEC_SEAL, thereby preventing any potential
future confusion.
This patch supersede previous patch which is trying different
direction [3], and please remove [2] from mm-unstable branch when
applying this patch.
Finally, I would like to express my gratitude to David Rheinsberg and
Barnabás Pőcze for initiating the discussion on the topic of sealability.
[1]
https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
[2]
https://lore.kernel.org/lkml/20240513191544.94754-1-pobrn@protonmail.com/
[3]
https://lore.kernel.org/lkml/20240524033933.135049-1-jeffxu@google.com/
Jeff Xu (1):
mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/mfd_noexec.rst | 86 ++++++++++++++++++++++
2 files changed, 87 insertions(+)
create mode 100644 Documentation/userspace-api/mfd_noexec.rst
--
2.45.2.505.gda0bf45e8d-goog
The different patches here are some unrelated fixes for MPTCP:
- Patch 1 ensures 'snd_una' is initialised on connect in case of MPTCP
fallback to TCP followed by retransmissions before the processing of
any other incoming packets. A fix for v5.9+.
- Patch 2 makes sure the RmAddr MIB counter is incremented, and only
once per ID, upon the reception of a RM_ADDR. A fix for v5.10+.
- Patch 3 doesn't update 'add addr' related counters if the connect()
was not possible. A fix for v5.7+.
- Patch 4 updates the mailmap file to add Geliang's new email address.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Geliang Tang (1):
mailmap: map Geliang's new email address
Paolo Abeni (1):
mptcp: ensure snd_una is properly initialized on connect
YonglongLi (2):
mptcp: pm: inc RmAddr MIB counter once per RM_ADDR ID
mptcp: pm: update add_addr counters after connect
.mailmap | 1 +
net/mptcp/pm_netlink.c | 21 ++++++++++++++-------
net/mptcp/protocol.c | 1 +
tools/testing/selftests/net/mptcp/mptcp_join.sh | 5 +++--
4 files changed, 19 insertions(+), 9 deletions(-)
---
base-commit: c44711b78608c98a3e6b49ce91678cd0917d5349
change-id: 20240607-upstream-net-20240607-misc-fixes-024007171d60
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Hi,
This builds on the proposal[1] from Mark and lets me convert the
existing usercopy selftest to KUnit. Besides adding this basic test to
the KUnit collection, it also opens the door for execve testing (which
depends on having a functional current->mm), and should provide the
basic infrastructure for adding Mark's much more complete usercopy tests.
-Kees
[1] https://lore.kernel.org/lkml/20230321122514.1743889-2-mark.rutland@arm.com/
Kees Cook (2):
kunit: test: Add vm_mmap() allocation resource manager
usercopy: Convert test_user_copy to KUnit test
MAINTAINERS | 1 +
include/kunit/test.h | 17 ++
lib/Kconfig.debug | 21 +-
lib/Makefile | 2 +-
lib/kunit/test.c | 139 +++++++++++-
lib/{test_user_copy.c => usercopy_kunit.c} | 252 ++++++++++-----------
6 files changed, 288 insertions(+), 144 deletions(-)
rename lib/{test_user_copy.c => usercopy_kunit.c} (52%)
--
2.34.1
Hi all,
This series does a number of cleanups into resctrl_val() and
generalizes it by removing test name specific handling from the
function.
v6:
- Adjust closing/rollback of the IMC perf
- Move the comment in measure_vals() to function level
- Capitalize MBM
- binded to -> bound to
- Language tweak into kerneldoc
- Removed stale paragraph from commit message
v5:
- Open mem bw file only once and use rewind().
- Add \n to mem bw file read to allow reading fresh values from the file.
- Return 0 if create_grp() is given NULL grp_name (matches the original
behavior). Mention this in function's kerneldoc.
- Cast pid_t to int before printing with %d.
- Caps/typo fixes to kerneldoc and commit messages.
- Use imperative tone in commit messages and improve them based on points
that came up during review.
v4:
- Merged close fix into IMC READ+WRITE rework patch
- Add loop to reset imc_counters_config fds to -1 to be able know which
need closing
- Introduce perf_close_imc_mem_bw() to close fds
- Open resctrl mem bw file (twice) beforehand to avoid opening it during
the test
- Remove MBM .mongrp setup
- Remove mongrp from CMT test
v3:
- Rename init functions to <testname>_init()
- Replace for loops with READ+WRITE statements for clarity
- Don't drop Return: entry from perf_open_imc_mem_bw() func comment
- New patch: Fix closing of IMC fds in case of error
- New patch: Make "bandwidth" consistent in comments & prints
- New patch: Simplify mem bandwidth file code
- Remove wrong comment
- Changed grp_name check to return -1 on fail (internal sanity check)
v2:
- Resolved conflicts with kselftest/next
- Spaces -> tabs correction
Ilpo Järvinen (16):
selftests/resctrl: Fix closing IMC fds on error and open-code R+W
instead of loops
selftests/resctrl: Calculate resctrl FS derived mem bw over sleep(1)
only
selftests/resctrl: Make "bandwidth" consistent in comments & prints
selftests/resctrl: Consolidate get_domain_id() into resctrl_val()
selftests/resctrl: Use correct type for pids
selftests/resctrl: Cleanup bm_pid and ppid usage & limit scope
selftests/resctrl: Rename measure_vals() to measure_mem_bw_vals() &
document
selftests/resctrl: Simplify mem bandwidth file code for MBA & MBM
tests
selftests/resctrl: Add ->measure() callback to resctrl_val_param
selftests/resctrl: Add ->init() callback into resctrl_val_param
selftests/resctrl: Simplify bandwidth report type handling
selftests/resctrl: Make some strings passed to resctrlfs functions
const
selftests/resctrl: Convert ctrlgrp & mongrp to pointers
selftests/resctrl: Remove mongrp from MBA test
selftests/resctrl: Remove mongrp from CMT test
selftests/resctrl: Remove test name comparing from
write_bm_pid_to_resctrl()
tools/testing/selftests/resctrl/cache.c | 10 +-
tools/testing/selftests/resctrl/cat_test.c | 5 +-
tools/testing/selftests/resctrl/cmt_test.c | 22 +-
tools/testing/selftests/resctrl/mba_test.c | 26 +-
tools/testing/selftests/resctrl/mbm_test.c | 26 +-
tools/testing/selftests/resctrl/resctrl.h | 49 ++-
tools/testing/selftests/resctrl/resctrl_val.c | 371 ++++++++----------
tools/testing/selftests/resctrl/resctrlfs.c | 67 ++--
8 files changed, 291 insertions(+), 285 deletions(-)
--
2.39.2
This patch addresses the present TODO in the file.
I have tested it manually on my system and added relevant filtering to
ensure that the correct feature list is being checked.
Signed-off-by: Abhinav Jain <jain.abhinav177(a)gmail.com>
---
tools/testing/selftests/net/netdevice.sh | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/netdevice.sh b/tools/testing/selftests/net/netdevice.sh
index e3afcb424710..cbe2573c3827 100755
--- a/tools/testing/selftests/net/netdevice.sh
+++ b/tools/testing/selftests/net/netdevice.sh
@@ -117,14 +117,31 @@ kci_netdev_ethtool()
return 1
fi
- ethtool -k "$netdev" > "$TMP_ETHTOOL_FEATURES"
+ ethtool -k "$netdev" | tail -n +2 > "$TMP_ETHTOOL_FEATURES"
if [ $? -ne 0 ];then
echo "FAIL: $netdev: ethtool list features"
rm "$TMP_ETHTOOL_FEATURES"
return 1
fi
echo "PASS: $netdev: ethtool list features"
- #TODO for each non fixed features, try to turn them on/off
+
+ for feature in $(grep -v fixed "$TMP_ETHTOOL_FEATURES" | \
+ awk '{print $1}' | sed 's/://'); do
+ ethtool --offload "$netdev" "$feature" off
+ if [ $? -eq 0 ]; then
+ echo "PASS: $netdev: Turned off feature: $feature"
+ else
+ echo "FAIL: $netdev: Failed to turn off feature: $feature"
+ fi
+
+ ethtool --offload "$netdev" "$feature" on
+ if [ $? -eq 0 ]; then
+ echo "PASS: $netdev: Turned on feature: $feature"
+ else
+ echo "FAIL: $netdev: Failed to turn on feature: $feature"
+ fi
+ done
+
rm "$TMP_ETHTOOL_FEATURES"
kci_netdev_ethtool_test 74 'dump' "ethtool -d $netdev"
--
2.34.1
From: Pankaj Raghav <p.raghav(a)samsung.com>
create_pagecache_thp_and_fd() in split_huge_page_test.c used the
variable dummy to perform mmap read.
However, this test was skipped even on XFS which has large folio
support. The issue was compiler (gcc 13.2.0) was optimizing out the
dummy variable, therefore, not creating huge page in the page cache.
Use asm volatile() trick to force the compiler not to optimize out
the loop where we read from the mmaped addr. This is similar to what is
being done in other tests (cow.c, etc)
As the variable is now used in the asm statement, remove the unused
attribute.
Signed-off-by: Pankaj Raghav <p.raghav(a)samsung.com>
---
Changes since v2:
- Use the asm volatile trick to force the compiler to not optimize the
read into dummy variable. (David)
tools/testing/selftests/mm/split_huge_page_test.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index d3c7f5fb3e7b..e5e8dafc9d94 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -300,7 +300,7 @@ int create_pagecache_thp_and_fd(const char *testfile, size_t fd_size, int *fd,
char **addr)
{
size_t i;
- int __attribute__((unused)) dummy = 0;
+ int dummy = 0;
srand(time(NULL));
@@ -341,6 +341,7 @@ int create_pagecache_thp_and_fd(const char *testfile, size_t fd_size, int *fd,
for (size_t i = 0; i < fd_size; i++)
dummy += *(*addr + i);
+ asm volatile("" : "+r" (dummy));
if (!check_huge_file(*addr, fd_size / pmd_pagesize, pmd_pagesize)) {
ksft_print_msg("No large pagecache folio generated, please provide a filesystem supporting large folio\n");
base-commit: d97496ca23a2d4ee80b7302849404859d9058bcd
--
2.44.1
The purpose of this series is to rethink how HID-BPF is invoked.
Currently it implies a jmp table, a prog fd bpf_map, a preloaded tracing
bpf program and a lot of manual work for handling the bpf program
lifetime and addition/removal.
OTOH, bpf_struct_ops take care of most of the bpf handling leaving us
with a simple list of ops pointers, and we can directly call the
struct_ops program from the kernel as a regular function.
The net gain right now is in term of code simplicity and lines of code
removal (though is an API breakage), but udev-hid-bpf is able to handle
such breakages.
In the near future, we will be able to extend the HID-BPF struct_ops
with entrypoints for hid_hw_raw_request() and hid_hw_output_report(),
allowing for covering all of the initial use cases:
- firewalling a HID device
- fixing all of the HID device interactions (not just device events as
it is right now).
The matching user-space loader (udev-hid-bpf) MR is at
https://gitlab.freedesktop.org/libevdev/udev-hid-bpf/-/merge_requests/86
I'll put it out of draft once this is merged.
Cheers,
Benjamin
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Changes in v2:
- drop HID_BPF_FLAGS enum and use BPF_F_BEFORE instead
- fix .init_members to not open code member->offset
- allow struct hid_device to be writeable from HID-BPF for its name,
uniq and phys
- Link to v1: https://lore.kernel.org/r/20240528-hid_bpf_struct_ops-v1-0-8c6663df27d8@ker…
---
Benjamin Tissoires (16):
HID: rename struct hid_bpf_ops into hid_ops
HID: bpf: add hid_get/put_device() helpers
HID: bpf: implement HID-BPF through bpf_struct_ops
selftests/hid: convert the hid_bpf selftests with struct_ops
HID: samples: convert the 2 HID-BPF samples into struct_ops
HID: bpf: add defines for HID-BPF SEC in in-tree bpf fixes
HID: bpf: convert in-tree fixes into struct_ops
HID: bpf: remove tracing HID-BPF capability
selftests/hid: add subprog call test
Documentation: HID: amend HID-BPF for struct_ops
Documentation: HID: add a small blurb on udev-hid-bpf
HID: bpf: Artist24: remove unused variable
HID: bpf: error on warnings when compiling bpf objects
bpf: allow bpf helpers to be used into HID-BPF struct_ops
HID: bpf: rework hid_bpf_ops_btf_struct_access
HID: bpf: make part of struct hid_device writable
Documentation/hid/hid-bpf.rst | 173 ++++---
drivers/hid/bpf/Makefile | 2 +-
drivers/hid/bpf/entrypoints/Makefile | 93 ----
drivers/hid/bpf/entrypoints/README | 4 -
drivers/hid/bpf/entrypoints/entrypoints.bpf.c | 25 -
drivers/hid/bpf/entrypoints/entrypoints.lskel.h | 248 ---------
drivers/hid/bpf/hid_bpf_dispatch.c | 266 +++-------
drivers/hid/bpf/hid_bpf_dispatch.h | 12 +-
drivers/hid/bpf/hid_bpf_jmp_table.c | 565 ---------------------
drivers/hid/bpf/hid_bpf_struct_ops.c | 298 +++++++++++
drivers/hid/bpf/progs/FR-TEC__Raptor-Mach-2.bpf.c | 9 +-
drivers/hid/bpf/progs/HP__Elite-Presenter.bpf.c | 6 +-
drivers/hid/bpf/progs/Huion__Kamvas-Pro-19.bpf.c | 9 +-
.../hid/bpf/progs/IOGEAR__Kaliber-MMOmentum.bpf.c | 6 +-
drivers/hid/bpf/progs/Makefile | 2 +-
.../hid/bpf/progs/Microsoft__XBox-Elite-2.bpf.c | 6 +-
drivers/hid/bpf/progs/Wacom__ArtPen.bpf.c | 6 +-
drivers/hid/bpf/progs/XPPen__Artist24.bpf.c | 10 +-
drivers/hid/bpf/progs/XPPen__ArtistPro16Gen2.bpf.c | 24 +-
drivers/hid/bpf/progs/hid_bpf.h | 5 +
drivers/hid/hid-core.c | 6 +-
include/linux/hid_bpf.h | 125 ++---
samples/hid/Makefile | 5 +-
samples/hid/hid_bpf_attach.bpf.c | 18 -
samples/hid/hid_bpf_attach.h | 14 -
samples/hid/hid_mouse.bpf.c | 26 +-
samples/hid/hid_mouse.c | 39 +-
samples/hid/hid_surface_dial.bpf.c | 10 +-
samples/hid/hid_surface_dial.c | 53 +-
tools/testing/selftests/hid/hid_bpf.c | 100 +++-
tools/testing/selftests/hid/progs/hid.c | 100 +++-
.../testing/selftests/hid/progs/hid_bpf_helpers.h | 19 +-
32 files changed, 805 insertions(+), 1479 deletions(-)
---
base-commit: 70ec81c2e2b4005465ad0d042e90b36087c36104
change-id: 20240513-hid_bpf_struct_ops-e3212a224555
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
This series fixes build errors found by clang to allow the x86 suite to
get built with the clang.
Unfortunately, there is one bug [1] in the clang becuase of which
extended asm isn't handled correctly by it and build fails for
sysret_rip.c. Hence even after this series the build of this test would
fail with clang. Should we disable this test for now when clang is used
until the bug is fixed in clang? Not sure. Any opinions?
[1] https://github.com/llvm/llvm-project/issues/53728
Muhammad Usama Anjum (8):
selftests: x86: Remove dependence of headers file
selftests: x86: check_initial_reg_state: remove -no-pie while using
-static
selftests: x86: test_vsyscall: remove unused function
selftests: x86: fsgsbase_restore: fix asm directive from =rm to =r
selftests: x86: syscall_arg_fault_32: remove unused variable
selftests: x86: test_FISTTP: use fisttps instead of ambigous fisttp
selftests: x86: fsgsbase: Remove unused function and variable
selftests: x86: amx: Remove unused functions
tools/testing/selftests/x86/Makefile | 9 +++++----
tools/testing/selftests/x86/amx.c | 16 ----------------
tools/testing/selftests/x86/fsgsbase.c | 6 ------
tools/testing/selftests/x86/fsgsbase_restore.c | 2 +-
tools/testing/selftests/x86/syscall_arg_fault.c | 1 -
tools/testing/selftests/x86/test_FISTTP.c | 8 ++++----
tools/testing/selftests/x86/test_vsyscall.c | 5 -----
7 files changed, 10 insertions(+), 37 deletions(-)
--
2.39.2
Commit 1b151e2435fc ("block: Remove special-casing of compound
pages") caused a change in behaviour when releasing the pages
if the buffer does not start at the beginning of the page. This
was because the calculation of the number of pages to release
was incorrect.
This was fixed by commit 38b43539d64b ("block: Fix page refcounts
for unaligned buffers in __bio_release_pages()").
We pin the user buffer during direct I/O writes. If this buffer is a
hugepage, bio_release_page() will unpin it and decrement all references
and pin counts at ->bi_end_io. However, if any references to the hugepage
remain post-I/O, the hugepage will not be freed upon unmap, leading
to a memory leak.
This patch verifies that a hugepage, used as a user buffer for DIO
operations, is correctly freed upon unmapping, regardless of whether
the offsets are aligned or unaligned w.r.t page boundary.
Test Result Fail Scenario (Without the fix)
--------------------------------------------------------
[]# ./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 6
not ok 4 : Huge pages not freed!
Totals: pass:3 fail:1 xfail:0 xpass:0 skip:0 error:0
Test Result PASS Scenario (With the fix)
---------------------------------------------------------
[]#./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 4 : Huge pages freed successfully !
Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
V4:
- Added this test to run_vmtests.sh.
V3:
- Fixed the build error when it is compiled with _FORTIFY_SOURCE.
V2:
- Addressed all review commets from Muhammad Usama Anjum
https://lore.kernel.org/all/20240604132801.23377-1-donettom@linux.ibm.com/
V1:
https://lore.kernel.org/all/20240523063905.3173-1-donettom@linux.ibm.com/#t
Signed-off-by: Donet Tom <donettom(a)linux.ibm.com>
Co-developed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/hugetlb_dio.c | 118 ++++++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 1 +
3 files changed, 120 insertions(+)
create mode 100644 tools/testing/selftests/mm/hugetlb_dio.c
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 3b49bc3d0a3b..a1748a4c7df1 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -73,6 +73,7 @@ TEST_GEN_FILES += ksm_functional_tests
TEST_GEN_FILES += mdwe_test
TEST_GEN_FILES += hugetlb_fault_after_madv
TEST_GEN_FILES += hugetlb_madv_vs_map
+TEST_GEN_FILES += hugetlb_dio
ifneq ($(ARCH),arm64)
TEST_GEN_FILES += soft-dirty
diff --git a/tools/testing/selftests/mm/hugetlb_dio.c b/tools/testing/selftests/mm/hugetlb_dio.c
new file mode 100644
index 000000000000..986f3b6c7f7b
--- /dev/null
+++ b/tools/testing/selftests/mm/hugetlb_dio.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This program tests for hugepage leaks after DIO writes to a file using a
+ * hugepage as the user buffer. During DIO, the user buffer is pinned and
+ * should be properly unpinned upon completion. This patch verifies that the
+ * kernel correctly unpins the buffer at DIO completion for both aligned and
+ * unaligned user buffer offsets (w.r.t page boundary), ensuring the hugepage
+ * is freed upon unmapping.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <sys/stat.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/mman.h>
+#include "vm_util.h"
+#include "../kselftest.h"
+
+void run_dio_using_hugetlb(unsigned int start_off, unsigned int end_off)
+{
+ int fd;
+ char *buffer = NULL;
+ char *orig_buffer = NULL;
+ size_t h_pagesize = 0;
+ size_t writesize;
+ int free_hpage_b = 0;
+ int free_hpage_a = 0;
+ const int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB;
+ const int mmap_prot = PROT_READ | PROT_WRITE;
+
+ writesize = end_off - start_off;
+
+ /* Get the default huge page size */
+ h_pagesize = default_huge_page_size();
+ if (!h_pagesize)
+ ksft_exit_fail_msg("Unable to determine huge page size\n");
+
+ /* Open the file to DIO */
+ fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT, 0664);
+ if (fd < 0)
+ ksft_exit_fail_perror("Error opening file\n");
+
+ /* Get the free huge pages before allocation */
+ free_hpage_b = get_free_hugepages();
+ if (free_hpage_b == 0) {
+ close(fd);
+ ksft_exit_skip("No free hugepage, exiting!\n");
+ }
+
+ /* Allocate a hugetlb page */
+ orig_buffer = mmap(NULL, h_pagesize, mmap_prot, mmap_flags, -1, 0);
+ if (orig_buffer == MAP_FAILED) {
+ close(fd);
+ ksft_exit_fail_perror("Error mapping memory\n");
+ }
+ buffer = orig_buffer;
+ buffer += start_off;
+
+ memset(buffer, 'A', writesize);
+
+ /* Write the buffer to the file */
+ if (write(fd, buffer, writesize) != (writesize)) {
+ munmap(orig_buffer, h_pagesize);
+ close(fd);
+ ksft_exit_fail_perror("Error writing to file\n");
+ }
+
+ /* unmap the huge page */
+ munmap(orig_buffer, h_pagesize);
+ close(fd);
+
+ /* Get the free huge pages after unmap*/
+ free_hpage_a = get_free_hugepages();
+
+ /*
+ * If the no. of free hugepages before allocation and after unmap does
+ * not match - that means there could still be a page which is pinned.
+ */
+ if (free_hpage_a != free_hpage_b) {
+ ksft_print_msg("No. Free pages before allocation : %d\n", free_hpage_b);
+ ksft_print_msg("No. Free pages after munmap : %d\n", free_hpage_a);
+ ksft_test_result_fail(": Huge pages not freed!\n");
+ } else {
+ ksft_print_msg("No. Free pages before allocation : %d\n", free_hpage_b);
+ ksft_print_msg("No. Free pages after munmap : %d\n", free_hpage_a);
+ ksft_test_result_pass(": Huge pages freed successfully !\n");
+ }
+}
+
+int main(void)
+{
+ size_t pagesize = 0;
+
+ ksft_print_header();
+ ksft_set_plan(4);
+
+ /* Get base page size */
+ pagesize = psize();
+
+ /* start and end is aligned to pagesize */
+ run_dio_using_hugetlb(0, (pagesize * 3));
+
+ /* start is aligned but end is not aligned */
+ run_dio_using_hugetlb(0, (pagesize * 3) - (pagesize / 2));
+
+ /* start is unaligned and end is aligned */
+ run_dio_using_hugetlb(pagesize / 2, (pagesize * 3));
+
+ /* both start and end are unaligned */
+ run_dio_using_hugetlb(pagesize / 2, (pagesize * 3) + (pagesize / 2));
+
+ ksft_finished();
+}
+
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 3157204b9047..5698d519170d 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -265,6 +265,7 @@ CATEGORY="hugetlb" run_test ./map_hugetlb
CATEGORY="hugetlb" run_test ./hugepage-mremap
CATEGORY="hugetlb" run_test ./hugepage-vmemmap
CATEGORY="hugetlb" run_test ./hugetlb-madvise
+CATEGORY="hugetlb" run_test ./hugetlb_dio
nr_hugepages_tmp=$(cat /proc/sys/vm/nr_hugepages)
# For this test, we need one and just one huge page
--
2.43.0
in the main function of vdso_restorer.c,there is a dlopen function,
but there is no dlclose function to close the file
Signed-off-by: liujing <liujing(a)cmss.chinamobile.com>
---
tools/testing/selftests/x86/vdso_restorer.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/x86/vdso_restorer.c b/tools/testing/selftests/x86/vdso_restorer.c
index fe99f2434155..a0b1155dee31 100644
--- a/tools/testing/selftests/x86/vdso_restorer.c
+++ b/tools/testing/selftests/x86/vdso_restorer.c
@@ -57,6 +57,8 @@ int main()
return 0;
}
+ dlclose(vdso);
+
memset(&sa, 0, sizeof(sa));
sa.handler = handler_with_siginfo;
sa.flags = SA_SIGINFO;
--
2.18.2
Conform individual tests to TAP output. One patch conform one test. With
this series, all vDSO tests become TAP conformant.
Muhammad Usama Anjum (4):
kselftests: vdso: vdso_test_clock_getres: conform test to TAP output
kselftests: vdso: vdso_test_correctness: conform test to TAP output
kselftests: vdso: vdso_test_getcpu: conform test to TAP output
kselftests: vdso: vdso_test_gettimeofday: conform test to TAP output
.../selftests/vDSO/vdso_test_clock_getres.c | 68 ++++----
.../selftests/vDSO/vdso_test_correctness.c | 146 +++++++++---------
.../testing/selftests/vDSO/vdso_test_getcpu.c | 16 +-
.../selftests/vDSO/vdso_test_gettimeofday.c | 23 +--
4 files changed, 126 insertions(+), 127 deletions(-)
--
2.39.2
The kselftests may be built in a couple different ways:
make LLVM=1
make CC=clang
In order to handle both cases, set LLVM=1 if CC=clang. That way,the rest
of lib.mk, and any Makefiles that include lib.mk, can base decisions
solely on whether or not LLVM is set.
Then, build upon that to disable a pair of clang warnings that are
already silenced on gcc.
Doing it this way is much better than the piecemeal approach that I
started with in [1] and [2]. Thanks to Nathan Chancellor for the patch
reviews that led to this approach.
Changes since the first version:
1) Wrote a detailed explanation for suppressing two clang warnings, in
both a lib.mk comment, and the commit description.
2) Added a Reviewed-by tag to the first patch.
[1] https://lore.kernel.org/20240527214704.300444-1-jhubbard@nvidia.com
[2] https://lore.kernel.org/20240527213641.299458-1-jhubbard@nvidia.com
John Hubbard (2):
selftests/lib.mk: handle both LLVM=1 and CC=clang builds
selftests/lib.mk: silence some clang warnings that gcc already ignores
tools/testing/selftests/lib.mk | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
base-commit: e0cce98fe279b64f4a7d81b7f5c3a23d80b92fbc
--
2.45.1
Commit 1b151e2435fc ("block: Remove special-casing of compound
pages") caused a change in behaviour when releasing the pages
if the buffer does not start at the beginning of the page. This
was because the calculation of the number of pages to release
was incorrect.
This was fixed by commit 38b43539d64b ("block: Fix page refcounts
for unaligned buffers in __bio_release_pages()").
We pin the user buffer during direct I/O writes. If this buffer is a
hugepage, bio_release_page() will unpin it and decrement all references
and pin counts at ->bi_end_io. However, if any references to the hugepage
remain post-I/O, the hugepage will not be freed upon unmap, leading
to a memory leak.
This patch verifies that a hugepage, used as a user buffer for DIO
operations, is correctly freed upon unmapping, regardless of whether
the offsets are aligned or unaligned w.r.t page boundary.
Test Result Fail Scenario (Without the fix)
--------------------------------------------------------
[]# ./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 6
not ok 4 : Huge pages not freed!
Totals: pass:3 fail:1 xfail:0 xpass:0 skip:0 error:0
Test Result PASS Scenario (With the fix)
---------------------------------------------------------
[]#./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 4 : Huge pages freed successfully !
Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
V3:
- Fixed the build error when it is compiled with _FORTIFY_SOURCE.
V2:
- Addressed all review commets from Muhammad Usama Anjum
https://lore.kernel.org/all/20240604132801.23377-1-donettom@linux.ibm.com/
V1:
https://lore.kernel.org/all/20240523063905.3173-1-donettom@linux.ibm.com/#t
Signed-off-by: Donet Tom <donettom(a)linux.ibm.com>
Co-developed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
---
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/hugetlb_dio.c | 118 +++++++++++++++++++++++
2 files changed, 119 insertions(+)
create mode 100644 tools/testing/selftests/mm/hugetlb_dio.c
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 3b49bc3d0a3b..a1748a4c7df1 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -73,6 +73,7 @@ TEST_GEN_FILES += ksm_functional_tests
TEST_GEN_FILES += mdwe_test
TEST_GEN_FILES += hugetlb_fault_after_madv
TEST_GEN_FILES += hugetlb_madv_vs_map
+TEST_GEN_FILES += hugetlb_dio
ifneq ($(ARCH),arm64)
TEST_GEN_FILES += soft-dirty
diff --git a/tools/testing/selftests/mm/hugetlb_dio.c b/tools/testing/selftests/mm/hugetlb_dio.c
new file mode 100644
index 000000000000..986f3b6c7f7b
--- /dev/null
+++ b/tools/testing/selftests/mm/hugetlb_dio.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This program tests for hugepage leaks after DIO writes to a file using a
+ * hugepage as the user buffer. During DIO, the user buffer is pinned and
+ * should be properly unpinned upon completion. This patch verifies that the
+ * kernel correctly unpins the buffer at DIO completion for both aligned and
+ * unaligned user buffer offsets (w.r.t page boundary), ensuring the hugepage
+ * is freed upon unmapping.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <sys/stat.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/mman.h>
+#include "vm_util.h"
+#include "../kselftest.h"
+
+void run_dio_using_hugetlb(unsigned int start_off, unsigned int end_off)
+{
+ int fd;
+ char *buffer = NULL;
+ char *orig_buffer = NULL;
+ size_t h_pagesize = 0;
+ size_t writesize;
+ int free_hpage_b = 0;
+ int free_hpage_a = 0;
+ const int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB;
+ const int mmap_prot = PROT_READ | PROT_WRITE;
+
+ writesize = end_off - start_off;
+
+ /* Get the default huge page size */
+ h_pagesize = default_huge_page_size();
+ if (!h_pagesize)
+ ksft_exit_fail_msg("Unable to determine huge page size\n");
+
+ /* Open the file to DIO */
+ fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT, 0664);
+ if (fd < 0)
+ ksft_exit_fail_perror("Error opening file\n");
+
+ /* Get the free huge pages before allocation */
+ free_hpage_b = get_free_hugepages();
+ if (free_hpage_b == 0) {
+ close(fd);
+ ksft_exit_skip("No free hugepage, exiting!\n");
+ }
+
+ /* Allocate a hugetlb page */
+ orig_buffer = mmap(NULL, h_pagesize, mmap_prot, mmap_flags, -1, 0);
+ if (orig_buffer == MAP_FAILED) {
+ close(fd);
+ ksft_exit_fail_perror("Error mapping memory\n");
+ }
+ buffer = orig_buffer;
+ buffer += start_off;
+
+ memset(buffer, 'A', writesize);
+
+ /* Write the buffer to the file */
+ if (write(fd, buffer, writesize) != (writesize)) {
+ munmap(orig_buffer, h_pagesize);
+ close(fd);
+ ksft_exit_fail_perror("Error writing to file\n");
+ }
+
+ /* unmap the huge page */
+ munmap(orig_buffer, h_pagesize);
+ close(fd);
+
+ /* Get the free huge pages after unmap*/
+ free_hpage_a = get_free_hugepages();
+
+ /*
+ * If the no. of free hugepages before allocation and after unmap does
+ * not match - that means there could still be a page which is pinned.
+ */
+ if (free_hpage_a != free_hpage_b) {
+ ksft_print_msg("No. Free pages before allocation : %d\n", free_hpage_b);
+ ksft_print_msg("No. Free pages after munmap : %d\n", free_hpage_a);
+ ksft_test_result_fail(": Huge pages not freed!\n");
+ } else {
+ ksft_print_msg("No. Free pages before allocation : %d\n", free_hpage_b);
+ ksft_print_msg("No. Free pages after munmap : %d\n", free_hpage_a);
+ ksft_test_result_pass(": Huge pages freed successfully !\n");
+ }
+}
+
+int main(void)
+{
+ size_t pagesize = 0;
+
+ ksft_print_header();
+ ksft_set_plan(4);
+
+ /* Get base page size */
+ pagesize = psize();
+
+ /* start and end is aligned to pagesize */
+ run_dio_using_hugetlb(0, (pagesize * 3));
+
+ /* start is aligned but end is not aligned */
+ run_dio_using_hugetlb(0, (pagesize * 3) - (pagesize / 2));
+
+ /* start is unaligned and end is aligned */
+ run_dio_using_hugetlb(pagesize / 2, (pagesize * 3));
+
+ /* both start and end are unaligned */
+ run_dio_using_hugetlb(pagesize / 2, (pagesize * 3) + (pagesize / 2));
+
+ ksft_finished();
+}
+
--
2.43.0
From: Jeff Xu <jeffxu(a)google.com>
By default, memfd_create() creates a non-sealable MFD, unless the
MFD_ALLOW_SEALING flag is set.
When the MFD_NOEXEC_SEAL flag is initially introduced, the MFD created
with that flag is sealable, even though MFD_ALLOW_SEALING is not set.
This patch changes MFD_NOEXEC_SEAL to be non-sealable by default,
unless MFD_ALLOW_SEALING is explicitly set.
This is a non-backward compatible change. However, as MFD_NOEXEC_SEAL
is new, we expect not many applications will rely on the nature of
MFD_NOEXEC_SEAL being sealable. In most cases, the application already
sets MFD_ALLOW_SEALING if they need a sealable MFD.
Additionally, this enhances the useability of pid namespace sysctl
vm.memfd_noexec. When vm.memfd_noexec equals 1 or 2, the kernel will
add MFD_NOEXEC_SEAL if mfd_create does not specify MFD_EXEC or
MFD_NOEXEC_SEAL, and the addition of MFD_NOEXEC_SEAL enables the MFD
to be sealable. This means, any application that does not desire this
behavior will be unable to utilize vm.memfd_noexec = 1 or 2 to
migrate/enforce non-executable MFD. This adjustment ensures that
applications can anticipate that the sealable characteristic will
remain unmodified by vm.memfd_noexec.
This patch was initially developed by Barnabás Pőcze, and Barnabás
used Debian Code Search and GitHub to try to find potential breakages
and could only find a single one. Dbus-broker's memfd_create() wrapper
is aware of this implicit `MFD_ALLOW_SEALING` behavior, and tries to
work around it [1]. This workaround will break. Luckily, this only
affects the test suite, it does not affect
the normal operations of dbus-broker. There is a PR with a fix[2]. In
addition, David Rheinsberg also raised similar fix in [3]
[1]: https://github.com/bus1/dbus-broker/blob/9eb0b7e5826fc76cad7b025bc46f267d4a…
[2]: https://github.com/bus1/dbus-broker/pull/366
[3]: https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
History
======
V2:
update commit message.
add testcase for vm.memfd_noexec
add documentation.
V1:
https://lore.kernel.org/lkml/20240513191544.94754-1-pobrn@protonmail.com/
Jeff Xu (2):
memfd: fix MFD_NOEXEC_SEAL to be non-sealable by default
memfd:add MEMFD_NOEXEC_SEAL documentation
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/mfd_noexec.rst | 90 ++++++++++++++++++++++
mm/memfd.c | 9 +--
tools/testing/selftests/memfd/memfd_test.c | 26 ++++++-
4 files changed, 120 insertions(+), 6 deletions(-)
create mode 100644 Documentation/userspace-api/mfd_noexec.rst
--
2.45.1.288.g0e0cd299f1-goog
`MFD_NOEXEC_SEAL` should remove the executable bits and set
`F_SEAL_EXEC` to prevent further modifications to the executable
bits as per the comment in the uapi header file:
not executable and sealed to prevent changing to executable
However, currently, it also unsets `F_SEAL_SEAL`, essentially
acting as a superset of `MFD_ALLOW_SEALING`. Nothing implies
that it should be so, and indeed up until the second version
of the of the patchset[0] that introduced `MFD_EXEC` and
`MFD_NOEXEC_SEAL`, `F_SEAL_SEAL` was not removed, however it
was changed in the third revision of the patchset[1] without
a clear explanation.
This behaviour is suprising for application developers,
there is no documentation that would reveal that `MFD_NOEXEC_SEAL`
has the additional effect of `MFD_ALLOW_SEALING`.
So do not remove `F_SEAL_SEAL` when `MFD_NOEXEC_SEAL` is requested.
This is technically an ABI break, but it seems very unlikely that an
application would depend on this behaviour (unless by accident).
[0]: https://lore.kernel.org/lkml/20220805222126.142525-3-jeffxu@google.com/
[1]: https://lore.kernel.org/lkml/20221202013404.163143-3-jeffxu@google.com/
Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
Signed-off-by: Barnabás Pőcze <pobrn(a)protonmail.com>
---
Or did I miss the explanation as to why MFD_NOEXEC_SEAL should
imply MFD_ALLOW_SEALING? If so, please direct me to it and
sorry for the noise.
---
mm/memfd.c | 9 ++++-----
tools/testing/selftests/memfd/memfd_test.c | 2 +-
2 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/mm/memfd.c b/mm/memfd.c
index 7d8d3ab3fa37..8b7f6afee21d 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -356,12 +356,11 @@ SYSCALL_DEFINE2(memfd_create,
inode->i_mode &= ~0111;
file_seals = memfd_file_seals_ptr(file);
- if (file_seals) {
- *file_seals &= ~F_SEAL_SEAL;
+ if (file_seals)
*file_seals |= F_SEAL_EXEC;
- }
- } else if (flags & MFD_ALLOW_SEALING) {
- /* MFD_EXEC and MFD_ALLOW_SEALING are set */
+ }
+
+ if (flags & MFD_ALLOW_SEALING) {
file_seals = memfd_file_seals_ptr(file);
if (file_seals)
*file_seals &= ~F_SEAL_SEAL;
diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c
index 18f585684e20..b6a7ad68c3c1 100644
--- a/tools/testing/selftests/memfd/memfd_test.c
+++ b/tools/testing/selftests/memfd/memfd_test.c
@@ -1151,7 +1151,7 @@ static void test_noexec_seal(void)
mfd_def_size,
MFD_CLOEXEC | MFD_NOEXEC_SEAL);
mfd_assert_mode(fd, 0666);
- mfd_assert_has_seals(fd, F_SEAL_EXEC);
+ mfd_assert_has_seals(fd, F_SEAL_SEAL | F_SEAL_EXEC);
mfd_fail_chmod(fd, 0777);
close(fd);
}
--
2.45.0
In order to be able to save the current value of a sysctl without changing
it, split the relevant bit out of sysctl_set() into a new helper.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Ido Schimmel <idosch(a)nvidia.com>
---
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: linux-kselftest(a)vger.kernel.org
Notes:
v2:
- New patch.
tools/testing/selftests/net/forwarding/lib.sh | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index eabbdf00d8ca..9086d2015296 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1134,12 +1134,19 @@ bridge_ageing_time_get()
}
declare -A SYSCTL_ORIG
+sysctl_save()
+{
+ local key=$1; shift
+
+ SYSCTL_ORIG[$key]=$(sysctl -n $key)
+}
+
sysctl_set()
{
local key=$1; shift
local value=$1; shift
- SYSCTL_ORIG[$key]=$(sysctl -n $key)
+ sysctl_save "$key"
sysctl -qw $key="$value"
}
--
2.45.0
This patch series is motivated by the following observation:
Raise a signal, jump to signal handler. The ucontext_t structure dumped
by kernel to userspace has a uc_sigmask field having the mask of blocked
signals. If you run a fresh minimalistic program doing this, this field
is empty, even if you block some signals while registering the handler
with sigaction().
Here is what the man-pages have to say:
sigaction(2): "sa_mask specifies a mask of signals which should be blocked
(i.e., added to the signal mask of the thread in which the signal handler
is invoked) during execution of the signal handler. In addition, the
signal which triggered the handler will be blocked, unless the SA_NODEFER
flag is used."
signal(7): Under "Execution of signal handlers", (1.3) implies:
"The thread's current signal mask is accessible via the ucontext_t
object that is pointed to by the third argument of the signal handler."
But, (1.4) states:
"Any signals specified in act->sa_mask when registering the handler with
sigprocmask(2) are added to the thread's signal mask. The signal being
delivered is also added to the signal mask, unless SA_NODEFER was
specified when registering the handler. These signals are thus blocked
while the handler executes."
There clearly is no distinction being made in the man pages between
"Thread's signal mask" and ucontext_t; this logically should imply
that a signal blocked by populating struct sigaction should be visible
in ucontext_t.
Here is what the kernel code does (for Aarch64):
do_signal() -> handle_signal() -> sigmask_to_save(), which returns
¤t->blocked, is passed to setup_rt_frame() -> setup_sigframe() ->
__copy_to_user(). Hence, ¤t->blocked is copied to ucontext_t
exposed to userspace. Returning back to handle_signal(),
signal_setup_done() -> signal_delivered() -> sigorsets() and
set_current_blocked() are responsible for using information from
struct ksignal ksig, which was populated through the sigaction()
system call in kernel/signal.c:
copy_from_user(&new_sa.sa, act, sizeof(new_sa.sa)),
to update ¤t->blocked; hence, the set of blocked signals for the
current thread is updated AFTER the kernel dumps ucontext_t to
userspace.
Assuming that the above is indeed the intended behaviour, because it
semantically makes sense, since the signals blocked using sigaction()
remain blocked only till the execution of the handler, and not in the
context present before jumping to the handler (but nothing can be
confirmed from the man-pages), the series introduces a test for
mangling with uc_sigmask. I will send a separate series to fix the
man-pages.
The proposed selftest has been tested out on Aarch32, Aarch64 and x86_64.
Dev Jain (2):
selftests: Rename sigaltstack to generic signal
selftests: Add a test mangling with uc_sigmask
tools/testing/selftests/Makefile | 2 +-
.../{sigaltstack => signal}/.gitignore | 3 +-
.../{sigaltstack => signal}/Makefile | 3 +-
.../current_stack_pointer.h | 0
.../selftests/signal/mangle_uc_sigmask.c | 141 ++++++++++++++++++
.../sas.c => signal/sigaltstack.c} | 0
6 files changed, 146 insertions(+), 3 deletions(-)
rename tools/testing/selftests/{sigaltstack => signal}/.gitignore (57%)
rename tools/testing/selftests/{sigaltstack => signal}/Makefile (53%)
rename tools/testing/selftests/{sigaltstack => signal}/current_stack_pointer.h (100%)
create mode 100644 tools/testing/selftests/signal/mangle_uc_sigmask.c
rename tools/testing/selftests/{sigaltstack/sas.c => signal/sigaltstack.c} (100%)
--
2.34.1
Hello,
We're pleased to announce the return of the Kernel Testing &
Dependability Micro-Conference at Linux Plumbers 2024:
https://lpc.events/event/18/contributions/1665/
You can already submit proposals by selecting the micro-conf in
the Track drop-down list:
https://lpc.events/login/?next=/event/18/abstracts/%23submit-abstract
Please note that the deadline for submissions is *Sunday 16th June*
The event description contains a list of suggested topics
inherited from past editions. Is there anything in particular
you would like to see discussed this year?
Knowing people's interests helps with triaging proposals and
making the micro-conf as relevant as possible. See you there!
Thanks,
Guillaume & Shuah & Sasha
When compiling with Android bionic, the MAP_HUGE_* and SHM_HUGE_* macros
are redefined because they are included from the uapi by sys/mman.h and
sys/shm.h:
INFO: From Compiling common/tools/testing/selftests/mm/thuge-gen.c:
common/tools/testing/selftests/mm/thuge-gen.c:32:9: warning: 'MAP_HUGE_2MB' macro redefined [-Wmacro-redefined]
32 | #define MAP_HUGE_2MB (21 << MAP_HUGE_SHIFT)
| ^
external/_main~_repo_rules~prebuilt_ndk/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/linux/mman.h:38:9: note: previous definition is here
38 | #define MAP_HUGE_2MB HUGETLB_FLAG_ENCODE_2MB
| ^
common/tools/testing/selftests/mm/thuge-gen.c:33:9: warning: 'MAP_HUGE_1GB' macro redefined [-Wmacro-redefined]
33 | #define MAP_HUGE_1GB (30 << MAP_HUGE_SHIFT)
| ^
external/_main~_repo_rules~prebuilt_ndk/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/linux/mman.h:44:9: note: previous definition is here
This test should probably use the uapi definitions instead of redefining
them. However, glibc gets struct redefinitions when including sys/shm.h
and linux/shm.h together. So, add guards for the SHM_HUGE_* macros
instead.
Edward Liaw (2):
selftests/mm: Include linux/mman.h
selftests/mm: Guard defines from shm
tools/testing/selftests/mm/thuge-gen.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
--
2.45.1.467.gbab1589fc0-goog
From: Geliang Tang <tanggeliang(a)kylinos.cn>
For moving dctcp test dedicated code out of do_test() into test_dctcp().
This patchset adds a new helper start_test() in bpf_tcp_ca.c to refactor
do_test().
Address Martin's comments for the previous series.
Geliang Tang (5):
selftests/bpf: Use connect_to_fd_opts in do_test in bpf_tcp_ca
selftests/bpf: Add start_test helper in bpf_tcp_ca
selftests/bpf: Use start_test in test_dctcp_fallback in bpf_tcp_ca
selftests/bpf: Use start_test in test_dctcp in bpf_tcp_ca
selftests/bpf: Drop useless arguments of do_test in bpf_tcp_ca
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 140 +++++++++++-------
1 file changed, 85 insertions(+), 55 deletions(-)
--
2.43.0
From: Pankaj Raghav <p.raghav(a)samsung.com>
create_pagecache_thp_and_fd() in split_huge_page_test.c used the
variable dummy to perform mmap read.
However, this test was skipped even on XFS which has large folio
support. The issue was compiler (gcc 13.2.0) was optimizing out the
dummy variable, therefore, not creating huge page in the page cache.
Add volatile keyword to force compiler not to optimize out the loop
where we read from the mmaped addr.
Signed-off-by: Pankaj Raghav <p.raghav(a)samsung.com>
---
tools/testing/selftests/mm/split_huge_page_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index d3c7f5fb3e7b..c573a58f80ab 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -300,7 +300,7 @@ int create_pagecache_thp_and_fd(const char *testfile, size_t fd_size, int *fd,
char **addr)
{
size_t i;
- int __attribute__((unused)) dummy = 0;
+ volatile int __attribute__((unused)) dummy = 0;
srand(time(NULL));
base-commit: d97496ca23a2d4ee80b7302849404859d9058bcd
--
2.44.1
From: Pankaj Raghav <p.raghav(a)samsung.com>
create_pagecache_thp_and_fd() in split_huge_page_test.c used the
variable dummy to perform mmap read.
However, this test was skipped even on XFS which has large folio
support. The issue was compiler (gcc 13.2.0) was optimizing out the
dummy variable, therefore, not creating huge page in the page cache.
Make it as a global variable to force the compiler not to optimize out
the loop where we read from the mmaped addr.
Signed-off-by: Pankaj Raghav <p.raghav(a)samsung.com>
---
Changes since v1:
- Make the dummy variable as a global variable(willy).
tools/testing/selftests/mm/split_huge_page_test.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index d3c7f5fb3e7b..c4857de2c042 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -23,6 +23,11 @@
uint64_t pagesize;
unsigned int pageshift;
uint64_t pmd_pagesize;
+/*
+ * Used by create_pagecache_thp_and_fd() to do mmap read.
+ * Made it as global to avoid compiler optimizing out the variable.
+ */
+int dummy;
#define SPLIT_DEBUGFS "/sys/kernel/debug/split_huge_pages"
#define SMAP_PATH "/proc/self/smaps"
@@ -300,7 +305,6 @@ int create_pagecache_thp_and_fd(const char *testfile, size_t fd_size, int *fd,
char **addr)
{
size_t i;
- int __attribute__((unused)) dummy = 0;
srand(time(NULL));
base-commit: d97496ca23a2d4ee80b7302849404859d9058bcd
--
2.44.1
While looking at using 'lib.sh' for the MPTCP selftests [1], we found
some small issues with 'lib.sh'. Here they are:
- Patch 1: fix 'errexit' (set -e) support with busywait. 'errexit' is
supported in some functions, not all. A fix for v6.8+.
- Patch 2: avoid confusing error messages linked to the cleaning part
when the netns setup fails. A fix for v6.8+.
- Patch 3: set a variable as local to avoid accidentally changing the
value of a another one with the same name on the caller side. A fix
for v6.10-rc1+.
Link: https://lore.kernel.org/mptcp/5f4615c3-0621-43c5-ad25-55747a4350ce@kernel.o… [1]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Matthieu Baerts (NGI0) (3):
selftests: net: lib: support errexit with busywait
selftests: net: lib: avoid error removing empty netns name
selftests: net: lib: set 'i' as local
tools/testing/selftests/net/lib.sh | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
---
base-commit: a535d59432370343058755100ee75ab03c0e3f91
change-id: 20240605-upstream-net-20240605-selftests-net-lib-fixes-7a90a1a8d9d2
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Commit 1b151e2435fc ("block: Remove special-casing of compound
pages") caused a change in behaviour when releasing the pages
if the buffer does not start at the beginning of the page. This
was because the calculation of the number of pages to release
was incorrect.
This was fixed by commit 38b43539d64b ("block: Fix page refcounts
for unaligned buffers in __bio_release_pages()").
We pin the user buffer during direct I/O writes. If this buffer is a
hugepage, bio_release_page() will unpin it and decrement all references
and pin counts at ->bi_end_io. However, if any references to the hugepage
remain post-I/O, the hugepage will not be freed upon unmap, leading
to a memory leak.
This patch verifies that a hugepage, used as a user buffer for DIO
operations, is correctly freed upon unmapping, regardless of whether
the offsets are aligned or unaligned w.r.t page boundary.
Test Result Fail Scenario (Without the fix)
--------------------------------------------------------
[]# ./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 6
not ok 4 : Huge pages not freed!
Totals: pass:3 fail:1 xfail:0 xpass:0 skip:0 error:0
Test Result PASS Scenario (With the fix)
---------------------------------------------------------
[]#./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 4 : Huge pages freed successfully !
Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
V2:
- Addressed all review commets from Muhammad Usama Anjum
V1:
https://lore.kernel.org/all/20240523063905.3173-1-donettom@linux.ibm.com/#t
Signed-off-by: Donet Tom <donettom(a)linux.ibm.com>
Co-developed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
---
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/hugetlb_dio.c | 118 +++++++++++++++++++++++
2 files changed, 119 insertions(+)
create mode 100644 tools/testing/selftests/mm/hugetlb_dio.c
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 3b49bc3d0a3b..a1748a4c7df1 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -73,6 +73,7 @@ TEST_GEN_FILES += ksm_functional_tests
TEST_GEN_FILES += mdwe_test
TEST_GEN_FILES += hugetlb_fault_after_madv
TEST_GEN_FILES += hugetlb_madv_vs_map
+TEST_GEN_FILES += hugetlb_dio
ifneq ($(ARCH),arm64)
TEST_GEN_FILES += soft-dirty
diff --git a/tools/testing/selftests/mm/hugetlb_dio.c b/tools/testing/selftests/mm/hugetlb_dio.c
new file mode 100644
index 000000000000..e4f4924179c8
--- /dev/null
+++ b/tools/testing/selftests/mm/hugetlb_dio.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This program tests for hugepage leaks after DIO writes to a file using a
+ * hugepage as the user buffer. During DIO, the user buffer is pinned and
+ * should be properly unpinned upon completion. This patch verifies that the
+ * kernel correctly unpins the buffer at DIO completion for both aligned and
+ * unaligned user buffer offsets (w.r.t page boundary), ensuring the hugepage
+ * is freed upon unmapping.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <sys/stat.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/mman.h>
+#include "vm_util.h"
+#include "../kselftest.h"
+
+void run_dio_using_hugetlb(unsigned int start_off, unsigned int end_off)
+{
+ int fd;
+ char *buffer = NULL;
+ char *orig_buffer = NULL;
+ size_t h_pagesize = 0;
+ size_t writesize;
+ int free_hpage_b = 0;
+ int free_hpage_a = 0;
+ const int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB;
+ const int mmap_prot = PROT_READ | PROT_WRITE;
+
+ writesize = end_off - start_off;
+
+ /* Get the default huge page size */
+ h_pagesize = default_huge_page_size();
+ if (!h_pagesize)
+ ksft_exit_fail_msg("Unable to determine huge page size\n");
+
+ /* Open the file to DIO */
+ fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT);
+ if (fd < 0)
+ ksft_exit_fail_perror("Error opening file\n");
+
+ /* Get the free huge pages before allocation */
+ free_hpage_b = get_free_hugepages();
+ if (free_hpage_b == 0) {
+ close(fd);
+ ksft_exit_skip("No free hugepage, exiting!\n");
+ }
+
+ /* Allocate a hugetlb page */
+ orig_buffer = mmap(NULL, h_pagesize, mmap_prot, mmap_flags, -1, 0);
+ if (orig_buffer == MAP_FAILED) {
+ close(fd);
+ ksft_exit_fail_perror("Error mapping memory\n");
+ }
+ buffer = orig_buffer;
+ buffer += start_off;
+
+ memset(buffer, 'A', writesize);
+
+ /* Write the buffer to the file */
+ if (write(fd, buffer, writesize) != (writesize)) {
+ munmap(orig_buffer, h_pagesize);
+ close(fd);
+ ksft_exit_fail_perror("Error writing to file\n");
+ }
+
+ /* unmap the huge page */
+ munmap(orig_buffer, h_pagesize);
+ close(fd);
+
+ /* Get the free huge pages after unmap*/
+ free_hpage_a = get_free_hugepages();
+
+ /*
+ * If the no. of free hugepages before allocation and after unmap does
+ * not match - that means there could still be a page which is pinned.
+ */
+ if (free_hpage_a != free_hpage_b) {
+ ksft_print_msg("No. Free pages before allocation : %d\n", free_hpage_b);
+ ksft_print_msg("No. Free pages after munmap : %d\n", free_hpage_a);
+ ksft_test_result_fail(": Huge pages not freed!\n");
+ } else {
+ ksft_print_msg("No. Free pages before allocation : %d\n", free_hpage_b);
+ ksft_print_msg("No. Free pages after munmap : %d\n", free_hpage_a);
+ ksft_test_result_pass(": Huge pages freed successfully !\n");
+ }
+}
+
+int main(void)
+{
+ size_t pagesize = 0;
+
+ ksft_print_header();
+ ksft_set_plan(4);
+
+ /* Get base page size */
+ pagesize = psize();
+
+ /* start and end is aligned to pagesize */
+ run_dio_using_hugetlb(0, (pagesize * 3));
+
+ /* start is aligned but end is not aligned */
+ run_dio_using_hugetlb(0, (pagesize * 3) - (pagesize / 2));
+
+ /* start is unaligned and end is aligned */
+ run_dio_using_hugetlb(pagesize / 2, (pagesize * 3));
+
+ /* both start and end are unaligned */
+ run_dio_using_hugetlb(pagesize / 2, (pagesize * 3) + (pagesize / 2));
+
+ ksft_finished();
+}
+
--
2.43.0
Fixed MAC addresses help with debugging as last four bytes identify the
network namespace.
Signed-off-by: Lukasz Majewski <lukma(a)denx.de>
---
tools/testing/selftests/net/hsr/hsr_ping.sh | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/tools/testing/selftests/net/hsr/hsr_ping.sh b/tools/testing/selftests/net/hsr/hsr_ping.sh
index 3684b813b0f6..f5d207fc770a 100755
--- a/tools/testing/selftests/net/hsr/hsr_ping.sh
+++ b/tools/testing/selftests/net/hsr/hsr_ping.sh
@@ -152,6 +152,15 @@ setup_hsr_interfaces()
ip -net "$ns3" addr add 100.64.0.3/24 dev hsr3
ip -net "$ns3" addr add dead:beef:1::3/64 dev hsr3 nodad
+ ip -net "$ns1" link set address 00:11:22:00:01:01 dev ns1eth1
+ ip -net "$ns1" link set address 00:11:22:00:01:02 dev ns1eth2
+
+ ip -net "$ns2" link set address 00:11:22:00:02:01 dev ns2eth1
+ ip -net "$ns2" link set address 00:11:22:00:02:02 dev ns2eth2
+
+ ip -net "$ns3" link set address 00:11:22:00:03:01 dev ns3eth1
+ ip -net "$ns3" link set address 00:11:22:00:03:02 dev ns3eth2
+
# All Links up
ip -net "$ns1" link set ns1eth1 up
ip -net "$ns1" link set ns1eth2 up
--
2.20.1
Hi all,
This series does a number of cleanups into resctrl_val() and
generalizes it by removing test name specific handling from the
function.
Hopefully these reach also Shuah successfully as I've recently seen
rejects for mail from @linux.intel.com to gmail addresses.
v5:
- Open mem bw file only once and use rewind().
- Add \n to mem bw file read to allow reading fresh values from the file.
- Return 0 if create_grp() is given NULL grp_name (matches the original
behavior). Mention this in function's kerneldoc.
- Cast pid_t to int before printing with %d.
- Caps/typo fixes to kerneldoc and commit messages.
- Use imperative tone in commit messages and improve them based on points
that came up during review.
v4:
- Merged close fix into IMC READ+WRITE rework patch
- Add loop to reset imc_counters_config fds to -1 to be able know which
need closing
- Introduce perf_close_imc_mem_bw() to close fds
- Open resctrl mem bw file (twice) beforehand to avoid opening it during
the test
- Remove MBM .mongrp setup
- Remove mongrp from CMT test
v3:
- Rename init functions to <testname>_init()
- Replace for loops with READ+WRITE statements for clarity
- Don't drop Return: entry from perf_open_imc_mem_bw() func comment
- New patch: Fix closing of IMC fds in case of error
- New patch: Make "bandwidth" consistent in comments & prints
- New patch: Simplify mem bandwidth file code
- Remove wrong comment
- Changed grp_name check to return -1 on fail (internal sanity check)
v2:
- Resolved conflicts with kselftest/next
- Spaces -> tabs correction
Ilpo Järvinen (16):
selftests/resctrl: Fix closing IMC fds on error and open-code R+W
instead of loops
selftests/resctrl: Calculate resctrl FS derived mem bw over sleep(1)
only
selftests/resctrl: Make "bandwidth" consistent in comments & prints
selftests/resctrl: Consolidate get_domain_id() into resctrl_val()
selftests/resctrl: Use correct type for pids
selftests/resctrl: Cleanup bm_pid and ppid usage & limit scope
selftests/resctrl: Rename measure_vals() to measure_mem_bw_vals() &
document
selftests/resctrl: Simplify mem bandwidth file code for MBA & MBM
tests
selftests/resctrl: Add ->measure() callback to resctrl_val_param
selftests/resctrl: Add ->init() callback into resctrl_val_param
selftests/resctrl: Simplify bandwidth report type handling
selftests/resctrl: Make some strings passed to resctrlfs functions
const
selftests/resctrl: Convert ctrlgrp & mongrp to pointers
selftests/resctrl: Remove mongrp from MBA test
selftests/resctrl: Remove mongrp from CMT test
selftests/resctrl: Remove test name comparing from
write_bm_pid_to_resctrl()
tools/testing/selftests/resctrl/cache.c | 10 +-
tools/testing/selftests/resctrl/cat_test.c | 5 +-
tools/testing/selftests/resctrl/cmt_test.c | 22 +-
tools/testing/selftests/resctrl/mba_test.c | 26 +-
tools/testing/selftests/resctrl/mbm_test.c | 26 +-
tools/testing/selftests/resctrl/resctrl.h | 49 ++-
tools/testing/selftests/resctrl/resctrl_val.c | 364 ++++++++----------
tools/testing/selftests/resctrl/resctrlfs.c | 67 ++--
8 files changed, 290 insertions(+), 279 deletions(-)
--
2.39.2
Hi,
I'm trying to build arm64 selftests on next-20240529. I'm getting build
failures. Complete logs are attached while some snippets are as following:
gcc pac.c /pauth/pac_corruptor.o /pauth/helper.o -o /pauth/pac -Wall -O2 -g
-I/linux_mainline/tools/testing/selftests/ -I/linux_mainline/tools/include
-mbranch-protection=pac-ret -march=armv8.2-a
In file included from pac.c:13:
../../kselftest_harness.h: In function ‘clone3_vfork’:
../../kselftest_harness.h:88:9: error: variable ‘args’ has initializer but
incomplete type
88 | struct clone_args args = {
CC check_prctl
check_prctl.c: In function ‘set_tagged_addr_ctrl’:
check_prctl.c:19:14: error: ‘PR_SET_TAGGED_ADDR_CTRL’ undeclared (first use
in this function)
19 | ret = prctl(PR_SET_TAGGED_ADDR_CTRL, val, 0, 0, 0);
gcc -mbranch-protection=standard -DBTI=1 -ffreestanding -Wall -Wextra -Wall
-O2 -g -I/linux_mainline/tools/testing/selftests/
-I/linux_mainline/tools/include -c -o /bti/test-bti.o test.c
test.c: In function ‘handler’:
test.c:85:50: error: ‘PSR_BTYPE_MASK’ undeclared (first use in this
function); did you mean ‘PSR_MODE_MASK’?
85 | write(1, &"00011011"[((uc->uc_mcontext.pstate & PSR_BTYPE_MASK)
I've GCC 8 installed. I'm not expecting the errors because of a little
older compiler. Any more ideas about the failures?
--
BR,
Muhammad Usama Anjum
The series composes of two parts. The first part Specifically,
patch 1 adds a comment at a callsite of riscv_setup_vsize to clarify how
vlenb is observed by the system. Patch 2 fixes the issue by failing the
boot process of a secondary core if vlenb mismatches.
Here is the organization of the series:
- Patch 1, 2 provide a fix for mismatching vlen problem [1]. The
solution is to fail secondary cores if their vlenb is not the same as
the boot core.
- Patch 3 is a cleanup for introducing ZVE* Vector subextensions. It
gives the obsolete ISA parser the ability to expand ISA extensions for
sigle letter extensions.
- Patch 4, 5, 6 introduce Zve32x, Zve32f, Zve64x, Zve64f, Zve64d for isa
parsing and hwprobe, and document about it.
- Patch 7 makes has_vector() check against ZVE32X instead of V, so most
userspace Vector supports will be available for bare ZVE32X.
- Patch 8 updates the prctl test so that it runs on ZVE32X.
The series is tested on a QEMU and verified that booting, Vector
programs context-switch, signal, ptrace, prctl interfaces works when we
only report partial V from the ISA.
Note that the signal test was performed after applying the commit
c27fa53b858b ("riscv: Fix vector state restore in rt_sigreturn()")
This patch should be able to apply on risc-v for-next branch on top of
the commit 0a16a1728790 ("riscv: select ARCH_HAS_FAST_MULTIPLIER")
[1]: https://lore.kernel.org/all/20240228-vicinity-cornstalk-4b8eb5fe5730@spud/T…
Changes in v5:
- Rebase on top of for-next
- Update comments (1, 7)
- Reorder the documentation patch to the front of patches that it
documents about. (5->4)
- Include ZVE64D to the list, which single letter V implies (6)
- Remove ZVE32F_IMPLY_LIST (5)
- Change the semantic of has_vector() thus rewrite patch 7
- Remove the patch that fixes integer promotion as it is merged else
place (8)
- Link to v4: https://lore.kernel.org/r/20240412-zve-detection-v4-0-e0c45bb6b253@sifive.c…
Changes in v4:
- Add a patch to trigger prctl test on ZVE32X (9)
- Add a patch to fix integer promotion bug in hwprobe (8)
- Fix a build fail on !CONFIG_RISCV_ISA_V (7)
- Add more comment in the assembly code change (2)
- Link to v3: https://lore.kernel.org/r/20240318-zve-detection-v3-0-e12d42107fa8@sifive.c…
Changelog v3:
- Include correct maintainers and mailing list into CC.
- Cleanup isa string parser code (3)
- Adjust extensions order and name (4, 5)
- Refine commit message (6)
Changelog v2:
- Update comments and commit messages (1, 2, 7)
- Refine isa_exts[] lists for zve extensions (4)
- Add a patch for dt-binding (5)
- Make ZVE* extensions depend on has_vector(ZVE32X) (6, 7)
---
---
Andy Chiu (8):
riscv: vector: add a comment when calling riscv_setup_vsize()
riscv: smp: fail booting up smp if inconsistent vlen is detected
riscv: cpufeature: call match_isa_ext() for single-letter extensions
dt-bindings: riscv: add Zve32[xf] Zve64[xfd] ISA extension description
riscv: cpufeature: add zve32[xf] and zve64[xfd] isa detection
riscv: hwprobe: add zve Vector subextensions into hwprobe interface
riscv: vector: adjust minimum Vector requirement to ZVE32X
selftest: run vector prctl test for ZVE32X
Documentation/arch/riscv/hwprobe.rst | 15 ++++++
.../devicetree/bindings/riscv/extensions.yaml | 30 +++++++++++
arch/riscv/include/asm/hwcap.h | 5 ++
arch/riscv/include/asm/vector.h | 10 ++--
arch/riscv/include/uapi/asm/hwprobe.h | 5 ++
arch/riscv/kernel/cpufeature.c | 60 +++++++++++++++++++---
arch/riscv/kernel/head.S | 19 ++++---
arch/riscv/kernel/smpboot.c | 14 +++--
arch/riscv/kernel/sys_hwprobe.c | 11 +++-
arch/riscv/kernel/vector.c | 5 +-
arch/riscv/lib/uaccess.S | 2 +-
.../testing/selftests/riscv/vector/vstate_prctl.c | 6 +--
12 files changed, 151 insertions(+), 31 deletions(-)
---
base-commit: 0a16a172879012c42f55ae8c2883e17c1e4e388f
change-id: 20240318-zve-detection-50106d2da527
Best regards,
--
Andy Chiu <andy.chiu(a)sifive.com>
hsr_redbox.sh test need to create bridge for testing. Add the missing
config CONFIG_BRIDGE in config file.
Fixes: eafbf0574e05 ("test: hsr: Extend the hsr_redbox.sh to have more SAN devices connected")
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
tools/testing/selftests/net/hsr/config | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/net/hsr/config b/tools/testing/selftests/net/hsr/config
index 22061204fb69..241542441c51 100644
--- a/tools/testing/selftests/net/hsr/config
+++ b/tools/testing/selftests/net/hsr/config
@@ -2,3 +2,4 @@ CONFIG_IPV6=y
CONFIG_NET_SCH_NETEM=m
CONFIG_HSR=y
CONFIG_VETH=y
+CONFIG_BRIDGE=y
--
2.43.0
This patchset makes it possible for MGLRU to consult secondary MMUs
while doing aging, not just during eviction. This allows for more
accurate reclaim decisions, which is especially important for proactive
reclaim.
This series makes the necessary MMU notifier changes to MGLRU and then
includes optimizations on top of that. This series also now includes
changes to access_tracking_perf_test to verify that aging works properly
for pages that are mainly used by KVM.
access_tracking_perf_test also has a mode (-p) to check performance of
MGLRU aging while the VM is faulting memory in. Here are some results:
Testing MGLRU aging while vCPUs are faulting in memory on x86 with the
TDP MMU. THPs disabled.
The test results varied a decent amount from run to run. I did my best to
take representative averages, but nonetheless, the big picture is the
important part.
Main takeaways:
- With the optimizations, the workload is much less impacted
by the presence of aging.
- With the optimizations, MGLRU is able to do aging much more
quickly, especially at 8+ vCPUs on my machine.
./access_tracking_perf_test -p -l -b 1G -v $N_VCPUS # 1G per vCPU
num_vcpus vcpu wall time aging avg pass time
1 (no aging) 0.878822016 n/a
1 (no opt) 0.938250568 0.008236007
1 (opt) 0.912270190 0.007314582
2 (no aging) 0.984959659 n/a
2 (no opt) 1.057880728 0.017989741
2 (opt) 1.037735641 0.013996319
4 (no aging) 1.264881581 n/a
4 (no opt) 1.318849182 0.056164918
4 (opt) 1.314653465 0.029311993
8 (no aging) 1.473883699 n/a
8 (no opt) 1.589441079 0.227419586s
8 (opt) 1.498439592 0.063857740s
16 (no aging) 2.048766096 n/a
16 (no opt) 2.399335597 1.247142841s
16 (opt) 2.000914001 0.121628689s
32 (no aging) 3.316256321 n/a
32 (no opt) 3.955417018 4.347290433
32 (opt) 3.355274507 0.250886289
64 (no aging) 6.498958516 n/a
64 (no opt) 7.127533884 9.815592054
64 (opt) 6.442582168 1.392907010
112 (no aging) 8.498029491 n/a
112 (no opt) 10.21372495 13.47381656
112 (opt) 8.896963554 2.292223850
Previous versions of this series included logic in MGLRU and KVM to
support batching the updates to secondary page tables. This version
removes this logic, as it was complex and not necessary to enable
proactive reclaim. This optimization, as well as the additional
optimizations for arm64 and powerpc, can be done in a later series.
Changes since v3[1]:
- Vastly simplified the series (thanks David). Removed mmu notifier
batching logic entirely.
- Cleaned up how locking is done for mmu_notifier_test/clear_young
(thanks David).
- Look-around is now only done when there are no secondary MMUs
subscribed to MMU notifiers.
- CONFIG_LRU_GEN_WALKS_SECONDARY_MMU has been added.
- Fixed the lockless implementation of kvm_{test,}age_gfn for x86
(thanks David).
- Added MGLRU functional and performance tests to
access_tracking_perf_test (thanks Axel).
- In v3, an mm would be completely ignored (for aging) if there was a
secondary MMU but support for secondary MMU walking was missing. Now,
missing secondary MMU walking support simply skips the notifier
calls (except for eviction).
- Added a sanity check for that range->lockless and range->on_lock are
never both provided for the memslot walk.
For the changes from v2[2] to v3, see v3[1].
This series applies cleanly to mm/mm-unstable and kvm/queue.
[1]: https://lore.kernel.org/linux-mm/20240401232946.1837665-1-jthoughton@google…
[2]: https://lore.kernel.org/kvmarm/20230526234435.662652-1-yuzhao@google.com/
James Houghton (7):
mm/Kconfig: Add LRU_GEN_WALKS_SECONDARY_MMU
mm: multi-gen LRU: Have secondary MMUs participate in aging
KVM: Add lockless memslot walk to KVM
KVM: Move MMU lock acquisition for test/clear_young to architecture
KVM: x86: Relax locking for kvm_test_age_gfn and kvm_age_gfn
KVM: arm64: Relax locking for kvm_test_age_gfn and kvm_age_gfn
KVM: selftests: Add multi-gen LRU aging to access_tracking_perf_test
Documentation/admin-guide/mm/multigen_lru.rst | 6 +-
arch/arm64/kvm/hyp/pgtable.c | 9 +-
arch/arm64/kvm/mmu.c | 30 +-
arch/loongarch/kvm/mmu.c | 20 +-
arch/mips/kvm/mmu.c | 21 +-
arch/powerpc/kvm/book3s.c | 14 +-
arch/riscv/kvm/mmu.c | 26 +-
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu/mmu.c | 10 +-
arch/x86/kvm/mmu/tdp_iter.h | 27 +-
arch/x86/kvm/mmu/tdp_mmu.c | 67 ++-
include/linux/kvm_host.h | 1 +
include/linux/mmzone.h | 6 +-
mm/Kconfig | 8 +
mm/rmap.c | 9 +-
mm/vmscan.c | 144 +++++--
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/access_tracking_perf_test.c | 365 ++++++++++++++--
.../selftests/kvm/include/lru_gen_util.h | 55 +++
.../testing/selftests/kvm/lib/lru_gen_util.c | 391 ++++++++++++++++++
virt/kvm/kvm_main.c | 38 +-
21 files changed, 1104 insertions(+), 145 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/lru_gen_util.h
create mode 100644 tools/testing/selftests/kvm/lib/lru_gen_util.c
base-commit: e0cce98fe279b64f4a7d81b7f5c3a23d80b92fbc
--
2.45.1.288.g0e0cd299f1-goog
Currrentl a 32 bit 1u value is being shifted more than 32 bits causing
overflow and incorrect checking of bits 32-63. Fix this by using the
BIT_ULL macro for shifting bits.
Detected by cppcheck:
sev_init2_tests.c:108:34: error: Shifting 32-bit value by 63 bits is
undefined behaviour [shiftTooManyBits]
Fixes: dfc083a181ba ("selftests: kvm: add tests for KVM_SEV_INIT2")
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
V2: Fix incorrect variable in 2nd BIT_ULL(), kudos to Dan Carpenter for
catching this error.
---
tools/testing/selftests/kvm/x86_64/sev_init2_tests.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/sev_init2_tests.c b/tools/testing/selftests/kvm/x86_64/sev_init2_tests.c
index 7a4a61be119b..ea09f7a06aa4 100644
--- a/tools/testing/selftests/kvm/x86_64/sev_init2_tests.c
+++ b/tools/testing/selftests/kvm/x86_64/sev_init2_tests.c
@@ -105,11 +105,11 @@ void test_features(uint32_t vm_type, uint64_t supported_features)
int i;
for (i = 0; i < 64; i++) {
- if (!(supported_features & (1u << i)))
+ if (!(supported_features & BIT_ULL(i)))
test_init2_invalid(vm_type,
&(struct kvm_sev_init){ .vmsa_features = BIT_ULL(i) },
"unknown feature");
- else if (KNOWN_FEATURES & (1u << i))
+ else if (KNOWN_FEATURES & BIT_ULL(i))
test_init2(vm_type,
&(struct kvm_sev_init){ .vmsa_features = BIT_ULL(i) });
}
--
2.39.2
Hi Linus,
Please pull this urgent kselftest fixes update for Linux 6.10-rc3.
This kselftest fixes update consists of fixes to build warnings
in several tests and fixes to ftrace tests.
diff for pull request is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0:
Linux 6.10-rc1 (2024-05-26 15:20:12 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.10-rc3
for you to fetch changes up to 4bf15b1c657d22d1d70173e43264e4606dfe75ff:
selftests/futex: don't pass a const char* to asprintf(3) (2024-05-31 14:37:10 -0600)
----------------------------------------------------------------
linux_kselftest-fixes-6.10-rc3
This kselftest fixes update consists of fixes to build warnings
in several tests and fixes to ftrace tests.
----------------------------------------------------------------
John Hubbard (3):
selftests/futex: pass _GNU_SOURCE without a value to the compiler
selftests/futex: don't redefine .PHONY targets (all, clean)
selftests/futex: don't pass a const char* to asprintf(3)
Mark Brown (1):
kselftest/alsa: Ensure _GNU_SOURCE is defined
Masami Hiramatsu (Google) (3):
selftests/ftrace: Fix to check required event file
selftests/ftrace: Update required config
selftests/tracing: Fix event filter test to retry up to 10 times
Michael Ellerman (3):
selftests: cachestat: Fix build warnings on ppc64
selftests/openat2: Fix build warnings on ppc64
selftests/overlayfs: Fix build error on ppc64
Steven Rostedt (Google) (1):
tracing/selftests: Fix kprobe event name test for .isra. functions
tools/testing/selftests/alsa/Makefile | 2 +-
tools/testing/selftests/cachestat/test_cachestat.c | 1 +
.../selftests/filesystems/overlayfs/dev_in_maps.c | 1 +
tools/testing/selftests/ftrace/config | 26 ++++++++++++++++------
.../ftrace/test.d/dynevent/test_duplicates.tc | 2 +-
.../ftrace/test.d/filter/event-filter-function.tc | 20 ++++++++++++++++-
.../ftrace/test.d/kprobe/kprobe_eventname.tc | 3 ++-
tools/testing/selftests/futex/Makefile | 2 --
tools/testing/selftests/futex/functional/Makefile | 2 +-
.../selftests/futex/functional/futex_requeue_pi.c | 2 +-
tools/testing/selftests/openat2/openat2_test.c | 1 +
11 files changed, 47 insertions(+), 15 deletions(-)
----------------------------------------------------------------
Hi,
This seven series fix an issue reported by kernel test robot [3].
Shuah, I (as well as Kees and Sean [4]) think this should be in -next
really soon to make sure everything works fine for the v6.9 release,
which is not currently the case. I cannot test against all kselftests
though. I would prefer to let you handle this, but I guess you're not
able to do so and I'll push it on my branch without reply from you.
Even if I push it on my branch, please push it on yours too as soon as
you see this and I'll remove it from mine.
Mark, Jakub, could you please test this series?
As reported by Kernel Test Robot [1] and Sean Christopherson [2], some
tests fail since v6.9-rc1 . This is due to the use of vfork() which
introduced some side effects. Similarly, while making it more generic,
a previous commit made some Landlock file system tests flaky, and
subject to the host's file system mount configuration.
This series fixes all these side effects by replacing vfork() with
clone3() and CLONE_VFORK, which is cleaner (no arbitrary shared memory)
and makes the Kselftest framework more robust.
I tried different approaches and I found this one to be the cleaner and
less invasive for current test cases.
I successfully ran the following tests (using TEST_F and
fork/clone/clone3, and KVM_ONE_VCPU_TEST) with this series:
- kvm:fix_hypercall_test
- kvm:sync_regs_test
- kvm:userspace_msr_exit_test
- kvm:vmx_pmu_caps_test
- landlock:fs_test
- landlock:net_test
- landlock:ptrace_test
- move_mount_set_group:move_mount_set_group_test
- net/af_unix:scm_pidfd
- perf_events:remove_on_exec
- pidfd:pidfd_getfd_test
- pidfd:pidfd_setns_test
- seccomp:seccomp_bpf
- user_events:abi_test
[1] https://lore.kernel.org/oe-lkp/202403291015.1fcfa957-oliver.sang@intel.com
[2] https://lore.kernel.org/r/ZjPelW6-AbtYvslu@google.com
[3] https://lore.kernel.org/r/202405100339.vfBe0t9C-lkp@intel.com
[4] https://lore.kernel.org/r/202405061002.01D399877A@keescook
Previous versions:
v1: https://lore.kernel.org/r/20240426172252.1862930-1-mic@digikod.net
v2: https://lore.kernel.org/r/20240429130931.2394118-1-mic@digikod.net
v3: https://lore.kernel.org/r/20240429191911.2552580-1-mic@digikod.net
v4: https://lore.kernel.org/r/20240502210926.145539-1-mic@digikod.net
v5: https://lore.kernel.org/r/20240503105820.300927-1-mic@digikod.net
v6: https://lore.kernel.org/r/20240506165518.474504-1-mic@digikod.net
Regards,
Mickaël Salaün (10):
selftests/pidfd: Fix config for pidfd_setns_test
selftests/landlock: Fix FS tests when run on a private mount point
selftests/harness: Fix fixture teardown
selftests/harness: Fix interleaved scheduling leading to race
conditions
selftests/landlock: Do not allocate memory in fixture data
selftests/harness: Constify fixture variants
selftests/pidfd: Fix wrong expectation
selftests/harness: Share _metadata between forked processes
selftests/harness: Fix vfork() side effects
selftests/harness: Handle TEST_F()'s explicit exit codes
tools/testing/selftests/kselftest_harness.h | 127 +++++++++++++-----
tools/testing/selftests/landlock/fs_test.c | 83 +++++++-----
tools/testing/selftests/pidfd/config | 2 +
.../selftests/pidfd/pidfd_setns_test.c | 2 +-
4 files changed, 147 insertions(+), 67 deletions(-)
base-commit: e67572cd2204894179d89bd7b984072f19313b03
--
2.45.0
Add support for (yet again) more RVA23U64 missing extensions. Add
support for Zimop, Zcmop, Zca, Zcf, Zcd and Zcb extensions ISA string
parsing, hwprobe and kvm support. Zce, Zcmt and Zcmp extensions have
been left out since they target microcontrollers/embedded CPUs and are
not needed by RVA23U64.
Since Zc* extensions states that C implies Zca, Zcf (if F and RV32), Zcd
(if D), this series modifies the way ISA string is parsed and now does
it in two phases. First one parses the string and the second one
validates it for the final ISA description.
Link: https://lore.kernel.org/linux-riscv/20240404103254.1752834-1-cleger@rivosin… [1]
Link: https://lore.kernel.org/all/20240409143839.558784-1-cleger@rivosinc.com/ [2]
---
v6:
- Rebased on riscv/for-next
- Remove ternary operator to use 'if()' instead in extension checks
- v5: https://lore.kernel.org/all/20240517145302.971019-1-cleger@rivosinc.com/
v5:
- Merged in Zimop to avoid any uneeded series dependencies
- Rework dependency resolution loop to loop on source isa first rather
than on all extension.
- Disabled extensions in source isa once set in resolved isa
- Rename riscv_resolve_isa() parameters
- v4: https://lore.kernel.org/all/20240429150553.625165-1-cleger@rivosinc.com/
v4:
- Modify validate() callbacks to return 0, -EPROBEDEFER or another
error.
- v3: https://lore.kernel.org/all/20240423124326.2532796-1-cleger@rivosinc.com/
v3:
- Fix typo "exists" -> "exist"
- Remove C implies Zca, Zcd, Zcf, dt-bindings rules
- Rework ISA string resolver to handle dependencies
- v2: https://lore.kernel.org/all/20240418124300.1387978-1-cleger@rivosinc.com/
v2:
- Add Zc* dependencies validation in dt-bindings
- v1: https://lore.kernel.org/lkml/20240410091106.749233-1-cleger@rivosinc.com/
Clément Léger (16):
dt-bindings: riscv: add Zimop ISA extension description
riscv: add ISA extension parsing for Zimop
riscv: hwprobe: export Zimop ISA extension
RISC-V: KVM: Allow Zimop extension for Guest/VM
KVM: riscv: selftests: Add Zimop extension to get-reg-list test
dt-bindings: riscv: add Zca, Zcf, Zcd and Zcb ISA extension
description
riscv: add ISA extensions validation callback
riscv: add ISA parsing for Zca, Zcf, Zcd and Zcb
riscv: hwprobe: export Zca, Zcf, Zcd and Zcb ISA extensions
RISC-V: KVM: Allow Zca, Zcf, Zcd and Zcb extensions for Guest/VM
KVM: riscv: selftests: Add some Zc* extensions to get-reg-list test
dt-bindings: riscv: add Zcmop ISA extension description
riscv: add ISA extension parsing for Zcmop
riscv: hwprobe: export Zcmop ISA extension
RISC-V: KVM: Allow Zcmop extension for Guest/VM
KVM: riscv: selftests: Add Zcmop extension to get-reg-list test
Documentation/arch/riscv/hwprobe.rst | 28 ++
.../devicetree/bindings/riscv/extensions.yaml | 95 ++++++
arch/riscv/include/asm/cpufeature.h | 1 +
arch/riscv/include/asm/hwcap.h | 7 +-
arch/riscv/include/uapi/asm/hwprobe.h | 6 +
arch/riscv/include/uapi/asm/kvm.h | 6 +
arch/riscv/kernel/cpufeature.c | 278 ++++++++++++------
arch/riscv/kernel/sys_hwprobe.c | 6 +
arch/riscv/kvm/vcpu_onereg.c | 12 +
.../selftests/kvm/riscv/get-reg-list.c | 24 ++
10 files changed, 375 insertions(+), 88 deletions(-)
--
2.45.1
Add support for (yet again) more RVA23U64 missing extensions. Add
support for Zimop, Zcmop, Zca, Zcf, Zcd and Zcb extensions ISA string
parsing, hwprobe and kvm support. Zce, Zcmt and Zcmp extensions have
been left out since they target microcontrollers/embedded CPUs and are
not needed by RVA23U64.
Since Zc* extensions states that C implies Zca, Zcf (if F and RV32), Zcd
(if D), this series modifies the way ISA string is parsed and now does
it in two phases. First one parses the string and the second one
validates it for the final ISA description.
Link: https://lore.kernel.org/linux-riscv/20240404103254.1752834-1-cleger@rivosin… [1]
Link: https://lore.kernel.org/all/20240409143839.558784-1-cleger@rivosinc.com/ [2]
---
v5:
- Merged in Zimop to avoid any uneeded series dependencies
- Rework dependency resolution loop to loop on source isa first rather
than on all extensions.
- Disabled extensions in source isa once set in resolved isa
- Rename riscv_resolve_isa() parameters
v4:
- Modify validate() callbacks to return 0, -EPROBEDEFER or another
error.
- v3: https://lore.kernel.org/all/20240423124326.2532796-1-cleger@rivosinc.com/
v3:
- Fix typo "exists" -> "exist"
- Remove C implies Zca, Zcd, Zcf, dt-bindings rules
- Rework ISA string resolver to handle dependencies
- v2: https://lore.kernel.org/all/20240418124300.1387978-1-cleger@rivosinc.com/
v2:
- Add Zc* dependencies validation in dt-bindings
- v1: https://lore.kernel.org/lkml/20240410091106.749233-1-cleger@rivosinc.com/
Clément Léger (16):
dt-bindings: riscv: add Zimop ISA extension description
riscv: add ISA extension parsing for Zimop
riscv: hwprobe: export Zimop ISA extension
RISC-V: KVM: Allow Zimop extension for Guest/VM
KVM: riscv: selftests: Add Zimop extension to get-reg-list test
dt-bindings: riscv: add Zca, Zcf, Zcd and Zcb ISA extension
description
riscv: add ISA extensions validation callback
riscv: add ISA parsing for Zca, Zcf, Zcd and Zcb
riscv: hwprobe: export Zca, Zcf, Zcd and Zcb ISA extensions
RISC-V: KVM: Allow Zca, Zcf, Zcd and Zcb extensions for Guest/VM
KVM: riscv: selftests: Add some Zc* extensions to get-reg-list test
dt-bindings: riscv: add Zcmop ISA extension description
riscv: add ISA extension parsing for Zcmop
riscv: hwprobe: export Zcmop ISA extension
RISC-V: KVM: Allow Zcmop extension for Guest/VM
KVM: riscv: selftests: Add Zcmop extension to get-reg-list test
Documentation/arch/riscv/hwprobe.rst | 28 ++
.../devicetree/bindings/riscv/extensions.yaml | 95 +++++++
arch/riscv/include/asm/cpufeature.h | 26 +-
arch/riscv/include/asm/hwcap.h | 6 +
arch/riscv/include/uapi/asm/hwprobe.h | 6 +
arch/riscv/include/uapi/asm/kvm.h | 6 +
arch/riscv/kernel/cpufeature.c | 244 ++++++++++++------
arch/riscv/kernel/sys_hwprobe.c | 6 +
arch/riscv/kvm/vcpu_onereg.c | 12 +
.../selftests/kvm/riscv/get-reg-list.c | 24 ++
10 files changed, 366 insertions(+), 87 deletions(-)
--
2.43.0
This patch series adds unit tests for the clk fixed rate basic type and
the clk registration functions that use struct clk_parent_data. To get
there, we add support for loading device tree overlays onto the live DTB
along with probing platform drivers to bind to device nodes in the
overlays. With this series, we're able to exercise some of the code in
the common clk framework that uses devicetree lookups to find parents
and the fixed rate clk code that scans device tree directly and creates
clks. Please review.
I Cced everyone to all the patches so they get the full context. I'm
hoping I can take the whole pile through the clk tree as they all build
upon each other. Or the DT part can be merged through the DT tree to
reduce the dependencies.
Changes from v3 (https://lore.kernel.org/r/20230327222159.3509818-1-sboyd@kernel.org):
* No longer depend on Frank's series[1] because it was merged upstream[2]
* Use kunit_add_action_or_reset() to shorten code
* Skip tests properly when CONFIG_OF_OVERLAY isn't set
Changes from v2 (https://lore.kernel.org/r/20230315183729.2376178-1-sboyd@kernel.org):
* Overlays don't depend on __symbols__ node
* Depend on Frank's always create root node if CONFIG_OF series[1]
* Added kernel-doc to KUnit API doc
* Fixed some kernel-doc on functions
* More test cases for fixed rate clk
Changes from v1 (https://lore.kernel.org/r/20230302013822.1808711-1-sboyd@kernel.org):
* Don't depend on UML, use unittest data approach to attach nodes
* Introduce overlay loading API for KUnit
* Move platform_device KUnit code to drivers/base/test
* Use #define macros for constants shared between unit tests and
overlays
* Settle on "test" as a vendor prefix
* Make KUnit wrappers have "_kunit" postfix
[1] https://lore.kernel.org/r/20230317053415.2254616-1-frowand.list@gmail.com
[2] https://lore.kernel.org/r/20240308195737.GA1174908-robh@kernel.org
Stephen Boyd (10):
of: Add test managed wrappers for of_overlay_apply()/of_node_put()
dt-bindings: vendor-prefixes: Add "test" vendor for KUnit and friends
dt-bindings: test: Add KUnit empty node binding
of: Add a KUnit test for overlays and test managed APIs
platform: Add test managed platform_device/driver APIs
dt-bindings: kunit: Add fixed rate clk consumer test
clk: Add test managed clk provider/consumer APIs
clk: Add KUnit tests for clk fixed rate basic type
dt-bindings: clk: Add KUnit clk_parent_data test
clk: Add KUnit tests for clks registered with struct clk_parent_data
Documentation/dev-tools/kunit/api/clk.rst | 10 +
Documentation/dev-tools/kunit/api/index.rst | 21 +
Documentation/dev-tools/kunit/api/of.rst | 13 +
.../dev-tools/kunit/api/platformdevice.rst | 10 +
.../bindings/clock/test,clk-parent-data.yaml | 47 ++
.../bindings/test/test,clk-fixed-rate.yaml | 35 ++
.../devicetree/bindings/test/test,empty.yaml | 30 ++
.../devicetree/bindings/vendor-prefixes.yaml | 2 +
drivers/base/test/Makefile | 3 +
drivers/base/test/platform_kunit-test.c | 140 ++++++
drivers/base/test/platform_kunit.c | 174 +++++++
drivers/clk/.kunitconfig | 2 +
drivers/clk/Kconfig | 9 +
drivers/clk/Makefile | 9 +-
drivers/clk/clk-fixed-rate_test.c | 377 +++++++++++++++
drivers/clk/clk-fixed-rate_test.h | 8 +
drivers/clk/clk_kunit.c | 198 ++++++++
drivers/clk/clk_parent_data_test.h | 10 +
drivers/clk/clk_test.c | 451 +++++++++++++++++-
drivers/clk/kunit_clk_fixed_rate_test.dtso | 19 +
drivers/clk/kunit_clk_parent_data_test.dtso | 28 ++
drivers/of/.kunitconfig | 1 +
drivers/of/Kconfig | 10 +
drivers/of/Makefile | 2 +
drivers/of/kunit_overlay_test.dtso | 9 +
drivers/of/of_kunit.c | 99 ++++
drivers/of/overlay_test.c | 115 +++++
include/kunit/clk.h | 28 ++
include/kunit/of.h | 94 ++++
include/kunit/platform_device.h | 15 +
30 files changed, 1967 insertions(+), 2 deletions(-)
create mode 100644 Documentation/dev-tools/kunit/api/clk.rst
create mode 100644 Documentation/dev-tools/kunit/api/of.rst
create mode 100644 Documentation/dev-tools/kunit/api/platformdevice.rst
create mode 100644 Documentation/devicetree/bindings/clock/test,clk-parent-data.yaml
create mode 100644 Documentation/devicetree/bindings/test/test,clk-fixed-rate.yaml
create mode 100644 Documentation/devicetree/bindings/test/test,empty.yaml
create mode 100644 drivers/base/test/platform_kunit-test.c
create mode 100644 drivers/base/test/platform_kunit.c
create mode 100644 drivers/clk/clk-fixed-rate_test.c
create mode 100644 drivers/clk/clk-fixed-rate_test.h
create mode 100644 drivers/clk/clk_kunit.c
create mode 100644 drivers/clk/clk_parent_data_test.h
create mode 100644 drivers/clk/kunit_clk_fixed_rate_test.dtso
create mode 100644 drivers/clk/kunit_clk_parent_data_test.dtso
create mode 100644 drivers/of/kunit_overlay_test.dtso
create mode 100644 drivers/of/of_kunit.c
create mode 100644 drivers/of/overlay_test.c
create mode 100644 include/kunit/clk.h
create mode 100644 include/kunit/of.h
create mode 100644 include/kunit/platform_device.h
base-commit: 4cece764965020c22cff7665b18a012006359095
--
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git
From: Geliang Tang <tanggeliang(a)kylinos.cn>
This patchset contains some fixes and improvements for test_sockmap.
3-5: switching attachments to bpf_link as Jakub suggested in [1].
1-2, 6-8: Small fixes.
[1]
https://lore.kernel.org/bpf/87zfsiw3a3.fsf@cloudflare.com/
Geliang Tang (8):
selftests/bpf: Fix tx_prog_fd values in test_sockmap
selftests/bpf: Drop duplicate definition of i in test_sockmap
selftests/bpf: Use bpf_link attachments in test_sockmap
selftests/bpf: Replace tx_prog_fd with tx_prog in test_sockmap
selftests/bpf: Drop prog_fd array in test_sockmap
selftests/bpf: Fix size of map_fd in test_sockmap
selftests/bpf: Check length of recv in test_sockmap
selftests/bpf: Drop duplicate bpf_map_lookup_elem in test_sockmap
.../selftests/bpf/progs/test_sockmap_kern.h | 3 -
tools/testing/selftests/bpf/test_sockmap.c | 101 +++++++++---------
2 files changed, 51 insertions(+), 53 deletions(-)
--
2.43.0
Adapt the current test-livepatch.sh script to account the number of
applied livepatches and ensure that an atomic replace livepatch disables
all previously applied livepatches.
Signed-off-by: Marcos Paulo de Souza <mpdesouza(a)suse.com>
---
Changes since v1:
* Added checks in the existing test-livepatch.sh instead of creating a
new test file. (Joe)
* Fixed issues reported by ShellCheck (Joe)
---
.../testing/selftests/livepatch/test-livepatch.sh | 46 ++++++++++++++++++++--
1 file changed, 42 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/livepatch/test-livepatch.sh b/tools/testing/selftests/livepatch/test-livepatch.sh
index e3455a6b1158..d85405d18e54 100755
--- a/tools/testing/selftests/livepatch/test-livepatch.sh
+++ b/tools/testing/selftests/livepatch/test-livepatch.sh
@@ -107,9 +107,12 @@ livepatch: '$MOD_LIVEPATCH': unpatching complete
# - load a livepatch that modifies the output from /proc/cmdline and
# verify correct behavior
-# - load an atomic replace livepatch and verify that only the second is active
-# - remove the first livepatch and verify that the atomic replace livepatch
-# is still active
+# - load two addtional livepatches and check the number of livepatch modules
+# applied
+# - load an atomic replace livepatch and check that the other three modules were
+# disabled
+# - remove all livepatches besides the atomic replace one and verify that the
+# atomic replace livepatch is still active
# - remove the atomic replace livepatch and verify that none are active
start_test "atomic replace livepatch"
@@ -119,12 +122,31 @@ load_lp $MOD_LIVEPATCH
grep 'live patched' /proc/cmdline > /dev/kmsg
grep 'live patched' /proc/meminfo > /dev/kmsg
+for mod in test_klp_syscall test_klp_callbacks_demo; do
+ load_lp $mod
+done
+
+mods=(/sys/kernel/livepatch/*)
+nmods=${#mods[@]}
+if [ "$nmods" -ne 3 ]; then
+ die "Expecting three modules listed, found $nmods"
+fi
+
load_lp $MOD_REPLACE replace=1
grep 'live patched' /proc/cmdline > /dev/kmsg
grep 'live patched' /proc/meminfo > /dev/kmsg
-unload_lp $MOD_LIVEPATCH
+mods=(/sys/kernel/livepatch/*)
+nmods=${#mods[@]}
+if [ "$nmods" -ne 1 ]; then
+ die "Expecting only one moduled listed, found $nmods"
+fi
+
+# These modules were disabled by the atomic replace
+for mod in test_klp_callbacks_demo test_klp_syscall $MOD_LIVEPATCH; do
+ unload_lp "$mod"
+done
grep 'live patched' /proc/cmdline > /dev/kmsg
grep 'live patched' /proc/meminfo > /dev/kmsg
@@ -142,6 +164,20 @@ livepatch: '$MOD_LIVEPATCH': starting patching transition
livepatch: '$MOD_LIVEPATCH': completing patching transition
livepatch: '$MOD_LIVEPATCH': patching complete
$MOD_LIVEPATCH: this has been live patched
+% insmod test_modules/test_klp_syscall.ko
+livepatch: enabling patch 'test_klp_syscall'
+livepatch: 'test_klp_syscall': initializing patching transition
+livepatch: 'test_klp_syscall': starting patching transition
+livepatch: 'test_klp_syscall': completing patching transition
+livepatch: 'test_klp_syscall': patching complete
+% insmod test_modules/test_klp_callbacks_demo.ko
+livepatch: enabling patch 'test_klp_callbacks_demo'
+livepatch: 'test_klp_callbacks_demo': initializing patching transition
+test_klp_callbacks_demo: pre_patch_callback: vmlinux
+livepatch: 'test_klp_callbacks_demo': starting patching transition
+livepatch: 'test_klp_callbacks_demo': completing patching transition
+test_klp_callbacks_demo: post_patch_callback: vmlinux
+livepatch: 'test_klp_callbacks_demo': patching complete
% insmod test_modules/$MOD_REPLACE.ko replace=1
livepatch: enabling patch '$MOD_REPLACE'
livepatch: '$MOD_REPLACE': initializing patching transition
@@ -149,6 +185,8 @@ livepatch: '$MOD_REPLACE': starting patching transition
livepatch: '$MOD_REPLACE': completing patching transition
livepatch: '$MOD_REPLACE': patching complete
$MOD_REPLACE: this has been live patched
+% rmmod test_klp_callbacks_demo
+% rmmod test_klp_syscall
% rmmod $MOD_LIVEPATCH
$MOD_REPLACE: this has been live patched
% echo 0 > /sys/kernel/livepatch/$MOD_REPLACE/enabled
---
base-commit: 6d69b6c12fce479fde7bc06f686212451688a102
change-id: 20240525-lp-atomic-replace-90b33ed018dc
Best regards,
--
Marcos Paulo de Souza <mpdesouza(a)suse.com>
Fixed MAC addresses help with debugging as last four bytes identify the
network namespace.
Signed-off-by: Lukasz Majewski <lukma(a)denx.de>
---
tools/testing/selftests/net/hsr/hsr_ping.sh | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/tools/testing/selftests/net/hsr/hsr_ping.sh b/tools/testing/selftests/net/hsr/hsr_ping.sh
index 3684b813b0f6..f5d207fc770a 100755
--- a/tools/testing/selftests/net/hsr/hsr_ping.sh
+++ b/tools/testing/selftests/net/hsr/hsr_ping.sh
@@ -152,6 +152,15 @@ setup_hsr_interfaces()
ip -net "$ns3" addr add 100.64.0.3/24 dev hsr3
ip -net "$ns3" addr add dead:beef:1::3/64 dev hsr3 nodad
+ ip -net "$ns1" link set address 00:11:22:00:01:01 dev ns1eth1
+ ip -net "$ns1" link set address 00:11:22:00:01:02 dev ns1eth2
+
+ ip -net "$ns2" link set address 00:11:22:00:02:01 dev ns2eth1
+ ip -net "$ns2" link set address 00:11:22:00:02:02 dev ns2eth2
+
+ ip -net "$ns3" link set address 00:11:22:00:03:01 dev ns3eth1
+ ip -net "$ns3" link set address 00:11:22:00:03:02 dev ns3eth2
+
# All Links up
ip -net "$ns1" link set ns1eth1 up
ip -net "$ns1" link set ns1eth2 up
--
2.20.1
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The function tracer is tested to see if pid filtering works. Add a test to
test function_graph tracer as well, but only if the function_graph tracer
is enabled for the top level or instance.
Cc: linux-kselftest(a)vger.kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
.../ftrace/test.d/ftrace/func-filter-pid.tc | 27 +++++++++++++++----
1 file changed, 22 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc
index 2f7211254529..c6fc9d31a496 100644
--- a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc
+++ b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc
@@ -14,6 +14,11 @@ if [ ! -f options/function-fork ]; then
echo "no option for function-fork found. Option will not be tested."
fi
+if [ ! -f options/funcgraph-proc ]; then
+ do_funcgraph_proc=0
+ echo "no option for function-fork found. Option will not be tested."
+fi
+
read PID _ < /proc/self/stat
if [ $do_function_fork -eq 1 ]; then
@@ -21,12 +26,18 @@ if [ $do_function_fork -eq 1 ]; then
orig_value=`grep function-fork trace_options`
fi
+if [ $do_funcgraph_proc -eq 1 ]; then
+ orig_value2=`cat options/funcgraph-proc`
+fi
+
do_reset() {
- if [ $do_function_fork -eq 0 ]; then
- return
+ if [ $do_function_fork -eq 1 ]; then
+ echo $orig_value > trace_options
fi
- echo $orig_value > trace_options
+ if [ $do_funcgraph_proc -eq 1 ]; then
+ echo $orig_value2 > options/funcgraph-proc
+ fi
}
fail() { # msg
@@ -36,13 +47,15 @@ fail() { # msg
}
do_test() {
+ TRACER=$1
+
disable_tracing
echo do_execve* > set_ftrace_filter
echo $FUNCTION_FORK >> set_ftrace_filter
echo $PID > set_ftrace_pid
- echo function > current_tracer
+ echo $TRACER > current_tracer
if [ $do_function_fork -eq 1 ]; then
# don't allow children to be traced
@@ -82,7 +95,11 @@ do_test() {
fi
}
-do_test
+do_test function
+if grep -s function_graph available_tracers; then
+ do_test function_graph
+fi
+
do_reset
exit 0
--
2.43.0
From: Matteo Croce <teknoraver(a)meta.com>
Some programs need to know the size of the network buffers to operate
correctly, export the following sysctls read-only in network namespaces:
- net.core.rmem_default
- net.core.rmem_max
- net.core.wmem_default
- net.core.wmem_max
Matteo Croce (2):
net: make net.core.{r,w}mem_{default,max} namespaced
selftests: net: tests net.core.{r,w}mem_{default,max} sysctls in a
netns
changes from v1:
- added SPDX header to test
- rewrite test with more detailed error messages
net/core/sysctl_net_core.c | 75 ++++++++++++---------
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/netns-sysctl.sh | 40 +++++++++++
3 files changed, 83 insertions(+), 33 deletions(-)
create mode 100755 tools/testing/selftests/net/netns-sysctl.sh
--
2.45.1
Add CFcommon.arch for the various arch's need for rcutorture.
According to [1] and [2], this patch
Fixes: a6fda6dab93c ("rcutorture: Tweak kvm options") by moving
x86 specific kernel option CONFIG_HYPERVISOR_GUEST to CFcommon.x86
[1] https://lore.kernel.org/all/20240427005626.1365935-1-zhouzhouyi@gmail.com/
[2] https://lore.kernel.org/all/059d36ce-6453-42be-a31e-895abd35d590@paulmck-la…
Tested in x86_64 and PPC VM of Open Source Lab of Oregon State University.
Signed-off-by: Zhouyi Zhou <zhouzhouyi(a)gmail.com>
---
Hi Paul,
I tried very hard to find in Linux kernel on how to dig out
the x86 specific kernel option CONFIG_HYPERVISOR_GUEST before configcheck.sh
generates ConfigFragment.diags.
I can only find this functionality in scripts/kconfig/conf which travels
the Kconfig hierarchy.
But the output of scripts/kconfig/conf, which is .config
is also one of the input of configcheck.sh:
```
kvm-recheck.sh: configcheck.sh $i/.config $i/ConfigFragment > $i/ConfigFragment.diags 2>&1
```
I feel some logic paradox in it ;-)
So, I pick the simplest way.
One more thing, recent change in include/linux/bitmap.h cause the make
of allmodconfig fail because of warning on both x86 platforms, I am
going to do research on it.
Thank your for your guidance
Zhouyi
--
tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh | 9 +++++++++
tools/testing/selftests/rcutorture/configs/rcu/CFcommon | 1 -
.../selftests/rcutorture/configs/rcu/CFcommon.x86 | 1 +
3 files changed, 10 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/CFcommon.x86
diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
index b33cd8753689..5332224238ba 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
@@ -62,6 +62,15 @@ config_override_param () {
}
echo > $T/KcList
+if uname -m | grep -q 86
+# TODO: add other architecture-specific common configuration when needed
+then
+ if test -f $config_dir/CFcommon.x86
+ then
+ config_override_param "$config_dir/CFcommon.x86" KcList\
+ "`cat $config_dir/CFcommon.x86 2> /dev/null`"
+ fi
+fi
config_override_param "$config_dir/CFcommon" KcList "`cat $config_dir/CFcommon 2> /dev/null`"
config_override_param "$config_template" KcList "`cat $config_template 2> /dev/null`"
config_override_param "--gdb options" KcList "$TORTURE_KCONFIG_GDB_ARG"
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/CFcommon b/tools/testing/selftests/rcutorture/configs/rcu/CFcommon
index 0e92d85313aa..cf0387ae5358 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/CFcommon
+++ b/tools/testing/selftests/rcutorture/configs/rcu/CFcommon
@@ -1,6 +1,5 @@
CONFIG_RCU_TORTURE_TEST=y
CONFIG_PRINTK_TIME=y
-CONFIG_HYPERVISOR_GUEST=y
CONFIG_PARAVIRT=y
CONFIG_KVM_GUEST=y
CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/CFcommon.x86 b/tools/testing/selftests/rcutorture/configs/rcu/CFcommon.x86
new file mode 100644
index 000000000000..2770560d56a0
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/configs/rcu/CFcommon.x86
@@ -0,0 +1 @@
+CONFIG_HYPERVISOR_GUEST=y
--
2.25.1
Hi,
Here's a few fixes that are part of my effort to get all selftests
building cleanly under clang. Plus one that I noticed by inspection.
Changes since v2:
1) Added a sentence to the .PHONY patch, to show that it is removing
duplicate code.
2) Added the actual clang warning output to the commit description.
Changes since the first version:
1) Rebased onto Linux 6.10-rc1
2) Added Reviewed-by's.
...and it turns out that all three patches are still required, on -rc1,
in order to get a clean clang build.
Enjoy!
thanks,
John Hubbard
John Hubbard (3):
selftests/futex: don't redefine .PHONY targets (all, clean)
selftests/futex: don't pass a const char* to asprintf(3)
selftests/futex: pass _GNU_SOURCE without a value to the compiler
tools/testing/selftests/futex/Makefile | 2 --
tools/testing/selftests/futex/functional/Makefile | 2 +-
tools/testing/selftests/futex/functional/futex_requeue_pi.c | 2 +-
3 files changed, 2 insertions(+), 4 deletions(-)
base-commit: b050496579632f86ee1ef7e7501906db579f3457
--
2.45.1
The purpose of this series is to rethink how HID-BPF is invoked.
Currently it implies a jmp table, a prog fd bpf_map, a preloaded tracing
bpf program and a lot of manual work for handling the bpf program
lifetime and addition/removal.
OTOH, bpf_struct_ops take care of most of the bpf handling leaving us
with a simple list of ops pointers, and we can directly call the
struct_ops program from the kernel as a regular function.
The net gain right now is in term of code simplicity and lines of code
removal (though is an API breakage), but udev-hid-bpf is able to handle
such breakages.
In the near future, we will be able to extend the HID-BPF struct_ops
with entrypoints for hid_hw_raw_request() and hid_hw_output_report(),
allowing for covering all of the initial use cases:
- firewalling a HID device
- fixing all of the HID device interactions (not just device events as
it is right now).
The matching user-space loader (udev-hid-bpf) MR is at
https://gitlab.freedesktop.org/libevdev/udev-hid-bpf/-/merge_requests/86
I'll put it out of draft once this is merged.
Cheers,
Benjamin
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Benjamin Tissoires (13):
HID: rename struct hid_bpf_ops into hid_ops
HID: bpf: add hid_get/put_device() helpers
HID: bpf: implement HID-BPF through bpf_struct_ops
selftests/hid: convert the hid_bpf selftests with struct_ops
HID: samples: convert the 2 HID-BPF samples into struct_ops
HID: bpf: add defines for HID-BPF SEC in in-tree bpf fixes
HID: bpf: convert in-tree fixes into struct_ops
HID: bpf: remove tracing HID-BPF capability
selftests/hid: add subprog call test
Documentation: HID: amend HID-BPF for struct_ops
Documentation: HID: add a small blurb on udev-hid-bpf
HID: bpf: Artist24: remove unused variable
HID: bpf: error on warnings when compiling bpf objects
Documentation/hid/hid-bpf.rst | 162 +++---
drivers/hid/bpf/Makefile | 2 +-
drivers/hid/bpf/entrypoints/Makefile | 93 ----
drivers/hid/bpf/entrypoints/README | 4 -
drivers/hid/bpf/entrypoints/entrypoints.bpf.c | 25 -
drivers/hid/bpf/entrypoints/entrypoints.lskel.h | 248 ---------
drivers/hid/bpf/hid_bpf_dispatch.c | 266 +++-------
drivers/hid/bpf/hid_bpf_dispatch.h | 12 +-
drivers/hid/bpf/hid_bpf_jmp_table.c | 565 ---------------------
drivers/hid/bpf/hid_bpf_struct_ops.c | 246 +++++++++
drivers/hid/bpf/progs/FR-TEC__Raptor-Mach-2.bpf.c | 9 +-
drivers/hid/bpf/progs/HP__Elite-Presenter.bpf.c | 6 +-
drivers/hid/bpf/progs/Huion__Kamvas-Pro-19.bpf.c | 9 +-
.../hid/bpf/progs/IOGEAR__Kaliber-MMOmentum.bpf.c | 6 +-
drivers/hid/bpf/progs/Makefile | 2 +-
.../hid/bpf/progs/Microsoft__XBox-Elite-2.bpf.c | 6 +-
drivers/hid/bpf/progs/Wacom__ArtPen.bpf.c | 6 +-
drivers/hid/bpf/progs/XPPen__Artist24.bpf.c | 10 +-
drivers/hid/bpf/progs/XPPen__ArtistPro16Gen2.bpf.c | 24 +-
drivers/hid/bpf/progs/hid_bpf.h | 5 +
drivers/hid/hid-core.c | 6 +-
include/linux/hid_bpf.h | 109 ++--
samples/hid/Makefile | 5 +-
samples/hid/hid_bpf_attach.bpf.c | 18 -
samples/hid/hid_bpf_attach.h | 14 -
samples/hid/hid_mouse.bpf.c | 26 +-
samples/hid/hid_mouse.c | 39 +-
samples/hid/hid_surface_dial.bpf.c | 10 +-
samples/hid/hid_surface_dial.c | 53 +-
tools/testing/selftests/hid/hid_bpf.c | 100 +++-
tools/testing/selftests/hid/progs/hid.c | 100 +++-
31 files changed, 744 insertions(+), 1442 deletions(-)
---
base-commit: 70ec81c2e2b4005465ad0d042e90b36087c36104
change-id: 20240513-hid_bpf_struct_ops-e3212a224555
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
Here is a couple of patches to fix issues related to runing environment
and kernel configuration.
Thank you,
---
Masami Hiramatsu (Google) (2):
selftests/tracing: Fix event filter test to retry up to 10 times
selftests/tracing: Fix to check the required syscall event
.../ftrace/test.d/dynevent/test_duplicates.tc | 2 +-
.../ftrace/test.d/filter/event-filter-function.tc | 20 +++++++++++++++++++-
2 files changed, 20 insertions(+), 2 deletions(-)
--
Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
The kselftests may be built in a couple different ways:
make LLVM=1
make CC=clang
In order to handle both cases, set LLVM=1 if CC=clang. That way,the rest
of lib.mk, and any Makefiles that include lib.mk, can base decisions
solely on whether or not LLVM is set.
Then, build upon that to disable a pair of clang warnings that are
already silenced on gcc.
Doing it this way is much better than the piecemeal approach that I
started with in [1] and [2]. Thanks to Nathan Chancellor for the patch
reviews that led to this approach.
[1] https://lore.kernel.org/20240527214704.300444-1-jhubbard@nvidia.com
[2] https://lore.kernel.org/20240527213641.299458-1-jhubbard@nvidia.com
John Hubbard (2):
selftests/lib.mk: handle both LLVM=1 and CC=clang builds
selftests/lib.mk: silence some clang warnings that gcc already ignores
tools/testing/selftests/lib.mk | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
base-commit: e0cce98fe279b64f4a7d81b7f5c3a23d80b92fbc
--
2.45.1
Hi all,
This series does a number of cleanups into resctrl_val() and
generalizes it by removing test name specific handling from the
function.
One of the changes improves MBA/MBM measurement by narrowing down the
period the resctrl FS derived memory bandwidth numbers are measured
over. My feel is it didn't cause noticeable difference into the numbers
because they're generally good anyway except for the small number of
outliers. To see the impact on outliers, I'd need to setup a test to
run large number of replications and do a statistical analysis, which
I've not spent my time on. Even without the statistical analysis, the
new way to measure seems obviously better and makes sense even if I
cannot see a major improvement with the setup I'm using.
v4:
- Merged close fix into IMC READ+WRITE rework patch
- Add loop to reset imc_counters_config fds to -1 to be able know which
need closing
- Introduce perf_close_imc_mem_bw() to close fds
- Open resctrl mem bw file (twice) beforehand to avoid opening it during
the test
- Remove MBM .mongrp setup
- Remove mongrp from CMT test
v3:
- Rename init functions to <testname>_init()
- Replace for loops with READ+WRITE statements for clarity
- Don't drop Return: entry from perf_open_imc_mem_bw() func comment
- New patch: Fix closing of IMC fds in case of error
- New patch: Make "bandwidth" consistent in comments & prints
- New patch: Simplify mem bandwidth file code
- Remove wrong comment
- Changed grp_name check to return -1 on fail (internal sanity check)
v2:
- Resolved conflicts with kselftest/next
- Spaces -> tabs correction
Ilpo Järvinen (16):
selftests/resctrl: Fix closing IMC fds on error and open-code R+W
instead of loops
selftests/resctrl: Calculate resctrl FS derived mem bw over sleep(1)
only
selftests/resctrl: Make "bandwidth" consistent in comments & prints
selftests/resctrl: Consolidate get_domain_id() into resctrl_val()
selftests/resctrl: Use correct type for pids
selftests/resctrl: Cleanup bm_pid and ppid usage & limit scope
selftests/resctrl: Rename measure_vals() to measure_mem_bw_vals() &
document
selftests/resctrl: Simplify mem bandwidth file code for MBA & MBM
tests
selftests/resctrl: Add ->measure() callback to resctrl_val_param
selftests/resctrl: Add ->init() callback into resctrl_val_param
selftests/resctrl: Simplify bandwidth report type handling
selftests/resctrl: Make some strings passed to resctrlfs functions
const
selftests/resctrl: Convert ctrlgrp & mongrp to pointers
selftests/resctrl: Remove mongrp from MBA test
selftests/resctrl: Remove mongrp from CMT test
selftests/resctrl: Remove test name comparing from
write_bm_pid_to_resctrl()
tools/testing/selftests/resctrl/cache.c | 6 +-
tools/testing/selftests/resctrl/cat_test.c | 5 +-
tools/testing/selftests/resctrl/cmt_test.c | 22 +-
tools/testing/selftests/resctrl/mba_test.c | 26 +-
tools/testing/selftests/resctrl/mbm_test.c | 26 +-
tools/testing/selftests/resctrl/resctrl.h | 49 ++-
tools/testing/selftests/resctrl/resctrl_val.c | 362 ++++++++----------
tools/testing/selftests/resctrl/resctrlfs.c | 64 ++--
8 files changed, 287 insertions(+), 273 deletions(-)
--
2.39.2
Hi,
Just a bunch of build and warnings fixes that show up when
building with clang. Some of these depend on each other, so
I'm sending them as a series.
Changes since the first version:
1) Rebased onto Linux 6.10-rc1
Enjoy!
thanks,
John Hubbard
John Hubbard (6):
selftests/x86: build test_FISTTP.c with clang
selftests/x86: build fsgsbase_restore.c with clang
selftests/x86: build sysret_rip.c with clang
selftests/x86: avoid -no-pie warnings from clang during compilation
selftests/x86: remove (or use) unused variables and functions
selftests/x86: fix printk warnings reported by clang
tools/testing/selftests/x86/Makefile | 10 +++++++
tools/testing/selftests/x86/amx.c | 16 -----------
.../testing/selftests/x86/clang_helpers_32.S | 11 ++++++++
.../testing/selftests/x86/clang_helpers_64.S | 28 +++++++++++++++++++
tools/testing/selftests/x86/fsgsbase.c | 6 ----
.../testing/selftests/x86/fsgsbase_restore.c | 11 ++++----
tools/testing/selftests/x86/sigreturn.c | 2 +-
.../testing/selftests/x86/syscall_arg_fault.c | 1 -
tools/testing/selftests/x86/sysret_rip.c | 20 ++++---------
tools/testing/selftests/x86/test_FISTTP.c | 8 +++---
tools/testing/selftests/x86/test_vsyscall.c | 15 ++++------
tools/testing/selftests/x86/vdso_restorer.c | 2 ++
12 files changed, 72 insertions(+), 58 deletions(-)
create mode 100644 tools/testing/selftests/x86/clang_helpers_32.S
create mode 100644 tools/testing/selftests/x86/clang_helpers_64.S
base-commit: 2bfcfd584ff5ccc8bb7acde19b42570414bf880b
prerequisite-patch-id: 39d606b9b165077aa1a3a3b0a3b396dba0c20070
--
2.45.1
Hi,
Here's a few fixes that are part of my effort to get all selftests
building cleanly under clang. Plus one that I noticed by inspection.
Changes since the first version:
1) Rebased onto Linux 6.10-rc1
2) Added Reviewed-by's.
...and it turns out that all three patches are still required, on -rc1,
in order to get a clean clang build.
Enjoy!
thanks,
John Hubbard
John Hubbard (3):
selftests/futex: don't redefine .PHONY targets (all, clean)
selftests/futex: don't pass a const char* to asprintf(3)
selftests/futex: pass _GNU_SOURCE without a value to the compiler
tools/testing/selftests/futex/Makefile | 2 --
tools/testing/selftests/futex/functional/Makefile | 2 +-
tools/testing/selftests/futex/functional/futex_requeue_pi.c | 2 +-
3 files changed, 2 insertions(+), 4 deletions(-)
base-commit: e0cce98fe279b64f4a7d81b7f5c3a23d80b92fbc
--
2.45.1
The MBM (Memory Bandwidth Monitoring) and MBA (Memory Bandwidth Allocation)
features are not enabled for AMD systems. The reason was lack of perf
counters to compare the resctrl test results.
Starting with the commit
25e56847821f ("perf/x86/amd/uncore: Add memory controller support"), AMD
now supports the UMC (Unified Memory Controller) perf events. These events
can be used to compare the test results.
This series adds the support to detect the UMC events and enable MBM/MBA
tests for AMD systems.
Babu Moger (4):
selftests/resctrl: Rename variable imcs and num_of_imcs() to generic
names
selftests/resctrl: Pass sysfs controller name of the vendor
selftests/resctrl: Add support for MBM and MBA tests on AMD
selftests/resctrl: Skip the tests if iMC/UMC counters are unavailable
tools/testing/selftests/resctrl/resctrl.h | 1 +
.../testing/selftests/resctrl/resctrl_tests.c | 16 ++-
tools/testing/selftests/resctrl/resctrl_val.c | 105 ++++++++++++++----
3 files changed, 96 insertions(+), 26 deletions(-)
--
2.34.1
Currently if we request a feature that is not set in the Kernel
config we fail silently and return the available features. However, the
documentation indicates we should return an EINVAL.
We need to fix this issue since we can end up with a Kernel warning
should a program request the feature UFFD_FEATURE_WP_UNPOPULATED on
a kernel with the config not set with this feature.
Signed-off-by: Audra Mitchell <audra(a)redhat.com>
---
fs/userfaultfd.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 60dcfafdc11a..17210558de79 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -2073,6 +2073,11 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
uffdio_api.features &= ~UFFD_FEATURE_WP_UNPOPULATED;
uffdio_api.features &= ~UFFD_FEATURE_WP_ASYNC;
#endif
+
+ ret = -EINVAL;
+ if (features & ~uffdio_api.features)
+ goto err_out;
+
uffdio_api.ioctls = UFFD_API_IOCTLS;
ret = -EFAULT;
if (copy_to_user(buf, &uffdio_api, sizeof(uffdio_api)))
--
2.44.0
When building with clang via:
make LLVM=1 -C tools/testing/selftests
two distinct failures occur:
1) gcc requires -static-libasan in order to ensure that Address
Sanitizer's library is the first one loaded. However, this leads to
build failures on clang, when building via:
make LLVM=1 -C tools/testing/selftests
However, clang already does the right thing by default: it statically
links the Address Sanitizer if -fsanitize is specified. Therefore, fix
this by simply omitting -static-libasan for clang builds. And leave
behind a comment, because the whole reason for static linking might not
be obvious.
2) clang won't accept invocations of this form, but gcc will:
$(CC) file1.c header2.h
Fix this by using selftests/lib.mk facilities for tracking local header
file dependencies: add them to LOCAL_HDRS, leaving only the .c files to
be passed to the compiler.
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
tools/testing/selftests/openat2/Makefile | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/openat2/Makefile b/tools/testing/selftests/openat2/Makefile
index 254d676a2689..185dc76ebb5f 100644
--- a/tools/testing/selftests/openat2/Makefile
+++ b/tools/testing/selftests/openat2/Makefile
@@ -1,8 +1,18 @@
# SPDX-License-Identifier: GPL-2.0-or-later
-CFLAGS += -Wall -O2 -g -fsanitize=address -fsanitize=undefined -static-libasan
+CFLAGS += -Wall -O2 -g -fsanitize=address -fsanitize=undefined
TEST_GEN_PROGS := openat2_test resolve_test rename_attack_test
+# gcc requires -static-libasan in order to ensure that Address Sanitizer's
+# library is the first one loaded. However, clang already statically links the
+# Address Sanitizer if -fsanitize is specified. Therefore, simply omit
+# -static-libasan for clang builds.
+ifeq ($(LLVM),)
+ CFLAGS += -static-libasan
+endif
+
+LOCAL_HDRS += helpers.h
+
include ../lib.mk
-$(TEST_GEN_PROGS): helpers.c helpers.h
+$(TEST_GEN_PROGS): helpers.c
base-commit: ddb4c3f25b7b95df3d6932db0b379d768a6ebdf7
prerequisite-patch-id: b901ece2a5b78503e2fb5480f20e304d36a0ea27
--
2.45.0
Fix build error on ppc64:
dev_in_maps.c: In function ‘get_file_dev_and_inode’:
dev_in_maps.c:60:59: error: format ‘%llu’ expects argument of type
‘long long unsigned int *’, but argument 7 has type ‘__u64 *’ {aka ‘long
unsigned int *’} [-Werror=format=]
By switching to unsigned long long for u64 for ppc64 builds.
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
---
tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c b/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
index 759f86e7d263..2862aae58b79 100644
--- a/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
+++ b/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
#define _GNU_SOURCE
+#define __SANE_USERSPACE_TYPES__ // Use ll64
#include <inttypes.h>
#include <unistd.h>
--
2.45.1
Fix warnings like:
openat2_test.c: In function ‘test_openat2_flags’:
openat2_test.c:303:73: warning: format ‘%llX’ expects argument of type
‘long long unsigned int’, but argument 5 has type ‘__u64’ {aka ‘long
unsigned int’} [-Wformat=]
By switching to unsigned long long for u64 for ppc64 builds.
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
---
tools/testing/selftests/openat2/openat2_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/openat2/openat2_test.c b/tools/testing/selftests/openat2/openat2_test.c
index 9024754530b2..5790ab446527 100644
--- a/tools/testing/selftests/openat2/openat2_test.c
+++ b/tools/testing/selftests/openat2/openat2_test.c
@@ -5,6 +5,7 @@
*/
#define _GNU_SOURCE
+#define __SANE_USERSPACE_TYPES__ // Use ll64
#include <fcntl.h>
#include <sched.h>
#include <sys/stat.h>
--
2.45.1
Fix warnings like:
test_cachestat.c: In function ‘print_cachestat’:
test_cachestat.c:30:38: warning: format ‘%llu’ expects argument of
type ‘long long unsigned int’, but argument 2 has type ‘__u64’ {aka
‘long unsigned int’} [-Wformat=]
By switching to unsigned long long for u64 for ppc64 builds.
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
---
tools/testing/selftests/cachestat/test_cachestat.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/cachestat/test_cachestat.c b/tools/testing/selftests/cachestat/test_cachestat.c
index b171fd53b004..632ab44737ec 100644
--- a/tools/testing/selftests/cachestat/test_cachestat.c
+++ b/tools/testing/selftests/cachestat/test_cachestat.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
#define _GNU_SOURCE
+#define __SANE_USERSPACE_TYPES__ // Use ll64
#include <stdio.h>
#include <stdbool.h>
--
2.45.1
v9:
===
Major Changes:
--------------
GVE queue API has been merged. Submitting this version as non-RFC after
rebasing on top of the merged API, and dropped the out of tree queue API
I was carrying on github. Addressed the little feedback v8 has received.
Detailed changelog:
------------------
- Added new patch from David Wei to this series for
netdev_rx_queue_restart()
- Fixed sparse error.
- Removed CONFIG_ checks in netmem_is_net_iov()
- Flipped skb->readable to skb->unreadable
- Minor fixes to selftests & docs.
RFC v8:
=======
Major Changes:
--------------
- Fixed build error generated by patch-by-patch build.
- Applied docs suggestions from Randy.
RFC v7:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the feedback
RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v7/
Detailed changelog:
- Use admin-perm in netlink API.
- Addressed feedback from Jakub with regards to netlink API
implementation.
- Renamed devmem.c functions to something more appropriate for that
file.
- Improve the performance seen through the page_pool benchmark.
- Fix the value definition of all the SO_DEVMEM_* uapi.
- Various fixes to documentation.
Perf - page-pool benchmark:
---------------------------
Improved performance of bench_page_pool_simple.ko tests compared to v6:
https://pastebin.com/raw/v5dYRg8L
net-next base: 8 cycle fast path.
RFC v6: 10 cycle fast path.
RFC v7: 9 cycle fast path.
RFC v7 with CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path,
same as baseline.
Perf - Devmem TCP benchmark:
---------------------
Perf is about the same regardless of the changes in v7, namely the
removal of the static_branch_unlikely to improve the page_pool benchmark
performance:
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
RFC v6:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the little
feedback RFCv5 received.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v6/
This version also comes with some performance data recorded in the cover
letter (see below changelog).
Detailed changelog:
- Rebased on top of the merged netmem_ref changes.
- Converted skb->dmabuf to skb->readable (Pavel). Pavel's original
suggestion was to remove the skb->dmabuf flag entirely, but when I
looked into it closely, I found the issue that if we remove the flag
we have to dereference the shinfo(skb) pointer to obtain the first
frag to tell whether an skb is readable or not. This can cause a
performance regression if it dirties the cache line when the
shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf
flag into a generic skb->readable flag which can be re-used by io_uring
0-copy RX.
- Squashed a few locking optimizations from Eric Dumazet in the RX path
and the DEVMEM_DONTNEED setsockopt.
- Expanded the tests a bit. Added validation for invalid scenarios and
added some more coverage.
Perf - page-pool benchmark:
---------------------------
bench_page_pool_simple.ko tests with and without these changes:
https://pastebin.com/raw/ncHDwAbn
AFAIK the number that really matters in the perf tests is the
'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
cycles without the changes but there is some 1 cycle noise in some
results.
With the patches this regresses to 9 cycles with the changes but there
is 1 cycle noise occasionally running this test repeatedly.
Lastly I tried disable the static_branch_unlikely() in
netmem_is_net_iov() check. To my surprise disabling the
static_branch_unlikely() check reduces the fast path back to 8 cycles,
but the 1 cycle noise remains.
Perf - Devmem TCP benchmark:
---------------------
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
Major changes in RFC v5:
========================
1. Rebased on top of 'Abstract page from net stack' series and used the
new netmem type to refer to LSB set pointers instead of re-using
struct page.
2. Downgraded this series back to RFC and called it RFC v5. This is
because this series is now dependent on 'Abstract page from net
stack'[1] and the queue API. Both are removed from the series to
reduce the patch # and those bits are fairly independent or
pre-requisite work.
3. Reworked the page_pool devmem support to use netmem and for some
more unified handling.
4. Reworked the reference counting of net_iov (renamed from
page_pool_iov) to use pp_ref_count for refcounting.
The full changes including the dependent series and GVE page pool
support is here:
https://github.com/mina/linux/commits/tcpdevmem-rfcv5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774
Major changes in v1:
====================
1. Implemented MVP queue API ndos to remove the userspace-visible
driver reset.
2. Fixed issues in the napi_pp_put_page() devmem frag unref path.
3. Removed RFC tag.
Many smaller addressed comments across all the patches (patches have
individual change log).
Full tree including the rest of the GVE driver changes:
https://github.com/mina/linux/commits/tcpdevmem-v1
Changes in RFC v3:
==================
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergeable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Changes in RFC v2:
==================
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
==================
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this series is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This series is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this series and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: David Wei <dw(a)davidwei.uk>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Yunsheng Lin <linyunsheng(a)huawei.com>
Cc: Shailend Chand <shailend(a)google.com>
Cc: Harshitha Ramamurthy <hramamurthy(a)google.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Jakub Kicinski (1):
net: page_pool: create hooks for custom page providers
Mina Almasry (13):
netdev: add netdev_rx_queue_restart()
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
page_pool: convert to use netmem
page_pool: devmem support
memory-provider: dmabuf devmem memory provider
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
net: add devmem TCP documentation
selftests: add ncdevmem, netcat for devmem TCP
Documentation/netlink/specs/netdev.yaml | 57 +++
Documentation/networking/devmem.rst | 258 +++++++++++
Documentation/networking/index.rst | 1 +
arch/alpha/include/uapi/asm/socket.h | 6 +
arch/mips/include/uapi/asm/socket.h | 6 +
arch/parisc/include/uapi/asm/socket.h | 6 +
arch/sparc/include/uapi/asm/socket.h | 6 +
include/linux/skbuff.h | 61 ++-
include/linux/skbuff_ref.h | 11 +-
include/linux/socket.h | 1 +
include/net/devmem.h | 124 ++++++
include/net/netdev_rx_queue.h | 5 +
include/net/netmem.h | 230 +++++++++-
include/net/page_pool/helpers.h | 157 +++++--
include/net/page_pool/types.h | 33 +-
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/trace/events/page_pool.h | 29 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 19 +
include/uapi/linux/uio.h | 17 +
net/bpf/test_run.c | 5 +-
net/core/Makefile | 3 +-
net/core/datagram.c | 6 +
net/core/dev.c | 6 +-
net/core/devmem.c | 376 ++++++++++++++++
net/core/gro.c | 8 +-
net/core/netdev-genl-gen.c | 23 +
net/core/netdev-genl-gen.h | 6 +
net/core/netdev-genl.c | 107 +++++
net/core/netdev_rx_queue.c | 74 ++++
net/core/page_pool.c | 364 +++++++++-------
net/core/skbuff.c | 83 +++-
net/core/sock.c | 61 +++
net/ipv4/esp4.c | 3 +-
net/ipv4/tcp.c | 254 ++++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 10 +
net/ipv4/tcp_minisocks.c | 2 +
net/ipv4/tcp_output.c | 5 +-
net/ipv6/esp6.c | 3 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 19 +
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 5 +
tools/testing/selftests/net/ncdevmem.c | 542 ++++++++++++++++++++++++
46 files changed, 2756 insertions(+), 267 deletions(-)
create mode 100644 Documentation/networking/devmem.rst
create mode 100644 include/net/devmem.h
create mode 100644 net/core/devmem.c
create mode 100644 net/core/netdev_rx_queue.c
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.45.0.118.g7fe29c98d7-goog
From: Geliang Tang <tanggeliang(a)kylinos.cn>
This patchset uses post_socket_cb and post_connect_cb callbacks of struct
network_helper_opts to refactor do_test() in bpf_tcp_ca.c to move dctcp
test dedicated code out of do_test() into test_dctcp().
v5:
- address Martin's comments in v4 (thanks)
- add patch 4, use start_server_str in test_dctcp_fallback too
- ASSERT_* is already used in settcpca, use this helper in cc_cb (patch 3)
and stg_post_socket_cb (patch 6)
- add ASSERT_* in stg_post_socket_cb in patch 6
v4:
- address Martin's comments in v3 (thanks).
- drop 2 patches, keep "type" as the individual arg to start_server_addr,
connect_to_addr and start_server_str.
v3:
- Add 4 new patches, 1-3 are cleanups. 4 adds a new helper.
- address Martin's comments in v2.
v2:
- rebased on commit "selftests/bpf: Add test for the use of new args in
cong_control"
Geliang Tang (7):
selftests/bpf: Drop struct post_socket_opts
selftests/bpf: Add start_server_str helper
selftests/bpf: Use post_socket_cb in connect_to_fd_opts
selftests/bpf: Use post_socket_cb in start_server_str
selftests/bpf: Use start_server_str in do_test in bpf_tcp_ca
selftests/bpf: Use connect_to_fd_opts in do_test in bpf_tcp_ca
selftests/bpf: Add post_connect_cb callback
tools/testing/selftests/bpf/network_helpers.c | 39 +++--
tools/testing/selftests/bpf/network_helpers.h | 9 +-
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 153 +++++++++++++-----
.../bpf/prog_tests/sockopt_inherit.c | 2 +-
.../bpf/test_tcp_check_syncookie_user.c | 4 +-
5 files changed, 145 insertions(+), 62 deletions(-)
--
2.43.0
In commit b5b73b26b3ca ("taprio: Fix allowing too small intervals"), a
comparison of user input against length_to_duration(q, ETH_ZLEN) was
introduced, to avoid RCU stalls due to frequent hrtimers.
The implementation of length_to_duration() depends on q->picos_per_byte
being set for the link speed. The blamed commit in the Fixes: tag has
moved this too late, so the checks introduced above are ineffective.
The q->picos_per_byte is zero at parse_taprio_schedule() ->
parse_sched_list() -> parse_sched_entry() -> fill_sched_entry() time.
Move the taprio_set_picos_per_byte() call as one of the first things in
taprio_change(), before the bulk of the netlink attribute parsing is
done. That's because it is needed there.
Add a selftest to make sure the issue doesn't get reintroduced.
Fixes: 09dbdf28f9f9 ("net/sched: taprio: fix calculation of maximum gate durations")
Signed-off-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
---
net/sched/sch_taprio.c | 4 +++-
.../tc-testing/tc-tests/qdiscs/taprio.json | 22 +++++++++++++++++++
2 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 1ab17e8a7260..118915055360 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -1848,6 +1848,9 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
}
q->flags = taprio_flags;
+ /* Needed for length_to_duration() during netlink attribute parsing */
+ taprio_set_picos_per_byte(dev, q);
+
err = taprio_parse_mqprio_opt(dev, mqprio, extack, q->flags);
if (err < 0)
return err;
@@ -1907,7 +1910,6 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
if (err < 0)
goto free_sched;
- taprio_set_picos_per_byte(dev, q);
taprio_update_queue_max_sdu(q, new_admin, stab);
if (FULL_OFFLOAD_IS_ENABLED(q->flags))
diff --git a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json
index 12da0a939e3e..8f12f00a4f57 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json
@@ -132,6 +132,28 @@
"echo \"1\" > /sys/bus/netdevsim/del_device"
]
},
+ {
+ "id": "6f62",
+ "name": "Add taprio Qdisc with too short interval",
+ "category": [
+ "qdisc",
+ "taprio"
+ ],
+ "plugins": {
+ "requires": "nsPlugin"
+ },
+ "setup": [
+ "echo \"1 1 8\" > /sys/bus/netdevsim/new_device"
+ ],
+ "cmdUnderTest": "$TC qdisc add dev $ETH root handle 1: taprio num_tc 2 queues 1@0 1@1 sched-entry S 01 300 sched-entry S 02 1700 clockid CLOCK_TAI",
+ "expExitCode": "2",
+ "verifyCmd": "$TC qdisc show dev $ETH",
+ "matchPattern": "qdisc taprio 1: root refcnt",
+ "matchCount": "0",
+ "teardown": [
+ "echo \"1\" > /sys/bus/netdevsim/del_device"
+ ]
+ },
{
"id": "3e1e",
"name": "Add taprio Qdisc with an invalid cycle-time",
--
2.34.1
Hi,
Here's a few fixes that are part of my effort to get all selftests
building cleanly under clang. Plus one that I noticed by inspection.
Enjoy!
thanks,
John Hubbard
John Hubbard (3):
selftests/futex: don't redefine .PHONY targets (all, clean)
selftests/futex: don't pass a const char* to asprintf(3)
selftests/futex: pass _GNU_SOURCE without a value to the compiler
tools/testing/selftests/futex/Makefile | 2 --
tools/testing/selftests/futex/functional/Makefile | 2 +-
tools/testing/selftests/futex/functional/futex_requeue_pi.c | 2 +-
3 files changed, 2 insertions(+), 4 deletions(-)
base-commit: f03359bca01bf4372cf2c118cd9a987a5951b1c8
prerequisite-patch-id: b901ece2a5b78503e2fb5480f20e304d36a0ea27
--
2.45.0
When building with clang, via:
make LLVM=1 -C tools/testing/selftest
...clang warns that "a variable sized type not at the end of a struct or
class is a GNU extension".
These cases are not easily changed, because they involve structs that
are part of the API. Fortunately, however, the tests seem to be doing
just fine (specifically, neither affected test runs any differently with
gcc vs. clang builds, on my test system) regardless of the warning. So,
all the warning is doing is preventing a clean build of selftests/net.
Fix this by suppressing this particular clang warning for the
selftests/net suite.
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
Changes since the first version:
1) Rebased onto Linux 6.10-rc1
thanks,
John Hubbard
tools/testing/selftests/net/Makefile | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index bd01e4a0be2c..9a3b766c8781 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -6,6 +6,10 @@ CFLAGS += -I../../../../usr/include/ $(KHDR_INCLUDES)
# Additional include paths needed by kselftest.h
CFLAGS += -I../
+ifneq ($(LLVM),)
+ CFLAGS += -Wno-gnu-variable-sized-type-not-at-end
+endif
+
TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh \
rtnetlink.sh xfrm_policy.sh test_blackhole_dev.sh
TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh ip_defrag.sh
base-commit: 2bfcfd584ff5ccc8bb7acde19b42570414bf880b
--
2.45.1
When building with clang, via:
make LLVM=1 -C tools/testing/selftest
...clang warns about "taking address of packed member 'write_index' ".
This is not particularly helpful, because the test code really wants to
write to exactly this location, and if it ends up being unaligned, then
the test won't work (and may segfault, depending on the CPU type).
If that ever comes up, it will be obvious and can be fixed. But all it's
doing now is prevent a clean clang build, so disable the warning.
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
Changes since the first version:
1) Rebased onto Linux 6.10-rc1
thanks,
John Hubbard
tools/testing/selftests/user_events/Makefile | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/user_events/Makefile b/tools/testing/selftests/user_events/Makefile
index 10fcd0066203..617e94344711 100644
--- a/tools/testing/selftests/user_events/Makefile
+++ b/tools/testing/selftests/user_events/Makefile
@@ -1,5 +1,10 @@
# SPDX-License-Identifier: GPL-2.0
CFLAGS += -Wl,-no-as-needed -Wall $(KHDR_INCLUDES)
+
+ifneq ($(LLVM),)
+ CFLAGS += -Wno-address-of-packed-member
+endif
+
LDLIBS += -lrt -lpthread -lm
TEST_GEN_PROGS = ftrace_test dyn_test perf_test abi_test
base-commit: 2bfcfd584ff5ccc8bb7acde19b42570414bf880b
--
2.45.1
Some subtests can be unstable, failing once every X runs. Fixing them
can take time: there could be an issue in the kernel or in the subtest,
and it is then important to do a proper analysis, not to hide real bugs.
To avoid creating noises on the different CIs where tests are more
unstable than on our side, some subtests have been marked as flaky. As a
result, errors with these subtests (if any) are ignored.
Note that the MPTCP CI will continue to track these flaky subtests. All
these unstable subtests are also tracked by our bug tracker.
These are fixes for the -net tree, because the instabilities are visible
there. The first patch introducing the flake support has no 'Fixes'
tags, mainly because it requires recent and important refactoring done
in all MPTCP selftests. Backporting that to old versions where the flaky
tests have been introduced would be too difficult, and probably not
worth it. The other patches, adding MPTCP_LIB_SUBTEST_FLAKY=1, have a
Fixes tag, simply to ease the backport of the future fixes removing them
along with the proper fix.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Matthieu Baerts (NGI0) (4):
selftests: mptcp: lib: support flaky subtests
selftests: mptcp: simult flows: mark 'unbalanced' tests as flaky
selftests: mptcp: join: mark 'fastclose' tests as flaky
selftests: mptcp: join: mark 'fail' tests as flaky
tools/testing/selftests/net/mptcp/mptcp_join.sh | 10 +++++++-
tools/testing/selftests/net/mptcp/mptcp_lib.sh | 30 +++++++++++++++++++++--
tools/testing/selftests/net/mptcp/simult_flows.sh | 6 ++---
3 files changed, 40 insertions(+), 6 deletions(-)
---
base-commit: 0b4f5add9fa59bfd42c1030f572db2e4c395181b
change-id: 20240524-upstream-net-20240524-selftests-mptcp-flaky-5b6836a59f72
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
When building with clang, via:
make LLVM=1 -C tools/testing/selftest
...clang warns about several cases of using a signed integer for the
priority argument to mq_receive(3), which expects an unsigned int.
Fix this by declaring the type as unsigned int in all cases.
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
Changes since the first version:
1) Rebased onto Linux 6.10-rc1
2) Reviewed-by's added.
thanks,
John Hubbard
tools/testing/selftests/mqueue/mq_perf_tests.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mqueue/mq_perf_tests.c b/tools/testing/selftests/mqueue/mq_perf_tests.c
index 5c16159d0bcd..fb898850867c 100644
--- a/tools/testing/selftests/mqueue/mq_perf_tests.c
+++ b/tools/testing/selftests/mqueue/mq_perf_tests.c
@@ -323,7 +323,8 @@ void *fake_cont_thread(void *arg)
void *cont_thread(void *arg)
{
char buff[MSG_SIZE];
- int i, priority;
+ int i;
+ unsigned int priority;
for (i = 0; i < num_cpus_to_pin; i++)
if (cpu_threads[i] == pthread_self())
@@ -425,7 +426,8 @@ struct test test2[] = {
void *perf_test_thread(void *arg)
{
char buff[MSG_SIZE];
- int prio_out, prio_in;
+ int prio_out;
+ unsigned int prio_in;
int i;
clockid_t clock;
pthread_t *t;
base-commit: 2bfcfd584ff5ccc8bb7acde19b42570414bf880b
--
2.45.1
From: "Alessandro Carminati (Red Hat)" <alessandro.carminati(a)gmail.com>
[ Upstream commit f803bcf9208a2540acb4c32bdc3616673169f490 ]
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gm…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 7c76b841b17bb..21bde60c95230 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -71,7 +71,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l -p "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -92,6 +91,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -183,6 +192,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -194,6 +204,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.43.0
From: "Alessandro Carminati (Red Hat)" <alessandro.carminati(a)gmail.com>
[ Upstream commit f803bcf9208a2540acb4c32bdc3616673169f490 ]
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gm…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 7c76b841b17bb..21bde60c95230 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -71,7 +71,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l -p "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -92,6 +91,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -183,6 +192,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -194,6 +204,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.43.0
From: "Alessandro Carminati (Red Hat)" <alessandro.carminati(a)gmail.com>
[ Upstream commit f803bcf9208a2540acb4c32bdc3616673169f490 ]
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gm…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 088fcad138c98..38c6e9f16f41e 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -71,7 +71,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -92,6 +91,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -189,6 +198,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -200,6 +210,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.43.0
From: "Alessandro Carminati (Red Hat)" <alessandro.carminati(a)gmail.com>
[ Upstream commit f803bcf9208a2540acb4c32bdc3616673169f490 ]
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gm…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 334bdfeab9403..365a2c7a89bad 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -72,7 +72,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -93,6 +92,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -190,6 +199,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -201,6 +211,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.43.0