6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ Upstream commit 3f47c1ebe5ca9c5883e596c7888dec4bec0176d8 ]
The GCC 13.2.0 compiler issued the following warning:
mixer-test.c: In function ‘ctl_value_index_valid’:
mixer-test.c:322:79: warning: format ‘%lld’ expects argument of type ‘long long int’, \
but argument 5 has type ‘long int’ [-Wformat=]
322 | ksft_print_msg("%s.%d value %lld more than maximum %lld\n",
| ~~~^
| |
| long long int
| %ld
323 | ctl->name, index, int64_val,
324 | snd_ctl_elem_info_get_max(ctl->info));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| |
| long int
Fixing the format specifier as advised by the compiler suggestion removes the
warning.
Fixes: 3f48b137d88e7 ("kselftest: alsa: Factor out check that values meet constraints")
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Jaroslav Kysela <perex(a)perex.cz>
Cc: Takashi Iwai <tiwai(a)suse.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: linux-sound(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
Acked-by: Mark Brown <broonie(a)kernel.org>
Link: https://lore.kernel.org/r/20240107173704.937824-3-mirsad.todorovac@alu.uniz…
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/alsa/mixer-test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/alsa/mixer-test.c b/tools/testing/selftests/alsa/mixer-test.c
index 208c2170c074..df942149c6f6 100644
--- a/tools/testing/selftests/alsa/mixer-test.c
+++ b/tools/testing/selftests/alsa/mixer-test.c
@@ -319,7 +319,7 @@ static bool ctl_value_index_valid(struct ctl_data *ctl,
}
if (int64_val > snd_ctl_elem_info_get_max64(ctl->info)) {
- ksft_print_msg("%s.%d value %lld more than maximum %lld\n",
+ ksft_print_msg("%s.%d value %lld more than maximum %ld\n",
ctl->name, index, int64_val,
snd_ctl_elem_info_get_max(ctl->info));
return false;
--
2.43.0
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ Upstream commit 8c51c13dc63d46e754c44215eabc0890a8bd9bfb ]
Minor fix in the number of arguments to error reporting function in the
test program as reported by GCC 13.2.0 warning.
mixer-test.c: In function ‘find_controls’:
mixer-test.c:169:44: warning: too many arguments for format [-Wformat-extra-args]
169 | ksft_exit_fail_msg("snd_ctl_poll_descriptors() failed for %d\n",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The number of arguments in call to ksft_exit_fail_msg() doesn't correspond
to the format specifiers, so this is adjusted resembling the sibling calls
to the error function.
Fixes: b1446bda56456 ("kselftest: alsa: Check for event generation when we write to controls")
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Jaroslav Kysela <perex(a)perex.cz>
Cc: Takashi Iwai <tiwai(a)suse.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: linux-sound(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
Acked-by: Mark Brown <broonie(a)kernel.org>
Link: https://lore.kernel.org/r/20240107173704.937824-2-mirsad.todorovac@alu.uniz…
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/alsa/mixer-test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/alsa/mixer-test.c b/tools/testing/selftests/alsa/mixer-test.c
index 23df154fcdd7..208c2170c074 100644
--- a/tools/testing/selftests/alsa/mixer-test.c
+++ b/tools/testing/selftests/alsa/mixer-test.c
@@ -166,7 +166,7 @@ static void find_controls(void)
err = snd_ctl_poll_descriptors(card_data->handle,
&card_data->pollfd, 1);
if (err != 1) {
- ksft_exit_fail_msg("snd_ctl_poll_descriptors() failed for %d\n",
+ ksft_exit_fail_msg("snd_ctl_poll_descriptors() failed for card %d: %d\n",
card, err);
}
--
2.43.0
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ Upstream commit 3f47c1ebe5ca9c5883e596c7888dec4bec0176d8 ]
The GCC 13.2.0 compiler issued the following warning:
mixer-test.c: In function ‘ctl_value_index_valid’:
mixer-test.c:322:79: warning: format ‘%lld’ expects argument of type ‘long long int’, \
but argument 5 has type ‘long int’ [-Wformat=]
322 | ksft_print_msg("%s.%d value %lld more than maximum %lld\n",
| ~~~^
| |
| long long int
| %ld
323 | ctl->name, index, int64_val,
324 | snd_ctl_elem_info_get_max(ctl->info));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| |
| long int
Fixing the format specifier as advised by the compiler suggestion removes the
warning.
Fixes: 3f48b137d88e7 ("kselftest: alsa: Factor out check that values meet constraints")
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Jaroslav Kysela <perex(a)perex.cz>
Cc: Takashi Iwai <tiwai(a)suse.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: linux-sound(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
Acked-by: Mark Brown <broonie(a)kernel.org>
Link: https://lore.kernel.org/r/20240107173704.937824-3-mirsad.todorovac@alu.uniz…
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/alsa/mixer-test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/alsa/mixer-test.c b/tools/testing/selftests/alsa/mixer-test.c
index 208c2170c074..df942149c6f6 100644
--- a/tools/testing/selftests/alsa/mixer-test.c
+++ b/tools/testing/selftests/alsa/mixer-test.c
@@ -319,7 +319,7 @@ static bool ctl_value_index_valid(struct ctl_data *ctl,
}
if (int64_val > snd_ctl_elem_info_get_max64(ctl->info)) {
- ksft_print_msg("%s.%d value %lld more than maximum %lld\n",
+ ksft_print_msg("%s.%d value %lld more than maximum %ld\n",
ctl->name, index, int64_val,
snd_ctl_elem_info_get_max(ctl->info));
return false;
--
2.43.0
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ Upstream commit 8c51c13dc63d46e754c44215eabc0890a8bd9bfb ]
Minor fix in the number of arguments to error reporting function in the
test program as reported by GCC 13.2.0 warning.
mixer-test.c: In function ‘find_controls’:
mixer-test.c:169:44: warning: too many arguments for format [-Wformat-extra-args]
169 | ksft_exit_fail_msg("snd_ctl_poll_descriptors() failed for %d\n",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The number of arguments in call to ksft_exit_fail_msg() doesn't correspond
to the format specifiers, so this is adjusted resembling the sibling calls
to the error function.
Fixes: b1446bda56456 ("kselftest: alsa: Check for event generation when we write to controls")
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Jaroslav Kysela <perex(a)perex.cz>
Cc: Takashi Iwai <tiwai(a)suse.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: linux-sound(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
Acked-by: Mark Brown <broonie(a)kernel.org>
Link: https://lore.kernel.org/r/20240107173704.937824-2-mirsad.todorovac@alu.uniz…
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/alsa/mixer-test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/alsa/mixer-test.c b/tools/testing/selftests/alsa/mixer-test.c
index 23df154fcdd7..208c2170c074 100644
--- a/tools/testing/selftests/alsa/mixer-test.c
+++ b/tools/testing/selftests/alsa/mixer-test.c
@@ -166,7 +166,7 @@ static void find_controls(void)
err = snd_ctl_poll_descriptors(card_data->handle,
&card_data->pollfd, 1);
if (err != 1) {
- ksft_exit_fail_msg("snd_ctl_poll_descriptors() failed for %d\n",
+ ksft_exit_fail_msg("snd_ctl_poll_descriptors() failed for card %d: %d\n",
card, err);
}
--
2.43.0
Currently the seccomp benchmark selftest produces non-standard output,
meaning that while it makes a number of checks of the performance it
observes this has to be parsed by humans. This means that automated
systems running this suite of tests are almost certainly ignoring the
results which isn't ideal for spotting problems. Let's rework things so
that each check that the program does is reported as a test result to
the framework.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.8-rc1.
- Link to v1: https://lore.kernel.org/r/20231219-b4-kselftest-seccomp-benchmark-ktap-v1-0…
---
Mark Brown (2):
kselftest/seccomp: Use kselftest output functions for benchmark
kselftest/seccomp: Report each expectation we assert as a KTAP test
.../testing/selftests/seccomp/seccomp_benchmark.c | 105 +++++++++++++--------
1 file changed, 65 insertions(+), 40 deletions(-)
---
base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
change-id: 20231219-b4-kselftest-seccomp-benchmark-ktap-357603823708
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Users can now select a
desired address space using a non-zero hint address to mmap. Previously,
requesting the default address space from mmap by passing zero as the hint
address would result in using the largest address space possible. Some
applications depend on empty bits in the virtual address space, like Go and
Java, so this patch provides more flexibility for application developers.
-Charlie
---
v10:
- Move pgtable.h defintions into a no __ASSEMBLY__ region to resolve compilation
conflicts (pointed out by Conor)
- Will now compile with allmodconfig
v9:
- Raise the mmap_end default to STACK_TOP_MAX to allow the address space to grow
beyond the default of sv48 on sv57 machines as suggested by Alexandre
- Some of the mmap macros had unnecessary conditionals that I have removed
v8:
- Fix RV32 and the RV32 compat mode of RV64 (suggested by Conor)
- Extract out addr and base from the mmap macros (suggested by Alexandre)
v7:
- Changing RLIMIT_STACK inside of an executing program does not trigger
arch_pick_mmap_layout(), so rewrite tests to change RLIMIT_STACK from a
script before executing tests. RLIMIT_STACK of infinity forces bottomup
mmap allocation.
- Make arch_get_mmap_base macro more readible by extracting out the rnd
calculation.
- Use MMAP_MIN_VA_BITS in TASK_UNMAPPED_BASE to support case when mmap
attempts to allocate address smaller than DEFAULT_MAP_WINDOW.
- Fix incorrect wording in documentation.
v6:
- Rebase onto the correct base
v5:
- Minor wording change in documentation
- Change some parenthesis in arch_get_mmap_ macros
- Added case for addr==0 in arch_get_mmap_ because without this, programs would
crash if RLIMIT_STACK was modified before executing the program. This was
tested using the libhugetlbfs tests.
v4:
- Split testcases/document patch into test cases, in-code documentation, and
formal documentation patches
- Modified the mmap_base macro to be more legible and better represent memory
layout
- Fixed documentation to better reflect the implmentation
- Renamed DEFAULT_VA_BITS to MMAP_VA_BITS
- Added additional test case for rlimit changes
---
Charlie Jenkins (4):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Document mmap changes
Documentation/riscv/vm-layout.rst | 22 +++++++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 33 ++++++++--
arch/riscv/include/asm/processor.h | 52 +++++++++++++--
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 2 +
tools/testing/selftests/riscv/mm/Makefile | 15 +++++
.../riscv/mm/testcases/mmap_bottomup.c | 35 ++++++++++
.../riscv/mm/testcases/mmap_default.c | 35 ++++++++++
.../selftests/riscv/mm/testcases/mmap_test.h | 64 +++++++++++++++++++
.../selftests/riscv/mm/testcases/run_mmap.sh | 12 ++++
11 files changed, 261 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_default.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_test.h
create mode 100755 tools/testing/selftests/riscv/mm/testcases/run_mmap.sh
--
2.34.1
Changes in v6:
- Rebased on top of 70d201a40823 (thanks Alexander Gordeev!)
- Resolved a conflict because of 43e8832fed08 being reverted
- Resolved a missing static declaration for lp_sys_getpid, since
-Wmissing-prototypes warning was enabled.
- Retested everything, from running the livepatch selftests from kernel
source, running from a directory here the testes were installed (Joe's
usecase), and running from a gen_tar'ed directory. All of them
executed correctly.
- Added Petr review tags (Thanks!)
- Link to v5: https://lore.kernel.org/r/20240109-send-lp-kselftests-v5-0-364d59a69f12@sus…
Changes in v5:
* Fixed an issue found by Joe that copied Kbuild files along with the
test modules to the installation directory.
* Added Joe Lawrense review tags.
Changes in v4:
* Documented how to compile the livepatch selftests without running the
tests (Joe)
* Removed the mention to lib/livepatch on MAINTAINERS file, reported by
checkpatch.
Changes in v3:
* Rebased on top of v6.6-rc5
* The commits messages were improved (Thanks Petr!)
* Created TEST_GEN_MODS_DIR variable to point to a directly that contains kernel
modules, and adapt selftests to build it before running the test.
* Moved test_klp-call_getpid out of test_programs, since the gen_tar
would just copy the generated test programs to the livepatches dir,
and so scripts relying on test_programs/test_klp-call_getpid will fail.
* Added a module_param for klp_pids, describing it's usage.
* Simplified the call_getpid program to ignore the return of getpid syscall,
since we only want to make sure the process transitions correctly to the
patched stated
* The test-syscall.sh not prints a log message showing the number of remaining
processes to transition into to livepatched state, and check_output expects it
to be 0.
* Added MODULE_AUTHOR and MODULE_DESCRIPTION to test_klp_syscall.c
- Link to v3: https://lore.kernel.org/r/20231031-send-lp-kselftests-v3-0-2b1655c2605f@sus…
- Link to v2: https://lore.kernel.org/linux-kselftest/20220630141226.2802-1-mpdesouza@sus…
This patchset moves the current kernel testing livepatch modules from
lib/livepatches to tools/testing/selftest/livepatch/test_modules, and compiles
them as out-of-tree modules before testing.
There is also a new test being added. This new test exercises multiple processes
calling a syscall, while a livepatch patched the syscall.
Why this move is an improvement:
* The modules are now compiled as out-of-tree modules against the current
running kernel, making them capable of being tested on different systems with
newer or older kernels.
* Such approach now needs kernel-devel package to be installed, since they are
out-of-tree modules. These can be generated by running "make rpm-pkg" in the
kernel source.
What needs to be solved:
* Currently gen_tar only packages the resulting binaries of the tests, and not
the sources. For the current approach, the newly added modules would be
compiled and then packaged. It works when testing on a system with the same
kernel version. But it will fail when running on a machine with different kernel
version, since module was compiled against the kernel currently running.
This is not a new problem, just aligning the expectations. For the current
approach to be truly system agnostic gen_tar would need to include the module
and program sources to be compiled in the target systems.
Thanks in advance!
Marcos
Signed-off-by: Marcos Paulo de Souza <mpdesouza(a)suse.com>
---
Marcos Paulo de Souza (3):
kselftests: lib.mk: Add TEST_GEN_MODS_DIR variable
livepatch: Move tests from lib/livepatch to selftests/livepatch
selftests: livepatch: Test livepatching a heavily called syscall
Documentation/dev-tools/kselftest.rst | 4 +
MAINTAINERS | 1 -
arch/s390/configs/debug_defconfig | 1 -
arch/s390/configs/defconfig | 1 -
lib/Kconfig.debug | 22 ----
lib/Makefile | 2 -
lib/livepatch/Makefile | 14 ---
tools/testing/selftests/lib.mk | 26 ++++-
tools/testing/selftests/livepatch/Makefile | 5 +-
tools/testing/selftests/livepatch/README | 25 +++--
tools/testing/selftests/livepatch/config | 1 -
tools/testing/selftests/livepatch/functions.sh | 34 +++---
.../testing/selftests/livepatch/test-callbacks.sh | 50 ++++-----
tools/testing/selftests/livepatch/test-ftrace.sh | 6 +-
.../testing/selftests/livepatch/test-livepatch.sh | 10 +-
.../selftests/livepatch/test-shadow-vars.sh | 2 +-
tools/testing/selftests/livepatch/test-state.sh | 18 ++--
tools/testing/selftests/livepatch/test-syscall.sh | 53 ++++++++++
tools/testing/selftests/livepatch/test-sysfs.sh | 6 +-
.../selftests/livepatch/test_klp-call_getpid.c | 44 ++++++++
.../selftests/livepatch/test_modules/Makefile | 20 ++++
.../test_modules}/test_klp_atomic_replace.c | 0
.../test_modules}/test_klp_callbacks_busy.c | 0
.../test_modules}/test_klp_callbacks_demo.c | 0
.../test_modules}/test_klp_callbacks_demo2.c | 0
.../test_modules}/test_klp_callbacks_mod.c | 0
.../livepatch/test_modules}/test_klp_livepatch.c | 0
.../livepatch/test_modules}/test_klp_shadow_vars.c | 0
.../livepatch/test_modules}/test_klp_state.c | 0
.../livepatch/test_modules}/test_klp_state2.c | 0
.../livepatch/test_modules}/test_klp_state3.c | 0
.../livepatch/test_modules/test_klp_syscall.c | 116 +++++++++++++++++++++
32 files changed, 340 insertions(+), 121 deletions(-)
---
base-commit: 70d201a40823acba23899342d62bc2644051ad2e
change-id: 20231031-send-lp-kselftests-4c917dcd4565
Best regards,
--
Marcos Paulo de Souza <mpdesouza(a)suse.com>
On systems with 64k page size and 512M huge page sizes, the allocation
and test succeeds but errors out at the munmap. As the comment states,
munmap will failure if its not HUGEPAGE aligned. This is due to the
length of the mapping being 1/2 the size of the hugepage causing the
munmap to not be hugepage aligned. Fix this by making the mapping length
the full hugepage if the hugepage is larger than the length of the
mapping.
Signed-off-by: Nico Pache <npache(a)redhat.com>
---
tools/testing/selftests/mm/map_hugetlb.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/tools/testing/selftests/mm/map_hugetlb.c b/tools/testing/selftests/mm/map_hugetlb.c
index 193281560b61..86e8f2048a40 100644
--- a/tools/testing/selftests/mm/map_hugetlb.c
+++ b/tools/testing/selftests/mm/map_hugetlb.c
@@ -15,6 +15,7 @@
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
+#include "vm_util.h"
#define LENGTH (256UL*1024*1024)
#define PROTECTION (PROT_READ | PROT_WRITE)
@@ -58,10 +59,16 @@ int main(int argc, char **argv)
{
void *addr;
int ret;
+ size_t hugepage_size;
size_t length = LENGTH;
int flags = FLAGS;
int shift = 0;
+ hugepage_size = default_huge_page_size();
+ /* munmap with fail if the length is not page aligned */
+ if (hugepage_size > length)
+ length = hugepage_size;
+
if (argc > 1)
length = atol(argv[1]) << 20;
if (argc > 2) {
--
2.43.0
ksm_tests was previously mmapping a region of memory, aligning the
returned pointer to a PMD boundary, then setting MADV_HUGEPAGE, but was
setting it past the end of the mmapped area due to not taking the
pointer alignment into consideration. Fix this behaviour.
Up until commit efa7df3e3bb5 ("mm: align larger anonymous mappings on
THP boundaries"), this buggy behavior was (usually) masked because the
alignment difference was always less than PMD-size. But since the
mentioned commit, `ksm_tests -H -s 100` started failing.
Fixes: 325254899684 ("selftests: vm: add KSM huge pages merging time test")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
---
Applies on top of mm-unstable.
Thanks,
Ryan
tools/testing/selftests/mm/ksm_tests.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/ksm_tests.c b/tools/testing/selftests/mm/ksm_tests.c
index 380b691d3eb9..b748c48908d9 100644
--- a/tools/testing/selftests/mm/ksm_tests.c
+++ b/tools/testing/selftests/mm/ksm_tests.c
@@ -566,7 +566,7 @@ static int ksm_merge_hugepages_time(int merge_type, int mapping, int prot,
if (map_ptr_orig == MAP_FAILED)
err(2, "initial mmap");
- if (madvise(map_ptr, len + HPAGE_SIZE, MADV_HUGEPAGE))
+ if (madvise(map_ptr, len, MADV_HUGEPAGE))
err(2, "MADV_HUGEPAGE");
pagemap_fd = open("/proc/self/pagemap", O_RDONLY);
--
2.25.1
Calling get_system_loc_code before checking devfd and errno - fails the test
when the device is not available, expected a SKIP.
Change the order of 'SKIP_IF_MSG' correctly SKIP when the /dev/papr-vpd device
is not available.
with out patch: Test FAILED on line 271
with patch: [SKIP] Test skipped on line 266: /dev/papr-vpd not present
Signed-off-by: R Nageswara Sastry <rnsastry(a)linux.ibm.com>
---
tools/testing/selftests/powerpc/papr_vpd/papr_vpd.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/powerpc/papr_vpd/papr_vpd.c b/tools/testing/selftests/powerpc/papr_vpd/papr_vpd.c
index 98cbb9109ee6..505294da1b9f 100644
--- a/tools/testing/selftests/powerpc/papr_vpd/papr_vpd.c
+++ b/tools/testing/selftests/powerpc/papr_vpd/papr_vpd.c
@@ -263,10 +263,10 @@ static int papr_vpd_system_loc_code(void)
off_t size;
int fd;
- SKIP_IF_MSG(get_system_loc_code(&lc),
- "Cannot determine system location code");
SKIP_IF_MSG(devfd < 0 && errno == ENOENT,
DEVPATH " not present");
+ SKIP_IF_MSG(get_system_loc_code(&lc),
+ "Cannot determine system location code");
FAIL_IF(devfd < 0);
--
2.37.1 (Apple Git-137.1)
By allowing the filter_glob parameter to be written to, it's possible to
tweak the testsuites that will be executed on new module loads. This
makes it easier to run specific tests without having to reload kunit and
provides a way to filter tests on real HW even if kunit is builtin.
Example for xe driver:
1) Run just 1 test
# echo -n xe_bo > /sys/module/kunit/parameters/filter_glob
# modprobe -r xe_live_test
# modprobe xe_live_test
# ls /sys/kernel/debug/kunit/
xe_bo
2) Run all tests
# echo \* > /sys/module/kunit/parameters/filter_glob
# modprobe -r xe_live_test
# modprobe xe_live_test
# ls /sys/kernel/debug/kunit/
xe_bo xe_dma_buf xe_migrate xe_mocs
References: https://lore.kernel.org/intel-xe/dzacvbdditbneiu3e3fmstjmttcbne44yspumpkd6s…
Signed-off-by: Lucas De Marchi <lucas.demarchi(a)intel.com>
---
lib/kunit/executor.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/kunit/executor.c b/lib/kunit/executor.c
index 1236b3cd2fbb..30ed9d321c19 100644
--- a/lib/kunit/executor.c
+++ b/lib/kunit/executor.c
@@ -31,7 +31,7 @@ static char *filter_glob_param;
static char *filter_param;
static char *filter_action_param;
-module_param_named(filter_glob, filter_glob_param, charp, 0400);
+module_param_named(filter_glob, filter_glob_param, charp, 0600);
MODULE_PARM_DESC(filter_glob,
"Filter which KUnit test suites/tests run at boot-time, e.g. list* or list*.*del_test");
module_param_named(filter, filter_param, charp, 0400);
--
2.40.1
If there is more than 32 cpus the bitmask will start to contain
commas, leading to:
./rps_default_mask.sh: line 36: [: 00000000,00000000: integer expression expected
Remove the commas, bash doesn't interpret leading zeroes as oct
so that should be good enough.
Fixes: c12e0d5f267d ("self-tests: introduce self-tests for RPS default mask")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
v2:
- remove all commas
v1: https://lore.kernel.org/all/20240119151248.3476897-1-kuba@kernel.org/
CC: shuah(a)kernel.org
CC: horms(a)kernel.org
CC: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/net/rps_default_mask.sh | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/net/rps_default_mask.sh b/tools/testing/selftests/net/rps_default_mask.sh
index a26c5624429f..4729e7026a73 100755
--- a/tools/testing/selftests/net/rps_default_mask.sh
+++ b/tools/testing/selftests/net/rps_default_mask.sh
@@ -33,6 +33,10 @@ chk_rps() {
rps_mask=$($cmd /sys/class/net/$dev_name/queues/rx-0/rps_cpus)
printf "%-60s" "$msg"
+
+ # In case there is more than 32 CPUs we need to remove commas from masks
+ rps_mask=${rps_mask//,}
+ expected_rps_mask=${expected_rps_mask//,}
if [ $rps_mask -eq $expected_rps_mask ]; then
echo "[ ok ]"
else
--
2.43.0
From: Jeff Xu <jeffxu(a)chromium.org>
This patchset proposes a new mseal() syscall for the Linux kernel.
In a nutshell, mseal() protects the VMAs of a given virtual memory
range against modifications, such as changes to their permission bits.
Modern CPUs support memory permissions, such as the read/write (RW)
and no-execute (NX) bits. Linux has supported NX since the release of
kernel version 2.6.8 in August 2004 [1]. The memory permission feature
improves the security stance on memory corruption bugs, as an attacker
cannot simply write to arbitrary memory and point the code to it. The
memory must be marked with the X bit, or else an exception will occur.
Internally, the kernel maintains the memory permissions in a data
structure called VMA (vm_area_struct). mseal() additionally protects
the VMA itself against modifications of the selected seal type.
Memory sealing is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For
example, such an attacker primitive can break control-flow integrity
guarantees since read-only memory that is supposed to be trusted can
become writable or .text pages can get remapped. Memory sealing can
automatically be applied by the runtime loader to seal .text and
.rodata pages and applications can additionally seal security critical
data at runtime. A similar feature already exists in the XNU kernel
with the VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the
mimmutable syscall [4]. Also, Chrome wants to adopt this feature for
their CFI work [2] and this patchset has been designed to be
compatible with the Chrome use case.
The new mseal() is an architecture independent syscall, and with
following signature:
mseal(void addr, size_t len, unsigned long types, unsigned long flags)
addr/len: memory range. Must be continuous/allocated memory, or else
mseal() will fail and no VMA is updated. For details on acceptable
arguments, please refer to documentation patch (mseal.rst) of this
patch set. Those are also fully covered by the selftest.
types: bit mask to specify the sealing types.
MM_SEAL_BASE
MM_SEAL_PROT_PKEY
MM_SEAL_DISCARD_RO_ANON
MM_SEAL_SEAL
The MM_SEAL_BASE:
The base package includes the features common to all VMA sealing
types. It prevents sealed VMAs from:
1> Unmapping, moving to another location, and shrinking the size, via
munmap() and mremap(), can leave an empty space, therefore can be
replaced with a VMA with a new set of attributes.
2> Move or expand a different vma into the current location, via mremap().
3> Modifying sealed VMA via mmap(MAP_FIXED).
4> Size expansion, via mremap(), does not appear to pose any specific
risks to sealed VMAs. It is included anyway because the use case is
unclear. In any case, users can rely on merging to expand a sealed
VMA.
We consider the MM_SEAL_BASE feature, on which other sealing features
will depend. For instance, it probably does not make sense to seal
PROT_PKEY without sealing the BASE, and the kernel will implicitly add
SEAL_BASE for SEAL_PROT_PKEY.
The MM_SEAL_PROT_PKEY:
Seal PROT and PKEY of the address range, i.e. mprotect() and
pkey_mprotect() will be denied if the memory is sealed with
MM_SEAL_PROT_PKEY.
The MM_SEAL_DISCARD_RO_ANON
Certain types of madvise() operations are destructive [6], such as
MADV_DONTNEED, which can effectively alter region contents by
discarding pages, especially when memory is anonymous. This blocks
such operations for anonymous memory which is not writable to the
user.
The MM_SEAL_SEAL
MM_SEAL_SEAL denies adding a new seal for an VMA.
This is similar to F_SEAL_SEAL in fcntl.
The idea that inspired this patch comes from Stephen Röttger’s work in
V8 CFI [5]. Chrome browser in ChromeOS will be the first user of this
API.
Indeed, the Chrome browser has very specific requirements for sealing,
which are distinct from those of most applications. For example, in
the case of libc, sealing is only applied to read-only (RO) or
read-execute (RX) memory segments (such as .text and .RELRO) to
prevent them from becoming writable, the lifetime of those mappings
are tied to the lifetime of the process.
Chrome wants to seal two large address space reservations that are
managed by different allocators. The memory is mapped RW- and RWX
respectively but write access to it is restricted using pkeys (or in
the future ARM permission overlay extensions). The lifetime of those
mappings are not tied to the lifetime of the process, therefore, while
the memory is sealed, the allocators still need to free or discard the
unused memory. For example, with madvise(DONTNEED).
However, always allowing madvise(DONTNEED) on this range poses a
security risk. For example if a jump instruction crosses a page
boundary and the second page gets discarded, it will overwrite the
target bytes with zeros and change the control flow. Checking
write-permission before the discard operation allows us to control
when the operation is valid. In this case, the madvise will only
succeed if the executing thread has PKEY write permissions and PKRU
changes are protected in software by control-flow integrity.
Although the initial version of this patch series is targeting the
Chrome browser as its first user, it became evident during upstream
discussions that we would also want to ensure that the patch set
eventually is a complete solution for memory sealing and compatible
with other use cases. The specific scenario currently in mind is
glibc's use case of loading and sealing ELF executables. To this end,
Stephen is working on a change to glibc to add sealing support to the
dynamic linker, which will seal all non-writable segments at startup.
Once this work is completed, all applications will be able to
automatically benefit from these new protections.
--------------------------------------------------------------------
Change history:
===============
V3:
- Abandon per-syscall approach, (Suggested by Linus Torvalds).
- Organize sealing types around their functionality, such as
MM_SEAL_BASE, MM_SEAL_PROT_PKEY.
- Extend the scope of sealing from calls originated in userspace to
both kernel and userspace. (Suggested by Linus Torvalds)
- Add seal type support in mmap(). (Suggested by Pedro Falcato)
- Add a new sealing type: MM_SEAL_DISCARD_RO_ANON to prevent
destructive operations of madvise. (Suggested by Jann Horn and
Stephen Röttger)
- Make sealed VMAs mergeable. (Suggested by Jann Horn)
- Add MAP_SEALABLE to mmap() (Detail see new discussions)
- Add documentation - mseal.rst
Work in progress:
=================
- update man page for mseal() and mmap()
Open discussions:
=================
Several open discussions from V1/V2 were not incorporated into V3. I
believe these are worth more discussion for future versions of this
patch series.
1> mseal() vs mimmutable()
mseal(bitmasks for multiple seal types)
BASE + PROT_PKEY+ MM_SEAL_DISCARD_RO_ANON
Apply PROT_PKEY implies BASE, same for DISCARD_RO_ANON
mimmutable() (openBSD)
This is equal to SEAL_BASE + SEAL_PROT_PKEY in mseal()
Plus allowing downgrade from W=>NW (OpenBSD)
Doesn’t have MM_SEAL_DISCARD_RO_ANON
mimmutable() is designed for memory sealing in libc, and mseal()
is designed for both Chrome browser and libc.
For the two memory ranges that Chrome browser wants to seal, as
discussed previously, the allocator still needs to free or discard
memory for the sealed memory. For performance reasons, we have
explored two solutions in the past: first, using PKEY-based munmap()
[7], and second, separating SEAL_MPROTECT (v1 of this patch set).
Recently, we have experimented with an alternative approach that
allows us to remove the separation of SEAL_MPROTECT. For those two
memory ranges, Chrome browser will use BASE + PROT_PKEY +
DISCARD_RO_ANON for all its sealing needs.
In the case of libc, the .text segment can be sealed with the BASE and
PROT_PKEY, and the RO data segments can be sealed with the BASE +
PROT_PKEY + DISCARD_RO_ANON.
From a flexibility standpoint, separating BASE out could be beneficial
for future extensions of sealing features. For instance, applications
might desire downgradable "prot" permissions (X=>NX, W=>NW, R=>NR),
which would conflict with SEAL_PROT_PKEY.
The more sealing features integrated into a single sealing type, the
fewer applications can utilize these features. For example, some
applications might programmatically require DISCARD_RO_ANON memory,
which would render such VMA unsuitable for sealing.
I'd like to get the community's input on this. From Chrome's
perspective, the separation isn't as important anymore, at least in
the short term. However, I prefer the multiple bits approach because
it's more extensible in the long term.
2> mseal() vs mprotect() vs madvise() for setting the seal.
mprotect():
Using prot field, but prot supports unset. It's workable, i.e. let
applications carry the sealing type and set in all subsequent calls to
mprotect(), but it feels like this is an extra thing to care about.
madvise():
uses enum, multiple sealing types might require multiple roundtrips.
IMO: sealing is a major departure from other memory syscalls because
it takes away capabilities. The other memory APIs add behaviors or
change attributes, but sealing does the opposite: it reduces
capabilities. The name of the syscall, mseal(), can help emphasize the
"taking away" part.
My second choice would be madvise().
3> Other:
There is also a topic about ptrace/, /proc/self/mem, Userfaultfd,
which I think can be followed up using v1 thread, where it has the
most context.
New discussions topics:
=======================
During the development of V3, I had new questions and thoughts and
wished to discuss.
1> shm/aio
From reading the code, it seems to me that aio/shm can mmap/munmap
maps on behalf of userspace, e.g. ksys_shmdt() in shm.c. The lifetime
of those mapping are not tied to the lifetime of the process. If those
memories are sealed from userspace, then unmap will fail. This isn’t a
huge problem, since the memory will eventually be freed at exit or
exec. However, it feels like the solution is not complete, because of
the leaks in VMA address space during the lifetime of the process.
There could be two solutions to address this, which I will discuss
later.
2> Brk (heap/stack)
Currently, userspace applications can seal parts of the heap by
calling malloc() and mseal(). This raises the question of what the
expected behavior is when sealing the heap is attempted.
let's assume following calls from user space:
ptr = malloc(size);
mprotect(ptr, size, RO);
mseal(ptr, size, SEAL_PROT_PKEY);
free(ptr);
Technically, before mseal() is added, the user can change the
protection of the heap by calling mprotect(RO). As long as the user
changes the protection back to RW before free(), the memory can be
reused.
Adding mseal() into picture, however, the heap is then sealed
partially, user can still free it, but the memory remains to be RO,
and the result of brk-shrink is nondeterministic, depending on if
munmap() will try to free the sealed memory.(brk uses munmap to shrink
the heap).
3> Above two cases led to the third topic:
There are two options to address the problem mentioned above.
Option 1: A “MAP_SEALABLE” flag in mmap().
If a map is created without this flag, the mseal() operation will
fail. Applications that are not concerned with sealing will expect
their behavior to be unchanged. For those that are concerned, adding a
flag at mmap time to opt in is not difficult. For the short term, this
solves problems 1 and 2 above. The memory in shm/aio/brk will not have
the MAP_SEALABLE flag at mmap(), and the same is true for the heap.
Option 2: Add MM_SEAL_SEAL during mmap()
It is possible to defensively set MM_SEAL_SEAL for the selected mappings at
creation time. Specifically, we can find all the mmaps that we do not want to
seal, and add the MM_SEAL_SEAL flag in the mmap() call. The difference
between MAP_SEALABLE and MM_SEAL_SEAL is that the first option starts from a
small size and incrementally increases, while the second option is the
opposite.
In my opinion, MAP_SEALABLE is the preferred option. Only a limited set of
mappings need to be sealed, and these are typically created by the runtime. For
the few dedicated applications that manage their own mappings, such as Chrome,
adding an extra flag at mmap() is not a difficult task. It is also a safer
option in terms of regression risk. This is the option included in this
version.
4>
I think it might be possible to seal the stack or other special
mappings created at runtime (vdso, vsyscall, vvar). This means we can
enforce and seal W^X for certain types of application. For instance,
the stack is typically used in read-write mode, but in some cases, it
can become executable. To defend against unintented addition of executable
bit to stack, we could let the application to seal it.
Sealing the heap (for adding X) requires special handling, since the
heap can shrink, and shrink is implemented through munmap().
Indeed, it might be possible that all virtual memory accessible to user
space, regardless of its usage pattern, could be sealed. However, this
would require additional research and development work.
------------------------------------------------------------------------
v2:
Use _BITUL to define MM_SEAL_XX type.
Use unsigned long for seal type in sys_mseal() and other functions.
Remove internal VM_SEAL_XX type and convert_user_seal_type().
Remove MM_ACTION_XX type.
Remove caller_origin(ON_BEHALF_OF_XX) and replace with sealing bitmask.
Add more comments in code.
Add a detailed commit message.
https://lore.kernel.org/lkml/20231017090815.1067790-1-jeffxu@chromium.org/
v1:
https://lore.kernel.org/lkml/20231016143828.647848-1-jeffxu@chromium.org/
----------------------------------------------------------------
[1] https://kernelnewbies.org/Linux_2_6_8
[2] https://v8.dev/blog/control-flow-integrity
[3] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b…
[4] https://man.openbsd.org/mimmutable.2
[5] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXge…
[6] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426Fkcgnf…
[7] https://lore.kernel.org/lkml/20230515130553.2311248-1-jeffxu@chromium.org/
Jeff Xu (11):
mseal: Add mseal syscall.
mseal: Wire up mseal syscall
mseal: add can_modify_mm and can_modify_vma
mseal: add MM_SEAL_BASE
mseal: add MM_SEAL_PROT_PKEY
mseal: add sealing support for mmap
mseal: make sealed VMA mergeable.
mseal: add MM_SEAL_DISCARD_RO_ANON
mseal: add MAP_SEALABLE to mmap()
selftest mm/mseal memory sealing
mseal:add documentation
Documentation/userspace-api/mseal.rst | 189 ++
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/ia64/kernel/syscalls/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/mips/kernel/vdso.c | 10 +-
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
fs/userfaultfd.c | 8 +-
include/linux/mm.h | 178 +-
include/linux/mm_types.h | 8 +
include/linux/syscalls.h | 2 +
include/uapi/asm-generic/mman-common.h | 16 +
include/uapi/asm-generic/unistd.h | 5 +-
include/uapi/linux/mman.h | 5 +
kernel/sys_ni.c | 1 +
mm/Kconfig | 9 +
mm/Makefile | 1 +
mm/madvise.c | 14 +-
mm/mempolicy.c | 2 +-
mm/mlock.c | 2 +-
mm/mmap.c | 77 +-
mm/mprotect.c | 12 +-
mm/mremap.c | 44 +-
mm/mseal.c | 376 ++++
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/mseal_test.c | 2141 +++++++++++++++++++
41 files changed, 3091 insertions(+), 32 deletions(-)
create mode 100644 Documentation/userspace-api/mseal.rst
create mode 100644 mm/mseal.c
create mode 100644 tools/testing/selftests/mm/mseal_test.c
--
2.43.0.472.g3155946c3a-goog
Hi,
Compilation of lsm_cgroup.c will fail if the vmlinux.h comes from a
kernel that does _not_ have CONFIG_PACKET=y. The reason is that the
definition of struct sockaddr_ll is not present in vmlinux.h and the
compiler will complain that is has an incomplete type.
CLNG-BPF [test_maps] lsm_cgroup.bpf.o
progs/lsm_cgroup.c:105:21: error: variable has incomplete type 'struct sockaddr_ll'
105 | struct sockaddr_ll sa = {};
| ^
progs/lsm_cgroup.c:105:9: note: forward declaration of 'struct sockaddr_ll'
105 | struct sockaddr_ll sa = {};
| ^
1 error generated.
While including linux/if_packet.h somehow made the compilation works for
me, IIUC this isn't a proper solution because vmlinux.h and kernel
headers should not be used at the same time (and would lead to
redefinition error when the kernel is built with CONFIG_PACKET=y, e.g.
on BPF CI).
What would be the suggested way to work around this?
Thanks,
Shung-Hsi
---
tools/testing/selftests/bpf/progs/lsm_cgroup.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/bpf/progs/lsm_cgroup.c b/tools/testing/selftests/bpf/progs/lsm_cgroup.c
index 02c11d16b692..5394ec7ae1d8 100644
--- a/tools/testing/selftests/bpf/progs/lsm_cgroup.c
+++ b/tools/testing/selftests/bpf/progs/lsm_cgroup.c
@@ -2,6 +2,7 @@
#include "vmlinux.h"
#include "bpf_tracing_net.h"
+#include <linux/if_packet.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
If there is more than 32 cpus the bitmask will start to contain
commas, leading to:
./rps_default_mask.sh: line 36: [: 00000000,00000000: integer expression expected
Remove the commas, bash doesn't interpret leading zeroes as oct
so that should be good enough.
Fixes: c12e0d5f267d ("self-tests: introduce self-tests for RPS default mask")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
CC: shuah(a)kernel.org
CC: horms(a)kernel.org
CC: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/net/rps_default_mask.sh | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/net/rps_default_mask.sh b/tools/testing/selftests/net/rps_default_mask.sh
index a26c5624429f..f8e786e220b6 100755
--- a/tools/testing/selftests/net/rps_default_mask.sh
+++ b/tools/testing/selftests/net/rps_default_mask.sh
@@ -33,6 +33,10 @@ chk_rps() {
rps_mask=$($cmd /sys/class/net/$dev_name/queues/rx-0/rps_cpus)
printf "%-60s" "$msg"
+
+ # In case there is more than 32 CPUs we need to remove commas from masks
+ rps_mask=${rps_mask/,}
+ expected_rps_mask=${expected_rps_mask/,}
if [ $rps_mask -eq $expected_rps_mask ]; then
echo "[ ok ]"
else
--
2.43.0
From: Benjamin Poirier <benjamin.poirier(a)gmail.com>
Two small fixes for net selftests.
These patches were carved out of the following RFC series:
https://lore.kernel.org/netdev/20231222135836.992841-1-bpoirier@nvidia.com/
I'm planning to send the rest of the series to net-next after it opens up.
Benjamin Poirier (2):
selftests: bonding: Change script interpreter
selftests: forwarding: Remove executable bits from lib.sh
.../selftests/drivers/net/bonding/mode-1-recovery-updelay.sh | 2 +-
.../selftests/drivers/net/bonding/mode-2-recovery-updelay.sh | 2 +-
tools/testing/selftests/net/forwarding/lib.sh | 0
3 files changed, 2 insertions(+), 2 deletions(-)
mode change 100755 => 100644 tools/testing/selftests/net/forwarding/lib.sh
--
2.43.0
On systems with 64k page size and 512M huge page sizes, the allocation
and test succeeds but errors out at the munmap. As the comment states,
munmap will failure if its not HUGEPAGE aligned. This is due to the
length of the mapping being 1/2 the size of the hugepage causing the
munmap to not be hugepage aligned. Fix this by making the mapping length
the full hugepage if the hugepage is larger than the length of the
mapping.
Signed-off-by: Nico Pache <npache(a)redhat.com>
---
tools/testing/selftests/mm/map_hugetlb.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/mm/map_hugetlb.c b/tools/testing/selftests/mm/map_hugetlb.c
index 193281560b61..dcb8095fcd45 100644
--- a/tools/testing/selftests/mm/map_hugetlb.c
+++ b/tools/testing/selftests/mm/map_hugetlb.c
@@ -58,10 +58,16 @@ int main(int argc, char **argv)
{
void *addr;
int ret;
+ size_t maplength;
size_t length = LENGTH;
int flags = FLAGS;
int shift = 0;
+ maplength = default_huge_page_size();
+ /* mmap with fail if the length is not page */
+ if (maplength > length)
+ length = maplength;
+
if (argc > 1)
length = atol(argv[1]) << 20;
if (argc > 2) {
--
2.43.0
As for the Qemu command, print the command used to run tests with UML.
Cc: Brendan Higgins <brendan.higgins(a)linux.dev>
Cc: David Gow <davidgow(a)google.com>
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
---
tools/testing/kunit/kunit_kernel.py | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
index 0b6488efed47..7254c110ff23 100644
--- a/tools/testing/kunit/kunit_kernel.py
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -146,6 +146,7 @@ class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations):
"""Runs the Linux UML binary. Must be named 'linux'."""
linux_bin = os.path.join(build_dir, 'linux')
params.extend(['mem=1G', 'console=tty', 'kunit_shutdown=halt'])
+ print('Running tests with:\n$', linux_bin, ' '.join(shlex.quote(arg) for arg in params))
return subprocess.Popen([linux_bin] + params,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
--
2.43.0
When tests are run by runner.sh, bond_options.sh gets killed before
it can complete:
make -C tools/testing/selftests run_tests TARGETS="drivers/net/bonding"
[...]
# timeout set to 120
# selftests: drivers/net/bonding: bond_options.sh
# TEST: prio (active-backup miimon primary_reselect 0) [ OK ]
# TEST: prio (active-backup miimon primary_reselect 1) [ OK ]
# TEST: prio (active-backup miimon primary_reselect 2) [ OK ]
# TEST: prio (active-backup arp_ip_target primary_reselect 0) [ OK ]
# TEST: prio (active-backup arp_ip_target primary_reselect 1) [ OK ]
# TEST: prio (active-backup arp_ip_target primary_reselect 2) [ OK ]
#
not ok 7 selftests: drivers/net/bonding: bond_options.sh # TIMEOUT 120 seconds
This test includes many sleep statements, at least some of which are
related to timers in the operation of the bonding driver itself. Increase
the test timeout to allow the test to complete.
I ran the test in slightly different VMs (including one without HW
virtualization support) and got runtimes of 13m39.760s, 13m31.238s, and
13m2.956s. Use a ~1.5x "safety factor" and set the timeout to 1200s.
Fixes: 42a8d4aaea84 ("selftests: bonding: add bonding prio option test")
Reported-by: Jakub Kicinski <kuba(a)kernel.org>
Closes: https://lore.kernel.org/netdev/20240116104402.1203850a@kernel.org/#t
Suggested-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier(a)nvidia.com>
---
tools/testing/selftests/drivers/net/bonding/settings | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/drivers/net/bonding/settings b/tools/testing/selftests/drivers/net/bonding/settings
index 6091b45d226b..79b65bdf05db 100644
--- a/tools/testing/selftests/drivers/net/bonding/settings
+++ b/tools/testing/selftests/drivers/net/bonding/settings
@@ -1 +1 @@
-timeout=120
+timeout=1200
--
2.43.0
The arm64 Guarded Control Stack (GCS) feature provides support for
hardware protected stacks of return addresses, intended to provide
hardening against return oriented programming (ROP) attacks and to make
it easier to gather call stacks for applications such as profiling.
When GCS is active a secondary stack called the Guarded Control Stack is
maintained, protected with a memory attribute which means that it can
only be written with specific GCS operations. The current GCS pointer
can not be directly written to by userspace. When a BL is executed the
value stored in LR is also pushed onto the GCS, and when a RET is
executed the top of the GCS is popped and compared to LR with a fault
being raised if the values do not match. GCS operations may only be
performed on GCS pages, a data abort is generated if they are not.
The combination of hardware enforcement and lack of extra instructions
in the function entry and exit paths should result in something which
has less overhead and is more difficult to attack than a purely software
implementation like clang's shadow stacks.
This series implements support for use of GCS by userspace, along with
support for use of GCS within KVM guests. It does not enable use of GCS
by either EL1 or EL2, this will be implemented separately. Executables
are started without GCS and must use a prctl() to enable it, it is
expected that this will be done very early in application execution by
the dynamic linker or other startup code. For dynamic linking this will
be done by checking that everything in the executable is marked as GCS
compatible.
x86 has an equivalent feature called shadow stacks, this series depends
on the x86 patches for generic memory management support for the new
guarded/shadow stack page type and shares APIs as much as possible. As
there has been extensive discussion with the wider community around the
ABI for shadow stacks I have as far as practical kept implementation
decisions close to those for x86, anticipating that review would lead to
similar conclusions in the absence of strong reasoning for divergence.
The main divergence I am concious of is that x86 allows shadow stack to
be enabled and disabled repeatedly, freeing the shadow stack for the
thread whenever disabled, while this implementation keeps the GCS
allocated after disable but refuses to reenable it. This is to avoid
races with things actively walking the GCS during a disable, we do
anticipate that some systems will wish to disable GCS at runtime but are
not aware of any demand for subsequently reenabling it.
x86 uses an arch_prctl() to manage enable and disable, since only x86
and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a
patch set for the equivalent RISC-V Zicfiss feature which I initially
adopted fairly directly but following review feedback has been revised
quite a bit.
We currently maintain the x86 pattern of implicitly allocating a shadow
stack for threads started with shadow stack enabled, there has been some
discussion of removing this support and requiring the use of clone3()
with explicit allocation of shadow stacks instead. I have no strong
feelings either way, implicit allocation is not really consistent with
anything else we do and creates the potential for errors around thread
exit but on the other hand it is existing ABI on x86 and minimises the
changes needed in userspace code.
There is an open issue with support for CRIU, on x86 this required the
ability to set the GCS mode via ptrace. This series supports
configuring mode bits other than enable/disable via ptrace but it needs
to be confirmed if this is sufficient.
The series depends on support for shadow stacks in clone3(), that series
includes the addition of ARCH_HAS_USER_SHADOW_STACK.
https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@ke…
It also depends on the addition of more waitpid() flags to nolibc:
https://lore.kernel.org/r/20231023-nolibc-waitpid-flags-v2-1-b09d096f091f@k…
You can see a branch with the full set of dependencies against Linus'
tree at:
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc.git arm64-gcs
[1] https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v7:
- Rebase onto v6.7-rc2 via the clone3() patch series.
- Change the token used to cap the stack during signal handling to be
compatible with GCSPOPM.
- Fix flags for new page types.
- Fold in support for clone3().
- Replace copy_to_user_gcs() with put_user_gcs().
- Link to v6: https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org
Changes in v6:
- Rebase onto v6.6-rc3.
- Add some more gcsb_dsync() barriers following spec clarifications.
- Due to ongoing discussion around clone()/clone3() I've not updated
anything there, the behaviour is the same as on previous versions.
- Link to v5: https://lore.kernel.org/r/20230822-arm64-gcs-v5-0-9ef181dd6324@kernel.org
Changes in v5:
- Don't map any permissions for user GCSs, we always use EL0 accessors
or use a separate mapping of the page.
- Reduce the standard size of the GCS to RLIMIT_STACK/2.
- Enforce a PAGE_SIZE alignment requirement on map_shadow_stack().
- Clarifications and fixes to documentation.
- More tests.
- Link to v4: https://lore.kernel.org/r/20230807-arm64-gcs-v4-0-68cfa37f9069@kernel.org
Changes in v4:
- Implement flags for map_shadow_stack() allowing the cap and end of
stack marker to be enabled independently or not at all.
- Relax size and alignment requirements for map_shadow_stack().
- Add more blurb explaining the advantages of hardware enforcement.
- Link to v3: https://lore.kernel.org/r/20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org
Changes in v3:
- Rebase onto v6.5-rc4.
- Add a GCS barrier on context switch.
- Add a GCS stress test.
- Link to v2: https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org
Changes in v2:
- Rebase onto v6.5-rc3.
- Rework prctl() interface to allow each bit to be locked independently.
- map_shadow_stack() now places the cap token based on the size
requested by the caller not the actual space allocated.
- Mode changes other than enable via ptrace are now supported.
- Expand test coverage.
- Various smaller fixes and adjustments.
- Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org
---
Mark Brown (39):
arm64/mm: Restructure arch_validate_flags() for extensibility
prctl: arch-agnostic prctl for shadow stack
mman: Add map_shadow_stack() flags
arm64: Document boot requirements for Guarded Control Stacks
arm64/gcs: Document the ABI for Guarded Control Stacks
arm64/sysreg: Add new system registers for GCS
arm64/sysreg: Add definitions for architected GCS caps
arm64/gcs: Add manual encodings of GCS instructions
arm64/gcs: Provide put_user_gcs()
arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS)
arm64/mm: Allocate PIE slots for EL0 guarded control stack
mm: Define VM_SHADOW_STACK for arm64 when we support GCS
arm64/mm: Map pages for guarded control stack
KVM: arm64: Manage GCS registers for guests
arm64/gcs: Allow GCS usage at EL0 and EL1
arm64/idreg: Add overrride for GCS
arm64/hwcap: Add hwcap for GCS
arm64/traps: Handle GCS exceptions
arm64/mm: Handle GCS data aborts
arm64/gcs: Context switch GCS state for EL0
arm64/gcs: Allocate a new GCS for threads with GCS enabled
arm64/gcs: Implement shadow stack prctl() interface
arm64/mm: Implement map_shadow_stack()
arm64/signal: Set up and restore the GCS context for signal handlers
arm64/signal: Expose GCS state in signal frames
arm64/ptrace: Expose GCS via ptrace and core files
arm64: Add Kconfig for Guarded Control Stack (GCS)
kselftest/arm64: Verify the GCS hwcap
kselftest/arm64: Add GCS as a detected feature in the signal tests
kselftest/arm64: Add framework support for GCS to signal handling tests
kselftest/arm64: Allow signals tests to specify an expected si_code
kselftest/arm64: Always run signals tests with GCS enabled
kselftest/arm64: Add very basic GCS test program
kselftest/arm64: Add a GCS test program built with the system libc
kselftest/arm64: Add test coverage for GCS mode locking
selftests/arm64: Add GCS signal tests
kselftest/arm64: Add a GCS stress test
kselftest/arm64: Enable GCS for the FP stress tests
kselftest/clone3: Enable GCS in the clone3 selftests
Documentation/admin-guide/kernel-parameters.txt | 6 +
Documentation/arch/arm64/booting.rst | 22 +
Documentation/arch/arm64/elf_hwcaps.rst | 3 +
Documentation/arch/arm64/gcs.rst | 233 +++++++
Documentation/arch/arm64/index.rst | 1 +
Documentation/filesystems/proc.rst | 2 +-
arch/arm64/Kconfig | 20 +
arch/arm64/include/asm/cpufeature.h | 6 +
arch/arm64/include/asm/el2_setup.h | 17 +
arch/arm64/include/asm/esr.h | 28 +-
arch/arm64/include/asm/exception.h | 2 +
arch/arm64/include/asm/gcs.h | 107 +++
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/asm/kvm_arm.h | 4 +-
arch/arm64/include/asm/kvm_host.h | 12 +
arch/arm64/include/asm/mman.h | 23 +-
arch/arm64/include/asm/pgtable-prot.h | 14 +-
arch/arm64/include/asm/processor.h | 7 +
arch/arm64/include/asm/sysreg.h | 20 +
arch/arm64/include/asm/uaccess.h | 40 ++
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/ptrace.h | 8 +
arch/arm64/include/uapi/asm/sigcontext.h | 9 +
arch/arm64/kernel/cpufeature.c | 19 +
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kernel/entry-common.c | 23 +
arch/arm64/kernel/idreg-override.c | 2 +
arch/arm64/kernel/process.c | 81 +++
arch/arm64/kernel/ptrace.c | 59 ++
arch/arm64/kernel/signal.c | 236 ++++++-
arch/arm64/kernel/traps.c | 11 +
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 +
arch/arm64/kvm/sys_regs.c | 22 +
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/fault.c | 79 ++-
arch/arm64/mm/gcs.c | 259 +++++++
arch/arm64/mm/mmap.c | 13 +-
arch/arm64/tools/cpucaps | 1 +
arch/arm64/tools/sysreg | 55 ++
arch/x86/include/uapi/asm/mman.h | 3 -
fs/proc/task_mmu.c | 3 +
include/linux/mm.h | 16 +-
include/uapi/asm-generic/mman.h | 4 +
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 22 +
kernel/sys.c | 30 +
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/abi/hwcap.c | 19 +
tools/testing/selftests/arm64/fp/assembler.h | 15 +
tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 +
tools/testing/selftests/arm64/fp/sve-test.S | 2 +
tools/testing/selftests/arm64/fp/za-test.S | 2 +
tools/testing/selftests/arm64/fp/zt-test.S | 2 +
tools/testing/selftests/arm64/gcs/.gitignore | 5 +
tools/testing/selftests/arm64/gcs/Makefile | 24 +
tools/testing/selftests/arm64/gcs/asm-offsets.h | 0
tools/testing/selftests/arm64/gcs/basic-gcs.c | 428 ++++++++++++
tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++
.../selftests/arm64/gcs/gcs-stress-thread.S | 311 +++++++++
tools/testing/selftests/arm64/gcs/gcs-stress.c | 532 +++++++++++++++
tools/testing/selftests/arm64/gcs/gcs-util.h | 100 +++
tools/testing/selftests/arm64/gcs/libc-gcs.c | 742 +++++++++++++++++++++
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../testing/selftests/arm64/signal/test_signals.c | 17 +-
.../testing/selftests/arm64/signal/test_signals.h | 6 +
.../selftests/arm64/signal/test_signals_utils.c | 32 +-
.../selftests/arm64/signal/test_signals_utils.h | 39 ++
.../arm64/signal/testcases/gcs_exception_fault.c | 59 ++
.../selftests/arm64/signal/testcases/gcs_frame.c | 78 +++
.../arm64/signal/testcases/gcs_write_fault.c | 67 ++
.../selftests/arm64/signal/testcases/testcases.c | 7 +
.../selftests/arm64/signal/testcases/testcases.h | 1 +
tools/testing/selftests/clone3/clone3.c | 37 +
73 files changed, 4234 insertions(+), 40 deletions(-)
---
base-commit: 3d0134d322380292c055454d9633738733992d61
change-id: 20230303-arm64-gcs-e311ab0d8729
Best regards,
--
Mark Brown <broonie(a)kernel.org>
A BPF application, e.g., a TCP congestion control, might benefit from or
even require precise (=hardware) packet timestamps. These timestamps are
already available through __sk_buff.hwtstamp and
bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were
not allowed to set SO_TIMESTAMPING* on sockets.
Enable BPF programs to actively request the generation of timestamps
from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the
network device must still be done separately, in user space.
This patch had previously been submitted in a two-part series (first
link below). The second patch has been independently applied in commit
7f6ca95d16b9 ("net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)")
(second link below).
On the earlier submission, there was the open question whether to only
allow, thus enforce, SO_TIMESTAMPING_NEW in this patch:
For a BPF program, this won't make a difference: A timestamp, when
accessed through the fields mentioned above, is directly read from
skb_shared_info.hwtstamps, independent of the places where NEW/OLD is
relevant. See bpf_convert_ctx_access() besides others.
I am unsure, though, when it comes to the interconnection of user space
and BPF "space", when both are interested in the timestamps. I think it
would cause an unsolvable conflict when user space is bound to use
SO_TIMESTAMPING_OLD with a BPF program only allowed to set
SO_TIMESTAMPING_NEW *on the same socket*? Please correct me if I'm
mistaken.
Link: https://lore.kernel.org/lkml/20230703175048.151683-1-jthinz@mailbox.tu-berl…
Link: https://lore.kernel.org/all/20231221231901.67003-1-jthinz@mailbox.tu-berlin…
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Deepa Dinamani <deepa.kernel(a)gmail.com>
Cc: Willem de Bruijn <willemdebruijn.kernel(a)gmail.com>
Signed-off-by: Jörn-Thorben Hinz <j-t.hinz(a)alumni.tu-berlin.de>
---
include/uapi/linux/bpf.h | 3 ++-
net/core/filter.c | 2 ++
tools/include/uapi/linux/bpf.h | 3 ++-
tools/testing/selftests/bpf/progs/bpf_tracing_net.h | 2 ++
tools/testing/selftests/bpf/progs/setget_sockopt.c | 4 ++++
5 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 754e68ca8744..8825d0648efe 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2734,7 +2734,8 @@ union bpf_attr {
* **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**,
* **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**,
* **SO_BINDTODEVICE**, **SO_KEEPALIVE**, **SO_REUSEADDR**,
- * **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**.
+ * **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**,
+ * **SO_TIMESTAMPING_NEW**, **SO_TIMESTAMPING_OLD**.
* * **IPPROTO_TCP**, which supports the following *optname*\ s:
* **TCP_CONGESTION**, **TCP_BPF_IW**,
* **TCP_BPF_SNDCWND_CLAMP**, **TCP_SAVE_SYN**,
diff --git a/net/core/filter.c b/net/core/filter.c
index 8c9f67c81e22..4f5280874fd8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5144,6 +5144,8 @@ static int sol_socket_sockopt(struct sock *sk, int optname,
case SO_MAX_PACING_RATE:
case SO_BINDTOIFINDEX:
case SO_TXREHASH:
+ case SO_TIMESTAMPING_NEW:
+ case SO_TIMESTAMPING_OLD:
if (*optlen != sizeof(int))
return -EINVAL;
break;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 7f24d898efbb..09eaafa6ab43 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2734,7 +2734,8 @@ union bpf_attr {
* **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**,
* **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**,
* **SO_BINDTODEVICE**, **SO_KEEPALIVE**, **SO_REUSEADDR**,
- * **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**.
+ * **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**,
+ * **SO_TIMESTAMPING_NEW**, **SO_TIMESTAMPING_OLD**.
* * **IPPROTO_TCP**, which supports the following *optname*\ s:
* **TCP_CONGESTION**, **TCP_BPF_IW**,
* **TCP_BPF_SNDCWND_CLAMP**, **TCP_SAVE_SYN**,
diff --git a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
index 1bdc680b0e0e..95f5f169819e 100644
--- a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
+++ b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
@@ -15,8 +15,10 @@
#define SO_RCVLOWAT 18
#define SO_BINDTODEVICE 25
#define SO_MARK 36
+#define SO_TIMESTAMPING_OLD 37
#define SO_MAX_PACING_RATE 47
#define SO_BINDTOIFINDEX 62
+#define SO_TIMESTAMPING_NEW 65
#define SO_TXREHASH 74
#define __SO_ACCEPTCON (1 << 16)
diff --git a/tools/testing/selftests/bpf/progs/setget_sockopt.c b/tools/testing/selftests/bpf/progs/setget_sockopt.c
index 7a438600ae98..54205d10793c 100644
--- a/tools/testing/selftests/bpf/progs/setget_sockopt.c
+++ b/tools/testing/selftests/bpf/progs/setget_sockopt.c
@@ -48,6 +48,10 @@ static const struct sockopt_test sol_socket_tests[] = {
{ .opt = SO_MARK, .new = 0xeb9f, .expected = 0xeb9f, },
{ .opt = SO_MAX_PACING_RATE, .new = 0xeb9f, .expected = 0xeb9f, },
{ .opt = SO_TXREHASH, .flip = 1, },
+ { .opt = SO_TIMESTAMPING_NEW, .new = SOF_TIMESTAMPING_RX_HARDWARE,
+ .expected = SOF_TIMESTAMPING_RX_HARDWARE, },
+ { .opt = SO_TIMESTAMPING_OLD, .new = SOF_TIMESTAMPING_RX_HARDWARE,
+ .expected = SOF_TIMESTAMPING_RX_HARDWARE, },
{ .opt = 0, },
};
--
2.39.2
This extends the KVM RISC-V ONE_REG interface to report more ISA extensions
namely: Zbz, scalar crypto, vector crypto, Zfh[min], Zihintntl, Zvfh[min],
and Zfa.
This series depends upon the "riscv: report more ISA extensions through
hwprobe" series.from Clement.
(Link: https://lore.kernel.org/lkml/20231114141256.126749-1-cleger@rivosinc.com/)
To test these patches, use KVMTOOL from the riscv_more_exts_v1 branch at:
https://github.com/avpatel/kvmtool.git
These patches can also be found in the riscv_kvm_more_exts_v1 branch at:
https://github.com/avpatel/linux.git
Anup Patel (15):
KVM: riscv: selftests: Generate ISA extension reg_list using macros
RISC-V: KVM: Allow Zbc extension for Guest/VM
KVM: riscv: selftests: Add Zbc extension to get-reg-list test
RISC-V: KVM: Allow scalar crypto extensions for Guest/VM
KVM: riscv: selftests: Add scaler crypto extensions to get-reg-list
test
RISC-V: KVM: Allow vector crypto extensions for Guest/VM
KVM: riscv: selftests: Add vector crypto extensions to get-reg-list
test
RISC-V: KVM: Allow Zfh[min] extensions for Guest/VM
KVM: riscv: selftests: Add Zfh[min] extensions to get-reg-list test
RISC-V: KVM: Allow Zihintntl extension for Guest/VM
KVM: riscv: selftests: Add Zihintntl extension to get-reg-list test
RISC-V: KVM: Allow Zvfh[min] extensions for Guest/VM
KVM: riscv: selftests: Add Zvfh[min] extensions to get-reg-list test
RISC-V: KVM: Allow Zfa extension for Guest/VM
KVM: riscv: selftests: Add Zfa extension to get-reg-list test
arch/riscv/include/uapi/asm/kvm.h | 27 ++
arch/riscv/kvm/vcpu_onereg.c | 54 +++
.../selftests/kvm/riscv/get-reg-list.c | 439 ++++++++----------
3 files changed, 265 insertions(+), 255 deletions(-)
--
2.34.1
As a followup to commit 03fb8565c880 ("selftests: bonding: add missing
build configs"), add more networking-specific config options which are
needed for bonding tests.
For testing, I used the minimal config generated by virtme-ng and I added
the options in the config file. All bonding tests passed.
Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") # for ipv6
Fixes: 6cbe791c0f4e ("kselftest: bonding: add num_grat_arp test") # for tc options
Fixes: 222c94ec0ad4 ("selftests: bonding: add tests for ether type changes") # for nlmon
Suggested-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier(a)nvidia.com>
---
tools/testing/selftests/drivers/net/bonding/config | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/bonding/config b/tools/testing/selftests/drivers/net/bonding/config
index f85b16fc5128..899d7fb6ea8e 100644
--- a/tools/testing/selftests/drivers/net/bonding/config
+++ b/tools/testing/selftests/drivers/net/bonding/config
@@ -1,5 +1,10 @@
CONFIG_BONDING=y
CONFIG_BRIDGE=y
CONFIG_DUMMY=y
+CONFIG_IPV6=y
CONFIG_MACVLAN=y
+CONFIG_NET_ACT_GACT=y
+CONFIG_NET_CLS_FLOWER=y
+CONFIG_NET_SCH_INGRESS=y
+CONFIG_NLMON=y
CONFIG_VETH=y
--
2.43.0
The device is exported with a fuzz of 4, meaning that the `+ t` here
is removed by the fuzz algorithm, making those tests failing.
Not sure why, but when I run this locally it was passing, but not in the
VM.
Link: https://gitlab.freedesktop.org/bentiss/hid/-/jobs/53692957#L3315
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Over the break the test suite wasn't properly running on my runner,
and this small issue sneaked in.
---
tools/testing/selftests/hid/tests/test_wacom_generic.py | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/hid/tests/test_wacom_generic.py b/tools/testing/selftests/hid/tests/test_wacom_generic.py
index 352fc39f3c6c..b62c7dba6777 100644
--- a/tools/testing/selftests/hid/tests/test_wacom_generic.py
+++ b/tools/testing/selftests/hid/tests/test_wacom_generic.py
@@ -880,8 +880,8 @@ class TestDTH2452Tablet(test_multitouch.BaseTest.TestMultitouch, TouchTabletTest
does not overlap with other contacts. The value of `t` may be
incremented over time to move the point along a linear path.
"""
- x = 50 + 10 * contact_id + t
- y = 100 + 100 * contact_id + t
+ x = 50 + 10 * contact_id + t * 11
+ y = 100 + 100 * contact_id + t * 11
return test_multitouch.Touch(contact_id, x, y)
def make_contacts(self, n, t=0):
@@ -902,8 +902,8 @@ class TestDTH2452Tablet(test_multitouch.BaseTest.TestMultitouch, TouchTabletTest
tracking_id = contact_ids.tracking_id
slot_num = contact_ids.slot_num
- x = 50 + 10 * contact_id + t
- y = 100 + 100 * contact_id + t
+ x = 50 + 10 * contact_id + t * 11
+ y = 100 + 100 * contact_id + t * 11
# If the data isn't supposed to be stored in any slots, there is
# nothing we can check for in the evdev stream.
---
base-commit: 80d5a73edcfbd1d8d6a4c2b755873c5d63a1ebd7
change-id: 20240117-b4-wip-wacom-tests-fixes-298b50bea47f
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
On Wed, Jan 17, 2024 at 7:12 PM Jason Gerecke <killertofu(a)gmail.com> wrote:
>
> LGTM. Acked-by: Jason Gerecke <jason.gerecke(a)wacom.com>
Thanks!
I'll add a:
Fixes: b0fb904d074e ("HID: wacom: Add additional tests of confidence behavior")
And send to Linus in the next round for 6.8 so we also fix the future
for-6.9 branches
Cheers,
Benjamin
>
>
> Jason
> ---
> Now instead of four in the eights place /
> you’ve got three, ‘Cause you added one /
> (That is to say, eight) to the two, /
> But you can’t take seven from three, /
> So you look at the sixty-fours....
>
>
>
> On Wed, Jan 17, 2024 at 5:27 AM Benjamin Tissoires <bentiss(a)kernel.org> wrote:
>>
>> The device is exported with a fuzz of 4, meaning that the `+ t` here
>> is removed by the fuzz algorithm, making those tests failing.
>>
>> Not sure why, but when I run this locally it was passing, but not in the
>> VM.
>>
>> Link: https://gitlab.freedesktop.org/bentiss/hid/-/jobs/53692957#L3315
>> Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
>> ---
>> Over the break the test suite wasn't properly running on my runner,
>> and this small issue sneaked in.
>> ---
>> tools/testing/selftests/hid/tests/test_wacom_generic.py | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/testing/selftests/hid/tests/test_wacom_generic.py b/tools/testing/selftests/hid/tests/test_wacom_generic.py
>> index 352fc39f3c6c..b62c7dba6777 100644
>> --- a/tools/testing/selftests/hid/tests/test_wacom_generic.py
>> +++ b/tools/testing/selftests/hid/tests/test_wacom_generic.py
>> @@ -880,8 +880,8 @@ class TestDTH2452Tablet(test_multitouch.BaseTest.TestMultitouch, TouchTabletTest
>> does not overlap with other contacts. The value of `t` may be
>> incremented over time to move the point along a linear path.
>> """
>> - x = 50 + 10 * contact_id + t
>> - y = 100 + 100 * contact_id + t
>> + x = 50 + 10 * contact_id + t * 11
>> + y = 100 + 100 * contact_id + t * 11
>> return test_multitouch.Touch(contact_id, x, y)
>>
>> def make_contacts(self, n, t=0):
>> @@ -902,8 +902,8 @@ class TestDTH2452Tablet(test_multitouch.BaseTest.TestMultitouch, TouchTabletTest
>> tracking_id = contact_ids.tracking_id
>> slot_num = contact_ids.slot_num
>>
>> - x = 50 + 10 * contact_id + t
>> - y = 100 + 100 * contact_id + t
>> + x = 50 + 10 * contact_id + t * 11
>> + y = 100 + 100 * contact_id + t * 11
>>
>> # If the data isn't supposed to be stored in any slots, there is
>> # nothing we can check for in the evdev stream.
>>
>> ---
>> base-commit: 80d5a73edcfbd1d8d6a4c2b755873c5d63a1ebd7
>> change-id: 20240117-b4-wip-wacom-tests-fixes-298b50bea47f
>>
>> Best regards,
>> --
>> Benjamin Tissoires <bentiss(a)kernel.org>
>>
>>
Hi Mohammad,
On 1/16/24 21:48, Mohammad Nassiri wrote:
> The end_server() function only operates in the server thread
> and always takes an accept socket instead of a listen socket as
> its input argument. To align with this, invert the boolean values
> used when calling verify_counters() within the end_server() function.
>
> Fixes: ("3c3ead555648 selftests/net: Add TCP-AO key-management test")
> Signed-off-by: Mohammad Nassiri <mnassiri(a)ciena.com>
> Link: https://lore.kernel.org/all/934627c5-eebb-4626-be23-cfb134c01d1a@arista.com/
As I've written you off-list, the patch probably was not delivered to
mailing lists due to SPF check not passing. Please, fix the send-email
setup when/if you want to send more patches.
Related to this patch: I'm going to carry and resend it together with 2
more patches, as this fix made 3 selftests fail and I've looked into that.
> ---
> tools/testing/selftests/net/tcp_ao/key-management.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/net/tcp_ao/key-management.c b/tools/testing/selftests/net/tcp_ao/key-management.c
> index c48b4970ca17..f6a9395e3cd7 100644
> --- a/tools/testing/selftests/net/tcp_ao/key-management.c
> +++ b/tools/testing/selftests/net/tcp_ao/key-management.c
> @@ -843,7 +843,7 @@ static void end_server(const char *tst_name, int sk,
> synchronize_threads(); /* 4: verified => closed */
> close(sk);
>
> - verify_counters(tst_name, true, false, begin, &end);
> + verify_counters(tst_name, false, true, begin, &end);
> synchronize_threads(); /* 5: counters */
> }
>
Thanks,
Dmitry
When running with CATEGORY= (thp | hugetlb) we see a large numbers of
tests failing. These failures are due to not being able to allocate a
hugepage and normally occur on memory contrainted systems or when using
large page sizes.
drop_cache and compact_memory before the tests for a higher chance at a
successful hugepage allocation.
Signed-off-by: Nico Pache <npache(a)redhat.com>
---
tools/testing/selftests/mm/run_vmtests.sh | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 246d53a5d7f2..040f27e21f47 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -206,6 +206,15 @@ pretty_name() {
# Usage: run_test [test binary] [arbitrary test arguments...]
run_test() {
if test_selected ${CATEGORY}; then
+ # On memory constrainted systems some tests can fail to allocate hugepages.
+ # perform some cleanup before the test for a higher success rate.
+ if [ ${CATEGORY} == "thp" ] | [ ${CATEGORY} == "hugetlb" ]; then
+ echo 3 > /proc/sys/vm/drop_caches
+ sleep 2
+ echo 1 > /proc/sys/vm/compact_memory
+ sleep 2
+ fi
+
local test=$(pretty_name "$*")
local title="running $*"
local sep=$(echo -n "$title" | tr "[:graph:][:space:]" -)
--
2.43.0
hugetlb_madv_vs_map selftest was not part of the mm test-suite since we
didn't have a fix for the problem it found.
Now that the problem is already fixed (see previous commit), let's
enable this selftest in the default test-suite.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
tools/testing/selftests/mm/run_vmtests.sh | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index a5e6ba8d3579..f41e1978e4d4 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -256,6 +256,7 @@ nr_hugepages_tmp=$(cat /proc/sys/vm/nr_hugepages)
# For this test, we need one and just one huge page
echo 1 > /proc/sys/vm/nr_hugepages
CATEGORY="hugetlb" run_test ./hugetlb_fault_after_madv
+CATEGORY="hugetlb" run_test ./hugetlb_madv_vs_map
# Restore the previous number of huge pages, since further tests rely on it
echo "$nr_hugepages_tmp" > /proc/sys/vm/nr_hugepages
--
2.34.1
From: Amit Cohen <amcohen(a)nvidia.com>
'qos_pfc' test checks PFC behavior. The idea is to limit the traffic
using a shaper somewhere in the flow of the packets. In this area, the
buffer is smaller than the buffer at the beginning of the flow, so it fills
up until there is no more space left. The test configures there PFC
which is supposed to notice that the headroom is filling up and send PFC
Xoff to indicate the transmitter to stop sending traffic for the priorities
sharing this PG.
The Xon/Xoff threshold is auto-configured and always equal to
2*(MTU rounded up to cell size). Even after sending the PFC Xoff packet,
traffic will keep arriving until the transmitter receives and processes
the PFC packet. This amount of traffic is known as the PFC delay allowance.
Currently the buffer for the delay traffic is configured as 100KB. The
MTU in the test is 10KB, therefore the threshold for Xoff is about 20KB.
This allows 80KB extra to be stored in this buffer.
8-lane ports use two buffers among which the configured buffer is split,
the Xoff threshold then applies to each buffer in parallel.
The test does not take into account the behavior of 8-lane ports, when the
ports are configured to 400Gbps with 8 lanes or 800Gbps with 8 lanes,
packets are dropped and the test fails.
Check if the relevant ports use 8 lanes, in such case double the size of
the buffer, as the headroom is split half-half.
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: linux-kselftest(a)vger.kernel.org
Fixes: bfa804784e32 ("selftests: mlxsw: Add a PFC test")
Signed-off-by: Amit Cohen <amcohen(a)nvidia.com>
Reviewed-by: Ido Schimmel <idosch(a)nvidia.com>
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
---
.../selftests/drivers/net/mlxsw/qos_pfc.sh | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh b/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh
index 49bef76083b8..0f0f4f05807c 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh
@@ -119,6 +119,9 @@ h2_destroy()
switch_create()
{
+ local lanes_swp4
+ local pg1_size
+
# pools
# -----
@@ -228,7 +231,20 @@ switch_create()
dcb pfc set dev $swp4 prio-pfc all:off 1:on
# PG0 will get autoconfigured to Xoff, give PG1 arbitrarily 100K, which
# is (-2*MTU) about 80K of delay provision.
- dcb buffer set dev $swp4 buffer-size all:0 1:$_100KB
+ pg1_size=$_100KB
+
+ setup_wait_dev_with_timeout $swp4
+
+ lanes_swp4=$(ethtool $swp4 | grep 'Lanes:')
+ lanes_swp4=${lanes_swp4#*"Lanes: "}
+
+ # 8-lane ports use two buffers among which the configured buffer
+ # is split, so double the size to get twice (20K + 80K).
+ if [[ $lanes_swp4 -eq 8 ]]; then
+ pg1_size=$((pg1_size * 2))
+ fi
+
+ dcb buffer set dev $swp4 buffer-size all:0 1:$pg1_size
# bridges
# -------
--
2.42.0
One build issue comes up due to both mount.h included dev_in_maps.c
In file included from dev_in_maps.c:10:
/usr/include/sys/mount.h:35:3: error: expected identifier before numeric constant
35 | MS_RDONLY = 1, /* Mount read-only. */
| ^~~~~~~~~
In file included from dev_in_maps.c:13:
Remove one of them to solve conflict, another error comes up:
dev_in_maps.c:170:6: error: implicit declaration of function ‘mount’ [-Werror=implicit-function-declaration]
170 | if (mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL) == -1) {
| ^~~~~
cc1: all warnings being treated as errors
and then , add sys_mount definition to solve it
After both above, dev_in_maps.c can be built correctly on my mache(gcc 10.2,glibc-2.32,kernel-5.10)
Signed-off-by: Hu Yadi <hu.yadi(a)h3c.com>
---
.../selftests/filesystems/overlayfs/dev_in_maps.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c b/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
index e19ab0e85709..759f86e7d263 100644
--- a/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
+++ b/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
@@ -10,7 +10,6 @@
#include <linux/mount.h>
#include <sys/syscall.h>
#include <sys/stat.h>
-#include <sys/mount.h>
#include <sys/mman.h>
#include <sched.h>
#include <fcntl.h>
@@ -32,7 +31,11 @@ static int sys_fsmount(int fd, unsigned int flags, unsigned int attr_flags)
{
return syscall(__NR_fsmount, fd, flags, attr_flags);
}
-
+static int sys_mount(const char *src, const char *tgt, const char *fst,
+ unsigned long flags, const void *data)
+{
+ return syscall(__NR_mount, src, tgt, fst, flags, data);
+}
static int sys_move_mount(int from_dfd, const char *from_pathname,
int to_dfd, const char *to_pathname,
unsigned int flags)
@@ -166,8 +169,7 @@ int main(int argc, char **argv)
ksft_test_result_skip("unable to create a new mount namespace\n");
return 1;
}
-
- if (mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL) == -1) {
+ if (sys_mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL) == -1) {
pr_perror("mount");
return 1;
}
--
2.39.3
Running charge_reserved_hugetlb.sh generates errors if sh is set to
dash:
./charge_reserved_hugetlb.sh: 9: [[: not found
./charge_reserved_hugetlb.sh: 19: [[: not found
./charge_reserved_hugetlb.sh: 27: [[: not found
./charge_reserved_hugetlb.sh: 37: [[: not found
./charge_reserved_hugetlb.sh: 45: Syntax error: "(" unexpected
Switch to using /bin/bash instead of /bin/sh. Make the switch for
write_hugetlb_memory.sh as well which is called from
charge_reserved_hugetlb.sh.
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
tools/testing/selftests/mm/charge_reserved_hugetlb.sh | 2 +-
tools/testing/selftests/mm/write_hugetlb_memory.sh | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
index 0899019a7fcb..e14bdd4455f2 100755
--- a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
+++ b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
# Kselftest framework requirement - SKIP code is 4.
diff --git a/tools/testing/selftests/mm/write_hugetlb_memory.sh b/tools/testing/selftests/mm/write_hugetlb_memory.sh
index 70a02301f4c2..3d2d2eb9d6ff 100755
--- a/tools/testing/selftests/mm/write_hugetlb_memory.sh
+++ b/tools/testing/selftests/mm/write_hugetlb_memory.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
set -e
--
2.42.0
From: Jeff Xu <jeffxu(a)chromium.org>
This is V4 of the patch, the patch has improved significantly since V1,
thanks to diverse inputs, a few discussions remain, please read those
in the open discussion section of v4 of change history.
-----------------------------------------------------------------
This patchset proposes a new mseal() syscall for the Linux kernel.
In a nutshell, mseal() protects the VMAs of a given virtual memory
range against modifications, such as changes to their permission bits.
Modern CPUs support memory permissions, such as the read/write (RW)
and no-execute (NX) bits. Linux has supported NX since the release of
kernel version 2.6.8 in August 2004 [1]. The memory permission feature
improves the security stance on memory corruption bugs, as an attacker
cannot simply write to arbitrary memory and point the code to it. The
memory must be marked with the X bit, or else an exception will occur.
Internally, the kernel maintains the memory permissions in a data
structure called VMA (vm_area_struct). mseal() additionally protects
the VMA itself against modifications of the selected seal type.
Memory sealing is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For
example, such an attacker primitive can break control-flow integrity
guarantees since read-only memory that is supposed to be trusted can
become writable or .text pages can get remapped. Memory sealing can
automatically be applied by the runtime loader to seal .text and
.rodata pages and applications can additionally seal security critical
data at runtime. A similar feature already exists in the XNU kernel
with the VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the
mimmutable syscall [4]. Also, Chrome wants to adopt this feature for
their CFI work [2] and this patchset has been designed to be
compatible with the Chrome use case.
Two system calls are involved in sealing the map: mmap() and mseal().
The new mseal() is an syscall on 64 bit CPU, and with
following signature:
int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.
mseal() blocks following operations for the given memory range.
1> Unmapping, moving to another location, and shrinking the size,
via munmap() and mremap(), can leave an empty space, therefore can
be replaced with a VMA with a new set of attributes.
2> Moving or expanding a different VMA into the current location,
via mremap().
3> Modifying a VMA via mmap(MAP_FIXED).
4> Size expansion, via mremap(), does not appear to pose any specific
risks to sealed VMAs. It is included anyway because the use case is
unclear. In any case, users can rely on merging to expand a sealed VMA.
5> mprotect() and pkey_mprotect().
6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
memory, when users don't have write permission to the memory. Those
behaviors can alter region contents by discarding pages, effectively a
memset(0) for anonymous memory.
In addition: mmap() has two related changes.
The PROT_SEAL bit in prot field of mmap(). When present, it marks
the map sealed since creation.
The MAP_SEALABLE bit in the flags field of mmap(). When present, it marks
the map as sealable. A map created without MAP_SEALABLE will not support
sealing, i.e. mseal() will fail.
Applications that don't care about sealing will expect their behavior
unchanged. For those that need sealing support, opt-in by adding
MAP_SEALABLE in mmap().
The idea that inspired this patch comes from Stephen Röttger’s work in
V8 CFI [5]. Chrome browser in ChromeOS will be the first user of this
API.
Indeed, the Chrome browser has very specific requirements for sealing,
which are distinct from those of most applications. For example, in
the case of libc, sealing is only applied to read-only (RO) or
read-execute (RX) memory segments (such as .text and .RELRO) to
prevent them from becoming writable, the lifetime of those mappings
are tied to the lifetime of the process.
Chrome wants to seal two large address space reservations that are
managed by different allocators. The memory is mapped RW- and RWX
respectively but write access to it is restricted using pkeys (or in
the future ARM permission overlay extensions). The lifetime of those
mappings are not tied to the lifetime of the process, therefore, while
the memory is sealed, the allocators still need to free or discard the
unused memory. For example, with madvise(DONTNEED).
However, always allowing madvise(DONTNEED) on this range poses a
security risk. For example if a jump instruction crosses a page
boundary and the second page gets discarded, it will overwrite the
target bytes with zeros and change the control flow. Checking
write-permission before the discard operation allows us to control
when the operation is valid. In this case, the madvise will only
succeed if the executing thread has PKEY write permissions and PKRU
changes are protected in software by control-flow integrity.
Although the initial version of this patch series is targeting the
Chrome browser as its first user, it became evident during upstream
discussions that we would also want to ensure that the patch set
eventually is a complete solution for memory sealing and compatible
with other use cases. The specific scenario currently in mind is
glibc's use case of loading and sealing ELF executables. To this end,
Stephen is working on a change to glibc to add sealing support to the
dynamic linker, which will seal all non-writable segments at startup.
Once this work is completed, all applications will be able to
automatically benefit from these new protections.
--------------------------------------------------------------------
Change history:
===============
V4:
(Suggested by Linus Torvalds)
- new signature: mseal(start,len,flags)
- 32 bit is not supported. vm_seal is removed, use vm_flags instead.
- single bit in vm_flags for sealed state.
- CONFIG_MSEAL kernel config is removed.
- single bit of PROT_SEAL in the "Prot" field of mmap().
Other changes:
- update selftest (Suggested by Muhammad Usama Anjum)
- update documentation.
Open discussions:
=================
Below discussion were brought up in V3, and did not receive any input:
the one important to this patch is MAP_SEALABLE in mmap(), which is in
current version of patch, list here for input/comments.
---------------------------------------------------------------------
During the development of V3, I had new questions and thoughts and
wished to discuss.
1> shm/aio
From reading the code, it seems to me that aio/shm can mmap/munmap
maps on behalf of userspace, e.g. ksys_shmdt() in shm.c. The lifetime
of those mapping are not tied to the lifetime of the process. If those
memories are sealed from userspace, then unmap will fail. This isn’t a
huge problem, since the memory will eventually be freed at exit or
exec. However, it feels like the solution is not complete, because of
the leaks in VMA address space during the lifetime of the process.
2> Brk (heap/stack)
Currently, userspace applications can seal parts of the heap by
calling malloc() and mseal(). This raises the question of what the
expected behavior is when sealing the heap is attempted.
let's assume following calls from user space:
ptr = malloc(size);
mprotect(ptr, size, RO);
mseal(ptr, size, SEAL_PROT_PKEY);
free(ptr);
Technically, before mseal() is added, the user can change the
protection of the heap by calling mprotect(RO). As long as the user
changes the protection back to RW before free(), the memory can be
reused.
Adding mseal() into picture, however, the heap is then sealed
partially, user can still free it, but the memory remains to be RO,
and the result of brk-shrink is nondeterministic, depending on if
munmap() will try to free the sealed memory.(brk uses munmap to shrink
the heap).
3> Above two cases led to the third topic:
There one option to address the problem mentioned above.
Option 1: A “MAP_SEALABLE” flag in mmap().
If a map is created without this flag, the mseal() operation will
fail. Applications that are not concerned with sealing will expect
their behavior to be unchanged. For those that are concerned, adding a
flag at mmap time to opt in is not difficult. For the short term, this
solves problems 1 and 2 above. The memory in shm/aio/brk will not have
the MAP_SEALABLE flag at mmap(), and the same is true for the heap.
If we choose not to go with path, all mapping will by default
sealable. We could document above mentioned limitations so devs are
more careful at the time to choose what memory to seal. I think
deny of service through mseal() by attacker is probably not a concern,
if attackers have access to mseal() and unsealed memory, then they can
also do other harmful thing to the memory, such as munmap, etc.
4>
I think it might be possible to seal the stack or other special
mappings created at runtime (vdso, vsyscall, vvar). This means we can
enforce and seal W^X for certain types of application. For instance,
the stack is typically used in read-write mode, but in some cases, it
can become executable. To defend against unintented addition of executable
bit to stack, we could let the application to seal it.
Sealing the heap (for adding X) requires special handling, since the
heap can shrink, and shrink is implemented through munmap().
Indeed, it might be possible that all virtual memory accessible to user
space, regardless of its usage pattern, could be sealed. However, this
would require additional research and development work.
=====================================================================
V3:
- Abandon per-syscall approach, (Suggested by Linus Torvalds).
- Organize sealing types around their functionality, such as
MM_SEAL_BASE, MM_SEAL_PROT_PKEY.
- Extend the scope of sealing from calls originated in userspace to
both kernel and userspace. (Suggested by Linus Torvalds)
- Add seal type support in mmap(). (Suggested by Pedro Falcato)
- Add a new sealing type: MM_SEAL_DISCARD_RO_ANON to prevent
destructive operations of madvise. (Suggested by Jann Horn and
Stephen Röttger)
- Make sealed VMAs mergeable. (Suggested by Jann Horn)
- Add MAP_SEALABLE to mmap()
- Add documentation - mseal.rst
https://lore.kernel.org/linux-mm/20231212231706.2680890-2-jeffxu@chromium.o…
v2:
Use _BITUL to define MM_SEAL_XX type.
Use unsigned long for seal type in sys_mseal() and other functions.
Remove internal VM_SEAL_XX type and convert_user_seal_type().
Remove MM_ACTION_XX type.
Remove caller_origin(ON_BEHALF_OF_XX) and replace with sealing bitmask.
Add more comments in code.
Add a detailed commit message.
https://lore.kernel.org/lkml/20231017090815.1067790-1-jeffxu@chromium.org/
v1:
https://lore.kernel.org/lkml/20231016143828.647848-1-jeffxu@chromium.org/
----------------------------------------------------------------
[1] https://kernelnewbies.org/Linux_2_6_8
[2] https://v8.dev/blog/control-flow-integrity
[3] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b…
[4] https://man.openbsd.org/mimmutable.2
[5] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXge…
[6] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426Fkcgnf…
[7] https://lore.kernel.org/lkml/20230515130553.2311248-1-jeffxu@chromium.org/
Jeff Xu (4):
mseal: Wire up mseal syscall
mseal: add mseal syscall
selftest mm/mseal memory sealing
mseal:add documentation
Documentation/userspace-api/mseal.rst | 181 ++
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
include/linux/mm.h | 60 +
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/mman-common.h | 7 +
include/uapi/asm-generic/unistd.h | 5 +-
kernel/sys_ni.c | 1 +
mm/Makefile | 4 +
mm/madvise.c | 12 +
mm/mmap.c | 27 +
mm/mprotect.c | 10 +
mm/mremap.c | 31 +
mm/mseal.c | 330 ++++
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/mseal_test.c | 1971 +++++++++++++++++++
32 files changed, 2659 insertions(+), 2 deletions(-)
create mode 100644 Documentation/userspace-api/mseal.rst
create mode 100644 mm/mseal.c
create mode 100644 tools/testing/selftests/mm/mseal_test.c
--
2.43.0.195.gebba966016-goog
From: Yonghong Song <yonghong.song(a)linux.dev>
[ Upstream commit 100888fb6d8a185866b1520031ee7e3182b173de ]
With latest clang18 (main branch of llvm-project repo), when building bpf selftests,
[~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j
The following compilation error happens:
fatal error: error in backend: Branch target out of insn range
...
Stack dump:
0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi
-I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter
/home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include
-idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf
-c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o
1. <eof> parser at end of file
2. Code generation
...
The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay
since cpu=v4 supports 32-bit branch target offset.
The above failure is due to upstream llvm patch [1] where some inlining behavior
are changed in clang18.
To workaround the issue, previously all 180 loop iterations are fully unrolled.
The bpf macro __BPF_CPU_VERSION__ (implemented in clang18 recently) is used to avoid
unrolling changes if cpu=v4. If __BPF_CPU_VERSION__ is not available and the
compiler is clang18, the unrollng amount is unconditionally reduced.
[1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
Signed-off-by: Yonghong Song <yonghong.song(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Tested-by: Alan Maguire <alan.maguire(a)oracle.com>
Link: https://lore.kernel.org/bpf/20231110193644.3130906-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/pyperf180.c | 22 +++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/pyperf180.c b/tools/testing/selftests/bpf/progs/pyperf180.c
index c39f559d3100..42c4a8b62e36 100644
--- a/tools/testing/selftests/bpf/progs/pyperf180.c
+++ b/tools/testing/selftests/bpf/progs/pyperf180.c
@@ -1,4 +1,26 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2019 Facebook
#define STACK_MAX_LEN 180
+
+/* llvm upstream commit at clang18
+ * https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
+ * changed inlining behavior and caused compilation failure as some branch
+ * target distance exceeded 16bit representation which is the maximum for
+ * cpu v1/v2/v3. Macro __BPF_CPU_VERSION__ is later implemented in clang18
+ * to specify which cpu version is used for compilation. So a smaller
+ * unroll_count can be set if __BPF_CPU_VERSION__ is less than 4, which
+ * reduced some branch target distances and resolved the compilation failure.
+ *
+ * To capture the case where a developer/ci uses clang18 but the corresponding
+ * repo checkpoint does not have __BPF_CPU_VERSION__, a smaller unroll_count
+ * will be set as well to prevent potential compilation failures.
+ */
+#ifdef __BPF_CPU_VERSION__
+#if __BPF_CPU_VERSION__ < 4
+#define UNROLL_COUNT 90
+#endif
+#elif __clang_major__ == 18
+#define UNROLL_COUNT 90
+#endif
+
#include "pyperf.h"
--
2.43.0
From: Yonghong Song <yonghong.song(a)linux.dev>
[ Upstream commit 100888fb6d8a185866b1520031ee7e3182b173de ]
With latest clang18 (main branch of llvm-project repo), when building bpf selftests,
[~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j
The following compilation error happens:
fatal error: error in backend: Branch target out of insn range
...
Stack dump:
0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi
-I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter
/home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include
-idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf
-c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o
1. <eof> parser at end of file
2. Code generation
...
The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay
since cpu=v4 supports 32-bit branch target offset.
The above failure is due to upstream llvm patch [1] where some inlining behavior
are changed in clang18.
To workaround the issue, previously all 180 loop iterations are fully unrolled.
The bpf macro __BPF_CPU_VERSION__ (implemented in clang18 recently) is used to avoid
unrolling changes if cpu=v4. If __BPF_CPU_VERSION__ is not available and the
compiler is clang18, the unrollng amount is unconditionally reduced.
[1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
Signed-off-by: Yonghong Song <yonghong.song(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Tested-by: Alan Maguire <alan.maguire(a)oracle.com>
Link: https://lore.kernel.org/bpf/20231110193644.3130906-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/pyperf180.c | 22 +++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/pyperf180.c b/tools/testing/selftests/bpf/progs/pyperf180.c
index c39f559d3100..42c4a8b62e36 100644
--- a/tools/testing/selftests/bpf/progs/pyperf180.c
+++ b/tools/testing/selftests/bpf/progs/pyperf180.c
@@ -1,4 +1,26 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2019 Facebook
#define STACK_MAX_LEN 180
+
+/* llvm upstream commit at clang18
+ * https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
+ * changed inlining behavior and caused compilation failure as some branch
+ * target distance exceeded 16bit representation which is the maximum for
+ * cpu v1/v2/v3. Macro __BPF_CPU_VERSION__ is later implemented in clang18
+ * to specify which cpu version is used for compilation. So a smaller
+ * unroll_count can be set if __BPF_CPU_VERSION__ is less than 4, which
+ * reduced some branch target distances and resolved the compilation failure.
+ *
+ * To capture the case where a developer/ci uses clang18 but the corresponding
+ * repo checkpoint does not have __BPF_CPU_VERSION__, a smaller unroll_count
+ * will be set as well to prevent potential compilation failures.
+ */
+#ifdef __BPF_CPU_VERSION__
+#if __BPF_CPU_VERSION__ < 4
+#define UNROLL_COUNT 90
+#endif
+#elif __clang_major__ == 18
+#define UNROLL_COUNT 90
+#endif
+
#include "pyperf.h"
--
2.43.0
From: Yonghong Song <yonghong.song(a)linux.dev>
[ Upstream commit 100888fb6d8a185866b1520031ee7e3182b173de ]
With latest clang18 (main branch of llvm-project repo), when building bpf selftests,
[~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j
The following compilation error happens:
fatal error: error in backend: Branch target out of insn range
...
Stack dump:
0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi
-I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter
/home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include
-idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf
-c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o
1. <eof> parser at end of file
2. Code generation
...
The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay
since cpu=v4 supports 32-bit branch target offset.
The above failure is due to upstream llvm patch [1] where some inlining behavior
are changed in clang18.
To workaround the issue, previously all 180 loop iterations are fully unrolled.
The bpf macro __BPF_CPU_VERSION__ (implemented in clang18 recently) is used to avoid
unrolling changes if cpu=v4. If __BPF_CPU_VERSION__ is not available and the
compiler is clang18, the unrollng amount is unconditionally reduced.
[1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
Signed-off-by: Yonghong Song <yonghong.song(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Tested-by: Alan Maguire <alan.maguire(a)oracle.com>
Link: https://lore.kernel.org/bpf/20231110193644.3130906-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/pyperf180.c | 22 +++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/pyperf180.c b/tools/testing/selftests/bpf/progs/pyperf180.c
index c39f559d3100..42c4a8b62e36 100644
--- a/tools/testing/selftests/bpf/progs/pyperf180.c
+++ b/tools/testing/selftests/bpf/progs/pyperf180.c
@@ -1,4 +1,26 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2019 Facebook
#define STACK_MAX_LEN 180
+
+/* llvm upstream commit at clang18
+ * https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
+ * changed inlining behavior and caused compilation failure as some branch
+ * target distance exceeded 16bit representation which is the maximum for
+ * cpu v1/v2/v3. Macro __BPF_CPU_VERSION__ is later implemented in clang18
+ * to specify which cpu version is used for compilation. So a smaller
+ * unroll_count can be set if __BPF_CPU_VERSION__ is less than 4, which
+ * reduced some branch target distances and resolved the compilation failure.
+ *
+ * To capture the case where a developer/ci uses clang18 but the corresponding
+ * repo checkpoint does not have __BPF_CPU_VERSION__, a smaller unroll_count
+ * will be set as well to prevent potential compilation failures.
+ */
+#ifdef __BPF_CPU_VERSION__
+#if __BPF_CPU_VERSION__ < 4
+#define UNROLL_COUNT 90
+#endif
+#elif __clang_major__ == 18
+#define UNROLL_COUNT 90
+#endif
+
#include "pyperf.h"
--
2.43.0
From: Yonghong Song <yonghong.song(a)linux.dev>
[ Upstream commit 100888fb6d8a185866b1520031ee7e3182b173de ]
With latest clang18 (main branch of llvm-project repo), when building bpf selftests,
[~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j
The following compilation error happens:
fatal error: error in backend: Branch target out of insn range
...
Stack dump:
0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi
-I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter
/home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include
-idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf
-c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o
1. <eof> parser at end of file
2. Code generation
...
The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay
since cpu=v4 supports 32-bit branch target offset.
The above failure is due to upstream llvm patch [1] where some inlining behavior
are changed in clang18.
To workaround the issue, previously all 180 loop iterations are fully unrolled.
The bpf macro __BPF_CPU_VERSION__ (implemented in clang18 recently) is used to avoid
unrolling changes if cpu=v4. If __BPF_CPU_VERSION__ is not available and the
compiler is clang18, the unrollng amount is unconditionally reduced.
[1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
Signed-off-by: Yonghong Song <yonghong.song(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Tested-by: Alan Maguire <alan.maguire(a)oracle.com>
Link: https://lore.kernel.org/bpf/20231110193644.3130906-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/pyperf180.c | 22 +++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/pyperf180.c b/tools/testing/selftests/bpf/progs/pyperf180.c
index c39f559d3100..42c4a8b62e36 100644
--- a/tools/testing/selftests/bpf/progs/pyperf180.c
+++ b/tools/testing/selftests/bpf/progs/pyperf180.c
@@ -1,4 +1,26 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2019 Facebook
#define STACK_MAX_LEN 180
+
+/* llvm upstream commit at clang18
+ * https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
+ * changed inlining behavior and caused compilation failure as some branch
+ * target distance exceeded 16bit representation which is the maximum for
+ * cpu v1/v2/v3. Macro __BPF_CPU_VERSION__ is later implemented in clang18
+ * to specify which cpu version is used for compilation. So a smaller
+ * unroll_count can be set if __BPF_CPU_VERSION__ is less than 4, which
+ * reduced some branch target distances and resolved the compilation failure.
+ *
+ * To capture the case where a developer/ci uses clang18 but the corresponding
+ * repo checkpoint does not have __BPF_CPU_VERSION__, a smaller unroll_count
+ * will be set as well to prevent potential compilation failures.
+ */
+#ifdef __BPF_CPU_VERSION__
+#if __BPF_CPU_VERSION__ < 4
+#define UNROLL_COUNT 90
+#endif
+#elif __clang_major__ == 18
+#define UNROLL_COUNT 90
+#endif
+
#include "pyperf.h"
--
2.43.0
From: Yonghong Song <yonghong.song(a)linux.dev>
[ Upstream commit 100888fb6d8a185866b1520031ee7e3182b173de ]
With latest clang18 (main branch of llvm-project repo), when building bpf selftests,
[~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j
The following compilation error happens:
fatal error: error in backend: Branch target out of insn range
...
Stack dump:
0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi
-I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter
/home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include
-idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf
-c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o
1. <eof> parser at end of file
2. Code generation
...
The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay
since cpu=v4 supports 32-bit branch target offset.
The above failure is due to upstream llvm patch [1] where some inlining behavior
are changed in clang18.
To workaround the issue, previously all 180 loop iterations are fully unrolled.
The bpf macro __BPF_CPU_VERSION__ (implemented in clang18 recently) is used to avoid
unrolling changes if cpu=v4. If __BPF_CPU_VERSION__ is not available and the
compiler is clang18, the unrollng amount is unconditionally reduced.
[1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
Signed-off-by: Yonghong Song <yonghong.song(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Tested-by: Alan Maguire <alan.maguire(a)oracle.com>
Link: https://lore.kernel.org/bpf/20231110193644.3130906-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/pyperf180.c | 22 +++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/pyperf180.c b/tools/testing/selftests/bpf/progs/pyperf180.c
index c39f559d3100..42c4a8b62e36 100644
--- a/tools/testing/selftests/bpf/progs/pyperf180.c
+++ b/tools/testing/selftests/bpf/progs/pyperf180.c
@@ -1,4 +1,26 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2019 Facebook
#define STACK_MAX_LEN 180
+
+/* llvm upstream commit at clang18
+ * https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
+ * changed inlining behavior and caused compilation failure as some branch
+ * target distance exceeded 16bit representation which is the maximum for
+ * cpu v1/v2/v3. Macro __BPF_CPU_VERSION__ is later implemented in clang18
+ * to specify which cpu version is used for compilation. So a smaller
+ * unroll_count can be set if __BPF_CPU_VERSION__ is less than 4, which
+ * reduced some branch target distances and resolved the compilation failure.
+ *
+ * To capture the case where a developer/ci uses clang18 but the corresponding
+ * repo checkpoint does not have __BPF_CPU_VERSION__, a smaller unroll_count
+ * will be set as well to prevent potential compilation failures.
+ */
+#ifdef __BPF_CPU_VERSION__
+#if __BPF_CPU_VERSION__ < 4
+#define UNROLL_COUNT 90
+#endif
+#elif __clang_major__ == 18
+#define UNROLL_COUNT 90
+#endif
+
#include "pyperf.h"
--
2.43.0
From: Yonghong Song <yonghong.song(a)linux.dev>
[ Upstream commit 100888fb6d8a185866b1520031ee7e3182b173de ]
With latest clang18 (main branch of llvm-project repo), when building bpf selftests,
[~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j
The following compilation error happens:
fatal error: error in backend: Branch target out of insn range
...
Stack dump:
0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi
-I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter
/home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include
-idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf
-c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o
1. <eof> parser at end of file
2. Code generation
...
The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay
since cpu=v4 supports 32-bit branch target offset.
The above failure is due to upstream llvm patch [1] where some inlining behavior
are changed in clang18.
To workaround the issue, previously all 180 loop iterations are fully unrolled.
The bpf macro __BPF_CPU_VERSION__ (implemented in clang18 recently) is used to avoid
unrolling changes if cpu=v4. If __BPF_CPU_VERSION__ is not available and the
compiler is clang18, the unrollng amount is unconditionally reduced.
[1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
Signed-off-by: Yonghong Song <yonghong.song(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Tested-by: Alan Maguire <alan.maguire(a)oracle.com>
Link: https://lore.kernel.org/bpf/20231110193644.3130906-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/pyperf180.c | 22 +++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/pyperf180.c b/tools/testing/selftests/bpf/progs/pyperf180.c
index c39f559d3100..42c4a8b62e36 100644
--- a/tools/testing/selftests/bpf/progs/pyperf180.c
+++ b/tools/testing/selftests/bpf/progs/pyperf180.c
@@ -1,4 +1,26 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2019 Facebook
#define STACK_MAX_LEN 180
+
+/* llvm upstream commit at clang18
+ * https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
+ * changed inlining behavior and caused compilation failure as some branch
+ * target distance exceeded 16bit representation which is the maximum for
+ * cpu v1/v2/v3. Macro __BPF_CPU_VERSION__ is later implemented in clang18
+ * to specify which cpu version is used for compilation. So a smaller
+ * unroll_count can be set if __BPF_CPU_VERSION__ is less than 4, which
+ * reduced some branch target distances and resolved the compilation failure.
+ *
+ * To capture the case where a developer/ci uses clang18 but the corresponding
+ * repo checkpoint does not have __BPF_CPU_VERSION__, a smaller unroll_count
+ * will be set as well to prevent potential compilation failures.
+ */
+#ifdef __BPF_CPU_VERSION__
+#if __BPF_CPU_VERSION__ < 4
+#define UNROLL_COUNT 90
+#endif
+#elif __clang_major__ == 18
+#define UNROLL_COUNT 90
+#endif
+
#include "pyperf.h"
--
2.43.0
Rae has been shouldering a lot of the KUnit review burden for the last
year, and will continue to do so in the future. Thanks!
Signed-off-by: David Gow <davidgow(a)google.com>
---
MAINTAINERS | 1 +
1 file changed, 1 insertion(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index f8efcb72ad4b..2316d89806dd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11599,6 +11599,7 @@ F: fs/smb/server/
KERNEL UNIT TESTING FRAMEWORK (KUnit)
M: Brendan Higgins <brendanhiggins(a)google.com>
M: David Gow <davidgow(a)google.com>
+R: Rae Moar <rmoar(a)google.com>
L: linux-kselftest(a)vger.kernel.org
L: kunit-dev(a)googlegroups.com
S: Maintained
--
2.43.0.275.g3460e3d667-goog
Number of tests are failing when netdev renaming is active
on the system. Add udevadm settle in logic determining
the names.
Fixes: 242aaf03dc9b ("selftests: add a test for ethtool pause stats")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
CC: shuah(a)kernel.org
CC: saeedm(a)nvidia.com
CC: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/drivers/net/netdevsim/ethtool-common.sh | 1 +
tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh | 1 +
2 files changed, 2 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/netdevsim/ethtool-common.sh b/tools/testing/selftests/drivers/net/netdevsim/ethtool-common.sh
index 922744059aaa..80160579e0cc 100644
--- a/tools/testing/selftests/drivers/net/netdevsim/ethtool-common.sh
+++ b/tools/testing/selftests/drivers/net/netdevsim/ethtool-common.sh
@@ -51,6 +51,7 @@ function make_netdev {
fi
echo $NSIM_ID $@ > /sys/bus/netdevsim/new_device
+ udevadm settle
# get new device name
ls /sys/bus/netdevsim/devices/netdevsim${NSIM_ID}/net/
}
diff --git a/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh b/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh
index 1b08e042cf94..4855ef597a15 100755
--- a/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh
+++ b/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh
@@ -233,6 +233,7 @@ function print_tables {
function get_netdev_name {
local -n old=$1
+ udevadm settle
new=$(ls /sys/class/net)
for netdev in $new; do
--
2.43.0
Nested translation is a hardware feature that is supported by many modern
IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
to get access to the physical address. stage-1 translation table is owned
by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
to stage-1 translation table should be followed by an IOTLB invalidation.
Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.
.-------------. .---------------------------.
| vIOMMU | | Guest I/O page table |
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush --+
'-------------' |
| | V
| | I/O page table pointer in GPA
'-------------'
Guest
------| Shadow |---------------------------|--------
v v v
Host
.-------------. .------------------------.
| pIOMMU | | FS for GIOVA->GPA |
| | '------------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.----------------------------------.
| | | SS for GPA->HPA, unmanaged domain|
| | '----------------------------------'
'-------------'
Where:
- FS = First stage page tables
- SS = Second stage page tables
<Intel VT-d Nested translation>
This series is based on the first part which was merged [1], this series is to
add the cache invalidation interface or the userspace to invalidate cache after
modifying the stage-1 page table. This includes both the iommufd changes and the
VT-d driver changes.
Complete code can be found in [2], QEMU could can be found in [3].
At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks
them for the help. ^_^. Look forward to your feedbacks.
[1] https://lore.kernel.org/linux-iommu/20231026044216.64964-1-yi.l.liu@intel.c… - merged
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1
Change log:
v11:
- Drop hw_error field in vtd cache invalidation uapi. devTLB invalidation
error is a serious security emergency requiring the host kernel to handle.
No need to expose it to userspace (especially given existing VMs doesn't
issue devTLB invalidation at all).
- The vtd qi_submit_sync() and related callers are reverted back to the
original state due to above drop.
- Align with the vtd path, drop the hw_error reporting in mock driver and
selftest as well since selftest is a demo of the real driver.
- Drop iommu_respond_struct_to_user_array() since no more driver want to
respond single entry in the user_array.
- Two typos from Wubinbin
v10: https://lore.kernel.org/all/20240102143834.146165-1-yi.l.liu@intel.com/
- Minor tweak to patch 07 (Kevin)
- Rebase on top of 6.7-rc8
v9: https://lore.kernel.org/linux-iommu/20231228150629.13149-1-yi.l.liu@intel.c…
- Add a test case which sets both IOMMU_TEST_INVALIDATE_FLAG_ALL and
IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR in flags, and expect to succeed
and see an 'error'. (Kevin)
- Returns -ETIMEOUT in qi_check_fault() if caller is interested with the
fault when timeout happens. If not, the qi_submit_sync() will keep retry
hence unable to report the error back to user. For now, only the user cache
invalidation path has interest on the time out error. So this change only
affects the user cache invalidation path. Other path will still hang in
qi_submit_sync() when timeout happens. (Kevin)
v8: https://lore.kernel.org/linux-iommu/20231227161354.67701-1-yi.l.liu@intel.c…
- Pass invalidation hint to the cache invalidation helper in the cache_invalidate_user
op path (Kevin)
- Move the devTLB invalidation out of info->iommu loop (Kevin, Weijiang)
- Clear *fault per restart in qi_submit_sync() to avoid acroos submission error
accumulation. (Kevin)
- Define the vtd cache invalidation uapi structure in separate patch (Kevin)
- Rename inv_error to be hw_error (Kevin)
- Rename 'reqs_uptr', 'req_type', 'req_len' and 'req_num' to be 'data_uptr',
'data_type', "entry_len' and 'entry_num" (Kevin)
- Allow user to set IOMMU_TEST_INVALIDATE_FLAG_ALL and IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR
in the same time (Kevin)
v7: https://lore.kernel.org/linux-iommu/20231221153948.119007-1-yi.l.liu@intel.…
- Remove domain->ops->cache_invalidate_user check in hwpt alloc path due
to failure in bisect (Baolu)
- Remove out_driver_error_code from struct iommu_hwpt_invalidate after
discussion in v6. Should expect per-entry error code.
- Rework the selftest cache invalidation part to report a per-entry error
- Allow user to pass in an empty array to have a try-and-fail mechanism for
user to check if a given req_type is supported by the kernel (Jason)
- Define a separate enum type for cache invalidation data (Jason)
- Fix the IOMMU_HWPT_INVALIDATE to always update the req_num field before
returning (Nicolin)
- Merge the VT-d nesting part 2/2
https://lore.kernel.org/linux-iommu/20231117131816.24359-1-yi.l.liu@intel.c…
into this series to avoid defining empty enum in the middle of the series.
The major difference is adding the VT-d related invalidation uapi structures
together with the generic data structures in patch 02 of this series.
- VT-d driver was refined to report ICE/ITE error from the bottom cache
invalidation submit helpers, hence the cache_invalidate_user op could
report such errors via the per-entry error field to user. VT-d driver
will not stop the invalidation array walking due to the ICE/ITE errors
as such errors are defined by VT-d spec, userspace should be able to
handle it and let the real user (say Virtual Machine) know about it.
But for other errors like invalid uapi data structure configuration,
memory copy failure, such errors should stop the array walking as it
may have more issues if go on.
- Minor fixes per Jason and Kevin's review comments
v6: https://lore.kernel.org/linux-iommu/20231117130717.19875-1-yi.l.liu@intel.c…
- No much change, just rebase on top of 6.7-rc1 as part 1/2 is merged
v5: https://lore.kernel.org/linux-iommu/20231020092426.13907-1-yi.l.liu@intel.c…
- Split the iommufd nesting series into two parts of alloc_user and
invalidation (Jason)
- Split IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING/_NESTED, and
do the same with the structures/alloc()/abort()/destroy(). Reworked the
selftest accordingly too. (Jason)
- Move hwpt/data_type into struct iommu_user_data from standalone op
arguments. (Jason)
- Rename hwpt_type to be data_type, the HWPT_TYPE to be HWPT_ALLOC_DATA,
_TYPE_DEFAULT to be _ALLOC_DATA_NONE (Jason, Kevin)
- Rename iommu_copy_user_data() to iommu_copy_struct_from_user() (Kevin)
- Add macro to the iommu_copy_struct_from_user() to calculate min_size
(Jason)
- Fix two bugs spotted by ZhaoYan
v4: https://lore.kernel.org/linux-iommu/20230921075138.124099-1-yi.l.liu@intel.…
- Separate HWPT alloc/destroy/abort functions between user-managed HWPTs
and kernel-managed HWPTs
- Rework invalidate uAPI to be a multi-request array-based design
- Add a struct iommu_user_data_array and a helper for driver to sanitize
and copy the entry data from user space invalidation array
- Add a patch fixing TEST_LENGTH() in selftest program
- Drop IOMMU_RESV_IOVA_RANGES patches
- Update kdoc and inline comments
- Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation,
this does not change the rule that resv regions should only be added to the
kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series
as it is needed only by SMMU so far.
v3: https://lore.kernel.org/linux-iommu/20230724110406.107212-1-yi.l.liu@intel.…
- Add new uAPI things in alphabetical order
- Pass in "enum iommu_hwpt_type hwpt_type" to op->domain_alloc_user for
sanity, replacing the previous op->domain_alloc_user_data_len solution
- Return ERR_PTR from domain_alloc_user instead of NULL
- Only add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation (Kevin)
- Add IOMMU_RESV_IOVA_RANGES to report resv iova ranges to userspace hence
userspace is able to exclude the ranges in the stage-1 HWPT (e.g. guest I/O
page table). (Kevin)
- Add selftest coverage for the new IOMMU_RESV_IOVA_RANGES ioctl
- Minor changes per Kevin's inputs
v2: https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.c…
- Add union iommu_domain_user_data to include all user data structures to avoid
passing void * in kernel APIs.
- Add iommu op to return user data length for user domain allocation
- Rename struct iommu_hwpt_alloc::data_type to be hwpt_type
- Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len
- Convert cache_invalidate_user op to be int instead of void
- Remove @data_type in struct iommu_hwpt_invalidate
- Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1
v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.…
Thanks,
Yi Liu
Lu Baolu (2):
iommu: Add cache_invalidate_user op
iommu/vt-d: Add iotlb flush for nested domain
Nicolin Chen (4):
iommu: Add iommu_copy_struct_from_user_array helper
iommufd/selftest: Add mock_domain_cache_invalidate_user support
iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
Yi Liu (2):
iommufd: Add IOMMU_HWPT_INVALIDATE
iommufd: Add data structure for Intel VT-d stage-1 cache invalidation
drivers/iommu/intel/nested.c | 88 ++++++++++
drivers/iommu/iommufd/hw_pagetable.c | 41 +++++
drivers/iommu/iommufd/iommufd_private.h | 10 ++
drivers/iommu/iommufd/iommufd_test.h | 23 +++
drivers/iommu/iommufd/main.c | 3 +
drivers/iommu/iommufd/selftest.c | 76 +++++++++
include/linux/iommu.h | 79 +++++++++
include/uapi/linux/iommufd.h | 79 +++++++++
tools/testing/selftests/iommu/iommufd.c | 152 ++++++++++++++++++
tools/testing/selftests/iommu/iommufd_utils.h | 57 +++++++
10 files changed, 608 insertions(+)
--
2.34.1
From: Rae Moar <rmoar(a)google.com>
[ Upstream commit 8ae27bc7fff4ef467a7964821a6cedb34a05d3b2 ]
Add parsing of attributes as diagnostic data. Fixes issue with test plan
being parsed incorrectly as diagnostic data when located after
suite-level attributes.
Note that if there does not exist a test plan line, the diagnostic lines
between the suite header and the first result will be saved in the suite
log rather than the first test case log.
Signed-off-by: Rae Moar <rmoar(a)google.com>
Reviewed-by: David Gow <davidgow(a)google.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/kunit/kunit_parser.py | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/kunit/kunit_parser.py b/tools/testing/kunit/kunit_parser.py
index 79d8832c862a..ce34be15c929 100644
--- a/tools/testing/kunit/kunit_parser.py
+++ b/tools/testing/kunit/kunit_parser.py
@@ -450,7 +450,7 @@ def parse_diagnostic(lines: LineStream) -> List[str]:
Log of diagnostic lines
"""
log = [] # type: List[str]
- non_diagnostic_lines = [TEST_RESULT, TEST_HEADER, KTAP_START, TAP_START]
+ non_diagnostic_lines = [TEST_RESULT, TEST_HEADER, KTAP_START, TAP_START, TEST_PLAN]
while lines and not any(re.match(lines.peek())
for re in non_diagnostic_lines):
log.append(lines.pop())
@@ -726,6 +726,7 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest:
# test plan
test.name = "main"
ktap_line = parse_ktap_header(lines, test)
+ test.log.extend(parse_diagnostic(lines))
parse_test_plan(lines, test)
parent_test = True
else:
@@ -737,6 +738,7 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest:
if parent_test:
# If KTAP version line and/or subtest header is found, attempt
# to parse test plan and print test header
+ test.log.extend(parse_diagnostic(lines))
parse_test_plan(lines, test)
print_test_header(test)
expected_count = test.expected_count
--
2.43.0
From: Thomas Weißschuh <linux(a)weissschuh.net>
[ Upstream commit bdeeeaba83682225a7bf5f100fe8652a59590d33 ]
qemu for LoongArch does not work properly with direct kernel boot.
The kernel will panic during initialization and hang without any output.
When booting in EFI mode everything work correctly.
While users most likely don't have the LoongArch EFI binary installed at
least an explicit error about 'file not found' is better than a hanging
test without output that can never succeed.
Link: https://lore.kernel.org/loongarch/1738d60a-df3a-4102-b1da-d16a29b6e06a@t-8c…
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
Acked-by: Willy Tarreau <w(a)1wt.eu>
Link: https://lore.kernel.org/r/20231031-nolibc-out-of-tree-v1-1-47c92f73590a@wei…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/nolibc/Makefile | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/nolibc/Makefile b/tools/testing/selftests/nolibc/Makefile
index dfe66776a331..be7711014ade 100644
--- a/tools/testing/selftests/nolibc/Makefile
+++ b/tools/testing/selftests/nolibc/Makefile
@@ -88,6 +88,13 @@ QEMU_ARCH_s390 = s390x
QEMU_ARCH_loongarch = loongarch64
QEMU_ARCH = $(QEMU_ARCH_$(XARCH))
+QEMU_BIOS_DIR = /usr/share/edk2/
+QEMU_BIOS_loongarch = $(QEMU_BIOS_DIR)/loongarch64/OVMF_CODE.fd
+
+ifneq ($(QEMU_BIOS_$(XARCH)),)
+QEMU_ARGS_BIOS = -bios $(QEMU_BIOS_$(XARCH))
+endif
+
# QEMU_ARGS : some arch-specific args to pass to qemu
QEMU_ARGS_i386 = -M pc -append "console=ttyS0,9600 i8042.noaux panic=-1 $(TEST:%=NOLIBC_TEST=%)"
QEMU_ARGS_x86_64 = -M pc -append "console=ttyS0,9600 i8042.noaux panic=-1 $(TEST:%=NOLIBC_TEST=%)"
@@ -101,7 +108,7 @@ QEMU_ARGS_ppc64le = -M powernv -append "console=hvc0 panic=-1 $(TEST:%=NOLIBC
QEMU_ARGS_riscv = -M virt -append "console=ttyS0 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
QEMU_ARGS_s390 = -M s390-ccw-virtio -m 1G -append "console=ttyS0 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
QEMU_ARGS_loongarch = -M virt -append "console=ttyS0,115200 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
-QEMU_ARGS = $(QEMU_ARGS_$(XARCH)) $(QEMU_ARGS_EXTRA)
+QEMU_ARGS = $(QEMU_ARGS_$(XARCH)) $(QEMU_ARGS_BIOS) $(QEMU_ARGS_EXTRA)
# OUTPUT is only set when run from the main makefile, otherwise
# it defaults to this nolibc directory.
--
2.43.0
From: Michal Wajdeczko <michal.wajdeczko(a)intel.com>
[ Upstream commit 342fb9789267ee3908959bfa136b82e88e2ce918 ]
If we run parameterized test that uses test->priv to prepare some
custom data, then value of test->priv will leak to the next param
iteration and may be unexpected. This could be easily seen if
we promote example_priv_test to parameterized test as then only
first test iteration will be successful:
$ ./tools/testing/kunit/kunit.py run \
--kunitconfig ./lib/kunit/.kunitconfig *.example_priv*
[ ] Starting KUnit Kernel (1/1)...
[ ] ============================================================
[ ] =================== example (1 subtest) ====================
[ ] ==================== example_priv_test ====================
[ ] [PASSED] example value 3
[ ] # example_priv_test: initializing
[ ] # example_priv_test: ASSERTION FAILED at lib/kunit/kunit-example-test.c:230
[ ] Expected test->priv == ((void *)0), but
[ ] test->priv == 0000000060dfe290
[ ] ((void *)0) == 0000000000000000
[ ] # example_priv_test: cleaning up
[ ] [FAILED] example value 2
[ ] # example_priv_test: initializing
[ ] # example_priv_test: ASSERTION FAILED at lib/kunit/kunit-example-test.c:230
[ ] Expected test->priv == ((void *)0), but
[ ] test->priv == 0000000060dfe290
[ ] ((void *)0) == 0000000000000000
[ ] # example_priv_test: cleaning up
[ ] [FAILED] example value 1
[ ] # example_priv_test: initializing
[ ] # example_priv_test: ASSERTION FAILED at lib/kunit/kunit-example-test.c:230
[ ] Expected test->priv == ((void *)0), but
[ ] test->priv == 0000000060dfe290
[ ] ((void *)0) == 0000000000000000
[ ] # example_priv_test: cleaning up
[ ] [FAILED] example value 0
[ ] # example_priv_test: initializing
[ ] # example_priv_test: cleaning up
[ ] # example_priv_test: pass:1 fail:3 skip:0 total:4
[ ] ================ [FAILED] example_priv_test ================
[ ] # example: initializing suite
[ ] # module: kunit_example_test
[ ] # example: exiting suite
[ ] # Totals: pass:1 fail:3 skip:0 total:4
[ ] ===================== [FAILED] example =====================
Fix that by resetting test->priv after each param iteration, in
similar way what we did for the test->status.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko(a)intel.com>
Cc: David Gow <davidgow(a)google.com>
Cc: Rae Moar <rmoar(a)google.com>
Reviewed-by: David Gow <davidgow(a)google.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
lib/kunit/test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index 7aceb07a1af9..1cdc405daa30 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -660,6 +660,7 @@ int kunit_run_tests(struct kunit_suite *suite)
test.param_index++;
test.status = KUNIT_SUCCESS;
test.status_comment[0] = '\0';
+ test.priv = NULL;
}
}
--
2.43.0
From: Rae Moar <rmoar(a)google.com>
[ Upstream commit 8ae27bc7fff4ef467a7964821a6cedb34a05d3b2 ]
Add parsing of attributes as diagnostic data. Fixes issue with test plan
being parsed incorrectly as diagnostic data when located after
suite-level attributes.
Note that if there does not exist a test plan line, the diagnostic lines
between the suite header and the first result will be saved in the suite
log rather than the first test case log.
Signed-off-by: Rae Moar <rmoar(a)google.com>
Reviewed-by: David Gow <davidgow(a)google.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/kunit/kunit_parser.py | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/kunit/kunit_parser.py b/tools/testing/kunit/kunit_parser.py
index 79d8832c862a..ce34be15c929 100644
--- a/tools/testing/kunit/kunit_parser.py
+++ b/tools/testing/kunit/kunit_parser.py
@@ -450,7 +450,7 @@ def parse_diagnostic(lines: LineStream) -> List[str]:
Log of diagnostic lines
"""
log = [] # type: List[str]
- non_diagnostic_lines = [TEST_RESULT, TEST_HEADER, KTAP_START, TAP_START]
+ non_diagnostic_lines = [TEST_RESULT, TEST_HEADER, KTAP_START, TAP_START, TEST_PLAN]
while lines and not any(re.match(lines.peek())
for re in non_diagnostic_lines):
log.append(lines.pop())
@@ -726,6 +726,7 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest:
# test plan
test.name = "main"
ktap_line = parse_ktap_header(lines, test)
+ test.log.extend(parse_diagnostic(lines))
parse_test_plan(lines, test)
parent_test = True
else:
@@ -737,6 +738,7 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest:
if parent_test:
# If KTAP version line and/or subtest header is found, attempt
# to parse test plan and print test header
+ test.log.extend(parse_diagnostic(lines))
parse_test_plan(lines, test)
print_test_header(test)
expected_count = test.expected_count
--
2.43.0
From: Thomas Weißschuh <linux(a)weissschuh.net>
[ Upstream commit bdeeeaba83682225a7bf5f100fe8652a59590d33 ]
qemu for LoongArch does not work properly with direct kernel boot.
The kernel will panic during initialization and hang without any output.
When booting in EFI mode everything work correctly.
While users most likely don't have the LoongArch EFI binary installed at
least an explicit error about 'file not found' is better than a hanging
test without output that can never succeed.
Link: https://lore.kernel.org/loongarch/1738d60a-df3a-4102-b1da-d16a29b6e06a@t-8c…
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
Acked-by: Willy Tarreau <w(a)1wt.eu>
Link: https://lore.kernel.org/r/20231031-nolibc-out-of-tree-v1-1-47c92f73590a@wei…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/nolibc/Makefile | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/nolibc/Makefile b/tools/testing/selftests/nolibc/Makefile
index a0fc07253baf..eb258ae1d948 100644
--- a/tools/testing/selftests/nolibc/Makefile
+++ b/tools/testing/selftests/nolibc/Makefile
@@ -88,6 +88,13 @@ QEMU_ARCH_s390 = s390x
QEMU_ARCH_loongarch = loongarch64
QEMU_ARCH = $(QEMU_ARCH_$(XARCH))
+QEMU_BIOS_DIR = /usr/share/edk2/
+QEMU_BIOS_loongarch = $(QEMU_BIOS_DIR)/loongarch64/OVMF_CODE.fd
+
+ifneq ($(QEMU_BIOS_$(XARCH)),)
+QEMU_ARGS_BIOS = -bios $(QEMU_BIOS_$(XARCH))
+endif
+
# QEMU_ARGS : some arch-specific args to pass to qemu
QEMU_ARGS_i386 = -M pc -append "console=ttyS0,9600 i8042.noaux panic=-1 $(TEST:%=NOLIBC_TEST=%)"
QEMU_ARGS_x86_64 = -M pc -append "console=ttyS0,9600 i8042.noaux panic=-1 $(TEST:%=NOLIBC_TEST=%)"
@@ -101,7 +108,7 @@ QEMU_ARGS_ppc64le = -M powernv -append "console=hvc0 panic=-1 $(TEST:%=NOLIBC
QEMU_ARGS_riscv = -M virt -append "console=ttyS0 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
QEMU_ARGS_s390 = -M s390-ccw-virtio -m 1G -append "console=ttyS0 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
QEMU_ARGS_loongarch = -M virt -append "console=ttyS0,115200 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
-QEMU_ARGS = $(QEMU_ARGS_$(XARCH)) $(QEMU_ARGS_EXTRA)
+QEMU_ARGS = $(QEMU_ARGS_$(XARCH)) $(QEMU_ARGS_BIOS) $(QEMU_ARGS_EXTRA)
# OUTPUT is only set when run from the main makefile, otherwise
# it defaults to this nolibc directory.
--
2.43.0
Hi,
An essential part of any big kernel submissions is selftests.
At the beginning of TCP-AO project, I made patches to fcnal-test.sh
and nettest.c to have the benefits of easy refactoring, early noticing
breakages, putting a moat around the code, documenting
and designing uAPI.
While tests based on fcnal-test.sh/nettest.c provided initial testing*
and were very easy to add, the pile of TCP-AO quickly grew out of
one-binary + shell-script testing.
The design of the TCP-AO testing is a bit different than one-big
selftest binary as I did previously in net/ipsec.c. I found it
beneficial to avoid implementing a tests runner/scheduler and delegate
it to the user or Makefile. The approach is very influenced
by CRIU/ZDTM testing[1]: it provides a static library with helper
functions and selftest binaries that create specific scenarios.
I also tried to utilize kselftest.h.
test_init() function does all needed preparations. To not leave
any traces after a selftest exists, it creates a network namespace
and if the test wants to establish a TCP connection, a child netns.
The parent and child netns have veth pair with proper ip addresses
and routes set up. Both peers, the client and server are different
pthreads. The treading model was chosen over forking mostly by easiness
of cleanup on a failure: no need to search for children, handle SIGCHLD,
make sure not to wait for a dead peer to perform anything, etc.
Any thread that does exit() naturally kills the tests, sweet!
The selftests are compiled currently in two variants: ipv4 and ipv6.
Ipv4-mapped-ipv6 addresses might be a third variant to add, but it's not
there in this version. As pretty much all tests are shared between two
address families, most of the code can be shared, too. To differ in code
what kind of test is running, Makefile supplies -DIPV6_TEST to compiler
and ifdeffery in tests can do things that have to be different between
address families. This is similar to TARGETS_C_BOTHBITS in x86 selftests
and also to tests code sharing in CRIU/ZDTM.
The total number of tests is 832.
From them rst_ipv{4,6} has currently one flaky subtest, that may fail:
> not ok 9 client connection was not reset: 0
I'll investigate what happens there. Also, unsigned-md5_ipv{4,6}
are flaky because of netns counter checks: it doesn't expect that
there may be retransmitted TCP segments from a previous sub-selftest.
That will be fixed. Besides, key-management_ipv{4,6} has 3 sub-tests
passing with XFAIL:
> ok 15 # XFAIL listen() after current/rnext keys set: the socket has current/rnext keys: 100:200
> ok 16 # XFAIL listen socket, delete current key from before listen(): failed to delete the key 100:100 -16
> ok 17 # XFAIL listen socket, delete rnext key from before listen(): failed to delete the key 200:200 -16
...
> # Totals: pass:117 fail:0 xfail:3 xpass:0 skip:0 error:0
Those need some more kernel work to pass instead of xfail.
The overview of selftests (see the diffstat at the bottom):
├── lib
│ ├── aolib.h
│ │ The header for all selftests to include.
│ ├── kconfig.c
│ │ Kernel kconfig detector to SKIP tests that depend on something.
│ ├── netlink.c
│ │ Netlink helper to add/modify/delete VETH/IPs/routes/VRFs
│ │ I considered just using libmnl, but this is around 400 lines
│ │ and avoids selftests dependency on out-of-tree sources/packets.
│ ├── proc.c
│ │ SNMP/netstat procfs parser and the counters comparator.
│ ├── repair.c
│ │ Heavily influenced by libsoccr and reduced to minimum TCP
│ │ socket checkpoint/repair. Shouldn't be used out of selftests,
│ │ though.
│ ├── setup.c
│ │ All the needed netns/veth/ips/etc preparations for test init.
│ ├── sock.c
│ │ Socket helpers: {s,g}etsockopt()s/connect()/listen()/etc.
│ └── utils.c
│ Random stuff (a pun intended).
├── bench-lookups.c
│ The only benchmark in selftests currently: checks how well TCP-AO
│ setsockopt()s perform, depending on the amount of keys on a socket.
├── connect.c
│ Trivial sample, can be used as a boilerplate to write a new test.
├── connect-deny.c
│ More-or-less what could be expected for TCP-AO in fcnal-test.sh
├── icmps-accept.c -> icmps-discard.c
├── icmps-discard.c
│ Verifies RFC5925 (7.8) by checking that TCP-AO connection can be
│ broken if ICMPs are accepted and survives when ::accept_icmps = 0
├── key-management.c
│ Key manipulations, rotations between randomized hashing algorithms
│ and counter checks for those scenarios.
├── restore.c
│ TCP_AO_REPAIR: verifies that a socket can be re-created without
│ TCP-AO connection being interrupted.
├── rst.c
│ As RST segments are signed on a separate code-path in kernel,
│ verifies passive/active TCP send_reset().
├── self-connect.c
│ Verifies that TCP self-connect and also simultaneous open work.
├── seq-ext.c
│ Utilizes TCP_AO_REPAIR to check that on SEQ roll-over SNE
│ increment is performed and segments with different SNEs fail to
│ pass verification.
├── setsockopt-closed.c
│ Checks that {s,g}etsockopt()s are extendable syscalls and common
│ error-paths for them.
└── unsigned-md5.c
Checks listen() socket for (non-)matching peers with: AO/MD5/none
keys. As well as their interaction with VRFs and AO_REQUIRED flag.
There are certainly more test scenarios that can be added, but even so,
I'm pretty happy that this much of TCP-AO functionality and uAPIs got
covered. These selftests were iteratively developed by me during TCP-AO
kernel upstreaming and the resulting kernel patches would have been
worse without having these tests. They provided the user-side
perspective but also allowed safer refactoring with less possibility
of introducing a regression. Now it's time to use them to dig
a moat around the TCP-AO code!
There are also people from other network companies that work on TCP-AO
(+testing), so sharing these selftests will allow them to contribute
and may benefit from their efforts.
The following changes since commit c7402612e2e61b76177f22e6e7f705adcbecc6fe:
Merge tag 'net-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2023-12-14 13:11:49 -0800)
are available in the Git repository at:
git@github.com:0x7f454c46/linux.git tcp-ao-selftests-v1
for you to fetch changes up to 85dc9bc676985d81f9043fd9c3a506f30851597b:
selftests/net: Add TCP-AO key-management test (2023-12-15 00:44:49 +0000)
----------------------------------------------------------------
* Planning to submit basic TCP-AO tests to fcnal-test.sh/nettest.c
separately.
[1]: https://github.com/checkpoint-restore/criu/tree/criu-dev/test/zdtm/static
Signed-off-by: Dmitry Safonov <dima(a)arista.com>
---
Dmitry Safonov (12):
selftests/net: Add TCP-AO library
selftests/net: Verify that TCP-AO complies with ignoring ICMPs
selftests/net: Add TCP-AO ICMPs accept test
selftests/net: Add a test for TCP-AO keys matching
selftests/net: Add test for TCP-AO add setsockopt() command
selftests/net: Add TCP-AO + TCP-MD5 + no sign listen socket tests
selftests/net: Add test/benchmark for removing MKTs
selftests/net: Add TCP_REPAIR TCP-AO tests
selftests/net: Add SEQ number extension test
selftests/net: Add TCP-AO RST test
selftests/net: Add TCP-AO selfconnect/simultaneous connect test
selftests/net: Add TCP-AO key-management test
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/net/tcp_ao/.gitignore | 2 +
tools/testing/selftests/net/tcp_ao/Makefile | 59 +
tools/testing/selftests/net/tcp_ao/bench-lookups.c | 358 ++++++
tools/testing/selftests/net/tcp_ao/connect-deny.c | 264 +++++
tools/testing/selftests/net/tcp_ao/connect.c | 90 ++
tools/testing/selftests/net/tcp_ao/icmps-accept.c | 1 +
tools/testing/selftests/net/tcp_ao/icmps-discard.c | 449 ++++++++
.../testing/selftests/net/tcp_ao/key-management.c | 1180 ++++++++++++++++++++
tools/testing/selftests/net/tcp_ao/lib/aolib.h | 605 ++++++++++
tools/testing/selftests/net/tcp_ao/lib/kconfig.c | 148 +++
tools/testing/selftests/net/tcp_ao/lib/netlink.c | 415 +++++++
tools/testing/selftests/net/tcp_ao/lib/proc.c | 273 +++++
tools/testing/selftests/net/tcp_ao/lib/repair.c | 254 +++++
tools/testing/selftests/net/tcp_ao/lib/setup.c | 342 ++++++
tools/testing/selftests/net/tcp_ao/lib/sock.c | 592 ++++++++++
tools/testing/selftests/net/tcp_ao/lib/utils.c | 30 +
tools/testing/selftests/net/tcp_ao/restore.c | 236 ++++
tools/testing/selftests/net/tcp_ao/rst.c | 415 +++++++
tools/testing/selftests/net/tcp_ao/self-connect.c | 197 ++++
tools/testing/selftests/net/tcp_ao/seq-ext.c | 245 ++++
.../selftests/net/tcp_ao/setsockopt-closed.c | 835 ++++++++++++++
tools/testing/selftests/net/tcp_ao/unsigned-md5.c | 742 ++++++++++++
23 files changed, 7733 insertions(+)
---
base-commit: c7402612e2e61b76177f22e6e7f705adcbecc6fe
change-id: 20231213-tcp-ao-selftests-d0f323006667
Best regards,
--
Dmitry Safonov <dima(a)arista.com>
Hi folks,
This series implements the functionality of delivering IO page faults to
user space through the IOMMUFD framework for nested translation. Nested
translation is a hardware feature that supports two-stage translation
tables for IOMMU. The second-stage translation table is managed by the
host VMM, while the first-stage translation table is owned by user
space. This allows user space to control the IOMMU mappings for its
devices.
When an IO page fault occurs on the first-stage translation table, the
IOMMU hardware can deliver the page fault to user space through the
IOMMUFD framework. User space can then handle the page fault and respond
to the device top-down through the IOMMUFD. This allows user space to
implement its own IO page fault handling policies.
User space indicates its capability of handling IO page faults by
setting the IOMMU_HWPT_ALLOC_IOPF_CAPABLE flag when allocating a
hardware page table (HWPT). IOMMUFD will then set up its infrastructure
for page fault delivery. On a successful return of HWPT allocation, the
user can retrieve and respond to page faults by reading and writing to
the file descriptor (FD) returned in out_fault_fd.
The iommu selftest framework has been updated to test the IO page fault
delivery and response functionality.
This series is based on the latest implementation of nested translation
under discussion [1] and the page fault handling framework refactoring in
the IOMMU core [2].
The series and related patches are available on GitHub: [3]
[1] https://lore.kernel.org/linux-iommu/20230921075138.124099-1-yi.l.liu@intel.…
[2] https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu.lu@linux.i…
[3] https://github.com/LuBaolu/intel-iommu/commits/iommufd-io-pgfault-delivery-…
Best regards,
baolu
Change log:
v2:
- Move all iommu refactoring patches into a sparated series and discuss
it in a different thread. The latest patch series [v6] is available at
https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu.lu@linux.i…
- We discussed the timeout of the pending page fault messages. We
agreed that we shouldn't apply any timeout policy for the page fault
handling in user space.
https://lore.kernel.org/linux-iommu/20230616113232.GA84678@myrica/
- Jason suggested that we adopt a simple file descriptor interface for
reading and responding to I/O page requests, so that user space
applications can improve performance using io_uring.
https://lore.kernel.org/linux-iommu/ZJWjD1ajeem6pK3I@ziepe.ca/
v1: https://lore.kernel.org/linux-iommu/20230530053724.232765-1-baolu.lu@linux.…
Lu Baolu (6):
iommu: Add iommu page fault cookie helpers
iommufd: Add iommu page fault uapi data
iommufd: Initializing and releasing IO page fault data
iommufd: Deliver fault messages to user space
iommufd/selftest: Add IOMMU_TEST_OP_TRIGGER_IOPF test support
iommufd/selftest: Add coverage for IOMMU_TEST_OP_TRIGGER_IOPF
include/linux/iommu.h | 9 +
drivers/iommu/iommu-priv.h | 15 +
drivers/iommu/iommufd/iommufd_private.h | 12 +
drivers/iommu/iommufd/iommufd_test.h | 8 +
include/uapi/linux/iommufd.h | 65 +++++
tools/testing/selftests/iommu/iommufd_utils.h | 66 ++++-
drivers/iommu/io-pgfault.c | 50 ++++
drivers/iommu/iommufd/device.c | 69 ++++-
drivers/iommu/iommufd/hw_pagetable.c | 260 +++++++++++++++++-
drivers/iommu/iommufd/selftest.c | 56 ++++
tools/testing/selftests/iommu/iommufd.c | 24 +-
.../selftests/iommu/iommufd_fail_nth.c | 2 +-
12 files changed, 620 insertions(+), 16 deletions(-)
--
2.34.1
This adds the pasid attach/detach uAPIs for userspace to attach/detach
a PASID of a device to/from a given ioas/hwpt. Only vfio-pci driver is
enabled in this series. After this series, PASID-capable devices bound
with vfio-pci can report PASID capability to userspace and VM to enable
PASID usages like Shared Virtual Addressing (SVA).
This series first adds the helpers for pasid attach in vfio core and then
add the device cdev ioctls for pasid attach/detach, finally exposes the
device PASID capability to user. It depends on iommufd pasid attach/detach
series [1].
Complete code can be found at [2], tested with a draft Qemu branch[3]
[1] https://lore.kernel.org/linux-iommu/20231127063428.127436-1-yi.l.liu@intel.…
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_pasid
[3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1%…
Change log:
v1:
- Report PASID capability via VFIO_DEVICE_FEATURE (Alex)
rfc: https://lore.kernel.org/linux-iommu/20230926093121.18676-1-yi.l.liu@intel.c…
Regards,
Yi Liu
Kevin Tian (1):
vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices
Yi Liu (2):
vfio: Add VFIO_DEVICE_PASID_[AT|DE]TACH_IOMMUFD_PT
vfio: Report PASID capability via VFIO_DEVICE_FEATURE ioctl
drivers/vfio/device_cdev.c | 45 +++++++++++++++++++++
drivers/vfio/iommufd.c | 48 ++++++++++++++++++++++
drivers/vfio/pci/vfio_pci.c | 2 +
drivers/vfio/pci/vfio_pci_core.c | 47 ++++++++++++++++++++++
drivers/vfio/vfio.h | 4 ++
drivers/vfio/vfio_main.c | 8 ++++
include/linux/vfio.h | 11 ++++++
include/uapi/linux/vfio.h | 68 ++++++++++++++++++++++++++++++++
8 files changed, 233 insertions(+)
--
2.34.1
From: Jeff Xu <jeffxu(a)google.com>
This patchset proposes a new mseal() syscall for the Linux kernel.
In a nutshell, mseal() protects the VMAs of a given virtual memory
range against modifications, such as changes to their permission bits.
Modern CPUs support memory permissions, such as the read/write (RW)
and no-execute (NX) bits. Linux has supported NX since the release of
kernel version 2.6.8 in August 2004 [1]. The memory permission feature
improves the security stance on memory corruption bugs, as an attacker
cannot simply write to arbitrary memory and point the code to it. The
memory must be marked with the X bit, or else an exception will occur.
Internally, the kernel maintains the memory permissions in a data
structure called VMA (vm_area_struct). mseal() additionally protects
the VMA itself against modifications of the selected seal type.
Memory sealing is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For
example, such an attacker primitive can break control-flow integrity
guarantees since read-only memory that is supposed to be trusted can
become writable or .text pages can get remapped. Memory sealing can
automatically be applied by the runtime loader to seal .text and
.rodata pages and applications can additionally seal security critical
data at runtime. A similar feature already exists in the XNU kernel
with the VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the
mimmutable syscall [4]. Also, Chrome wants to adopt this feature for
their CFI work [2] and this patchset has been designed to be
compatible with the Chrome use case.
Two system calls are involved in sealing the map: mmap() and mseal().
The new mseal() is an syscall on 64 bit CPU, and with
following signature:
int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.
mseal() blocks following operations for the given memory range.
1> Unmapping, moving to another location, and shrinking the size,
via munmap() and mremap(), can leave an empty space, therefore can
be replaced with a VMA with a new set of attributes.
2> Moving or expanding a different VMA into the current location,
via mremap().
3> Modifying a VMA via mmap(MAP_FIXED).
4> Size expansion, via mremap(), does not appear to pose any specific
risks to sealed VMAs. It is included anyway because the use case is
unclear. In any case, users can rely on merging to expand a sealed VMA.
5> mprotect() and pkey_mprotect().
6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
memory, when users don't have write permission to the memory. Those
behaviors can alter region contents by discarding pages, effectively a
memset(0) for anonymous memory.
In addition: mmap() has two related changes.
The PROT_SEAL bit in prot field of mmap(). When present, it marks
the map sealed since creation.
The MAP_SEALABLE bit in the flags field of mmap(). When present, it marks
the map as sealable. A map created without MAP_SEALABLE will not support
sealing, i.e. mseal() will fail.
Applications that don't care about sealing will expect their behavior
unchanged. For those that need sealing support, opt-in by adding
MAP_SEALABLE in mmap().
The idea that inspired this patch comes from Stephen Röttger’s work in
V8 CFI [5]. Chrome browser in ChromeOS will be the first user of this
API.
Indeed, the Chrome browser has very specific requirements for sealing,
which are distinct from those of most applications. For example, in
the case of libc, sealing is only applied to read-only (RO) or
read-execute (RX) memory segments (such as .text and .RELRO) to
prevent them from becoming writable, the lifetime of those mappings
are tied to the lifetime of the process.
Chrome wants to seal two large address space reservations that are
managed by different allocators. The memory is mapped RW- and RWX
respectively but write access to it is restricted using pkeys (or in
the future ARM permission overlay extensions). The lifetime of those
mappings are not tied to the lifetime of the process, therefore, while
the memory is sealed, the allocators still need to free or discard the
unused memory. For example, with madvise(DONTNEED).
However, always allowing madvise(DONTNEED) on this range poses a
security risk. For example if a jump instruction crosses a page
boundary and the second page gets discarded, it will overwrite the
target bytes with zeros and change the control flow. Checking
write-permission before the discard operation allows us to control
when the operation is valid. In this case, the madvise will only
succeed if the executing thread has PKEY write permissions and PKRU
changes are protected in software by control-flow integrity.
Although the initial version of this patch series is targeting the
Chrome browser as its first user, it became evident during upstream
discussions that we would also want to ensure that the patch set
eventually is a complete solution for memory sealing and compatible
with other use cases. The specific scenario currently in mind is
glibc's use case of loading and sealing ELF executables. To this end,
Stephen is working on a change to glibc to add sealing support to the
dynamic linker, which will seal all non-writable segments at startup.
Once this work is completed, all applications will be able to
automatically benefit from these new protections.
Change history:
===============
V6:
- Drop RFC from subject, Given Linus's general approval.
- Adjust syscall number for mseal (main Jan.11/2024)
- Code style fix (Matthew Wilcox)
- selftest: use ksft macros (Muhammad Usama Anjum)
- Document fix. (Randy Dunlap)
V5:
- fix build issue in mseal-Wire-up-mseal-syscall
(Suggested by Linus Torvalds, and Greg KH)
- updates on selftest.
https://lore.kernel.org/lkml/20240109154547.1839886-1-jeffxu@chromium.org/#r
V4:
(Suggested by Linus Torvalds)
- new signature: mseal(start,len,flags)
- 32 bit is not supported. vm_seal is removed, use vm_flags instead.
- single bit in vm_flags for sealed state.
- CONFIG_MSEAL kernel config is removed.
- single bit of PROT_SEAL in the "Prot" field of mmap().
Other changes:
- update selftest (Suggested by Muhammad Usama Anjum)
- update documentation.
https://lore.kernel.org/all/20240104185138.169307-1-jeffxu@chromium.org/
V3:
- Abandon per-syscall approach, (Suggested by Linus Torvalds).
- Organize sealing types around their functionality, such as
MM_SEAL_BASE, MM_SEAL_PROT_PKEY.
- Extend the scope of sealing from calls originated in userspace to
both kernel and userspace. (Suggested by Linus Torvalds)
- Add seal type support in mmap(). (Suggested by Pedro Falcato)
- Add a new sealing type: MM_SEAL_DISCARD_RO_ANON to prevent
destructive operations of madvise. (Suggested by Jann Horn and
Stephen Röttger)
- Make sealed VMAs mergeable. (Suggested by Jann Horn)
- Add MAP_SEALABLE to mmap()
- Add documentation - mseal.rst
https://lore.kernel.org/linux-mm/20231212231706.2680890-2-jeffxu@chromium.o…
v2:
Use _BITUL to define MM_SEAL_XX type.
Use unsigned long for seal type in sys_mseal() and other functions.
Remove internal VM_SEAL_XX type and convert_user_seal_type().
Remove MM_ACTION_XX type.
Remove caller_origin(ON_BEHALF_OF_XX) and replace with sealing bitmask.
Add more comments in code.
Add a detailed commit message.
https://lore.kernel.org/lkml/20231017090815.1067790-1-jeffxu@chromium.org/
v1:
https://lore.kernel.org/lkml/20231016143828.647848-1-jeffxu@chromium.org/
----------------------------------------------------------------
[1] https://kernelnewbies.org/Linux_2_6_8
[2] https://v8.dev/blog/control-flow-integrity
[3] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b…
[4] https://man.openbsd.org/mimmutable.2
[5] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXge…
[6] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426Fkcgnf…
[7] https://lore.kernel.org/lkml/20230515130553.2311248-1-jeffxu@chromium.org/
Jeff Xu (4):
mseal: Wire up mseal syscall
mseal: add mseal syscall
selftest mm/mseal memory sealing
mseal:add documentation
Documentation/userspace-api/mseal.rst | 181 ++
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
include/linux/mm.h | 60 +
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/mman-common.h | 8 +
include/uapi/asm-generic/unistd.h | 5 +-
kernel/sys_ni.c | 1 +
mm/Makefile | 4 +
mm/madvise.c | 12 +
mm/mmap.c | 27 +
mm/mprotect.c | 10 +
mm/mremap.c | 31 +
mm/mseal.c | 330 +++
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/mseal_test.c | 1997 +++++++++++++++++++
32 files changed, 2686 insertions(+), 2 deletions(-)
create mode 100644 Documentation/userspace-api/mseal.rst
create mode 100644 mm/mseal.c
create mode 100644 tools/testing/selftests/mm/mseal_test.c
--
2.43.0.275.g3460e3d667-goog
=== Description ===
This is a bpf-treewide change that annotates all kfuncs as such inside
.BTF_ids. This annotation eventually allows us to automatically generate
kfunc prototypes from bpftool.
We store this metadata inside a yet-unused flags field inside struct
btf_id_set8 (thanks Kumar!). pahole will be taught where to look.
More details about the full chain of events are available in commit 3's
description.
The accompanying pahole changes (still needs some cleanup) can be viewed
here on this "frozen" branch [0].
[0]: https://github.com/danobi/pahole/tree/kfunc_btf-mailed
=== Changelog ===
Changes from v2:
* Only WARN() for vmlinux kfuncs
Changes from v1:
* Move WARN_ON() up a call level
* Also return error when kfunc set is not properly tagged
* Use BTF_KFUNCS_START/END instead of flags
* Rename BTF_SET8_KFUNC to BTF_SET8_KFUNCS
Daniel Xu (3):
bpf: btf: Support flags for BTF_SET8 sets
bpf: btf: Add BTF_KFUNCS_START/END macro pair
bpf: treewide: Annotate BPF kfuncs in BTF
drivers/hid/bpf/hid_bpf_dispatch.c | 8 +++----
fs/verity/measure.c | 4 ++--
include/linux/btf_ids.h | 21 +++++++++++++++----
kernel/bpf/btf.c | 8 +++++++
kernel/bpf/cpumask.c | 4 ++--
kernel/bpf/helpers.c | 8 +++----
kernel/bpf/map_iter.c | 4 ++--
kernel/cgroup/rstat.c | 4 ++--
kernel/trace/bpf_trace.c | 8 +++----
net/bpf/test_run.c | 8 +++----
net/core/filter.c | 16 +++++++-------
net/core/xdp.c | 4 ++--
net/ipv4/bpf_tcp_ca.c | 4 ++--
net/ipv4/fou_bpf.c | 4 ++--
net/ipv4/tcp_bbr.c | 4 ++--
net/ipv4/tcp_cubic.c | 4 ++--
net/ipv4/tcp_dctcp.c | 4 ++--
net/netfilter/nf_conntrack_bpf.c | 4 ++--
net/netfilter/nf_nat_bpf.c | 4 ++--
net/xfrm/xfrm_interface_bpf.c | 4 ++--
net/xfrm/xfrm_state_bpf.c | 4 ++--
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 8 +++----
22 files changed, 81 insertions(+), 60 deletions(-)
--
2.42.1
From: Maxim Mikityanskiy <maxim(a)isovalent.com>
The goal of this series is to extend the verifier's capabilities of
tracking scalars when they are spilled to stack, especially when the
spill or fill is narrowing. It also contains a fix by Eduard for
infinite loop detection and a state pruning optimization by Eduard that
compensates for a verification complexity regression introduced by
tracking unbounded scalars. These improvements reduce the surface of
false rejections that I saw while working on Cilium codebase.
Patch 1 (Maxim): Fix for an existing test, it will matter later in the
series.
Patches 2-3 (Eduard): Fixes for false rejections in infinite loop
detection that happen in the selftests when my patches are applied.
Patches 4-5 (Maxim): Fix the inconsistency of find_equal_scalars that
was possible if 32-bit spills were made.
Patches 6-11 (Maxim): Support the case when boundary checks are first
performed after the register was spilled to the stack.
Patches 12-13 (Maxim): Support narrowing fills.
Patches 14-15 (Eduard): Optimization for state pruning in stacksafe() to
mitigate the verification complexity regression.
veristat -e file,prog,states -f '!states_diff<50' -f '!states_pct<10' -f '!states_a<10' -f '!states_b<10' -C ...
* Without patch 14:
File Program States (A) States (B) States (DIFF)
-------------------- ------------ ---------- ---------- ----------------
bpf_xdp.o tail_lb_ipv6 3877 2936 -941 (-24.27%)
pyperf180.bpf.o on_event 8422 10456 +2034 (+24.15%)
pyperf600.bpf.o on_event 22259 37319 +15060 (+67.66%)
pyperf600_iter.bpf.o on_event 400 540 +140 (+35.00%)
strobemeta.bpf.o on_event 4702 13435 +8733 (+185.73%)
* With patch 14:
File Program States (A) States (B) States (DIFF)
-------------------- ------------ ---------- ---------- --------------
bpf_xdp.o tail_lb_ipv6 3877 2937 -940 (-24.25%)
pyperf600_iter.bpf.o on_event 400 500 +100 (+25.00%)
v2 changes:
Fixed comments in patch 1, moved endianness checks to header files in
patch 12 where possible, added Eduard's ACKs.
Eduard Zingerman (4):
bpf: make infinite loop detection in is_state_visited() exact
selftests/bpf: check if imprecise stack spills confuse infinite loop
detection
bpf: Optimize state pruning for spilled scalars
selftests/bpf: states pruning checks for scalar vs STACK_{MISC,ZERO}
Maxim Mikityanskiy (11):
selftests/bpf: Fix the u64_offset_to_skb_data test
bpf: Make bpf_for_each_spilled_reg consider narrow spills
selftests/bpf: Add a test case for 32-bit spill tracking
bpf: Add the assign_scalar_id_before_mov function
bpf: Add the get_reg_width function
bpf: Assign ID to scalars on spill
selftests/bpf: Test assigning ID to scalars on spill
bpf: Track spilled unbounded scalars
selftests/bpf: Test tracking spilled unbounded scalars
bpf: Preserve boundaries and track scalars on narrowing fill
selftests/bpf: Add test cases for narrowing fill
include/linux/bpf_verifier.h | 4 +-
include/linux/filter.h | 12 +
kernel/bpf/verifier.c | 155 ++++-
.../bpf/progs/verifier_direct_packet_access.c | 2 +-
.../selftests/bpf/progs/verifier_loops1.c | 24 +
.../selftests/bpf/progs/verifier_spill_fill.c | 533 +++++++++++++++++-
.../testing/selftests/bpf/verifier/precise.c | 6 +-
7 files changed, 685 insertions(+), 51 deletions(-)
--
2.43.0
The livepatching kselftests rely on comparing expected vs. observed
dmesg output. After each test, new dmesg entries are determined by the
'comm' utility comparing a saved, pre-test copy of dmesg to post-test
dmesg output.
Alexander reports that the 'comm --nocheck-order -13' invocation used by
the tests can be confused when dmesg entry timestamps vary in magnitude
(ie, "[ 98.820331]" vs. "[ 100.031067]"), in which case, additional
messages are reported as new. The unexpected entries then spoil the
test results.
Instead of relying on 'comm' or 'diff' to determine new testing dmesg
entries, refactor the code:
- pre-test : log a unique canary dmesg entry
- test : run tests, log messages
- post-test : filter dmesg starting from pre-test message
Reported-by: Alexander Gordeev <agordeev(a)linux.ibm.com>
Closes: https://lore.kernel.org/live-patching/ZYAimyPYhxVA9wKg@li-008a6a4c-3549-11b…
Signed-off-by: Joe Lawrence <joe.lawrence(a)redhat.com>
---
.../testing/selftests/livepatch/functions.sh | 37 +++++++++----------
1 file changed, 17 insertions(+), 20 deletions(-)
diff --git a/tools/testing/selftests/livepatch/functions.sh b/tools/testing/selftests/livepatch/functions.sh
index c8416c54b463..b1fd7362c2fe 100644
--- a/tools/testing/selftests/livepatch/functions.sh
+++ b/tools/testing/selftests/livepatch/functions.sh
@@ -42,17 +42,6 @@ function die() {
exit 1
}
-# save existing dmesg so we can detect new content
-function save_dmesg() {
- SAVED_DMESG=$(mktemp --tmpdir -t klp-dmesg-XXXXXX)
- dmesg > "$SAVED_DMESG"
-}
-
-# cleanup temporary dmesg file from save_dmesg()
-function cleanup_dmesg_file() {
- rm -f "$SAVED_DMESG"
-}
-
function push_config() {
DYNAMIC_DEBUG=$(grep '^kernel/livepatch' /sys/kernel/debug/dynamic_debug/control | \
awk -F'[: ]' '{print "file " $1 " line " $2 " " $4}')
@@ -99,7 +88,6 @@ function set_ftrace_enabled() {
function cleanup() {
pop_config
- cleanup_dmesg_file
}
# setup_config - save the current config and set a script exit trap that
@@ -280,7 +268,15 @@ function set_pre_patch_ret {
function start_test {
local test="$1"
- save_dmesg
+ # Dump something unique into the dmesg log, then stash the entry
+ # in LAST_DMESG. The check_result() function will use it to
+ # find new kernel messages since the test started.
+ local last_dmesg_msg="livepatch kselftest timestamp: $(date --rfc-3339=ns)"
+ log "$last_dmesg_msg"
+ loop_until 'dmesg | grep -q "$last_dmesg_msg"' ||
+ die "buffer busy? can't find canary dmesg message: $last_dmesg_msg"
+ LAST_DMESG=$(dmesg | grep "$last_dmesg_msg")
+
echo -n "TEST: $test ... "
log "===== TEST: $test ====="
}
@@ -291,23 +287,24 @@ function check_result {
local expect="$*"
local result
- # Note: when comparing dmesg output, the kernel log timestamps
- # help differentiate repeated testing runs. Remove them with a
- # post-comparison sed filter.
-
- result=$(dmesg | comm --nocheck-order -13 "$SAVED_DMESG" - | \
+ # Test results include any new dmesg entry since LAST_DMESG, then:
+ # - include lines matching keywords
+ # - exclude lines matching keywords
+ # - filter out dmesg timestamp prefixes
+ result=$(dmesg | awk -v last_dmesg="$LAST_DMESG" 'p; $0 == last_dmesg { p=1 }' | \
grep -e 'livepatch:' -e 'test_klp' | \
grep -v '\(tainting\|taints\) kernel' | \
sed 's/^\[[ 0-9.]*\] //')
if [[ "$expect" == "$result" ]] ; then
echo "ok"
+ elif [[ "$result" == "" ]] ; then
+ echo -e "not ok\n\nbuffer overrun? can't find canary dmesg entry: $LAST_DMESG\n"
+ die "livepatch kselftest(s) failed"
else
echo -e "not ok\n\n$(diff -upr --label expected --label result <(echo "$expect") <(echo "$result"))\n"
die "livepatch kselftest(s) failed"
fi
-
- cleanup_dmesg_file
}
# check_sysfs_rights(modname, rel_path, expected_rights) - check sysfs
--
2.41.0
This series updates all instances of LLVM Phabricator and Bugzilla links
to point to GitHub commits directly and LLVM's Bugzilla to GitHub issue
shortlinks respectively.
I split up the Phabricator patch into BPF selftests and the rest of the
kernel in case the BPF folks want to take it separately from the rest of
the series, there are obviously no dependency issues in that case. The
Bugzilla change was mechanical enough and should have no conflicts.
I am aiming this at Andrew and CC'ing other lists, in case maintainers
want to chime in, but I think this is pretty uncontroversial (famous
last words...).
---
Nathan Chancellor (3):
selftests/bpf: Update LLVM Phabricator links
arch and include: Update LLVM Phabricator links
treewide: Update LLVM Bugzilla links
arch/arm64/Kconfig | 4 +--
arch/powerpc/Makefile | 4 +--
arch/powerpc/kvm/book3s_hv_nested.c | 2 +-
arch/riscv/Kconfig | 2 +-
arch/riscv/include/asm/ftrace.h | 2 +-
arch/s390/include/asm/ftrace.h | 2 +-
arch/x86/power/Makefile | 2 +-
crypto/blake2b_generic.c | 2 +-
drivers/firmware/efi/libstub/Makefile | 2 +-
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 2 +-
drivers/media/test-drivers/vicodec/codec-fwht.c | 2 +-
drivers/regulator/Kconfig | 2 +-
include/asm-generic/vmlinux.lds.h | 2 +-
include/linux/compiler-clang.h | 2 +-
lib/Kconfig.kasan | 2 +-
lib/raid6/Makefile | 2 +-
lib/stackinit_kunit.c | 2 +-
mm/slab_common.c | 2 +-
net/bridge/br_multicast.c | 2 +-
security/Kconfig | 2 +-
tools/testing/selftests/bpf/README.rst | 32 +++++++++++-----------
tools/testing/selftests/bpf/prog_tests/xdpwall.c | 2 +-
.../selftests/bpf/progs/test_core_reloc_type_id.c | 2 +-
23 files changed, 40 insertions(+), 40 deletions(-)
---
base-commit: 0dd3ee31125508cd67f7e7172247f05b7fd1753a
change-id: 20240109-update-llvm-links-d03f9d649e1e
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
Hi all:
The core frequency is subjected to the process variation in semiconductors.
Not all cores are able to reach the maximum frequency respecting the
infrastructure limits. Consequently, AMD has redefined the concept of
maximum frequency of a part. This means that a fraction of cores can reach
maximum frequency. To find the best process scheduling policy for a given
scenario, OS needs to know the core ordering informed by the platform through
highest performance capability register of the CPPC interface.
Earlier implementations of amd-pstate preferred core only support a static
core ranking and targeted performance. Now it has the ability to dynamically
change the preferred core based on the workload and platform conditions and
accounting for thermals and aging.
Amd-pstate driver utilizes the functions and data structures provided by
the ITMT architecture to enable the scheduler to favor scheduling on cores
which can be get a higher frequency with lower voltage.
We call it amd-pstate preferred core.
Here sched_set_itmt_core_prio() is called to set priorities and
sched_set_itmt_support() is called to enable ITMT feature.
Amd-pstate driver uses the highest performance value to indicate
the priority of CPU. The higher value has a higher priority.
Amd-pstate driver will provide an initial core ordering at boot time.
It relies on the CPPC interface to communicate the core ranking to the
operating system and scheduler to make sure that OS is choosing the cores
with highest performance firstly for scheduling the process. When amd-pstate
driver receives a message with the highest performance change, it will
update the core ranking.
Changes from V12->V13:
- ACPI: CPPC:
- - modify commit message.
- - modify handle function of the notify(0x85).
- cpufreq: amd-pstate:
- - implement update_limits() callback function.
- x86:
- - pick up Acked-By flag added by Petkov.
Changes from V11->V12:
- all:
- - pick up Reviewed-By flag added by Perry.
- cpufreq: amd-pstate:
- - rebase the latest linux-next and fixed conflicts.
- - fixed the issue about cpudata without init in amd_pstate_update_highest_perf().
Changes from V10->V11:
- cpufreq: amd-pstate:
- - according Perry's commnts, I replace the string with str_enabled_disable().
Changes from V9->V10:
- cpufreq: amd-pstate:
- - add judgement for highest_perf. When it is less than 255, the
preferred core feature is enabled. And it will set the priority.
- - deleset "static u32 max_highest_perf" etc, because amd p-state
perferred coe does not require specail process for hotpulg.
Changes form V8->V9:
- all:
- - pick up Tested-By flag added by Oleksandr.
- cpufreq: amd-pstate:
- - pick up Review-By flag added by Wyes.
- - ignore modification of bug.
- - add a attribute of prefcore_ranking.
- - modify data type conversion from u32 to int.
- Documentation: amd-pstate:
- - pick up Review-By flag added by Wyes.
Changes form V7->V8:
- all:
- - pick up Review-By flag added by Mario and Ray.
- cpufreq: amd-pstate:
- - use hw_prefcore embeds into cpudata structure.
- - delete preferred core init from cpu online/off.
Changes form V6->V7:
- x86:
- - Modify kconfig about X86_AMD_PSTATE.
- cpufreq: amd-pstate:
- - modify incorrect comments about scheduler_work().
- - convert highest_perf data type.
- - modify preferred core init when cpu init and online.
- ACPI: CPPC:
- - modify link of CPPC highest performance.
- cpufreq:
- - modify link of CPPC highest performance changed.
Changes form V5->V6:
- cpufreq: amd-pstate:
- - modify the wrong tag order.
- - modify warning about hw_prefcore sysfs attribute.
- - delete duplicate comments.
- - modify the variable name cppc_highest_perf to prefcore_ranking.
- - modify judgment conditions for setting highest_perf.
- - modify sysfs attribute for CPPC highest perf to pr_debug message.
- Documentation: amd-pstate:
- - modify warning: title underline too short.
Changes form V4->V5:
- cpufreq: amd-pstate:
- - modify sysfs attribute for CPPC highest perf.
- - modify warning about comments
- - rebase linux-next
- cpufreq:
- - Moidfy warning about function declarations.
- Documentation: amd-pstate:
- - align with ``amd-pstat``
Changes form V3->V4:
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.
Changes form V2->V3:
- x86:
- - Modify kconfig and description.
- cpufreq: amd-pstate:
- - Add Co-developed-by tag in commit message.
- cpufreq:
- - Modify commit message.
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.
Changes form V1->V2:
- ACPI: CPPC:
- - Add reference link.
- cpufreq:
- - Moidfy link error.
- cpufreq: amd-pstate:
- - Init the priorities of all online CPUs
- - Use a single variable to represent the status of preferred core.
- Documentation:
- - Default enabled preferred core.
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.
- - Default enabled preferred core.
- - Use a single variable to represent the status of preferred core.
*** BLURB HERE ***
Meng Li (7):
x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion.
ACPI: CPPC: Add get the highest performance cppc control
cpufreq: amd-pstate: Enable amd-pstate preferred core supporting.
cpufreq: Add a notification message that the highest perf has changed
cpufreq: amd-pstate: Update amd-pstate preferred core ranking
dynamically
Documentation: amd-pstate: introduce amd-pstate preferred core
Documentation: introduce amd-pstate preferrd core mode kernel command
line options
.../admin-guide/kernel-parameters.txt | 5 +
Documentation/admin-guide/pm/amd-pstate.rst | 59 +++++-
arch/x86/Kconfig | 5 +-
drivers/acpi/cppc_acpi.c | 13 ++
drivers/acpi/processor_driver.c | 6 +
drivers/cpufreq/amd-pstate.c | 183 +++++++++++++++++-
include/acpi/cppc_acpi.h | 5 +
include/linux/amd-pstate.h | 10 +
8 files changed, 274 insertions(+), 12 deletions(-)
--
2.34.1
Add a test to exercize cpu hotplug with the function tracer active to
ensure that sensitive functions in idle path are excluded from being
traced. This helps catch issues such as the one fixed by commit
4b3338aaa74d ("powerpc/ftrace: Fix stack teardown in ftrace_no_trace").
Signed-off-by: Naveen N Rao <naveen(a)kernel.org>
---
v2: Add a check for next available online cpu, as suggested by Masami.
.../ftrace/test.d/ftrace/func_hotplug.tc | 42 +++++++++++++++++++
1 file changed, 42 insertions(+)
create mode 100644 tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc
diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc
new file mode 100644
index 000000000000..ccfbfde3d942
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc
@@ -0,0 +1,42 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0-or-later
+# description: ftrace - function trace across cpu hotplug
+# requires: function:tracer
+
+if ! which nproc ; then
+ nproc() {
+ ls -d /sys/devices/system/cpu/cpu[0-9]* | wc -l
+ }
+fi
+
+NP=`nproc`
+
+if [ $NP -eq 1 ] ;then
+ echo "We cannot test cpu hotplug in UP environment"
+ exit_unresolved
+fi
+
+# Find online cpu
+for i in /sys/devices/system/cpu/cpu[1-9]*; do
+ if [ -f $i/online ] && [ "$(cat $i/online)" = "1" ]; then
+ cpu=$i
+ break
+ fi
+done
+
+if [ -z "$cpu" ]; then
+ echo "We cannot test cpu hotplug with a single cpu online"
+ exit_unresolved
+fi
+
+echo 0 > tracing_on
+echo > trace
+
+: "Set $(basename $cpu) offline/online with function tracer enabled"
+echo function > current_tracer
+echo 1 > tracing_on
+(echo 0 > $cpu/online)
+(echo "forked"; sleep 1)
+(echo 1 > $cpu/online)
+echo 0 > tracing_on
+echo nop > current_tracer
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
--
2.43.0
Use 2 separate variables of types int and unsigned long long instead of
confusing them. This corrects the correct print format for each of them
and removes the build warning:
warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long long unsigned int’
Fixes: a4cb3b243343 ("selftests: mm: add a test for remapping to area immediately after existing mapping")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
Changes since v1:
- Don't just fix the print format, instead use different variables
---
tools/testing/selftests/mm/mremap_test.c | 27 ++++++++++++------------
1 file changed, 14 insertions(+), 13 deletions(-)
diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
index 1d4c1589c305..2f8b991f78cb 100644
--- a/tools/testing/selftests/mm/mremap_test.c
+++ b/tools/testing/selftests/mm/mremap_test.c
@@ -360,7 +360,8 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
char pattern_seed)
{
void *addr, *src_addr, *dest_addr, *dest_preamble_addr;
- unsigned long long i;
+ int d;
+ unsigned long long t;
struct timespec t_start = {0, 0}, t_end = {0, 0};
long long start_ns, end_ns, align_mask, ret, offset;
unsigned long long threshold;
@@ -378,8 +379,8 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
/* Set byte pattern for source block. */
srand(pattern_seed);
- for (i = 0; i < threshold; i++)
- memset((char *) src_addr + i, (char) rand(), 1);
+ for (t = 0; t < threshold; t++)
+ memset((char *) src_addr + t, (char) rand(), 1);
/* Mask to zero out lower bits of address for alignment */
align_mask = ~(c.dest_alignment - 1);
@@ -420,8 +421,8 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
/* Set byte pattern for the dest preamble block. */
srand(pattern_seed);
- for (i = 0; i < c.dest_preamble_size; i++)
- memset((char *) dest_preamble_addr + i, (char) rand(), 1);
+ for (d = 0; d < c.dest_preamble_size; d++)
+ memset((char *) dest_preamble_addr + d, (char) rand(), 1);
}
clock_gettime(CLOCK_MONOTONIC, &t_start);
@@ -437,14 +438,14 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
/* Verify byte pattern after remapping */
srand(pattern_seed);
- for (i = 0; i < threshold; i++) {
+ for (t = 0; t < threshold; t++) {
char c = (char) rand();
- if (((char *) dest_addr)[i] != c) {
+ if (((char *) dest_addr)[t] != c) {
ksft_print_msg("Data after remap doesn't match at offset %llu\n",
- i);
+ t);
ksft_print_msg("Expected: %#x\t Got: %#x\n", c & 0xff,
- ((char *) dest_addr)[i] & 0xff);
+ ((char *) dest_addr)[t] & 0xff);
ret = -1;
goto clean_up_dest;
}
@@ -453,14 +454,14 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
/* Verify the dest preamble byte pattern after remapping */
if (c.dest_preamble_size) {
srand(pattern_seed);
- for (i = 0; i < c.dest_preamble_size; i++) {
+ for (d = 0; d < c.dest_preamble_size; d++) {
char c = (char) rand();
- if (((char *) dest_preamble_addr)[i] != c) {
+ if (((char *) dest_preamble_addr)[d] != c) {
ksft_print_msg("Preamble data after remap doesn't match at offset %d\n",
- i);
+ d);
ksft_print_msg("Expected: %#x\t Got: %#x\n", c & 0xff,
- ((char *) dest_preamble_addr)[i] & 0xff);
+ ((char *) dest_preamble_addr)[d] & 0xff);
ret = -1;
goto clean_up_dest;
}
--
2.42.0
Fix following build warning:
warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long long unsigned int’
Fixes: a4cb3b243343 ("selftests: mm: add a test for remapping to area immediately after existing mapping")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
tools/testing/selftests/mm/mremap_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
index 1d4c1589c305..dd1cbb068982 100644
--- a/tools/testing/selftests/mm/mremap_test.c
+++ b/tools/testing/selftests/mm/mremap_test.c
@@ -457,7 +457,7 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
char c = (char) rand();
if (((char *) dest_preamble_addr)[i] != c) {
- ksft_print_msg("Preamble data after remap doesn't match at offset %d\n",
+ ksft_print_msg("Preamble data after remap doesn't match at offset %llu\n",
i);
ksft_print_msg("Expected: %#x\t Got: %#x\n", c & 0xff,
((char *) dest_preamble_addr)[i] & 0xff);
--
2.42.0
From: Christoph Müllner <christoph.muellner(a)vrull.eu>
When building the RISC-V selftests with a riscv32 compiler I ran into
a couple of compiler warnings. While riscv32 support for these tests is
questionable, the fixes are so trivial that it is probably best to simply
apply them.
Note that the missing-include patch and some format string warnings
are also relevant for riscv64.
Christoph Müllner (5):
tools: selftests: riscv: Fix compile warnings in hwprobe
tools: selftests: riscv: Fix compile warnings in cbo
tools: selftests: riscv: Add missing include for vector test
tools: selftests: riscv: Fix compile warnings in vector tests
tools: selftests: riscv: Fix compile warnings in mm tests
tools/testing/selftests/riscv/hwprobe/cbo.c | 6 +++---
tools/testing/selftests/riscv/hwprobe/hwprobe.c | 4 ++--
tools/testing/selftests/riscv/mm/mmap_test.h | 3 +++
tools/testing/selftests/riscv/vector/v_initval_nolibc.c | 2 +-
tools/testing/selftests/riscv/vector/vstate_exec_nolibc.c | 3 +++
tools/testing/selftests/riscv/vector/vstate_prctl.c | 4 ++--
6 files changed, 14 insertions(+), 8 deletions(-)
--
2.41.0
When execute the dirty_log_test on some aarch64 machine, it sometimes
trigger the ASSERT:
==== Test Assertion Failure ====
dirty_log_test.c:384: dirty_ring_vcpu_ring_full
pid=14854 tid=14854 errno=22 - Invalid argument
1 0x00000000004033eb: dirty_ring_collect_dirty_pages at dirty_log_test.c:384
2 0x0000000000402d27: log_mode_collect_dirty_pages at dirty_log_test.c:505
3 (inlined by) run_test at dirty_log_test.c:802
4 0x0000000000403dc7: for_each_guest_mode at guest_modes.c:100
5 0x0000000000401dff: main at dirty_log_test.c:941 (discriminator 3)
6 0x0000ffff9be173c7: ?? ??:0
7 0x0000ffff9be1749f: ?? ??:0
8 0x000000000040206f: _start at ??:?
Didn't continue vcpu even without ring full
The dirty_log_test fails when execute the dirty-ring test, this is
because the sem_vcpu_cont and the sem_vcpu_stop is non-zero value when
execute the dirty_ring_collect_dirty_pages() function. When those two
sem_t variables are non-zero, the dirty_ring_wait_vcpu() at the
beginning of the dirty_ring_collect_dirty_pages() will not wait for the
vcpu to stop, but continue to execute the following code. In this case,
before vcpu stop, if the dirty_ring_vcpu_ring_full is true, and the
dirty_ring_collect_dirty_pages() has passed the check for the
dirty_ring_vcpu_ring_full but hasn't execute the check for the
continued_vcpu, the vcpu stop, and set the dirty_ring_vcpu_ring_full to
false. Then dirty_ring_collect_dirty_pages() will trigger the ASSERT.
Why sem_vcpu_cont and sem_vcpu_stop can be non-zero value? It's because
the dirty_ring_before_vcpu_join() execute the sem_post(&sem_vcpu_cont)
at the end of each dirty-ring test. It can cause two cases:
1. sem_vcpu_cont be non-zero. When we set the host_quit to be true,
the vcpu_worker directly see the host_quit to be true, it quit. So
the log_mode_before_vcpu_join() function will set the sem_vcpu_cont
to 1, since the vcpu_worker has quit, it won't consume it.
2. sem_vcpu_stop be non-zero. When we set the host_quit to be true,
the vcpu_worker has entered the guest state, the next time it exit
from guest state, it will set the sem_vcpu_stop to 1, and then see
the host_quit, no one will consume the sem_vcpu_stop.
When execute more and more dirty-ring tests, the sem_vcpu_cont and
sem_vcpu_stop can be larger and larger, which makes many code paths
don't wait for the sem_t. Thus finally cause the problem.
To fix this problem, we can wait a while before set the host_quit to
true, which gives the vcpu time to enter the guest state, so it will
exit again. Then we can wait the vcpu to exit, and let it continue
again, then the vcpu will see the host_quit. Thus the sem_vcpu_cont and
sem_vcpu_stop will be both zero when test finished.
Signed-off-by: Shaoqin Huang <shahuang(a)redhat.com>
---
v1->v2:
- Fix the real logic bug, not just fresh the context.
v1: https://lore.kernel.org/all/20231116093536.22256-1-shahuang@redhat.com/
---
tools/testing/selftests/kvm/dirty_log_test.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index 936f3a8d1b83..a6e0ff46a07c 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -417,7 +417,8 @@ static void dirty_ring_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
static void dirty_ring_before_vcpu_join(void)
{
- /* Kick another round of vcpu just to make sure it will quit */
+ /* Wait vcpu exit, and let it continue to see the host_quit. */
+ dirty_ring_wait_vcpu();
sem_post(&sem_vcpu_cont);
}
@@ -719,6 +720,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
struct kvm_vm *vm;
unsigned long *bmap;
uint32_t ring_buf_idx = 0;
+ int sem_val;
if (!log_mode_supported()) {
print_skip("Log mode '%s' not supported",
@@ -726,6 +728,11 @@ static void run_test(enum vm_guest_mode mode, void *arg)
return;
}
+ sem_getvalue(&sem_vcpu_stop, &sem_val);
+ assert(sem_val == 0);
+ sem_getvalue(&sem_vcpu_cont, &sem_val);
+ assert(sem_val == 0);
+
/*
* We reserve page table for 2 times of extra dirty mem which
* will definitely cover the original (1G+) test range. Here
@@ -825,6 +832,13 @@ static void run_test(enum vm_guest_mode mode, void *arg)
sync_global_to_guest(vm, iteration);
}
+ /*
+ *
+ * Before we set the host_quit, let the vcpu has time to run, to make
+ * sure we consume the sem_vcpu_stop and the vcpu consume the
+ * sem_vcpu_cont, to keep the semaphore balance.
+ */
+ usleep(p->interval * 1000);
/* Tell the vcpu thread to quit */
host_quit = true;
log_mode_before_vcpu_join();
--
2.40.1
Now that we have the VISIBLE_IF_KUNIT and EXPORT_SYMBOL_IF_KUNIT macros,
update the instructions to recommend this way of testing static
functions.
Signed-off-by: Arthur Grillo <arthurgrillo(a)riseup.net>
---
Changes in v3:
- Maintain the old '#include' way
- Link to v2: https://lore.kernel.org/r/20240108-kunit-doc-export-v2-1-8f2dd3395fed@riseu…
Changes in v2:
- Fix #if condition
- Link to v1: https://lore.kernel.org/r/20240108-kunit-doc-export-v1-1-119368df0d96@riseu…
---
Documentation/dev-tools/kunit/usage.rst | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst
index c27e1646ecd9..8e35b94a17ec 100644
--- a/Documentation/dev-tools/kunit/usage.rst
+++ b/Documentation/dev-tools/kunit/usage.rst
@@ -671,8 +671,23 @@ Testing Static Functions
------------------------
If we do not want to expose functions or variables for testing, one option is to
-conditionally ``#include`` the test file at the end of your .c file. For
-example:
+conditionally export the used symbol. For example:
+
+.. code-block:: c
+
+ /* In my_file.c */
+
+ VISIBLE_IF_KUNIT int do_interesting_thing();
+ EXPORT_SYMBOL_IF_KUNIT(do_interesting_thing);
+
+ /* In my_file.h */
+
+ #if IS_ENABLED(CONFIG_KUNIT)
+ int do_interesting_thing(void);
+ #endif
+
+Alternatively, you could conditionally ``#include`` the test file at the end of
+your .c file. For example:
.. code-block:: c
---
base-commit: eeb8e8d9f124f279e80ae679f4ba6e822ce4f95f
change-id: 20240108-kunit-doc-export-eec1f910ab67
Best regards,
--
Arthur Grillo <arthurgrillo(a)riseup.net>
The rules to link selftests are:
> $(OUTPUT)/%_ipv4: %.c
> $(LINK.c) $^ $(LDLIBS) -o $@
>
> $(OUTPUT)/%_ipv6: %.c
> $(LINK.c) -DIPV6_TEST $^ $(LDLIBS) -o $@
The intel test robot uses only selftest's Makefile, not the top linux
Makefile:
> make W=1 O=/tmp/kselftest -C tools/testing/selftests
So, $(LINK.c) is determined by environment, rather than by kernel
Makefiles. On my machine (as well as other people that ran tcp-ao
selftests) GNU/Make implicit definition does use $(LDFLAGS):
> [dima@Mindolluin ~]$ make -p -f/dev/null | grep '^LINK.c\>'
> make: *** No targets. Stop.
> LINK.c = $(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
But, according to build robot report, it's not the case for them.
While I could just avoid using pre-defined $(LINK.c), it's also used by
selftests/lib.mk by default.
Anyways, according to GNU/Make documentation [1], I should have used
$(LDLIBS) instead of $(LDFLAGS) in the first place, so let's just do it:
> LDFLAGS
> Extra flags to give to compilers when they are supposed to invoke
> the linker, ‘ld’, such as -L. Libraries (-lfoo) should be added
> to the LDLIBS variable instead.
> LDLIBS
> Library flags or names given to compilers when they are supposed
> to invoke the linker, ‘ld’. LOADLIBES is a deprecated (but still
> supported) alternative to LDLIBS. Non-library linker flags, such
> as -L, should go in the LDFLAGS variable.
[1]: https://www.gnu.org/software/make/manual/html_node/Implicit-Variables.html
Fixes: cfbab37b3da0 ("selftests/net: Add TCP-AO library")
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202401011151.veyYTJzq-lkp@intel.com/
Signed-off-by: Dmitry Safonov <dima(a)arista.com>
---
tools/testing/selftests/net/tcp_ao/Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/tcp_ao/Makefile b/tools/testing/selftests/net/tcp_ao/Makefile
index 8e60bae67aa9..522d991e310e 100644
--- a/tools/testing/selftests/net/tcp_ao/Makefile
+++ b/tools/testing/selftests/net/tcp_ao/Makefile
@@ -52,5 +52,5 @@ $(OUTPUT)/%_ipv6: %.c
$(OUTPUT)/icmps-accept_ipv4: CFLAGS+= -DTEST_ICMPS_ACCEPT
$(OUTPUT)/icmps-accept_ipv6: CFLAGS+= -DTEST_ICMPS_ACCEPT
-$(OUTPUT)/bench-lookups_ipv4: LDFLAGS+= -lm
-$(OUTPUT)/bench-lookups_ipv6: LDFLAGS+= -lm
+$(OUTPUT)/bench-lookups_ipv4: LDLIBS+= -lm
+$(OUTPUT)/bench-lookups_ipv6: LDLIBS+= -lm
---
base-commit: 8cb47d7cd090a690c1785385b2f3d407d4a53ad0
change-id: 20240110-tcp_ao-selftests-makefile-3dafb1e96df8
Best regards,
--
Dmitry Safonov <dima(a)arista.com>
Changes in v5:
* Fixed an issue found by Joe that copied Kbuild files along with the
test modules to the installation directory.
* Added Joe Lawrense review tags.
Changes in v4:
* Documented how to compile the livepatch selftests without running the
tests (Joe)
* Removed the mention to lib/livepatch on MAINTAINERS file, reported by
checkpatch.
Changes in v3:
* Rebased on top of v6.6-rc5
* The commits messages were improved (Thanks Petr!)
* Created TEST_GEN_MODS_DIR variable to point to a directly that contains kernel
modules, and adapt selftests to build it before running the test.
* Moved test_klp-call_getpid out of test_programs, since the gen_tar
would just copy the generated test programs to the livepatches dir,
and so scripts relying on test_programs/test_klp-call_getpid will fail.
* Added a module_param for klp_pids, describing it's usage.
* Simplified the call_getpid program to ignore the return of getpid syscall,
since we only want to make sure the process transitions correctly to the
patched stated
* The test-syscall.sh not prints a log message showing the number of remaining
processes to transition into to livepatched state, and check_output expects it
to be 0.
* Added MODULE_AUTHOR and MODULE_DESCRIPTION to test_klp_syscall.c
- Link to v3: https://lore.kernel.org/r/20231031-send-lp-kselftests-v3-0-2b1655c2605f@sus…
- Link to v2: https://lore.kernel.org/linux-kselftest/20220630141226.2802-1-mpdesouza@sus…
This patchset moves the current kernel testing livepatch modules from
lib/livepatches to tools/testing/selftest/livepatch/test_modules, and compiles
them as out-of-tree modules before testing.
There is also a new test being added. This new test exercises multiple processes
calling a syscall, while a livepatch patched the syscall.
Why this move is an improvement:
* The modules are now compiled as out-of-tree modules against the current
running kernel, making them capable of being tested on different systems with
newer or older kernels.
* Such approach now needs kernel-devel package to be installed, since they are
out-of-tree modules. These can be generated by running "make rpm-pkg" in the
kernel source.
What needs to be solved:
* Currently gen_tar only packages the resulting binaries of the tests, and not
the sources. For the current approach, the newly added modules would be
compiled and then packaged. It works when testing on a system with the same
kernel version. But it will fail when running on a machine with different kernel
version, since module was compiled against the kernel currently running.
This is not a new problem, just aligning the expectations. For the current
approach to be truly system agnostic gen_tar would need to include the module
and program sources to be compiled in the target systems.
Thanks in advance!
Marcos
Signed-off-by: Marcos Paulo de Souza <mpdesouza(a)suse.com>
---
Marcos Paulo de Souza (3):
kselftests: lib.mk: Add TEST_GEN_MODS_DIR variable
livepatch: Move tests from lib/livepatch to selftests/livepatch
selftests: livepatch: Test livepatching a heavily called syscall
Documentation/dev-tools/kselftest.rst | 4 +
MAINTAINERS | 1 -
arch/s390/configs/debug_defconfig | 1 -
arch/s390/configs/defconfig | 1 -
lib/Kconfig.debug | 22 ----
lib/Makefile | 2 -
lib/livepatch/Makefile | 14 ---
tools/testing/selftests/lib.mk | 25 ++++-
tools/testing/selftests/livepatch/Makefile | 5 +-
tools/testing/selftests/livepatch/README | 25 +++--
tools/testing/selftests/livepatch/config | 1 -
tools/testing/selftests/livepatch/functions.sh | 34 +++---
.../testing/selftests/livepatch/test-callbacks.sh | 50 ++++-----
tools/testing/selftests/livepatch/test-ftrace.sh | 6 +-
.../testing/selftests/livepatch/test-livepatch.sh | 10 +-
.../selftests/livepatch/test-shadow-vars.sh | 2 +-
tools/testing/selftests/livepatch/test-state.sh | 18 ++--
tools/testing/selftests/livepatch/test-syscall.sh | 53 ++++++++++
tools/testing/selftests/livepatch/test-sysfs.sh | 6 +-
.../selftests/livepatch/test_klp-call_getpid.c | 44 ++++++++
.../selftests/livepatch/test_modules/Makefile | 20 ++++
.../test_modules}/test_klp_atomic_replace.c | 0
.../test_modules}/test_klp_callbacks_busy.c | 0
.../test_modules}/test_klp_callbacks_demo.c | 0
.../test_modules}/test_klp_callbacks_demo2.c | 0
.../test_modules}/test_klp_callbacks_mod.c | 0
.../livepatch/test_modules}/test_klp_livepatch.c | 0
.../livepatch/test_modules}/test_klp_shadow_vars.c | 0
.../livepatch/test_modules}/test_klp_state.c | 0
.../livepatch/test_modules}/test_klp_state2.c | 0
.../livepatch/test_modules}/test_klp_state3.c | 0
.../livepatch/test_modules/test_klp_syscall.c | 116 +++++++++++++++++++++
32 files changed, 339 insertions(+), 121 deletions(-)
---
base-commit: 89ecef4cb0ac442d5ad48c1aae1e2e1e7744d46f
change-id: 20231031-send-lp-kselftests-4c917dcd4565
Best regards,
--
Marcos Paulo de Souza <mpdesouza(a)suse.com>
Hi all:
The core frequency is subjected to the process variation in semiconductors.
Not all cores are able to reach the maximum frequency respecting the
infrastructure limits. Consequently, AMD has redefined the concept of
maximum frequency of a part. This means that a fraction of cores can reach
maximum frequency. To find the best process scheduling policy for a given
scenario, OS needs to know the core ordering informed by the platform through
highest performance capability register of the CPPC interface.
Earlier implementations of amd-pstate preferred core only support a static
core ranking and targeted performance. Now it has the ability to dynamically
change the preferred core based on the workload and platform conditions and
accounting for thermals and aging.
Amd-pstate driver utilizes the functions and data structures provided by
the ITMT architecture to enable the scheduler to favor scheduling on cores
which can be get a higher frequency with lower voltage.
We call it amd-pstate preferred core.
Here sched_set_itmt_core_prio() is called to set priorities and
sched_set_itmt_support() is called to enable ITMT feature.
Amd-pstate driver uses the highest performance value to indicate
the priority of CPU. The higher value has a higher priority.
Amd-pstate driver will provide an initial core ordering at boot time.
It relies on the CPPC interface to communicate the core ranking to the
operating system and scheduler to make sure that OS is choosing the cores
with highest performance firstly for scheduling the process. When amd-pstate
driver receives a message with the highest performance change, it will
update the core ranking.
Changes from V11->V12:
- all:
- - pick up Reviewed-By flag added by Perry.
- cpufreq: amd-pstate:
- - rebase the latest linux-next and fixed conflicts.
- - fixed the issue about cpudata without init in amd_pstate_update_highest_perf().
Changes from V10->V11:
- cpufreq: amd-pstate:
- - according Perry's commnts, I replace the string with str_enabled_disable().
Changes from V9->V10:
- cpufreq: amd-pstate:
- - add judgement for highest_perf. When it is less than 255, the
preferred core feature is enabled. And it will set the priority.
- - deleset "static u32 max_highest_perf" etc, because amd p-state
perferred coe does not require specail process for hotpulg.
Changes form V8->V9:
- all:
- - pick up Tested-By flag added by Oleksandr.
- cpufreq: amd-pstate:
- - pick up Review-By flag added by Wyes.
- - ignore modification of bug.
- - add a attribute of prefcore_ranking.
- - modify data type conversion from u32 to int.
- Documentation: amd-pstate:
- - pick up Review-By flag added by Wyes.
Changes form V7->V8:
- all:
- - pick up Review-By flag added by Mario and Ray.
- cpufreq: amd-pstate:
- - use hw_prefcore embeds into cpudata structure.
- - delete preferred core init from cpu online/off.
Changes form V6->V7:
- x86:
- - Modify kconfig about X86_AMD_PSTATE.
- cpufreq: amd-pstate:
- - modify incorrect comments about scheduler_work().
- - convert highest_perf data type.
- - modify preferred core init when cpu init and online.
- acpi: cppc:
- - modify link of CPPC highest performance.
- cpufreq:
- - modify link of CPPC highest performance changed.
Changes form V5->V6:
- cpufreq: amd-pstate:
- - modify the wrong tag order.
- - modify warning about hw_prefcore sysfs attribute.
- - delete duplicate comments.
- - modify the variable name cppc_highest_perf to prefcore_ranking.
- - modify judgment conditions for setting highest_perf.
- - modify sysfs attribute for CPPC highest perf to pr_debug message.
- Documentation: amd-pstate:
- - modify warning: title underline too short.
Changes form V4->V5:
- cpufreq: amd-pstate:
- - modify sysfs attribute for CPPC highest perf.
- - modify warning about comments
- - rebase linux-next
- cpufreq:
- - Moidfy warning about function declarations.
- Documentation: amd-pstate:
- - align with ``amd-pstat``
Changes form V3->V4:
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.
Changes form V2->V3:
- x86:
- - Modify kconfig and description.
- cpufreq: amd-pstate:
- - Add Co-developed-by tag in commit message.
- cpufreq:
- - Modify commit message.
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.
Changes form V1->V2:
- acpi: cppc:
- - Add reference link.
- cpufreq:
- - Moidfy link error.
- cpufreq: amd-pstate:
- - Init the priorities of all online CPUs
- - Use a single variable to represent the status of preferred core.
- Documentation:
- - Default enabled preferred core.
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.
- - Default enabled preferred core.
- - Use a single variable to represent the status of preferred core.
Meng Li (7):
x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion.
acpi: cppc: Add get the highest performance cppc control
cpufreq: amd-pstate: Enable amd-pstate preferred core supporting.
cpufreq: Add a notification message that the highest perf has changed
cpufreq: amd-pstate: Update amd-pstate preferred core ranking
dynamically
Documentation: amd-pstate: introduce amd-pstate preferred core
Documentation: introduce amd-pstate preferrd core mode kernel command
line options
.../admin-guide/kernel-parameters.txt | 5 +
Documentation/admin-guide/pm/amd-pstate.rst | 59 +++++-
arch/x86/Kconfig | 5 +-
drivers/acpi/cppc_acpi.c | 13 ++
drivers/acpi/processor_driver.c | 6 +
drivers/cpufreq/amd-pstate.c | 175 +++++++++++++++++-
drivers/cpufreq/cpufreq.c | 13 ++
include/acpi/cppc_acpi.h | 5 +
include/linux/amd-pstate.h | 10 +
include/linux/cpufreq.h | 5 +
10 files changed, 284 insertions(+), 12 deletions(-)
--
2.34.1
Nested translation is a hardware feature that is supported by many modern
IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
to get access to the physical address. stage-1 translation table is owned
by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
to stage-1 translation table should be followed by an IOTLB invalidation.
Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.
.-------------. .---------------------------.
| vIOMMU | | Guest I/O page table |
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush --+
'-------------' |
| | V
| | I/O page table pointer in GPA
'-------------'
Guest
------| Shadow |---------------------------|--------
v v v
Host
.-------------. .------------------------.
| pIOMMU | | FS for GIOVA->GPA |
| | '------------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.----------------------------------.
| | | SS for GPA->HPA, unmanaged domain|
| | '----------------------------------'
'-------------'
Where:
- FS = First stage page tables
- SS = Second stage page tables
<Intel VT-d Nested translation>
This series is based on the first part which was merged [1], this series is to
add the cache invalidation interface or the userspace to invalidate cache after
modifying the stage-1 page table. This includes both the iommufd changes and the
VT-d driver changes.
Complete code can be found in [2], QEMU could can be found in [3].
At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks
them for the help. ^_^. Look forward to your feedbacks.
[1] https://lore.kernel.org/linux-iommu/20231026044216.64964-1-yi.l.liu@intel.c… - merged
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1
Change log:
v8:
- Pass invalidation hint to the cache invalidation helper in the cache_invalidate_user
op path (Kevin)
- Move the devTLB invalidation out of info->iommu loop (Kevin, Weijiang)
- Clear *fault per restart in qi_submit_sync() to avoid acroos submission error
accumulation. (Kevin)
- Define the vtd cache invalidation uapi structure in separate patch (Kevin)
- Rename inv_error to be hw_error (Kevin)
- Rename 'reqs_uptr', 'req_type', 'req_len' and 'req_num' to be 'data_uptr',
'data_type', "entry_len' and 'entry_num" (Kevin)
- Allow user to set IOMMU_TEST_INVALIDATE_FLAG_ALL and IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR
in the same time (Kevin)
v7: https://lore.kernel.org/linux-iommu/20231221153948.119007-1-yi.l.liu@intel.…
- Remove domain->ops->cache_invalidate_user check in hwpt alloc path due
to failure in bisect (Baolu)
- Remove out_driver_error_code from struct iommu_hwpt_invalidate after
discussion in v6. Should expect per-entry error code.
- Rework the selftest cache invalidation part to report a per-entry error
- Allow user to pass in an empty array to have a try-and-fail mechanism for
user to check if a given req_type is supported by the kernel (Jason)
- Define a separate enum type for cache invalidation data (Jason)
- Fix the IOMMU_HWPT_INVALIDATE to always update the req_num field before
returning (Nicolin)
- Merge the VT-d nesting part 2/2
https://lore.kernel.org/linux-iommu/20231117131816.24359-1-yi.l.liu@intel.c…
into this series to avoid defining empty enum in the middle of the series.
The major difference is adding the VT-d related invalidation uapi structures
together with the generic data structures in patch 02 of this series.
- VT-d driver was refined to report ICE/ITE error from the bottom cache
invalidation submit helpers, hence the cache_invalidate_user op could
report such errors via the per-entry error field to user. VT-d driver
will not stop the invalidation array walking due to the ICE/ITE errors
as such errors are defined by VT-d spec, userspace should be able to
handle it and let the real user (say Virtual Machine) know about it.
But for other errors like invalid uapi data structure configuration,
memory copy failure, such errors should stop the array walking as it
may have more issues if go on.
- Minor fixes per Jason and Kevin's review comments
v6: https://lore.kernel.org/linux-iommu/20231117130717.19875-1-yi.l.liu@intel.c…
- No much change, just rebase on top of 6.7-rc1 as part 1/2 is merged
v5: https://lore.kernel.org/linux-iommu/20231020092426.13907-1-yi.l.liu@intel.c…
- Split the iommufd nesting series into two parts of alloc_user and
invalidation (Jason)
- Split IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING/_NESTED, and
do the same with the structures/alloc()/abort()/destroy(). Reworked the
selftest accordingly too. (Jason)
- Move hwpt/data_type into struct iommu_user_data from standalone op
arguments. (Jason)
- Rename hwpt_type to be data_type, the HWPT_TYPE to be HWPT_ALLOC_DATA,
_TYPE_DEFAULT to be _ALLOC_DATA_NONE (Jason, Kevin)
- Rename iommu_copy_user_data() to iommu_copy_struct_from_user() (Kevin)
- Add macro to the iommu_copy_struct_from_user() to calculate min_size
(Jason)
- Fix two bugs spotted by ZhaoYan
v4: https://lore.kernel.org/linux-iommu/20230921075138.124099-1-yi.l.liu@intel.…
- Separate HWPT alloc/destroy/abort functions between user-managed HWPTs
and kernel-managed HWPTs
- Rework invalidate uAPI to be a multi-request array-based design
- Add a struct iommu_user_data_array and a helper for driver to sanitize
and copy the entry data from user space invalidation array
- Add a patch fixing TEST_LENGTH() in selftest program
- Drop IOMMU_RESV_IOVA_RANGES patches
- Update kdoc and inline comments
- Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation,
this does not change the rule that resv regions should only be added to the
kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series
as it is needed only by SMMU so far.
v3: https://lore.kernel.org/linux-iommu/20230724110406.107212-1-yi.l.liu@intel.…
- Add new uAPI things in alphabetical order
- Pass in "enum iommu_hwpt_type hwpt_type" to op->domain_alloc_user for
sanity, replacing the previous op->domain_alloc_user_data_len solution
- Return ERR_PTR from domain_alloc_user instead of NULL
- Only add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation (Kevin)
- Add IOMMU_RESV_IOVA_RANGES to report resv iova ranges to userspace hence
userspace is able to exclude the ranges in the stage-1 HWPT (e.g. guest I/O
page table). (Kevin)
- Add selftest coverage for the new IOMMU_RESV_IOVA_RANGES ioctl
- Minor changes per Kevin's inputs
v2: https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.c…
- Add union iommu_domain_user_data to include all user data structures to avoid
passing void * in kernel APIs.
- Add iommu op to return user data length for user domain allocation
- Rename struct iommu_hwpt_alloc::data_type to be hwpt_type
- Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len
- Convert cache_invalidate_user op to be int instead of void
- Remove @data_type in struct iommu_hwpt_invalidate
- Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1
v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.…
Thanks,
Yi Liu
Lu Baolu (4):
iommu: Add cache_invalidate_user op
iommu/vt-d: Allow qi_submit_sync() to return the QI faults
iommu/vt-d: Convert stage-1 cache invalidation to return QI fault
iommu/vt-d: Add iotlb flush for nested domain
Nicolin Chen (4):
iommu: Add iommu_copy_struct_from_user_array helper
iommufd/selftest: Add mock_domain_cache_invalidate_user support
iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
Yi Liu (2):
iommufd: Add IOMMU_HWPT_INVALIDATE
iommufd: Add data structure for Intel VT-d stage-1 cache invalidation
drivers/iommu/intel/dmar.c | 38 ++--
drivers/iommu/intel/iommu.c | 12 +-
drivers/iommu/intel/iommu.h | 8 +-
drivers/iommu/intel/irq_remapping.c | 2 +-
drivers/iommu/intel/nested.c | 118 ++++++++++++
drivers/iommu/intel/pasid.c | 14 +-
drivers/iommu/intel/svm.c | 14 +-
drivers/iommu/iommufd/hw_pagetable.c | 41 ++++
drivers/iommu/iommufd/iommufd_private.h | 10 +
drivers/iommu/iommufd/iommufd_test.h | 39 ++++
drivers/iommu/iommufd/main.c | 3 +
drivers/iommu/iommufd/selftest.c | 86 +++++++++
include/linux/iommu.h | 100 ++++++++++
include/uapi/linux/iommufd.h | 98 ++++++++++
tools/testing/selftests/iommu/iommufd.c | 179 ++++++++++++++++++
tools/testing/selftests/iommu/iommufd_utils.h | 57 ++++++
16 files changed, 781 insertions(+), 38 deletions(-)
--
2.34.1
From: Jeff Xu <jeffxu(a)chromium.org>
This patchset proposes a new mseal() syscall for the Linux kernel.
In a nutshell, mseal() protects the VMAs of a given virtual memory
range against modifications, such as changes to their permission bits.
Modern CPUs support memory permissions, such as the read/write (RW)
and no-execute (NX) bits. Linux has supported NX since the release of
kernel version 2.6.8 in August 2004 [1]. The memory permission feature
improves the security stance on memory corruption bugs, as an attacker
cannot simply write to arbitrary memory and point the code to it. The
memory must be marked with the X bit, or else an exception will occur.
Internally, the kernel maintains the memory permissions in a data
structure called VMA (vm_area_struct). mseal() additionally protects
the VMA itself against modifications of the selected seal type.
Memory sealing is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For
example, such an attacker primitive can break control-flow integrity
guarantees since read-only memory that is supposed to be trusted can
become writable or .text pages can get remapped. Memory sealing can
automatically be applied by the runtime loader to seal .text and
.rodata pages and applications can additionally seal security critical
data at runtime. A similar feature already exists in the XNU kernel
with the VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the
mimmutable syscall [4]. Also, Chrome wants to adopt this feature for
their CFI work [2] and this patchset has been designed to be
compatible with the Chrome use case.
Two system calls are involved in sealing the map: mmap() and mseal().
The new mseal() is an syscall on 64 bit CPU, and with
following signature:
int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.
mseal() blocks following operations for the given memory range.
1> Unmapping, moving to another location, and shrinking the size,
via munmap() and mremap(), can leave an empty space, therefore can
be replaced with a VMA with a new set of attributes.
2> Moving or expanding a different VMA into the current location,
via mremap().
3> Modifying a VMA via mmap(MAP_FIXED).
4> Size expansion, via mremap(), does not appear to pose any specific
risks to sealed VMAs. It is included anyway because the use case is
unclear. In any case, users can rely on merging to expand a sealed VMA.
5> mprotect() and pkey_mprotect().
6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
memory, when users don't have write permission to the memory. Those
behaviors can alter region contents by discarding pages, effectively a
memset(0) for anonymous memory.
In addition: mmap() has two related changes.
The PROT_SEAL bit in prot field of mmap(). When present, it marks
the map sealed since creation.
The MAP_SEALABLE bit in the flags field of mmap(). When present, it marks
the map as sealable. A map created without MAP_SEALABLE will not support
sealing, i.e. mseal() will fail.
Applications that don't care about sealing will expect their behavior
unchanged. For those that need sealing support, opt-in by adding
MAP_SEALABLE in mmap().
The idea that inspired this patch comes from Stephen Röttger’s work in
V8 CFI [5]. Chrome browser in ChromeOS will be the first user of this
API.
Indeed, the Chrome browser has very specific requirements for sealing,
which are distinct from those of most applications. For example, in
the case of libc, sealing is only applied to read-only (RO) or
read-execute (RX) memory segments (such as .text and .RELRO) to
prevent them from becoming writable, the lifetime of those mappings
are tied to the lifetime of the process.
Chrome wants to seal two large address space reservations that are
managed by different allocators. The memory is mapped RW- and RWX
respectively but write access to it is restricted using pkeys (or in
the future ARM permission overlay extensions). The lifetime of those
mappings are not tied to the lifetime of the process, therefore, while
the memory is sealed, the allocators still need to free or discard the
unused memory. For example, with madvise(DONTNEED).
However, always allowing madvise(DONTNEED) on this range poses a
security risk. For example if a jump instruction crosses a page
boundary and the second page gets discarded, it will overwrite the
target bytes with zeros and change the control flow. Checking
write-permission before the discard operation allows us to control
when the operation is valid. In this case, the madvise will only
succeed if the executing thread has PKEY write permissions and PKRU
changes are protected in software by control-flow integrity.
Although the initial version of this patch series is targeting the
Chrome browser as its first user, it became evident during upstream
discussions that we would also want to ensure that the patch set
eventually is a complete solution for memory sealing and compatible
with other use cases. The specific scenario currently in mind is
glibc's use case of loading and sealing ELF executables. To this end,
Stephen is working on a change to glibc to add sealing support to the
dynamic linker, which will seal all non-writable segments at startup.
Once this work is completed, all applications will be able to
automatically benefit from these new protections.
Change history:
===============
V5:
- fix build issue in mseal-Wire-up-mseal-syscall
(Suggested by Linus Torvalds, and Greg KH)
- updates on selftest.
V4:
(Suggested by Linus Torvalds)
- new signature: mseal(start,len,flags)
- 32 bit is not supported. vm_seal is removed, use vm_flags instead.
- single bit in vm_flags for sealed state.
- CONFIG_MSEAL kernel config is removed.
- single bit of PROT_SEAL in the "Prot" field of mmap().
Other changes:
- update selftest (Suggested by Muhammad Usama Anjum)
- update documentation.
https://lore.kernel.org/all/20240104185138.169307-1-jeffxu@chromium.org/
V3:
- Abandon per-syscall approach, (Suggested by Linus Torvalds).
- Organize sealing types around their functionality, such as
MM_SEAL_BASE, MM_SEAL_PROT_PKEY.
- Extend the scope of sealing from calls originated in userspace to
both kernel and userspace. (Suggested by Linus Torvalds)
- Add seal type support in mmap(). (Suggested by Pedro Falcato)
- Add a new sealing type: MM_SEAL_DISCARD_RO_ANON to prevent
destructive operations of madvise. (Suggested by Jann Horn and
Stephen Röttger)
- Make sealed VMAs mergeable. (Suggested by Jann Horn)
- Add MAP_SEALABLE to mmap()
- Add documentation - mseal.rst
https://lore.kernel.org/linux-mm/20231212231706.2680890-2-jeffxu@chromium.o…
v2:
Use _BITUL to define MM_SEAL_XX type.
Use unsigned long for seal type in sys_mseal() and other functions.
Remove internal VM_SEAL_XX type and convert_user_seal_type().
Remove MM_ACTION_XX type.
Remove caller_origin(ON_BEHALF_OF_XX) and replace with sealing bitmask.
Add more comments in code.
Add a detailed commit message.
https://lore.kernel.org/lkml/20231017090815.1067790-1-jeffxu@chromium.org/
v1:
https://lore.kernel.org/lkml/20231016143828.647848-1-jeffxu@chromium.org/
----------------------------------------------------------------
[1] https://kernelnewbies.org/Linux_2_6_8
[2] https://v8.dev/blog/control-flow-integrity
[3] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b…
[4] https://man.openbsd.org/mimmutable.2
[5] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXge…
[6] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426Fkcgnf…
[7] https://lore.kernel.org/lkml/20230515130553.2311248-1-jeffxu@chromium.org/
Jeff Xu (4):
mseal: Wire up mseal syscall
mseal: add mseal syscall
selftest mm/mseal memory sealing
mseal:add documentation
Documentation/userspace-api/mseal.rst | 181 ++
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
include/linux/mm.h | 60 +
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/mman-common.h | 7 +
include/uapi/asm-generic/unistd.h | 5 +-
kernel/sys_ni.c | 1 +
mm/Makefile | 4 +
mm/madvise.c | 12 +
mm/mmap.c | 27 +
mm/mprotect.c | 10 +
mm/mremap.c | 31 +
mm/mseal.c | 330 +++
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/mseal_test.c | 1989 +++++++++++++++++++
32 files changed, 2677 insertions(+), 2 deletions(-)
create mode 100644 Documentation/userspace-api/mseal.rst
create mode 100644 mm/mseal.c
create mode 100644 tools/testing/selftests/mm/mseal_test.c
--
2.43.0.195.gebba966016-goog
The kernel sefltest mm/hugepage-vmemmap fails on architectures
which has different page size other than 4K. In hugepage-vmemmap
page size used is 4k so the pfn calculation will go wrong on systems
which has different page size .The length of MAP_HUGETLB memory must
be hugepage aligned but in hugepage-vmemmap map length is 2M so this
will not get aligned if the system has differnet hugepage size.
Added psize() to get the page size and default_huge_page_size() to
get the default hugepage size at run time, hugepage-vmemmap test pass
on powerpc with 64K page size and x86 with 4K page size.
Result on powerpc without patch (page size 64K)
*# ./hugepage-vmemmap
Returned address is 0x7effff000000 whose pfn is 0
Head page flags (100000000) is invalid
check_page_flags: Invalid argument
*#
Result on powerpc with patch (page size 64K)
*# ./hugepage-vmemmap
Returned address is 0x7effff000000 whose pfn is 600
*#
Result on x86 with patch (page size 4K)
*# ./hugepage-vmemmap
Returned address is 0x7fc7c2c00000 whose pfn is 1dac00
*#
Signed-off-by: Donet Tom <donettom(a)linux.vnet.ibm.com>
Reported-by : Geetika Moolchandani (geetika(a)linux.ibm.com)
Tested-by : Geetika Moolchandani (geetika(a)linux.ibm.com)
---
tools/testing/selftests/mm/hugepage-vmemmap.c | 29 ++++++++++++-------
1 file changed, 18 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/mm/hugepage-vmemmap.c b/tools/testing/selftests/mm/hugepage-vmemmap.c
index 5b354c209e93..894d28c3dd47 100644
--- a/tools/testing/selftests/mm/hugepage-vmemmap.c
+++ b/tools/testing/selftests/mm/hugepage-vmemmap.c
@@ -10,10 +10,7 @@
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
-
-#define MAP_LENGTH (2UL * 1024 * 1024)
-
-#define PAGE_SIZE 4096
+#include "vm_util.h"
#define PAGE_COMPOUND_HEAD (1UL << 15)
#define PAGE_COMPOUND_TAIL (1UL << 16)
@@ -39,6 +36,9 @@
#define MAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
#endif
+static size_t pagesize;
+static size_t maplength;
+
static void write_bytes(char *addr, size_t length)
{
unsigned long i;
@@ -56,7 +56,7 @@ static unsigned long virt_to_pfn(void *addr)
if (fd < 0)
return -1UL;
- lseek(fd, (unsigned long)addr / PAGE_SIZE * sizeof(pagemap), SEEK_SET);
+ lseek(fd, (unsigned long)addr / pagesize * sizeof(pagemap), SEEK_SET);
read(fd, &pagemap, sizeof(pagemap));
close(fd);
@@ -86,7 +86,7 @@ static int check_page_flags(unsigned long pfn)
* this also verifies kernel has correctly set the fake page_head to tail
* while hugetlb_free_vmemmap is enabled.
*/
- for (i = 1; i < MAP_LENGTH / PAGE_SIZE; i++) {
+ for (i = 1; i < maplength / pagesize; i++) {
read(fd, &pageflags, sizeof(pageflags));
if ((pageflags & TAIL_PAGE_FLAGS) != TAIL_PAGE_FLAGS ||
(pageflags & HEAD_PAGE_FLAGS) == HEAD_PAGE_FLAGS) {
@@ -106,18 +106,25 @@ int main(int argc, char **argv)
void *addr;
unsigned long pfn;
- addr = mmap(MAP_ADDR, MAP_LENGTH, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0);
+ pagesize = psize();
+ maplength = default_huge_page_size();
+ if (!maplength) {
+ printf("Unable to determine huge page size\n");
+ exit(1);
+ }
+
+ addr = mmap(MAP_ADDR, maplength, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap");
exit(1);
}
/* Trigger allocation of HugeTLB page. */
- write_bytes(addr, MAP_LENGTH);
+ write_bytes(addr, maplength);
pfn = virt_to_pfn(addr);
if (pfn == -1UL) {
- munmap(addr, MAP_LENGTH);
+ munmap(addr, maplength);
perror("virt_to_pfn");
exit(1);
}
@@ -125,13 +132,13 @@ int main(int argc, char **argv)
printf("Returned address is %p whose pfn is %lx\n", addr, pfn);
if (check_page_flags(pfn) < 0) {
- munmap(addr, MAP_LENGTH);
+ munmap(addr, maplength);
perror("check_page_flags");
exit(1);
}
/* munmap() length of MAP_HUGETLB memory must be hugepage aligned */
- if (munmap(addr, MAP_LENGTH)) {
+ if (munmap(addr, maplength)) {
perror("munmap");
exit(1);
}
--
2.43.0
This test case triggers a race between madvise(MADV_DONTNEED) and
mmap() in a single huge page, which got stolen (while reserved).
Once the only page is stolen, the memory previously mmaped (and
madvise(MADV_DONTNEED) got a SIGBUS when accessed.
I am not adding this test to the un_vmtests.sh scripts, since this test
fails at upstream.
Breno Leitao (1):
selftests/mm: add a new test for madv and hugetlb mmap
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/hugetlb_madv_vs_map.c | 124 ++++++++++++++++++
3 files changed, 126 insertions(+)
create mode 100644 tools/testing/selftests/mm/hugetlb_madv_vs_map.c
--
2.34.1
Hi Linus,
Please pull the nolibc update for Linux 6.8-rc1.
This nolibc update for Linux 6.8-rc1 consists of:
* Support for PIC mode on MIPS.
* Support for getrlimit()/setrlimit().
* Replace some custom declarations with UAPI includes.
* A new script "run-tests.sh" to run the testsuite over different architectures
and configurations.
* A few non-functional code cleanups.
* Minor improvements to nolibc-test, primarily to support the test script.
There are no urgent fixes available at this time.
diff is attached. Build and nolibc tests were run on next.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit b85ea95d086471afb4ad062012a4d73cd328fa86:
Linux 6.7-rc1 (2023-11-12 16:19:07 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-nolibc-6.8-rc1
for you to fetch changes up to d543d9ddf593b1f4cb1d57d9ac0ad279fe18adaf:
selftests/nolibc: disable coredump via setrlimit (2023-12-11 22:38:37 +0100)
----------------------------------------------------------------
linux_kselftest-nolibc-6.8-rc1
This nolibc update for Linux 6.8-rc1 consists of:
* Support for PIC mode on MIPS.
* Support for getrlimit()/setrlimit().
* Replace some custom declarations with UAPI includes.
* A new script "run-tests.sh" to run the testsuite over different architectures
and configurations.
* A few non-functional code cleanups.
* Minor improvements to nolibc-test, primarily to support the test script.
There are no urgent fixes available at this time.
----------------------------------------------------------------
Mark Brown (1):
tools/nolibc: Use linux/wait.h rather than duplicating it
Thomas Weißschuh (21):
selftests/nolibc: don't hang on config input
selftests/nolibc: use EFI -bios for LoongArch qemu
selftests/nolibc: anchor paths in $(srcdir) if possible
selftests/nolibc: support out-of-tree builds
selftests/nolibc: add script to run testsuite
tools/nolibc: error out on unsupported architecture
tools/nolibc: move MIPS ABI validation into arch-mips.h
selftests/nolibc: use XARCH for MIPS
selftests/nolibc: explicitly specify ABI for MIPS
selftests/nolibc: extraconfig support
selftests/nolibc: add configuration for mipso32be
selftests/nolibc: fix testcase status alignment
selftests/nolibc: introduce QEMU_ARCH_USER
selftests/nolibc: run-tests.sh: enable testing via qemu-user
tools/nolibc: mips: add support for PIC
selftests/nolibc: make result alignment more robust
tools/nolibc: annotate va_list printf formats
tools/nolibc: drop duplicated testcase ioctl_tiocinq
tools/nolibc: drop custom definition of struct rusage
tools/nolibc: add support for getrlimit/setrlimit
selftests/nolibc: disable coredump via setrlimit
tools/include/nolibc/arch-mips.h | 11 +-
tools/include/nolibc/arch.h | 4 +-
tools/include/nolibc/stdio.h | 4 +-
tools/include/nolibc/sys.h | 38 ++++++
tools/include/nolibc/types.h | 25 +---
tools/testing/selftests/nolibc/.gitignore | 1 +
tools/testing/selftests/nolibc/Makefile | 65 ++++++++---
tools/testing/selftests/nolibc/nolibc-test.c | 51 ++++++--
tools/testing/selftests/nolibc/run-tests.sh | 169 +++++++++++++++++++++++++++
9 files changed, 318 insertions(+), 50 deletions(-)
create mode 100755 tools/testing/selftests/nolibc/run-tests.sh
----------------------------------------------------------------
Hi Linus,
Please pull the following Kselftest update for Linux 6.8-rc1.
This kselftest update for Linux 6.8-rc1 consists of enhancements
to reporting test results, fixes to root and user run behavior
and fixing ksft_print_msg() calls.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit b85ea95d086471afb4ad062012a4d73cd328fa86:
Linux 6.7-rc1 (2023-11-12 16:19:07 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-next-6.8-rc1
for you to fetch changes up to ee9793be08b1a1c29308a099c01790a3befb390a:
tracing/selftests: Add ownership modification tests for eventfs (2023-12-22 10:01:41 -0700)
----------------------------------------------------------------
linux_kselftest-next-6.8-rc1
This kselftest update for Linux 6.8-rc1 consists of enhancements
to reporting test results, fixes to root and user run behavior
and fixing ksft_print_msg() calls.
----------------------------------------------------------------
Atul Kumar Pant (1):
selftests: sched: Remove initialization to 0 for a static variable
Mark Brown (3):
kselftest/vDSO: Make test name reporting for vdso_abi_test tooling friendly
kselftest/vDSO: Fix message formatting for clock_id logging
kselftest/vDSO: Use ksft_print_msg() rather than printf in vdso_test_abi
Osama Muhammad (1):
selftests: prctl: Add prctl test for PR_GET_NAME
Steven Rostedt (Google) (1):
tracing/selftests: Add ownership modification tests for eventfs
Swarup Laxman Kotiaklapudi (1):
selftests: capabilities: namespace create varies for root and normal user
angquan yu (3):
selftests:breakpoints: Fix Format String Warning in breakpoint_test
selftests/breakpoints: Fix format specifier in ksft_print_msg in step_after_suspend_test.c
selftests:x86: Fix Format String Warnings in lam.c
.../selftests/breakpoints/breakpoint_test.c | 4 +-
.../breakpoints/step_after_suspend_test.c | 2 +-
tools/testing/selftests/capabilities/test_execve.c | 6 +-
.../ftrace/test.d/00basic/test_ownership.tc | 114 +++++++++++++++++++++
tools/testing/selftests/prctl/set-process-name.c | 32 ++++++
tools/testing/selftests/sched/cs_prctl_test.c | 2 +-
tools/testing/selftests/vDSO/vdso_test_abi.c | 72 +++++++------
tools/testing/selftests/x86/lam.c | 4 +-
8 files changed, 192 insertions(+), 44 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
----------------------------------------------------------------
Hi Linus,
Please pull the following KUnit next update for Linux 6.8-rc1.
This KUnit update for Linux 6.8-rc1 consists of:
- a new feature that adds APIs for managing devices introducing
a set of helper functions which allow devices (internally a
struct kunit_device) to be created and managed by KUnit.
These devices will be automatically unregistered on
test exit. These helpers can either use a user-provided
struct device_driver, or have one automatically created and
managed by KUnit. In both cases, the device lives on a new
kunit_bus.
- changes to switch drm/tests to use kunit devices
- several fixes and enhancements to attribute feature
- changes to reorganize deferred action function introducing
KUNIT_DEFINE_ACTION_WRAPPER
- new feature adds ability to run tests after boot using debugfs
- fixes and enhancements to string-stream-test:
- parse ERR_PTR in string_stream_destroy()
- unchecked dereference in bug fix in debugfs_print_results()
- handling errors from alloc_string_stream()
- NULL-dereference bug fix in kunit_init_suite()
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit ceb6a6f023fd3e8b07761ed900352ef574010bcb:
Linux 6.7-rc6 (2023-12-17 15:19:28 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-kunit-6.8-rc1
for you to fetch changes up to 539e582a375dedee95a4fa9ca3f37cdb25c441ec:
kunit: Fix some comments which were mistakenly kerneldoc (2024-01-03 09:10:37 -0700)
----------------------------------------------------------------
linux_kselftest-kunit-6.8-rc1
This KUnit update for Linux 6.8-rc1 consists of:
- a new feature that adds APIs for managing devices introducing
a set of helper functions which allow devices (internally a
struct kunit_device) to be created and managed by KUnit.
These devices will be automatically unregistered on
test exit. These helpers can either use a user-provided
struct device_driver, or have one automatically created and
managed by KUnit. In both cases, the device lives on a new
kunit_bus.
- changes to switch drm/tests to use kunit devices
- several fixes and enhancements to attribute feature
- changes to reorganize deferred action function introducing
KUNIT_DEFINE_ACTION_WRAPPER
- new feature adds ability to run tests after boot using debugfs
- fixes and enhancements to string-stream-test:
- parse ERR_PTR in string_stream_destroy()
- unchecked dereference in bug fix in debugfs_print_results()
- handling errors from alloc_string_stream()
- NULL-dereference bug fix in kunit_init_suite()
----------------------------------------------------------------
David Gow (4):
kunit: Add a macro to wrap a deferred action function
drm/tests: Use KUNIT_DEFINE_ACTION_WRAPPER()
drm/vc4: tests: Use KUNIT_DEFINE_ACTION_WRAPPER
kunit: Fix some comments which were mistakenly kerneldoc
Maxime Ripard (1):
drm/tests: Switch to kunit devices
Michal Wajdeczko (2):
kunit: Add example for using test->priv
kunit: Reset test->priv after each param iteration
Rae Moar (8):
kunit: tool: fix parsing of test attributes
kunit: tool: add test for parsing attributes
kunit: move KUNIT_TABLE out of INIT_DATA
kunit: add KUNIT_INIT_TABLE to init linker section
kunit: add example suite to test init suites
kunit: add is_init test attribute
kunit: add ability to run tests after boot using debugfs
Documentation: Add debugfs docs with run after boot
Richard Fitzgerald (8):
kunit: string-stream-test: Avoid cast warning when testing gfp_t flags
kunit: string-stream: Allow ERR_PTR to be passed to string_stream_destroy()
kunit: debugfs: Fix unchecked dereference in debugfs_print_results()
kunit: debugfs: Handle errors from alloc_string_stream()
kunit: Fix NULL-dereference in kunit_init_suite() if suite->log is NULL
kunit: Allow passing function pointer to kunit_activate_static_stub()
kunit: Add example of kunit_activate_static_stub() with pointer-to-function
kunit: Protect string comparisons against NULL
davidgow(a)google.com (4):
kunit: Add APIs for managing devices
fortify: test: Use kunit_device
overflow: Replace fake root_device with kunit_device
ASoC: topology: Replace fake root_device with kunit_device in tests
Documentation/dev-tools/kunit/api/resource.rst | 9 +
Documentation/dev-tools/kunit/run_manual.rst | 51 +++++-
Documentation/dev-tools/kunit/running_tips.rst | 7 +
Documentation/dev-tools/kunit/usage.rst | 60 ++++++-
drivers/gpu/drm/tests/drm_kunit_helpers.c | 78 +--------
drivers/gpu/drm/vc4/tests/vc4_mock.c | 9 +-
include/asm-generic/vmlinux.lds.h | 11 +-
include/kunit/device.h | 80 +++++++++
include/kunit/resource.h | 21 +++
include/kunit/static_stub.h | 2 +-
include/kunit/test.h | 33 ++--
include/linux/module.h | 2 +
kernel/module/main.c | 3 +
lib/fortify_kunit.c | 5 +-
lib/kunit/Makefile | 3 +-
lib/kunit/attributes.c | 60 +++++++
lib/kunit/debugfs.c | 102 +++++++++++-
lib/kunit/device-impl.h | 17 ++
lib/kunit/device.c | 181 +++++++++++++++++++++
lib/kunit/executor.c | 68 +++++++-
lib/kunit/kunit-example-test.c | 87 ++++++++++
lib/kunit/kunit-test.c | 139 +++++++++++++++-
lib/kunit/string-stream-test.c | 2 +-
lib/kunit/string-stream.c | 2 +-
lib/kunit/test.c | 48 +++++-
lib/overflow_kunit.c | 5 +-
sound/soc/soc-topology-test.c | 10 +-
tools/testing/kunit/kunit_parser.py | 4 +-
tools/testing/kunit/kunit_tool_test.py | 16 ++
.../kunit/test_data/test_parse_attributes.log | 9 +
30 files changed, 978 insertions(+), 146 deletions(-)
create mode 100644 include/kunit/device.h
create mode 100644 lib/kunit/device-impl.h
create mode 100644 lib/kunit/device.c
create mode 100644 tools/testing/kunit/test_data/test_parse_attributes.log
----------------------------------------------------------------
Changes in v4:
* Documented how to compile the livepatch selftests without running the
tests (Joe)
* Removed the mention to lib/livepatch on MAINTAINERS file, reported by
checkpatch.
Changes in v3:
* Rebased on top of v6.6-rc5
* The commits messages were improved (Thanks Petr!)
* Created TEST_GEN_MODS_DIR variable to point to a directly that contains kernel
modules, and adapt selftests to build it before running the test.
* Moved test_klp-call_getpid out of test_programs, since the gen_tar
would just copy the generated test programs to the livepatches dir,
and so scripts relying on test_programs/test_klp-call_getpid will fail.
* Added a module_param for klp_pids, describing it's usage.
* Simplified the call_getpid program to ignore the return of getpid syscall,
since we only want to make sure the process transitions correctly to the
patched stated
* The test-syscall.sh not prints a log message showing the number of remaining
processes to transition into to livepatched state, and check_output expects it
to be 0.
* Added MODULE_AUTHOR and MODULE_DESCRIPTION to test_klp_syscall.c
- Link to v3: https://lore.kernel.org/r/20231031-send-lp-kselftests-v3-0-2b1655c2605f@sus…
- Link to v2: https://lore.kernel.org/linux-kselftest/20220630141226.2802-1-mpdesouza@sus…
This patchset moves the current kernel testing livepatch modules from
lib/livepatches to tools/testing/selftest/livepatch/test_modules, and compiles
them as out-of-tree modules before testing.
There is also a new test being added. This new test exercises multiple processes
calling a syscall, while a livepatch patched the syscall.
Why this move is an improvement:
* The modules are now compiled as out-of-tree modules against the current
running kernel, making them capable of being tested on different systems with
newer or older kernels.
* Such approach now needs kernel-devel package to be installed, since they are
out-of-tree modules. These can be generated by running "make rpm-pkg" in the
kernel source.
What needs to be solved:
* Currently gen_tar only packages the resulting binaries of the tests, and not
the sources. For the current approach, the newly added modules would be
compiled and then packaged. It works when testing on a system with the same
kernel version. But it will fail when running on a machine with different kernel
version, since module was compiled against the kernel currently running.
This is not a new problem, just aligning the expectations. For the current
approach to be truly system agnostic gen_tar would need to include the module
and program sources to be compiled in the target systems.
Thanks in advance!
Marcos
Signed-off-by: Marcos Paulo de Souza <mpdesouza(a)suse.com>
---
Marcos Paulo de Souza (3):
kselftests: lib.mk: Add TEST_GEN_MODS_DIR variable
livepatch: Move tests from lib/livepatch to selftests/livepatch
selftests: livepatch: Test livepatching a heavily called syscall
Documentation/dev-tools/kselftest.rst | 4 +
MAINTAINERS | 1 -
arch/s390/configs/debug_defconfig | 1 -
arch/s390/configs/defconfig | 1 -
lib/Kconfig.debug | 22 ----
lib/Makefile | 2 -
lib/livepatch/Makefile | 14 ---
tools/testing/selftests/lib.mk | 20 +++-
tools/testing/selftests/livepatch/Makefile | 5 +-
tools/testing/selftests/livepatch/README | 25 +++--
tools/testing/selftests/livepatch/config | 1 -
tools/testing/selftests/livepatch/functions.sh | 34 +++---
.../testing/selftests/livepatch/test-callbacks.sh | 50 ++++-----
tools/testing/selftests/livepatch/test-ftrace.sh | 6 +-
.../testing/selftests/livepatch/test-livepatch.sh | 10 +-
.../selftests/livepatch/test-shadow-vars.sh | 2 +-
tools/testing/selftests/livepatch/test-state.sh | 18 ++--
tools/testing/selftests/livepatch/test-syscall.sh | 53 ++++++++++
tools/testing/selftests/livepatch/test-sysfs.sh | 6 +-
.../selftests/livepatch/test_klp-call_getpid.c | 44 ++++++++
.../selftests/livepatch/test_modules/Makefile | 20 ++++
.../test_modules}/test_klp_atomic_replace.c | 0
.../test_modules}/test_klp_callbacks_busy.c | 0
.../test_modules}/test_klp_callbacks_demo.c | 0
.../test_modules}/test_klp_callbacks_demo2.c | 0
.../test_modules}/test_klp_callbacks_mod.c | 0
.../livepatch/test_modules}/test_klp_livepatch.c | 0
.../livepatch/test_modules}/test_klp_shadow_vars.c | 0
.../livepatch/test_modules}/test_klp_state.c | 0
.../livepatch/test_modules}/test_klp_state2.c | 0
.../livepatch/test_modules}/test_klp_state3.c | 0
.../livepatch/test_modules/test_klp_syscall.c | 116 +++++++++++++++++++++
32 files changed, 334 insertions(+), 121 deletions(-)
---
base-commit: 206ed72d6b33f53b2a8bf043f54ed6734121d26b
change-id: 20231031-send-lp-kselftests-4c917dcd4565
Best regards,
--
Marcos Paulo de Souza <mpdesouza(a)suse.com>
Minor consistency fixes.
They clear a couple of compiler format string warnings.
[1/2] is the fix of an obvious typo in the format specifier
[2/2] is securing the print function against spurious format specifiers
in passed paramater string
Mirsad Todorovac (2):
selftest: breakpoints: fix a minor typo in the format string
selftest: breakpoints: clear the format string warning and secure the
output
tools/testing/selftests/breakpoints/breakpoint_test.c | 4 ++--
tools/testing/selftests/breakpoints/step_after_suspend_test.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
--
2.40.1
Minor fixes of compiler warnings and one bug in the number of parameters which
would not crash the test but it is better fixed for correctness sake.
As the general climate in the Linux kernel community is to fix all compiler
warnings, this could be on the right track, even if only in the testing suite.
Changelog:
v1 -> v2:
- Compared to v1, commit subject lines have been adjusted to reflect the style
of the subsystem, as suggested by Mark.
- 1/4 was already acked and unchanged (adjusted the subject line as suggested)
(code unchanged)
- 2/4 was acked with suggestion to adjust the subject line (done).
(code unchanged)
- 3/4 The format specifier was changed from %d to %u as suggested.
- The 4/4 submitted for review (in the v1 it was delayed by an omission).
(code unchanged)
Mirsad Todorovac (4):
kselftest/alsa - mixer-test: fix the number of parameters to
ksft_exit_fail_msg()
kselftest/alsa - mixer-test: Fix the print format specifier warning
kselftest/alsa - mixer-test: Fix the print format specifier warning
kselftest/alsa - conf: Stringify the printed errno in sysfs_get()
tools/testing/selftests/alsa/conf.c | 2 +-
tools/testing/selftests/alsa/mixer-test.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
--
2.40.1
Some aarch64 systems running a PREEMPT_RT patched kernel, needs
more time to complete the test.
This change mirrors:
commit ba83af059153 ("Improve stability of find_vma BPF test")
addressing similar requirements and allowing the QTI SA8775P based
systems, and others, to complete the test when running RT kernel.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
---
tools/testing/selftests/bpf/prog_tests/find_vma.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/find_vma.c b/tools/testing/selftests/bpf/prog_tests/find_vma.c
index 5165b38f0e59..43d62db8d57b 100644
--- a/tools/testing/selftests/bpf/prog_tests/find_vma.c
+++ b/tools/testing/selftests/bpf/prog_tests/find_vma.c
@@ -51,7 +51,7 @@ static void test_find_vma_pe(struct find_vma *skel)
struct bpf_link *link = NULL;
volatile int j = 0;
int pfd, i;
- const int one_bn = 1000000000;
+ const int dummy_wait = 2500000000;
pfd = open_pe();
if (pfd < 0) {
@@ -68,10 +68,10 @@ static void test_find_vma_pe(struct find_vma *skel)
if (!ASSERT_OK_PTR(link, "attach_perf_event"))
goto cleanup;
- for (i = 0; i < one_bn && find_vma_pe_condition(skel); ++i)
+ for (i = 0; i < dummy_wait && find_vma_pe_condition(skel); ++i)
++j;
- test_and_reset_skel(skel, -EBUSY /* in nmi, irq_work is busy */, i == one_bn);
+ test_and_reset_skel(skel, -EBUSY /* in nmi, irq_work is busy */, i == dummy_wait);
cleanup:
bpf_link__destroy(link);
close(pfd);
--
2.34.1
Now that we have the VISIBLE_IF_KUNIT and EXPORT_SYMBOL_IF_KUNIT macros,
update the instructions to stop recommending including .c files.
Signed-off-by: Arthur Grillo <arthurgrillo(a)riseup.net>
---
Changes in v2:
- Fix #if condition
- Link to v1: https://lore.kernel.org/r/20240108-kunit-doc-export-v1-1-119368df0d96@riseu…
---
Documentation/dev-tools/kunit/usage.rst | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst
index c27e1646ecd9..f095c6bb76ff 100644
--- a/Documentation/dev-tools/kunit/usage.rst
+++ b/Documentation/dev-tools/kunit/usage.rst
@@ -671,19 +671,22 @@ Testing Static Functions
------------------------
If we do not want to expose functions or variables for testing, one option is to
-conditionally ``#include`` the test file at the end of your .c file. For
-example:
+conditionally export the used symbol.
.. code-block:: c
/* In my_file.c */
- static int do_interesting_thing();
+ VISIBLE_IF_KUNIT int do_interesting_thing();
+ EXPORT_SYMBOL_IF_KUNIT(do_interesting_thing);
- #ifdef CONFIG_MY_KUNIT_TEST
- #include "my_kunit_test.c"
+ /* In my_file.h */
+
+ #if IS_ENABLED(CONFIG_KUNIT)
+ int do_interesting_thing(void);
#endif
+
Injecting Test-Only Code
------------------------
---
base-commit: eeb8e8d9f124f279e80ae679f4ba6e822ce4f95f
change-id: 20240108-kunit-doc-export-eec1f910ab67
Best regards,
--
Arthur Grillo <arthurgrillo(a)riseup.net>
Commit 2810c1e99867 ("kunit: Fix wild-memory-access bug in
kunit_free_suite_set()") fixed a wild-memory-access bug that could have
happened during the loading phase of test suites built and executed as
loadable modules. However, it also introduced a problematic side effect
that causes test suites modules to crash when they attempt to register
fake devices.
When a module is loaded, it traverses the MODULE_STATE_UNFORMED and
MODULE_STATE_COMING states before reaching the normal operating state
MODULE_STATE_LIVE. Finally, when the module is removed, it moves to
MODULE_STATE_GOING before being released. However, if the loading
function load_module() fails between complete_formation() and
do_init_module(), the module goes directly from MODULE_STATE_COMING to
MODULE_STATE_GOING without passing through MODULE_STATE_LIVE.
This behavior was causing kunit_module_exit() to be called without
having first executed kunit_module_init(). Since kunit_module_exit() is
responsible for freeing the memory allocated by kunit_module_init()
through kunit_filter_suites(), this behavior was resulting in a
wild-memory-access bug.
Commit 2810c1e99867 ("kunit: Fix wild-memory-access bug in
kunit_free_suite_set()") fixed this issue by running the tests when the
module is still in MODULE_STATE_COMING. However, modules in that state
are not fully initialized, lacking sysfs kobjects. Therefore, if a test
module attempts to register a fake device, it will inevitably crash.
This patch proposes a different approach to fix the original
wild-memory-access bug while restoring the normal module execution flow
by making kunit_module_exit() able to detect if kunit_module_init() has
previously initialized the tests suite set. In this way, test modules
can once again register fake devices without crashing.
This behavior is achieved by checking whether mod->kunit_suites is a
virtual or direct mapping address. If it is a virtual address, then
kunit_module_init() has allocated the suite_set in kunit_filter_suites()
using kmalloc_array(). On the contrary, if mod->kunit_suites is still
pointing to the original address that was set when looking up the
.kunit_test_suites section of the module, then the loading phase has
failed and there's no memory to be freed.
v3:
- add a comment to clarify why the start address is checked
v2:
- add include <linux/mm.h>
Fixes: 2810c1e99867 ("kunit: Fix wild-memory-access bug in kunit_free_suite_set()")
Tested-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
Reviewed-by: Javier Martinez Canillas <javierm(a)redhat.com>
Signed-off-by: Marco Pagani <marpagan(a)redhat.com>
---
lib/kunit/test.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index 7aceb07a1af9..3263e0d5e0f6 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -16,6 +16,7 @@
#include <linux/panic.h>
#include <linux/sched/debug.h>
#include <linux/sched.h>
+#include <linux/mm.h>
#include "debugfs.h"
#include "hooks-impl.h"
@@ -775,12 +776,19 @@ static void kunit_module_exit(struct module *mod)
};
const char *action = kunit_action();
+ /*
+ * Check if the start address is a valid virtual address to detect
+ * if the module load sequence has failed and the suite set has not
+ * been initialized and filtered.
+ */
+ if (!suite_set.start || !virt_addr_valid(suite_set.start))
+ return;
+
if (!action)
__kunit_test_suites_exit(mod->kunit_suites,
mod->num_kunit_suites);
- if (suite_set.start)
- kunit_free_suite_set(suite_set);
+ kunit_free_suite_set(suite_set);
}
static int kunit_module_notify(struct notifier_block *nb, unsigned long val,
@@ -790,12 +798,12 @@ static int kunit_module_notify(struct notifier_block *nb, unsigned long val,
switch (val) {
case MODULE_STATE_LIVE:
+ kunit_module_init(mod);
break;
case MODULE_STATE_GOING:
kunit_module_exit(mod);
break;
case MODULE_STATE_COMING:
- kunit_module_init(mod);
break;
case MODULE_STATE_UNFORMED:
break;
base-commit: 33cc938e65a98f1d29d0a18403dbbee050dcad9a
--
2.43.0
This is the second part to add Intel VT-d nested translation based on IOMMUFD
nesting infrastructure. As the iommufd nesting infrastructure series [1],
iommu core supports new ops to invalidate the cache after the modifictions
in stage-1 page table. So far, the cache invalidation data is vendor specific,
the data_type (IOMMU_HWPT_DATA_VTD_S1) defined for the vendor specific HWPT
allocation is reused in the cache invalidation path. User should provide the
correct data_type that suit with the type used in HWPT allocation.
IOMMU_HWPT_INVALIDATE iotcl returns an error in @out_driver_error_code. However
Intel VT-d does not define error code so far, so it's not easy to pre-define it
in iommufd neither. As a result, this field should just be ignored on VT-d platform.
Complete code can be found in [2], corresponding QEMU could can be found in [3].
[1] https://lore.kernel.org/linux-iommu/20231117130717.19875-1-yi.l.liu@intel.c…
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1
Change log:
v7:
- No much change, just rebase on top of 6.7-rc1
v6: https://lore.kernel.org/linux-iommu/20231020093719.18725-1-yi.l.liu@intel.c…
- Address comments from Kevin
- Split the VT-d nesting series into two parts (Jason)
v5: https://lore.kernel.org/linux-iommu/20230921075431.125239-1-yi.l.liu@intel.…
- Add Kevin's r-b for patch 2, 3 ,5 8, 10
- Drop enforce_cache_coherency callback from the nested type domain ops (Kevin)
- Remove duplicate agaw check in patch 04 (Kevin)
- Remove duplicate domain_update_iommu_cap() in patch 06 (Kevin)
- Check parent's force_snooping to set pgsnp in the pasid entry (Kevin)
- uapi data structure check (Kevin)
- Simplify the errata handling as user can allocate nested parent domain
v4: https://lore.kernel.org/linux-iommu/20230724111335.107427-1-yi.l.liu@intel.…
- Remove ascii art tables (Jason)
- Drop EMT (Tina, Jason)
- Drop MTS and related definitions (Kevin)
- Rename macro IOMMU_VTD_PGTBL_ to IOMMU_VTD_S1_ (Kevin)
- Rename struct iommu_hwpt_intel_vtd_ to iommu_hwpt_vtd_ (Kevin)
- Rename struct iommu_hwpt_intel_vtd to iommu_hwpt_vtd_s1 (Kevin)
- Put the vendor specific hwpt alloc data structure before enuma iommu_hwpt_type (Kevin)
- Do not trim the higher page levels of S2 domain in nested domain attachment as the
S2 domain may have been used independently. (Kevin)
- Remove the first-stage pgd check against the maximum address of s2_domain as hw
can check it anyhow. It makes sense to check every pfns used in the stage-1 page
table. But it cannot make it. So just leave it to hw. (Kevin)
- Split the iotlb flush part into an order of uapi, helper and callback implementation (Kevin)
- Change the policy of VT-d nesting errata, disallow RO mapping once a domain is used
as parent domain of a nested domain. This removes the nested_users counting. (Kevin)
- Minor fix for "make htmldocs"
v3: https://lore.kernel.org/linux-iommu/20230511145110.27707-1-yi.l.liu@intel.c…
- Further split the patches into an order of adding helpers for nested
domain, iotlb flush, nested domain attachment and nested domain allocation
callback, then report the hw_info to userspace.
- Add batch support in cache invalidation from userspace
- Disallow nested translation usage if RO mappings exists in stage-2 domain
due to errata on readonly mappings on Sapphire Rapids platform.
v2: https://lore.kernel.org/linux-iommu/20230309082207.612346-1-yi.l.liu@intel.…
- The iommufd infrastructure is split to be separate series.
v1: https://lore.kernel.org/linux-iommu/20230209043153.14964-1-yi.l.liu@intel.c…
Regards,
Yi Liu
Yi Liu (3):
iommufd: Add data structure for Intel VT-d stage-1 cache invalidation
iommu/vt-d: Make iotlb flush helpers to be extern
iommu/vt-d: Add iotlb flush for nested domain
drivers/iommu/intel/iommu.c | 10 +++----
drivers/iommu/intel/iommu.h | 6 ++++
drivers/iommu/intel/nested.c | 54 ++++++++++++++++++++++++++++++++++++
include/uapi/linux/iommufd.h | 36 ++++++++++++++++++++++++
4 files changed, 101 insertions(+), 5 deletions(-)
--
2.34.1
Now that we have the VISIBLE_IF_KUNIT and EXPORT_SYMBOL_IF_KUNIT macros,
update the instructions to stop recommending including .c files.
Signed-off-by: Arthur Grillo <arthurgrillo(a)riseup.net>
---
Documentation/dev-tools/kunit/usage.rst | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst
index c27e1646ecd9..7410b39ec5b7 100644
--- a/Documentation/dev-tools/kunit/usage.rst
+++ b/Documentation/dev-tools/kunit/usage.rst
@@ -671,19 +671,22 @@ Testing Static Functions
------------------------
If we do not want to expose functions or variables for testing, one option is to
-conditionally ``#include`` the test file at the end of your .c file. For
-example:
+conditionally export the used symbol.
.. code-block:: c
/* In my_file.c */
- static int do_interesting_thing();
+ VISIBLE_IF_KUNIT int do_interesting_thing();
+ EXPORT_SYMBOL_IF_KUNIT(do_interesting_thing);
+
+ /* In my_file.h */
#ifdef CONFIG_MY_KUNIT_TEST
- #include "my_kunit_test.c"
+ int do_interesting_thing(void);
#endif
+
Injecting Test-Only Code
------------------------
---
base-commit: eeb8e8d9f124f279e80ae679f4ba6e822ce4f95f
change-id: 20240108-kunit-doc-export-eec1f910ab67
Best regards,
--
Arthur Grillo <arthurgrillo(a)riseup.net>
Nested translation is a hardware feature that is supported by many modern
IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
to get access to the physical address. stage-1 translation table is owned
by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
to stage-1 translation table should be followed by an IOTLB invalidation.
Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.
.-------------. .---------------------------.
| vIOMMU | | Guest I/O page table |
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush --+
'-------------' |
| | V
| | I/O page table pointer in GPA
'-------------'
Guest
------| Shadow |---------------------------|--------
v v v
Host
.-------------. .------------------------.
| pIOMMU | | FS for GIOVA->GPA |
| | '------------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.----------------------------------.
| | | SS for GPA->HPA, unmanaged domain|
| | '----------------------------------'
'-------------'
Where:
- FS = First stage page tables
- SS = Second stage page tables
<Intel VT-d Nested translation>
This series adds the cache invalidation path for the userspace to invalidate
cache after modifying the stage-1 page table. This is based on the first part
of nesting [1]
Complete code can be found in [2], QEMU could can be found in [3].
At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks
them for the help. ^_^. Look forward to your feedbacks.
[1] https://lore.kernel.org/linux-iommu/20231026044216.64964-1-yi.l.liu@intel.c… - merged
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1
Change log:
v6:
- No much change, just rebase on top of 6.7-rc1 as part 1/2 is merged
v5: https://lore.kernel.org/linux-iommu/20231020092426.13907-1-yi.l.liu@intel.c…
- Split the iommufd nesting series into two parts of alloc_user and
invalidation (Jason)
- Split IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING/_NESTED, and
do the same with the structures/alloc()/abort()/destroy(). Reworked the
selftest accordingly too. (Jason)
- Move hwpt/data_type into struct iommu_user_data from standalone op
arguments. (Jason)
- Rename hwpt_type to be data_type, the HWPT_TYPE to be HWPT_ALLOC_DATA,
_TYPE_DEFAULT to be _ALLOC_DATA_NONE (Jason, Kevin)
- Rename iommu_copy_user_data() to iommu_copy_struct_from_user() (Kevin)
- Add macro to the iommu_copy_struct_from_user() to calculate min_size
(Jason)
- Fix two bugs spotted by ZhaoYan
v4: https://lore.kernel.org/linux-iommu/20230921075138.124099-1-yi.l.liu@intel.…
- Separate HWPT alloc/destroy/abort functions between user-managed HWPTs
and kernel-managed HWPTs
- Rework invalidate uAPI to be a multi-request array-based design
- Add a struct iommu_user_data_array and a helper for driver to sanitize
and copy the entry data from user space invalidation array
- Add a patch fixing TEST_LENGTH() in selftest program
- Drop IOMMU_RESV_IOVA_RANGES patches
- Update kdoc and inline comments
- Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation,
this does not change the rule that resv regions should only be added to the
kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series
as it is needed only by SMMU so far.
v3: https://lore.kernel.org/linux-iommu/20230724110406.107212-1-yi.l.liu@intel.…
- Add new uAPI things in alphabetical order
- Pass in "enum iommu_hwpt_type hwpt_type" to op->domain_alloc_user for
sanity, replacing the previous op->domain_alloc_user_data_len solution
- Return ERR_PTR from domain_alloc_user instead of NULL
- Only add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation (Kevin)
- Add IOMMU_RESV_IOVA_RANGES to report resv iova ranges to userspace hence
userspace is able to exclude the ranges in the stage-1 HWPT (e.g. guest I/O
page table). (Kevin)
- Add selftest coverage for the new IOMMU_RESV_IOVA_RANGES ioctl
- Minor changes per Kevin's inputs
v2: https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.c…
- Add union iommu_domain_user_data to include all user data structures to avoid
passing void * in kernel APIs.
- Add iommu op to return user data length for user domain allocation
- Rename struct iommu_hwpt_alloc::data_type to be hwpt_type
- Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len
- Convert cache_invalidate_user op to be int instead of void
- Remove @data_type in struct iommu_hwpt_invalidate
- Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1
v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.…
Thanks,
Yi Liu
Lu Baolu (1):
iommu: Add cache_invalidate_user op
Nicolin Chen (4):
iommu: Add iommu_copy_struct_from_user_array helper
iommufd/selftest: Add mock_domain_cache_invalidate_user support
iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
Yi Liu (1):
iommufd: Add IOMMU_HWPT_INVALIDATE
drivers/iommu/iommufd/hw_pagetable.c | 35 ++++++++
drivers/iommu/iommufd/iommufd_private.h | 9 ++
drivers/iommu/iommufd/iommufd_test.h | 22 +++++
drivers/iommu/iommufd/main.c | 3 +
drivers/iommu/iommufd/selftest.c | 69 +++++++++++++++
include/linux/iommu.h | 84 +++++++++++++++++++
include/uapi/linux/iommufd.h | 35 ++++++++
tools/testing/selftests/iommu/iommufd.c | 75 +++++++++++++++++
tools/testing/selftests/iommu/iommufd_utils.h | 63 ++++++++++++++
9 files changed, 395 insertions(+)
--
2.34.1
Minor fixes of compiler warnings and one bug in the number of parameters which
would not crash the test but it is better fixed for correctness sake.
As the general climate in the Linux kernel community is to fix all compiler
warnings, this could be on the right track, even if only in the testing suite.
Mirsad Todorovac (4):
kselftest: alsa: fix the number of parameters to ksft_exit_fail_msg()
kselftest: alsa: Fix the printf format specifier in call to
ksft_print_msg()
ksellftest: alsa: Fix the printf format specifier to unsigned int
selftests: alsa: Fix the exit error message parameter in sysfs_get()
tools/testing/selftests/alsa/conf.c | 2 +-
tools/testing/selftests/alsa/mixer-test.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
--
2.40.1
In particular, fcnal-test.sh timed out on slower hardware after
some new permutations of tests were added.
This single test ran for almost an hour instead of the expected
25 min (1500s). 75 minutes should suffice for most systems.
Cc: David Ahern <dsahern(a)kernel.org>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Paolo Abeni <pabeni(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: netdev(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
---
tools/testing/selftests/net/settings | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/settings b/tools/testing/selftests/net/settings
index dfc27cdc6c05..ed8418e8217a 100644
--- a/tools/testing/selftests/net/settings
+++ b/tools/testing/selftests/net/settings
@@ -1 +1 @@
-timeout=1500
+timeout=4500
--
2.40.1
Hi, all,
The default timeout for tools/testing/selftest/net groups of tests is 1500s (25m).
This is less than half of what is required to run the full fcnal-test.sh on my hardware
(53m48s).
With the timeout adjusted, tests passed 914 of 914 OK.
Best regards,
Mirsad Todorovac
diff --git a/tools/testing/selftests/net/settings b/tools/testing/selftests/net/settings
index dfc27cdc6c05..ed8418e8217a 100644
--- a/tools/testing/selftests/net/settings
+++ b/tools/testing/selftests/net/settings
@@ -1 +1 @@
-timeout=1500
+timeout=3600
-----------------------------------------------------------------
[snip]
#################################################################
Ping LLA with multiple interfaces
TEST: Pre cycle, ping out ns-B - multicast IP [ OK ]
TEST: Pre cycle, ping out ns-C - multicast IP [ OK ]
TEST: Post cycle ns-A eth1, ping out ns-B - multicast IP [ OK ]
TEST: Post cycle ns-A eth1, ping out ns-C - multicast IP [ OK ]
TEST: Post cycle ns-A eth2, ping out ns-B - multicast IP [ OK ]
TEST: Post cycle ns-A eth2, ping out ns-C - multicast IP [ OK ]
#################################################################
SNAT on VRF
TEST: IPv4 TCP connection over VRF with SNAT [ OK ]
TEST: IPv6 TCP connection over VRF with SNAT [ OK ]
Tests passed: 914
Tests failed: 0
real 53m48.460s
user 0m32.885s
sys 2m41.509s
root@hostname:/
--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union
"I see something approaching fast ... Will it be friends with me?"
Hi, all,
There is a minor omission in selftests/alsa/conf.c, returning errno where there is
strerror(errno) passed in the sibling calls to printf().
The bug was apparently introduced with the commit aba51cd0949ae
("selftests: alsa - add PCM test").
As a diff speaks like a thousand words, the fix is simple:
Regards,
Mirsad
----- cut -----
diff --git a/tools/testing/selftests/alsa/conf.c b/tools/testing/selftests/alsa/conf.c
index 00925eb8d9f4..89e3656a042d 100644
--- a/tools/testing/selftests/alsa/conf.c
+++ b/tools/testing/selftests/alsa/conf.c
@@ -179,7 +179,7 @@ static char *sysfs_get(const char *sysfs_root, const char *id)
close(fd);
if (len < 0)
ksft_exit_fail_msg("sysfs: unable to read value '%s': %s\n",
- path, errno);
+ path, strerror(errno));
while (len > 0 && path[len-1] == '\n')
len--;
path[len] = '\0';
--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union
"I see something approaching fast ... Will it be friends with me?"
=== Description ===
This is a bpf-treewide change that annotates all kfuncs as such inside
.BTF_ids. This annotation eventually allows us to automatically generate
kfunc prototypes from bpftool.
We store this metadata inside a yet-unused flags field inside struct
btf_id_set8 (thanks Kumar!). pahole will be taught where to look.
More details about the full chain of events are available in commit 3's
description.
The accompanying pahole changes (still needs some cleanup) can be viewed
here on this "frozen" branch [0].
[0]: https://github.com/danobi/pahole/tree/kfunc_btf-mailed
=== Changelog ===
Changes from v1:
* Move WARN_ON() up a call level
* Also return error when kfunc set is not properly tagged
* Use BTF_KFUNCS_START/END instead of flags
* Rename BTF_SET8_KFUNC to BTF_SET8_KFUNCS
Daniel Xu (3):
bpf: btf: Support flags for BTF_SET8 sets
bpf: btf: Add BTF_KFUNCS_START/END macro pair
bpf: treewide: Annotate BPF kfuncs in BTF
drivers/hid/bpf/hid_bpf_dispatch.c | 8 +++----
fs/verity/measure.c | 4 ++--
include/linux/btf_ids.h | 21 +++++++++++++++----
kernel/bpf/btf.c | 4 ++++
kernel/bpf/cpumask.c | 4 ++--
kernel/bpf/helpers.c | 8 +++----
kernel/bpf/map_iter.c | 4 ++--
kernel/cgroup/rstat.c | 4 ++--
kernel/trace/bpf_trace.c | 8 +++----
net/bpf/test_run.c | 8 +++----
net/core/filter.c | 16 +++++++-------
net/core/xdp.c | 4 ++--
net/ipv4/bpf_tcp_ca.c | 4 ++--
net/ipv4/fou_bpf.c | 4 ++--
net/ipv4/tcp_bbr.c | 4 ++--
net/ipv4/tcp_cubic.c | 4 ++--
net/ipv4/tcp_dctcp.c | 4 ++--
net/netfilter/nf_conntrack_bpf.c | 4 ++--
net/netfilter/nf_nat_bpf.c | 4 ++--
net/xfrm/xfrm_interface_bpf.c | 4 ++--
net/xfrm/xfrm_state_bpf.c | 4 ++--
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 8 +++----
22 files changed, 77 insertions(+), 60 deletions(-)
--
2.42.1
The expression "source ../lib.sh" added to net/forwarding/lib.sh in commit
25ae948b4478 ("selftests/net: add lib.sh") does not work for tests outside
net/forwarding which source net/forwarding/lib.sh (1). It also does not
work in some cases where only a subset of tests are exported (2).
Avoid the problems mentioned above by replacing the faulty expression with
a copy of the content from net/lib.sh which is used by files under
net/forwarding.
A more thorough solution which avoids duplicating content between
net/lib.sh and net/forwarding/lib.sh has been posted here:
https://lore.kernel.org/netdev/20231222135836.992841-1-bpoirier@nvidia.com/
The approach in the current patch is a stopgap solution to avoid submitting
large changes at the eleventh hour of this development cycle.
Example of problem 1)
tools/testing/selftests/drivers/net/bonding$ ./dev_addr_lists.sh
./net_forwarding_lib.sh: line 41: ../lib.sh: No such file or directory
TEST: bonding cleanup mode active-backup [ OK ]
TEST: bonding cleanup mode 802.3ad [ OK ]
TEST: bonding LACPDU multicast address to slave (from bond down) [ OK ]
TEST: bonding LACPDU multicast address to slave (from bond up) [ OK ]
An error message is printed but since the test does not use functions from
net/lib.sh, the test results are not affected.
Example of problem 2)
tools/testing/selftests$ make install TARGETS="net/forwarding"
tools/testing/selftests$ cd kselftest_install/net/forwarding/
tools/testing/selftests/kselftest_install/net/forwarding$ ./pedit_ip.sh veth{0..3}
lib.sh: line 41: ../lib.sh: No such file or directory
TEST: ping [ OK ]
TEST: ping6 [ OK ]
./pedit_ip.sh: line 135: busywait: command not found
TEST: dev veth1 ingress pedit ip src set 198.51.100.1 [FAIL]
Expected to get 10 packets, but got .
./pedit_ip.sh: line 135: busywait: command not found
TEST: dev veth2 egress pedit ip src set 198.51.100.1 [FAIL]
Expected to get 10 packets, but got .
./pedit_ip.sh: line 135: busywait: command not found
TEST: dev veth1 ingress pedit ip dst set 198.51.100.1 [FAIL]
Expected to get 10 packets, but got .
./pedit_ip.sh: line 135: busywait: command not found
TEST: dev veth2 egress pedit ip dst set 198.51.100.1 [FAIL]
Expected to get 10 packets, but got .
./pedit_ip.sh: line 135: busywait: command not found
TEST: dev veth1 ingress pedit ip6 src set 2001:db8:2::1 [FAIL]
Expected to get 10 packets, but got .
./pedit_ip.sh: line 135: busywait: command not found
TEST: dev veth2 egress pedit ip6 src set 2001:db8:2::1 [FAIL]
Expected to get 10 packets, but got .
./pedit_ip.sh: line 135: busywait: command not found
TEST: dev veth1 ingress pedit ip6 dst set 2001:db8:2::1 [FAIL]
Expected to get 10 packets, but got .
./pedit_ip.sh: line 135: busywait: command not found
TEST: dev veth2 egress pedit ip6 dst set 2001:db8:2::1 [FAIL]
Expected to get 10 packets, but got .
In this case, the test results are affected.
Fixes: 25ae948b4478 ("selftests/net: add lib.sh")
Suggested-by: Ido Schimmel <idosch(a)nvidia.com>
Suggested-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Ido Schimmel <idosch(a)nvidia.com>
Tested-by: Petr Machata <petrm(a)nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier(a)nvidia.com>
---
tools/testing/selftests/net/forwarding/lib.sh | 27 ++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 69ef2a40df21..8a61464ab6eb 100755
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -38,7 +38,32 @@ if [[ -f $relative_path/forwarding.config ]]; then
source "$relative_path/forwarding.config"
fi
-source ../lib.sh
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+busywait()
+{
+ local timeout=$1; shift
+
+ local start_time="$(date -u +%s%3N)"
+ while true
+ do
+ local out
+ out=$("$@")
+ local ret=$?
+ if ((!ret)); then
+ echo -n "$out"
+ return 0
+ fi
+
+ local current_time="$(date -u +%s%3N)"
+ if ((current_time - start_time > timeout)); then
+ echo -n "$out"
+ return 1
+ fi
+ done
+}
+
##############################################################################
# Sanity checks
--
2.43.0
This series attempts to reduce the parsing overhead of IPv6 extension
headers in GRO and GSO, by removing extension header specific code and
enabling the frag0 fast path.
The following changes were made:
- Removed some unnecessary HBH conditionals by adding HBH offload
to inet6_offloads
- Added a utility function to support frag0 fast path in ipv6_gro_receive
- Added selftests for IPv6 packets with extension headers in GRO
Richard
v2 -> v3:
- Removed previously added IPv6 extension header length constant and
using sizeof(*opth) instead.
- Removed unnecessary conditional in gro selftest framework
- v2:
https://lore.kernel.org/netdev/127b8199-1cd4-42d7-9b2b-875abaad93fe@gmail.c…
v1 -> v2:
- Added a minimum IPv6 extension header length constant to make code self
documenting.
- Added new selftest which checks that packets with different extension
header payloads do not coalesce.
- Added more info in the second commit message regarding the code changes.
- v1:
https://lore.kernel.org/netdev/f4eff69d-3917-4c42-8c6b-d09597ac4437@gmail.c…
Richard Gobert (3):
net: gso: add HBH extension header offload support
net: gro: parse ipv6 ext headers without frag0 invalidation
selftests/net: fix GRO coalesce test and add ext header coalesce tests
net/ipv6/exthdrs_offload.c | 11 ++++
net/ipv6/ip6_offload.c | 76 +++++++++++++++++--------
tools/testing/selftests/net/gro.c | 93 +++++++++++++++++++++++++++++--
3 files changed, 150 insertions(+), 30 deletions(-)
--
2.36.1
Hi,
for this v3 I changed the approach for identifying devices in a stable
way from the match fields back to the hardware topology (used in v1).
The match fields were proposed as a way to avoid the possible issue of
PCI topology being reconfigured, but that wasn't observed on any real
system so far. However using match fields does allow for a real issue if
an external device similar to an internal one is connected to the
system, which results in a change of the match count and therefore a
test failure. So using the HW topology was chosen as the most reliable
approach.
The per-platform device description file now uses YAML following a
suggestion from Chris Obbard, and the test script was re-written in
python to handle the new YAML format.
A second sample board file is also now included for an x86 platform,
which contains an USB controller behind a PCI controller, which wasn't
possible to describe in v1.
Thanks,
Nícolas
v2: https://lore.kernel.org/all/20231127233558.868365-1-nfraprado@collabora.com
v1: https://lore.kernel.org/all/20231024211818.365844-1-nfraprado@collabora.com
Original cover letter:
This is part of an effort to improve detection of regressions impacting
device probe on all platforms. The recently merged DT kselftest [3]
detects probe issues for all devices described statically in the DT.
That leaves out devices discovered at run-time from discoverable busses.
This is where this test comes in. All of the devices that are connected
through discoverable busses (ie USB and PCI), and which are internal and
therefore always present, can be described in a per-platform file so
they can be checked for. The test will check that the device has been
instantiated and bound to a driver.
Patch 1 introduces the test. Patch 2 and 3 add the device definitions
for the google,spherion machine (Acer Chromebook 514) and XPS 13 as
examples.
This is the output from the test running on Spherion:
TAP version 13
Using board file: boards/google,spherion.yaml
1..8
ok 1 /usb2-controller(a)11200000/1.4.1/camera.device
ok 2 /usb2-controller(a)11200000/1.4.1/camera.0.driver
ok 3 /usb2-controller(a)11200000/1.4.1/camera.1.driver
ok 4 /usb2-controller(a)11200000/1.4.2/bluetooth.device
ok 5 /usb2-controller(a)11200000/1.4.2/bluetooth.0.driver
ok 6 /usb2-controller(a)11200000/1.4.2/bluetooth.1.driver
ok 7 /pci-controller(a)11230000/0.0/0.0/wifi.device
ok 8 /pci-controller(a)11230000/0.0/0.0/wifi.driver
Totals: pass:8 fail:0 xfail:0 xpass:0 skip:0 error:0
[3] https://lore.kernel.org/all/20230828211424.2964562-1-nfraprado@collabora.co…
Changes in v3:
- Reverted approach of encoding stable device reference in test file
from device match fields (from modalias) back to HW topology (from v1)
- Changed board file description to YAML
- Rewrote test script in python to handle YAML and support x86 platforms
Changes in v2:
- Changed approach of encoding stable device reference in test file from
HW topology to device match fields (the ones from modalias)
- Better documented test format
Nícolas F. R. A. Prado (3):
kselftest: Add test to verify probe of devices from discoverable
busses
kselftest: devices: Add sample board file for google,spherion
kselftest: devices: Add sample board file for XPS 13 9300
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/devices/Makefile | 4 +
.../devices/boards/Dell Inc.,XPS 13 9300.yaml | 40 +++
.../devices/boards/google,spherion.yaml | 50 +++
tools/testing/selftests/devices/ksft.py | 90 +++++
.../devices/test_discoverable_devices.py | 318 ++++++++++++++++++
6 files changed, 503 insertions(+)
create mode 100644 tools/testing/selftests/devices/Makefile
create mode 100644 tools/testing/selftests/devices/boards/Dell Inc.,XPS 13 9300.yaml
create mode 100644 tools/testing/selftests/devices/boards/google,spherion.yaml
create mode 100644 tools/testing/selftests/devices/ksft.py
create mode 100755 tools/testing/selftests/devices/test_discoverable_devices.py
--
2.43.0
Add NULL checks to KUNIT_BINARY_STR_ASSERTION() so that it will fail
cleanly if either pointer is NULL, instead of causing a NULL pointer
dereference in the strcmp().
A test failure could be that a string is unexpectedly NULL. This could
be trapped by KUNIT_ASSERT_NOT_NULL() but that would terminate the test
at that point. It's preferable that the KUNIT_EXPECT_STR*() macros can
handle NULL pointers as a failure.
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
---
include/kunit/test.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index b163b9984b33..c2ce379c329b 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -758,7 +758,7 @@ do { \
.right_text = #right, \
}; \
\
- if (likely(strcmp(__left, __right) op 0)) \
+ if (likely((__left) && (__right) && (strcmp(__left, __right) op 0))) \
break; \
\
\
--
2.30.2
The RISC-V arch_timer selftests is used to validate Sstc timer
functionality in a guest, which sets up periodic timer interrupts
and check the basic interrupt status upon its receipt.
This KVM selftests was ported from aarch64 arch_timer and tested
with Linux v6.7-rc4 on a Qemu riscv64 virt machine.
---
Changed since v4:
* Rebased to Linux 6.7-rc4
* Included Paolo's patch(01/11) to fix issues with SPLIT_TESTS
* Droped the patch(KVM: selftests: Unify the makefile rule for split targets)
since Paolo's patch had included the fix
* Added new patch(05/11) to include header file vdso/processor.h from linux
source tree to leverage the cpu_relax() definition - Conor/Andrew
* Added new patch(11/11) to enable user configuration of timer error margin
parameter which alleviate the intermitent failure in stress test - Andrew
* Other minor fixes per Andrew's comments
Haibo Xu (10):
KVM: arm64: selftests: Split arch_timer test code
KVM: selftests: Add CONFIG_64BIT definition for the build
tools: riscv: Add header file csr.h
tools: riscv: Add header file vdso/processor.h
KVM: riscv: selftests: Switch to use macro from csr.h
KVM: riscv: selftests: Add exception handling support
KVM: riscv: selftests: Add guest helper to get vcpu id
KVM: riscv: selftests: Change vcpu_has_ext to a common function
KVM: riscv: selftests: Add sstc timer test
KVM: selftests: Enable tunning of err_margin_us in arch timer test
Paolo Bonzini (1):
selftests/kvm: Fix issues with $(SPLIT_TESTS)
tools/arch/riscv/include/asm/csr.h | 521 ++++++++++++++++++
tools/arch/riscv/include/asm/vdso/processor.h | 32 ++
tools/testing/selftests/kvm/Makefile | 27 +-
.../selftests/kvm/aarch64/arch_timer.c | 295 +---------
tools/testing/selftests/kvm/arch_timer.c | 259 +++++++++
.../selftests/kvm/include/aarch64/processor.h | 4 -
.../selftests/kvm/include/kvm_util_base.h | 9 +
.../selftests/kvm/include/riscv/arch_timer.h | 71 +++
.../selftests/kvm/include/riscv/processor.h | 65 ++-
.../testing/selftests/kvm/include/test_util.h | 2 +
.../selftests/kvm/include/timer_test.h | 45 ++
.../selftests/kvm/lib/riscv/handlers.S | 101 ++++
.../selftests/kvm/lib/riscv/processor.c | 87 +++
.../testing/selftests/kvm/riscv/arch_timer.c | 111 ++++
.../selftests/kvm/riscv/get-reg-list.c | 11 +-
15 files changed, 1333 insertions(+), 307 deletions(-)
create mode 100644 tools/arch/riscv/include/asm/csr.h
create mode 100644 tools/arch/riscv/include/asm/vdso/processor.h
create mode 100644 tools/testing/selftests/kvm/arch_timer.c
create mode 100644 tools/testing/selftests/kvm/include/riscv/arch_timer.h
create mode 100644 tools/testing/selftests/kvm/include/timer_test.h
create mode 100644 tools/testing/selftests/kvm/lib/riscv/handlers.S
create mode 100644 tools/testing/selftests/kvm/riscv/arch_timer.c
--
2.34.1
The patch set [1] added a general lib.sh in net selftests, and converted
several test scripts to source the lib.sh.
unicast_extensions.sh (converted in [1]) and pmtu.sh (converted in [2])
have a /bin/sh shebang which may point to various shells in different
distributions, but "source" is only available in some of them. For
example, "source" is a built-it function in bash, but it cannot be
used in dash.
Refer to other scripts that were converted together, simply change the
shebang to bash to fix the following issues when the default /bin/sh
points to other shells.
# selftests: net: unicast_extensions.sh
# ./unicast_extensions.sh: 31: source: not found
# ###########################################################################
# Unicast address extensions tests (behavior of reserved IPv4 addresses)
# ###########################################################################
# TEST: assign and ping within 240/4 (1 of 2) (is allowed) [FAIL]
# TEST: assign and ping within 240/4 (2 of 2) (is allowed) [FAIL]
# TEST: assign and ping within 0/8 (1 of 2) (is allowed) [FAIL]
# TEST: assign and ping within 0/8 (2 of 2) (is allowed) [FAIL]
# TEST: assign and ping inside 255.255/16 (is allowed) [FAIL]
# TEST: assign and ping inside 255.255.255/24 (is allowed) [FAIL]
# TEST: route between 240.5.6/24 and 255.1.2/24 (is allowed) [FAIL]
# TEST: route between 0.200/16 and 245.99/16 (is allowed) [FAIL]
# TEST: assign and ping lowest address (/24) [FAIL]
# TEST: assign and ping lowest address (/26) [FAIL]
# TEST: routing using lowest address [FAIL]
# TEST: assigning 0.0.0.0 (is forbidden) [ OK ]
# TEST: assigning 255.255.255.255 (is forbidden) [ OK ]
# TEST: assign and ping inside 127/8 (is forbidden) [ OK ]
# TEST: assign and ping class D address (is forbidden) [ OK ]
# TEST: routing using class D (is forbidden) [ OK ]
# TEST: routing using 127/8 (is forbidden) [ OK ]
not ok 51 selftests: net: unicast_extensions.sh # exit=1
v1 -> v2:
- Fix pmtu.sh which has the same issue as unicast_extensions.sh,
suggested by Hangbin
- Change the style of the "source" line to be consistent with other
tests, suggested by Hangbin
Link: https://lore.kernel.org/all/20231202020110.362433-1-liuhangbin@gmail.com/ [1]
Link: https://lore.kernel.org/all/20231219094856.1740079-1-liuhangbin@gmail.com/ [2]
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Signed-off-by: Yujie Liu <yujie.liu(a)intel.com>
---
tools/testing/selftests/net/pmtu.sh | 4 ++--
tools/testing/selftests/net/unicast_extensions.sh | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index 175d3d1d773b..f10879788f61 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
#
# Check that route PMTU values match expectations, and that initial device MTU
@@ -198,7 +198,7 @@
# - pmtu_ipv6_route_change
# Same as above but with IPv6
-source ./lib.sh
+source lib.sh
PAUSE_ON_FAIL=no
VERBOSE=0
diff --git a/tools/testing/selftests/net/unicast_extensions.sh b/tools/testing/selftests/net/unicast_extensions.sh
index b7a2cb9e7477..f52aa5f7da52 100755
--- a/tools/testing/selftests/net/unicast_extensions.sh
+++ b/tools/testing/selftests/net/unicast_extensions.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
#
# By Seth Schoen (c) 2021, for the IPv4 Unicast Extensions Project
@@ -28,7 +28,7 @@
# These tests provide an easy way to flip the expected result of any
# of these behaviors for testing kernel patches that change them.
-source ./lib.sh
+source lib.sh
# nettest can be run from PATH or from same directory as this selftest
if ! which nettest >/dev/null; then
base-commit: cd4d7263d58ab98fd4dee876776e4da6c328faa3
--
2.34.1
This series attempts to reduce the parsing overhead of IPv6 extension
headers in GRO and GSO, by removing extension header specific code and
enabling the frag0 fast path.
The following changes were made:
- Removed some unnecessary HBH conditionals by adding HBH offload
to inet6_offloads
- Added a utility function to support frag0 fast path in ipv6_gro_receive
- Added selftests for IPv6 packets with extension headers in GRO
Richard
v1 -> v2:
- Added a minimum IPv6 extension header length constant to make code self
documenting.
- Added new selftest which checks that packets with different extension
header payloads do not coalesce.
- Added more info in the second commit message regarding the code changes.
- v1:
https://lore.kernel.org/netdev/f4eff69d-3917-4c42-8c6b-d09597ac4437@gmail.c…
Richard Gobert (3):
net: gso: add HBH extension header offload support
net: gro: parse ipv6 ext headers without frag0 invalidation
selftests/net: fix GRO coalesce test and add ext header coalesce tests
include/net/ipv6.h | 1 +
net/ipv6/exthdrs_offload.c | 11 ++++
net/ipv6/ip6_offload.c | 76 +++++++++++++++++--------
tools/testing/selftests/net/gro.c | 94 +++++++++++++++++++++++++++++--
4 files changed, 152 insertions(+), 30 deletions(-)
--
2.36.1
Nested translation is a hardware feature that is supported by many modern
IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
to get access to the physical address. stage-1 translation table is owned
by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
to stage-1 translation table should be followed by an IOTLB invalidation.
Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.
.-------------. .---------------------------.
| vIOMMU | | Guest I/O page table |
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush --+
'-------------' |
| | V
| | I/O page table pointer in GPA
'-------------'
Guest
------| Shadow |---------------------------|--------
v v v
Host
.-------------. .------------------------.
| pIOMMU | | FS for GIOVA->GPA |
| | '------------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.----------------------------------.
| | | SS for GPA->HPA, unmanaged domain|
| | '----------------------------------'
'-------------'
Where:
- FS = First stage page tables
- SS = Second stage page tables
<Intel VT-d Nested translation>
This series is based on the first part which was merged [1], this series is to
add the cache invalidation interface or the userspace to invalidate cache after
modifying the stage-1 page table. This includes both the iommufd changes and the
VT-d driver changes.
Complete code can be found in [2], QEMU could can be found in [3].
At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks
them for the help. ^_^. Look forward to your feedbacks.
[1] https://lore.kernel.org/linux-iommu/20231026044216.64964-1-yi.l.liu@intel.c… - merged
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1
Change log:
v10:
- Minor tweak to patch 07 (Kevin)
- Rebase on top of 6.7-rc8
v9: https://lore.kernel.org/linux-iommu/20231228150629.13149-1-yi.l.liu@intel.c…
- Add a test case which sets both IOMMU_TEST_INVALIDATE_FLAG_ALL and
IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR in flags, and expect to succeed
and see an 'error'. (Kevin)
- Returns -ETIMEOUT in qi_check_fault() if caller is interested with the
fault when timeout happens. If not, the qi_submit_sync() will keep retry
hence unable to report the error back to user. For now, only the user cache
invalidation path has interest on the time out error. So this change only
affects the user cache invalidation path. Other path will still hang in
qi_submit_sync() when timeout happens. (Kevin)
v8: https://lore.kernel.org/linux-iommu/20231227161354.67701-1-yi.l.liu@intel.c…
- Pass invalidation hint to the cache invalidation helper in the cache_invalidate_user
op path (Kevin)
- Move the devTLB invalidation out of info->iommu loop (Kevin, Weijiang)
- Clear *fault per restart in qi_submit_sync() to avoid acroos submission error
accumulation. (Kevin)
- Define the vtd cache invalidation uapi structure in separate patch (Kevin)
- Rename inv_error to be hw_error (Kevin)
- Rename 'reqs_uptr', 'req_type', 'req_len' and 'req_num' to be 'data_uptr',
'data_type', "entry_len' and 'entry_num" (Kevin)
- Allow user to set IOMMU_TEST_INVALIDATE_FLAG_ALL and IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR
in the same time (Kevin)
v7: https://lore.kernel.org/linux-iommu/20231221153948.119007-1-yi.l.liu@intel.…
- Remove domain->ops->cache_invalidate_user check in hwpt alloc path due
to failure in bisect (Baolu)
- Remove out_driver_error_code from struct iommu_hwpt_invalidate after
discussion in v6. Should expect per-entry error code.
- Rework the selftest cache invalidation part to report a per-entry error
- Allow user to pass in an empty array to have a try-and-fail mechanism for
user to check if a given req_type is supported by the kernel (Jason)
- Define a separate enum type for cache invalidation data (Jason)
- Fix the IOMMU_HWPT_INVALIDATE to always update the req_num field before
returning (Nicolin)
- Merge the VT-d nesting part 2/2
https://lore.kernel.org/linux-iommu/20231117131816.24359-1-yi.l.liu@intel.c…
into this series to avoid defining empty enum in the middle of the series.
The major difference is adding the VT-d related invalidation uapi structures
together with the generic data structures in patch 02 of this series.
- VT-d driver was refined to report ICE/ITE error from the bottom cache
invalidation submit helpers, hence the cache_invalidate_user op could
report such errors via the per-entry error field to user. VT-d driver
will not stop the invalidation array walking due to the ICE/ITE errors
as such errors are defined by VT-d spec, userspace should be able to
handle it and let the real user (say Virtual Machine) know about it.
But for other errors like invalid uapi data structure configuration,
memory copy failure, such errors should stop the array walking as it
may have more issues if go on.
- Minor fixes per Jason and Kevin's review comments
v6: https://lore.kernel.org/linux-iommu/20231117130717.19875-1-yi.l.liu@intel.c…
- No much change, just rebase on top of 6.7-rc1 as part 1/2 is merged
v5: https://lore.kernel.org/linux-iommu/20231020092426.13907-1-yi.l.liu@intel.c…
- Split the iommufd nesting series into two parts of alloc_user and
invalidation (Jason)
- Split IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING/_NESTED, and
do the same with the structures/alloc()/abort()/destroy(). Reworked the
selftest accordingly too. (Jason)
- Move hwpt/data_type into struct iommu_user_data from standalone op
arguments. (Jason)
- Rename hwpt_type to be data_type, the HWPT_TYPE to be HWPT_ALLOC_DATA,
_TYPE_DEFAULT to be _ALLOC_DATA_NONE (Jason, Kevin)
- Rename iommu_copy_user_data() to iommu_copy_struct_from_user() (Kevin)
- Add macro to the iommu_copy_struct_from_user() to calculate min_size
(Jason)
- Fix two bugs spotted by ZhaoYan
v4: https://lore.kernel.org/linux-iommu/20230921075138.124099-1-yi.l.liu@intel.…
- Separate HWPT alloc/destroy/abort functions between user-managed HWPTs
and kernel-managed HWPTs
- Rework invalidate uAPI to be a multi-request array-based design
- Add a struct iommu_user_data_array and a helper for driver to sanitize
and copy the entry data from user space invalidation array
- Add a patch fixing TEST_LENGTH() in selftest program
- Drop IOMMU_RESV_IOVA_RANGES patches
- Update kdoc and inline comments
- Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation,
this does not change the rule that resv regions should only be added to the
kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series
as it is needed only by SMMU so far.
v3: https://lore.kernel.org/linux-iommu/20230724110406.107212-1-yi.l.liu@intel.…
- Add new uAPI things in alphabetical order
- Pass in "enum iommu_hwpt_type hwpt_type" to op->domain_alloc_user for
sanity, replacing the previous op->domain_alloc_user_data_len solution
- Return ERR_PTR from domain_alloc_user instead of NULL
- Only add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation (Kevin)
- Add IOMMU_RESV_IOVA_RANGES to report resv iova ranges to userspace hence
userspace is able to exclude the ranges in the stage-1 HWPT (e.g. guest I/O
page table). (Kevin)
- Add selftest coverage for the new IOMMU_RESV_IOVA_RANGES ioctl
- Minor changes per Kevin's inputs
v2: https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.c…
- Add union iommu_domain_user_data to include all user data structures to avoid
passing void * in kernel APIs.
- Add iommu op to return user data length for user domain allocation
- Rename struct iommu_hwpt_alloc::data_type to be hwpt_type
- Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len
- Convert cache_invalidate_user op to be int instead of void
- Remove @data_type in struct iommu_hwpt_invalidate
- Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1
v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.…
Thanks,
Yi Liu
Lu Baolu (4):
iommu: Add cache_invalidate_user op
iommu/vt-d: Allow qi_submit_sync() to return the QI faults
iommu/vt-d: Convert stage-1 cache invalidation to return QI fault
iommu/vt-d: Add iotlb flush for nested domain
Nicolin Chen (4):
iommu: Add iommu_copy_struct_from_user_array helper
iommufd/selftest: Add mock_domain_cache_invalidate_user support
iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
Yi Liu (2):
iommufd: Add IOMMU_HWPT_INVALIDATE
iommufd: Add data structure for Intel VT-d stage-1 cache invalidation
drivers/iommu/intel/dmar.c | 42 ++--
drivers/iommu/intel/iommu.c | 12 +-
drivers/iommu/intel/iommu.h | 8 +-
drivers/iommu/intel/irq_remapping.c | 2 +-
drivers/iommu/intel/nested.c | 107 ++++++++++
drivers/iommu/intel/pasid.c | 14 +-
drivers/iommu/intel/svm.c | 14 +-
drivers/iommu/iommufd/hw_pagetable.c | 41 ++++
drivers/iommu/iommufd/iommufd_private.h | 10 +
drivers/iommu/iommufd/iommufd_test.h | 39 ++++
drivers/iommu/iommufd/main.c | 3 +
drivers/iommu/iommufd/selftest.c | 86 ++++++++
include/linux/iommu.h | 100 +++++++++
include/uapi/linux/iommufd.h | 101 ++++++++++
tools/testing/selftests/iommu/iommufd.c | 190 ++++++++++++++++++
tools/testing/selftests/iommu/iommufd_utils.h | 57 ++++++
16 files changed, 787 insertions(+), 39 deletions(-)
--
2.34.1
For now, we have to call some helpers when we need to update the csum,
such as bpf_l4_csum_replace, bpf_l3_csum_replace, etc. These helpers are
not inlined, which causes poor performance.
In fact, we can define our own csum update functions in BPF program
instead of bpf_l3_csum_replace, which is totally inlined and efficient.
However, we can't do this for bpf_l4_csum_replace for now, as we can't
update skb->csum, which can cause skb->csum invalid in the rx path with
CHECKSUM_COMPLETE mode.
What's more, we can't use the direct data access and have to use
skb_store_bytes() with the BPF_F_RECOMPUTE_CSUM flag in some case, such
as modifing the vni in the vxlan header and the underlay udp header has
no checksum.
In the first patch, we make skb->csum readable and writable, and we make
skb->ip_summed readable. For now, for tc only. With these 2 fields, we
don't need to call bpf helpers for csum update any more.
In the second patch, we add some testcases for the read/write testing for
skb->csum and skb->ip_summed.
If this series is acceptable, we can define the inlined functions for csum
update in libbpf in the next step.
Menglong Dong (2):
bpf: add csum/ip_summed fields to __sk_buff
testcases/bpf: add testcases for skb->csum to ctx_skb.c
include/linux/skbuff.h | 2 +
include/uapi/linux/bpf.h | 2 +
net/core/filter.c | 22 ++++++++++
tools/include/uapi/linux/bpf.h | 2 +
.../testing/selftests/bpf/verifier/ctx_skb.c | 43 +++++++++++++++++++
5 files changed, 71 insertions(+)
--
2.39.2
While testing the split PMD path with lockdep enabled I've got an
"Invalid wait context" error caused by split_huge_page_to_list() trying
to lock anon_vma->rwsem while inside RCU read section. The issues is due
to move_pages_pte() calling split_folio() under RCU read lock. Fix this
by unmapping the PTEs and exiting RCU read section before splitting the
folio and then retrying. The same retry pattern is used when locking the
folio or anon_vma in this function. After splitting the large folio we
unlock and release it because after the split the old folio might not be
the one that contains the src_addr.
Fixes: 94b01c885131 ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
---
Changes from v1 [1]:
1. Reset src_folio and src_folio_pte after folio is split, per Peter Xu
[1] https://lore.kernel.org/all/20231230025607.2476912-1-surenb@google.com/
mm/userfaultfd.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 5e718014e671..216ab4c8621f 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1078,9 +1078,18 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
/* at this point we have src_folio locked */
if (folio_test_large(src_folio)) {
+ /* split_folio() can block */
+ pte_unmap(&orig_src_pte);
+ pte_unmap(&orig_dst_pte);
+ src_pte = dst_pte = NULL;
err = split_folio(src_folio);
if (err)
goto out;
+ /* have to reacquire the folio after it got split */
+ folio_unlock(src_folio);
+ folio_put(src_folio);
+ src_folio = NULL;
+ goto retry;
}
if (!src_anon_vma) {
--
2.43.0.472.g3155946c3a-goog
While testing the split PMD path with lockdep enabled I've got an
"Invalid wait context" error caused by split_huge_page_to_list() trying
to lock anon_vma->rwsem while inside RCU read section. The issues is due
to move_pages_pte() calling split_folio() under RCU read lock. Fix this
by unmapping the PTEs and exiting RCU read section before splitting the
folio and then retrying. The same retry pattern is used when locking the
folio or anon_vma in this function.
Fixes: 94b01c885131 ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
---
Patch applies over mm-unstable.
Please note that the SHA in Fixes tag is unstable.
mm/userfaultfd.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 5e718014e671..71393410e028 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1078,9 +1078,14 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
/* at this point we have src_folio locked */
if (folio_test_large(src_folio)) {
+ /* split_folio() can block */
+ pte_unmap(&orig_src_pte);
+ pte_unmap(&orig_dst_pte);
+ src_pte = dst_pte = NULL;
err = split_folio(src_folio);
if (err)
goto out;
+ goto retry;
}
if (!src_anon_vma) {
--
2.43.0.472.g3155946c3a-goog
From: Roberto Sassu <roberto.sassu(a)huawei.com>
IMA and EVM are not effectively LSMs, especially due to the fact that in
the past they could not provide a security blob while there is another LSM
active.
That changed in the recent years, the LSM stacking feature now makes it
possible to stack together multiple LSMs, and allows them to provide a
security blob for most kernel objects. While the LSM stacking feature has
some limitations being worked out, it is already suitable to make IMA and
EVM as LSMs.
The main purpose of this patch set is to remove IMA and EVM function calls,
hardcoded in the LSM infrastructure and other places in the kernel, and to
register them as LSM hook implementations, so that those functions are
called by the LSM infrastructure like other regular LSMs.
This patch set introduces two new LSMs 'ima' and 'evm', so that functions
can be registered to their respective LSM, and removes the 'integrity' LSM.
integrity_kernel_module_request() was moved to IMA, since it was related to
appraisal. integrity_inode_free() was replaced with
ima_inode_free_security() (EVM does not need to free memory).
In order to make 'ima' and 'evm' independent LSMs, it was necessary to
split integrity metadata used by both IMA and EVM, and to let them manage
their own. The special case of the IMA_NEW_FILE flag, managed by IMA and
used by EVM, was handled by introducing a new flag in EVM, EVM_NEW_FILE,
managed by two additional LSM hooks, evm_post_path_mknod() and
evm_file_free(), equivalent to their counterparts ima_post_path_mknod() and
ima_file_free().
In addition to splitting metadata, it was decided to embed the full
structure into the inode security blob, rather than using a cache of
objects and allocating them on demand. This opens for new possibilities,
such as improving locking in IMA.
Another follow-up change was removing the iint parameter from
evm_verifyxattr(), that IMA used to pass integrity metadata to EVM. After
splitting metadata, and aligning EVM_NEW_FILE with IMA_NEW_FILE, this
parameter was not necessary anymore.
The last part was to ensure that the order of IMA and EVM functions is
respected after they become LSMs. Since the order of lsm_info structures in
the .lsm_info.init section depends on the order object files containing
those structures are passed to the linker of the kernel image, and since
IMA is before EVM in the Makefile, that is sufficient to assert that IMA
functions are executed before EVM ones.
The patch set is organized as follows.
Patches 1-9 make IMA and EVM functions suitable to be registered to the LSM
infrastructure, by aligning function parameters.
Patches 10-18 add new LSM hooks in the same places where IMA and EVM
functions are called, if there is no LSM hook already.
Patches 19-21 introduce the new standalone LSMs 'ima' and 'evm', and move
hardcoded calls to IMA, EVM and integrity functions to those LSMs.
Patches 22-23 remove the dependency on the 'integrity' LSM by splitting
integrity metadata, so that the 'ima' and 'evm' LSMs can use their own.
They also duplicate iint_lockdep_annotate() in ima_main.c, since the mutex
field was moved from integrity_iint_cache to ima_iint_cache.
Patch 24 finally removes the 'integrity' LSM, since 'ima' and 'evm' are now
self-contained and independent.
The patch set applies on top of lsm/dev, commit 80b4ff1d2c9b ("selftests:
remove the LSM_ID_IMA check in lsm/lsm_list_modules_test"). The
linux-integrity/next-integrity-testing at commit f17167bea279 ("ima: Remove
EXPERIMENTAL from Kconfig") was merged.
Changelog:
v7:
- Use return instead of goto in __vfs_removexattr_locked() (suggested by
Casey)
- Clarify in security/integrity/Makefile that the order of 'ima' and 'evm'
LSMs depends on the order in which IMA and EVM are compiled
- Move integrity_iint_cache flags to ima.h and evm.h in security/ and
duplicate IMA_NEW_FILE to EVM_NEW_FILE
- Rename evm_inode_get_iint() to evm_iint_inode() and ima_inode_get_iint()
to ima_iint_inode(), check if inode->i_security is NULL, and just return
the pointer from the inode security blob
- Restore the non-NULL checks after ima_iint_inode() and evm_iint_inode()
(suggested by Casey)
- Introduce evm_file_free() to clear EVM_NEW_FILE
- Remove comment about LSM_ORDER_LAST not guaranteeing the order of 'ima'
and 'evm' LSMs
- Lock iint->mutex before reading IMA_COLLECTED flag in __ima_inode_hash()
and restored ima_policy_flag check
- Remove patch about the hardcoded ordering of 'ima' and 'evm' LSMs in
security.c
- Add missing ima_inode_free_security() to free iint->ima_hash
- Add the cases for LSM_ID_IMA and LSM_ID_EVM in lsm_list_modules_test.c
- Mention about the change in IMA and EVM post functions for private
inodes
v6:
- See v7
v5:
- Rename security_file_pre_free() to security_file_release() and the LSM
hook file_pre_free_security to file_release (suggested by Paul)
- Move integrity_kernel_module_request() to ima_main.c (renamed to
ima_kernel_module_request())
- Split the integrity_iint_cache structure into ima_iint_cache and
evm_iint_cache, so that IMA and EVM can use disjoint metadata and
reserve space with the LSM infrastructure
- Reserve space for the entire ima_iint_cache and evm_iint_cache
structures, not just the pointer (suggested by Paul)
- Introduce ima_inode_get_iint() and evm_inode_get_iint() to retrieve
respectively the ima_iint_cache and evm_iint_cache structure from the
security blob
- Remove the various non-NULL checks for the ima_iint_cache and
evm_iint_cache structures, since the LSM infrastructure ensure that they
always exist
- Remove the iint parameter from evm_verifyxattr() since IMA and EVM
use disjoint integrity metaddata
- Introduce the evm_post_path_mknod() to set the IMA_NEW_FILE flag
- Register the inode_alloc_security LSM hook in IMA and EVM to
initialize the respective integrity metadata structures
- Remove the 'integrity' LSM completely and instead make 'ima' and 'evm'
proper standalone LSMs
- Add the inode parameter to ima_get_verity_digest(), since the inode
field is not present in ima_iint_cache
- Move iint_lockdep_annotate() to ima_main.c (renamed to
ima_iint_lockdep_annotate())
- Remove ima_get_lsm_id() and evm_get_lsm_id(), since IMA and EVM directly
register the needed LSM hooks
- Enforce 'ima' and 'evm' LSM ordering at LSM infrastructure level
v4:
- Improve short and long description of
security_inode_post_create_tmpfile(), security_inode_post_set_acl(),
security_inode_post_remove_acl() and security_file_post_open()
(suggested by Mimi)
- Improve commit message of 'ima: Move to LSM infrastructure' (suggested
by Mimi)
v3:
- Drop 'ima: Align ima_post_path_mknod() definition with LSM
infrastructure' and 'ima: Align ima_post_create_tmpfile() definition
with LSM infrastructure', define the new LSM hooks with the same
IMA parameters instead (suggested by Mimi)
- Do IS_PRIVATE() check in security_path_post_mknod() and
security_inode_post_create_tmpfile() on the new inode rather than the
parent directory (in the post method it is available)
- Don't export ima_file_check() (suggested by Stefan)
- Remove redundant check of file mode in ima_post_path_mknod() (suggested
by Mimi)
- Mention that ima_post_path_mknod() is now conditionally invoked when
CONFIG_SECURITY_PATH=y (suggested by Mimi)
- Mention when a LSM hook will be introduced in the IMA/EVM alignment
patches (suggested by Mimi)
- Simplify the commit messages when introducing a new LSM hook
- Still keep the 'extern' in the function declaration, until the
declaration is removed (suggested by Mimi)
- Improve documentation of security_file_pre_free()
- Register 'ima' and 'evm' as standalone LSMs (suggested by Paul)
- Initialize the 'ima' and 'evm' LSMs from 'integrity', to keep the
original ordering of IMA and EVM functions as when they were hardcoded
- Return the IMA and EVM LSM IDs to 'integrity' for registration of the
integrity-specific hooks
- Reserve an xattr slot from the 'evm' LSM instead of 'integrity'
- Pass the LSM ID to init_ima_appraise_lsm()
v2:
- Add description for newly introduced LSM hooks (suggested by Casey)
- Clarify in the description of security_file_pre_free() that actions can
be performed while the file is still open
v1:
- Drop 'evm: Complete description of evm_inode_setattr()', 'fs: Fix
description of vfs_tmpfile()' and 'security: Introduce LSM_ORDER_LAST',
they were sent separately (suggested by Christian Brauner)
- Replace dentry with file descriptor parameter for
security_inode_post_create_tmpfile()
- Introduce mode_stripped and pass it as mode argument to
security_path_mknod() and security_path_post_mknod()
- Use goto in do_mknodat() and __vfs_removexattr_locked() (suggested by
Mimi)
- Replace __lsm_ro_after_init with __ro_after_init
- Modify short description of security_inode_post_create_tmpfile() and
security_inode_post_set_acl() (suggested by Stefan)
- Move security_inode_post_setattr() just after security_inode_setattr()
(suggested by Mimi)
- Modify short description of security_key_post_create_or_update()
(suggested by Mimi)
- Add back exported functions ima_file_check() and
evm_inode_init_security() respectively to ima.h and evm.h (reported by
kernel robot)
- Remove extern from prototype declarations and fix style issues
- Remove unnecessary include of linux/lsm_hooks.h in ima_main.c and
ima_appraise.c
Roberto Sassu (24):
ima: Align ima_inode_post_setattr() definition with LSM infrastructure
ima: Align ima_file_mprotect() definition with LSM infrastructure
ima: Align ima_inode_setxattr() definition with LSM infrastructure
ima: Align ima_inode_removexattr() definition with LSM infrastructure
ima: Align ima_post_read_file() definition with LSM infrastructure
evm: Align evm_inode_post_setattr() definition with LSM infrastructure
evm: Align evm_inode_setxattr() definition with LSM infrastructure
evm: Align evm_inode_post_setxattr() definition with LSM
infrastructure
security: Align inode_setattr hook definition with EVM
security: Introduce inode_post_setattr hook
security: Introduce inode_post_removexattr hook
security: Introduce file_post_open hook
security: Introduce file_release hook
security: Introduce path_post_mknod hook
security: Introduce inode_post_create_tmpfile hook
security: Introduce inode_post_set_acl hook
security: Introduce inode_post_remove_acl hook
security: Introduce key_post_create_or_update hook
ima: Move to LSM infrastructure
ima: Move IMA-Appraisal to LSM infrastructure
evm: Move to LSM infrastructure
evm: Make it independent from 'integrity' LSM
ima: Make it independent from 'integrity' LSM
integrity: Remove LSM
fs/attr.c | 5 +-
fs/file_table.c | 3 +-
fs/namei.c | 12 +-
fs/nfsd/vfs.c | 3 +-
fs/open.c | 1 -
fs/posix_acl.c | 5 +-
fs/xattr.c | 9 +-
include/linux/evm.h | 111 +-------
include/linux/fs.h | 2 -
include/linux/ima.h | 142 ----------
include/linux/integrity.h | 27 --
include/linux/lsm_hook_defs.h | 20 +-
include/linux/security.h | 59 ++++
include/uapi/linux/lsm.h | 2 +
security/integrity/Makefile | 1 +
security/integrity/digsig_asymmetric.c | 23 --
security/integrity/evm/evm.h | 19 ++
security/integrity/evm/evm_crypto.c | 4 +-
security/integrity/evm/evm_main.c | 195 ++++++++++---
security/integrity/iint.c | 197 +------------
security/integrity/ima/ima.h | 120 +++++++-
security/integrity/ima/ima_api.c | 15 +-
security/integrity/ima/ima_appraise.c | 64 +++--
security/integrity/ima/ima_init.c | 2 +-
security/integrity/ima/ima_main.c | 201 +++++++++++---
security/integrity/ima/ima_policy.c | 2 +-
security/integrity/integrity.h | 80 +-----
security/keys/key.c | 10 +-
security/security.c | 261 +++++++++++-------
security/selinux/hooks.c | 3 +-
security/smack/smack_lsm.c | 4 +-
.../selftests/lsm/lsm_list_modules_test.c | 6 +
32 files changed, 783 insertions(+), 825 deletions(-)
--
2.34.1
This MIB counter is similar to the one of TCP -- CurrEstab -- available
in /proc/net/snmp. This is useful to quickly list the number of MPTCP
connections without having to iterate over all of them.
Patch 1 prepares its support by adding new helper functions:
- MPTCP_DEC_STATS(): similar to MPTCP_INC_STATS(), but this time to
decrement a counter.
- mptcp_set_state(): similar to tcp_set_state(), to change the state of
an MPTCP socket, and to inc/decrement the new counter when needed.
Patch 2 uses mptcp_set_state() instead of directly calling
inet_sk_state_store() to change the state of MPTCP sockets.
Patch 3 and 4 validate the new feature in MPTCP "join" and "diag"
selftests.
Signed-off-by: Matthieu Baerts <matttbe(a)kernel.org>
---
Geliang Tang (4):
mptcp: add CurrEstab MIB counter support
mptcp: use mptcp_set_state
selftests: mptcp: join: check CURRESTAB counters
selftests: mptcp: diag: check CURRESTAB counters
net/mptcp/mib.c | 1 +
net/mptcp/mib.h | 8 ++++
net/mptcp/pm_netlink.c | 5 +++
net/mptcp/protocol.c | 56 ++++++++++++++++---------
net/mptcp/protocol.h | 1 +
net/mptcp/subflow.c | 2 +-
tools/testing/selftests/net/mptcp/diag.sh | 17 +++++++-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 46 +++++++++++++++++---
8 files changed, 110 insertions(+), 26 deletions(-)
---
base-commit: 56794e5358542b7c652f202946e53bfd2373b5e0
change-id: 20231221-upstream-net-next-20231221-mptcp-currestab-5a2867b4020b
Best regards,
--
Matthieu Baerts <matttbe(a)kernel.org>
suite->log must be checked for NULL before passing it to
string_stream_clear(). This was done in kunit_init_test() but was missing
from kunit_init_suite().
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
Fixes: 6d696c4695c5 ("kunit: add ability to run tests after boot using debugfs")
---
lib/kunit/test.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index e803d998e855..ea7f0913e55a 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -658,7 +658,9 @@ static void kunit_init_suite(struct kunit_suite *suite)
kunit_debugfs_create_suite(suite);
suite->status_comment[0] = '\0';
suite->suite_init_err = 0;
- string_stream_clear(suite->log);
+
+ if (suite->log)
+ string_stream_clear(suite->log);
}
bool kunit_enabled(void)
--
2.30.2
This makes the uevent selftests build not write to the source tree
unconditionally, as that breaks out of tree builds when the source tree
is read-only. It also avoids leaving a git repository in a dirty state
after a build.
v2: drop spurious extra SPDX-License-Identifier
Signed-off-by: Antonio Terceiro <antonio.terceiro(a)linaro.org>
---
tools/testing/selftests/uevent/Makefile | 15 +++------------
1 file changed, 3 insertions(+), 12 deletions(-)
diff --git a/tools/testing/selftests/uevent/Makefile b/tools/testing/selftests/uevent/Makefile
index f7baa9aa2932..872969f42694 100644
--- a/tools/testing/selftests/uevent/Makefile
+++ b/tools/testing/selftests/uevent/Makefile
@@ -1,17 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
all:
-include ../lib.mk
-
-.PHONY: all clean
-
-BINARIES := uevent_filtering
-CFLAGS += -Wl,-no-as-needed -Wall
+CFLAGS += -Wl,-no-as-needed -Wall $(KHDR_INCLUDES)
-uevent_filtering: uevent_filtering.c ../kselftest.h ../kselftest_harness.h
- $(CC) $(CFLAGS) $< -o $@
+TEST_GEN_PROGS = uevent_filtering
-TEST_PROGS += $(BINARIES)
-EXTRA_CLEAN := $(BINARIES)
-
-all: $(BINARIES)
+include ../lib.mk
--
2.43.0
Nested translation is a hardware feature that is supported by many modern
IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
to get access to the physical address. stage-1 translation table is owned
by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
to stage-1 translation table should be followed by an IOTLB invalidation.
Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.
.-------------. .---------------------------.
| vIOMMU | | Guest I/O page table |
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush --+
'-------------' |
| | V
| | I/O page table pointer in GPA
'-------------'
Guest
------| Shadow |---------------------------|--------
v v v
Host
.-------------. .------------------------.
| pIOMMU | | FS for GIOVA->GPA |
| | '------------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.----------------------------------.
| | | SS for GPA->HPA, unmanaged domain|
| | '----------------------------------'
'-------------'
Where:
- FS = First stage page tables
- SS = Second stage page tables
<Intel VT-d Nested translation>
This series is based on the first part which was merged [1], this series is to
add the cache invalidation interface or the userspace to invalidate cache after
modifying the stage-1 page table. This includes both the iommufd changes and the
VT-d driver changes.
Complete code can be found in [2], QEMU could can be found in [3].
At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks
them for the help. ^_^. Look forward to your feedbacks.
[1] https://lore.kernel.org/linux-iommu/20231026044216.64964-1-yi.l.liu@intel.c… - merged
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1
Change log:
v9:
- Add a test case which sets both IOMMU_TEST_INVALIDATE_FLAG_ALL and
IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR in flags, and expect to succeed
and see an 'error'. (Kevin)
- Returns -ETIMEOUT in qi_check_fault() if caller is interested with the
fault when timeout happens. If not, the qi_submit_sync() will keep retry
hence unable to report the error back to user. For now, only the user cache
invalidation path has interest on the time out error. So this change only
affects the user cache invalidation path. Other path will still hang in
qi_submit_sync() when timeout happens. (Kevin)
v8: https://lore.kernel.org/linux-iommu/20231227161354.67701-1-yi.l.liu@intel.c…
- Pass invalidation hint to the cache invalidation helper in the cache_invalidate_user
op path (Kevin)
- Move the devTLB invalidation out of info->iommu loop (Kevin, Weijiang)
- Clear *fault per restart in qi_submit_sync() to avoid acroos submission error
accumulation. (Kevin)
- Define the vtd cache invalidation uapi structure in separate patch (Kevin)
- Rename inv_error to be hw_error (Kevin)
- Rename 'reqs_uptr', 'req_type', 'req_len' and 'req_num' to be 'data_uptr',
'data_type', "entry_len' and 'entry_num" (Kevin)
- Allow user to set IOMMU_TEST_INVALIDATE_FLAG_ALL and IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR
in the same time (Kevin)
v7: https://lore.kernel.org/linux-iommu/20231221153948.119007-1-yi.l.liu@intel.…
- Remove domain->ops->cache_invalidate_user check in hwpt alloc path due
to failure in bisect (Baolu)
- Remove out_driver_error_code from struct iommu_hwpt_invalidate after
discussion in v6. Should expect per-entry error code.
- Rework the selftest cache invalidation part to report a per-entry error
- Allow user to pass in an empty array to have a try-and-fail mechanism for
user to check if a given req_type is supported by the kernel (Jason)
- Define a separate enum type for cache invalidation data (Jason)
- Fix the IOMMU_HWPT_INVALIDATE to always update the req_num field before
returning (Nicolin)
- Merge the VT-d nesting part 2/2
https://lore.kernel.org/linux-iommu/20231117131816.24359-1-yi.l.liu@intel.c…
into this series to avoid defining empty enum in the middle of the series.
The major difference is adding the VT-d related invalidation uapi structures
together with the generic data structures in patch 02 of this series.
- VT-d driver was refined to report ICE/ITE error from the bottom cache
invalidation submit helpers, hence the cache_invalidate_user op could
report such errors via the per-entry error field to user. VT-d driver
will not stop the invalidation array walking due to the ICE/ITE errors
as such errors are defined by VT-d spec, userspace should be able to
handle it and let the real user (say Virtual Machine) know about it.
But for other errors like invalid uapi data structure configuration,
memory copy failure, such errors should stop the array walking as it
may have more issues if go on.
- Minor fixes per Jason and Kevin's review comments
v6: https://lore.kernel.org/linux-iommu/20231117130717.19875-1-yi.l.liu@intel.c…
- No much change, just rebase on top of 6.7-rc1 as part 1/2 is merged
v5: https://lore.kernel.org/linux-iommu/20231020092426.13907-1-yi.l.liu@intel.c…
- Split the iommufd nesting series into two parts of alloc_user and
invalidation (Jason)
- Split IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING/_NESTED, and
do the same with the structures/alloc()/abort()/destroy(). Reworked the
selftest accordingly too. (Jason)
- Move hwpt/data_type into struct iommu_user_data from standalone op
arguments. (Jason)
- Rename hwpt_type to be data_type, the HWPT_TYPE to be HWPT_ALLOC_DATA,
_TYPE_DEFAULT to be _ALLOC_DATA_NONE (Jason, Kevin)
- Rename iommu_copy_user_data() to iommu_copy_struct_from_user() (Kevin)
- Add macro to the iommu_copy_struct_from_user() to calculate min_size
(Jason)
- Fix two bugs spotted by ZhaoYan
v4: https://lore.kernel.org/linux-iommu/20230921075138.124099-1-yi.l.liu@intel.…
- Separate HWPT alloc/destroy/abort functions between user-managed HWPTs
and kernel-managed HWPTs
- Rework invalidate uAPI to be a multi-request array-based design
- Add a struct iommu_user_data_array and a helper for driver to sanitize
and copy the entry data from user space invalidation array
- Add a patch fixing TEST_LENGTH() in selftest program
- Drop IOMMU_RESV_IOVA_RANGES patches
- Update kdoc and inline comments
- Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation,
this does not change the rule that resv regions should only be added to the
kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series
as it is needed only by SMMU so far.
v3: https://lore.kernel.org/linux-iommu/20230724110406.107212-1-yi.l.liu@intel.…
- Add new uAPI things in alphabetical order
- Pass in "enum iommu_hwpt_type hwpt_type" to op->domain_alloc_user for
sanity, replacing the previous op->domain_alloc_user_data_len solution
- Return ERR_PTR from domain_alloc_user instead of NULL
- Only add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation (Kevin)
- Add IOMMU_RESV_IOVA_RANGES to report resv iova ranges to userspace hence
userspace is able to exclude the ranges in the stage-1 HWPT (e.g. guest I/O
page table). (Kevin)
- Add selftest coverage for the new IOMMU_RESV_IOVA_RANGES ioctl
- Minor changes per Kevin's inputs
v2: https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.c…
- Add union iommu_domain_user_data to include all user data structures to avoid
passing void * in kernel APIs.
- Add iommu op to return user data length for user domain allocation
- Rename struct iommu_hwpt_alloc::data_type to be hwpt_type
- Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len
- Convert cache_invalidate_user op to be int instead of void
- Remove @data_type in struct iommu_hwpt_invalidate
- Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1
v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.…
Thanks,
Yi Liu
Lu Baolu (4):
iommu: Add cache_invalidate_user op
iommu/vt-d: Allow qi_submit_sync() to return the QI faults
iommu/vt-d: Convert stage-1 cache invalidation to return QI fault
iommu/vt-d: Add iotlb flush for nested domain
Nicolin Chen (4):
iommu: Add iommu_copy_struct_from_user_array helper
iommufd/selftest: Add mock_domain_cache_invalidate_user support
iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
Yi Liu (2):
iommufd: Add IOMMU_HWPT_INVALIDATE
iommufd: Add data structure for Intel VT-d stage-1 cache invalidation
drivers/iommu/intel/dmar.c | 49 +++--
drivers/iommu/intel/iommu.c | 12 +-
drivers/iommu/intel/iommu.h | 8 +-
drivers/iommu/intel/irq_remapping.c | 2 +-
drivers/iommu/intel/nested.c | 107 ++++++++++
drivers/iommu/intel/pasid.c | 14 +-
drivers/iommu/intel/svm.c | 14 +-
drivers/iommu/iommufd/hw_pagetable.c | 41 ++++
drivers/iommu/iommufd/iommufd_private.h | 10 +
drivers/iommu/iommufd/iommufd_test.h | 39 ++++
drivers/iommu/iommufd/main.c | 3 +
drivers/iommu/iommufd/selftest.c | 86 ++++++++
include/linux/iommu.h | 100 +++++++++
include/uapi/linux/iommufd.h | 101 ++++++++++
tools/testing/selftests/iommu/iommufd.c | 190 ++++++++++++++++++
tools/testing/selftests/iommu/iommufd_utils.h | 57 ++++++
16 files changed, 794 insertions(+), 39 deletions(-)
--
2.34.1
The patch set [1] added a general lib.sh in net selftests, and converted
several test scripts to source the lib.sh.
The shebang of unicast_extensions.sh is /bin/sh which may point to various
shells in different distributions, but "source" is only available in some
of them. For example, "source" is a built-it function in bash, but it
cannot be used in dash.
Refer to other scripts that were converted together, simply change the
shebang to bash to suppress the following errors when the default /bin/sh
points to other shells.
# selftests: net: unicast_extensions.sh
# ./unicast_extensions.sh: 31: source: not found
# ###########################################################################
# Unicast address extensions tests (behavior of reserved IPv4 addresses)
# ###########################################################################
# TEST: assign and ping within 240/4 (1 of 2) (is allowed) [FAIL]
# TEST: assign and ping within 240/4 (2 of 2) (is allowed) [FAIL]
# TEST: assign and ping within 0/8 (1 of 2) (is allowed) [FAIL]
# TEST: assign and ping within 0/8 (2 of 2) (is allowed) [FAIL]
# TEST: assign and ping inside 255.255/16 (is allowed) [FAIL]
# TEST: assign and ping inside 255.255.255/24 (is allowed) [FAIL]
# TEST: route between 240.5.6/24 and 255.1.2/24 (is allowed) [FAIL]
# TEST: route between 0.200/16 and 245.99/16 (is allowed) [FAIL]
# TEST: assign and ping lowest address (/24) [FAIL]
# TEST: assign and ping lowest address (/26) [FAIL]
# TEST: routing using lowest address [FAIL]
# TEST: assigning 0.0.0.0 (is forbidden) [ OK ]
# TEST: assigning 255.255.255.255 (is forbidden) [ OK ]
# TEST: assign and ping inside 127/8 (is forbidden) [ OK ]
# TEST: assign and ping class D address (is forbidden) [ OK ]
# TEST: routing using class D (is forbidden) [ OK ]
# TEST: routing using 127/8 (is forbidden) [ OK ]
not ok 51 selftests: net: unicast_extensions.sh # exit=1
Link: https://lore.kernel.org/all/20231202020110.362433-1-liuhangbin@gmail.com/ [1]
Fixes: 0f4765d0b48d ("selftests/net: convert unicast_extensions.sh to run it in unique namespace")
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Signed-off-by: Yujie Liu <yujie.liu(a)intel.com>
---
tools/testing/selftests/net/unicast_extensions.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/unicast_extensions.sh b/tools/testing/selftests/net/unicast_extensions.sh
index b7a2cb9e7477..2766990c2b78 100755
--- a/tools/testing/selftests/net/unicast_extensions.sh
+++ b/tools/testing/selftests/net/unicast_extensions.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
#
# By Seth Schoen (c) 2021, for the IPv4 Unicast Extensions Project
--
2.34.1
This series attempts to reduce parsing overhead of IPv6 extension headers
in GRO and GSO, by removing extension header specific code and enabling
the frag0 fast path.
The following changes were made:
- Specific unnecessary HBH conditionals were removed by adding HBH offload
to inet6_offloads
- Added a utility function to support frag0 fast path in ipv6_gro_receive
- Added self-test for IPv6 packets with extension headers in GRO
Richard
Richard Gobert (3):
net: gso: add HBH extension header offload support
net: gro: parse ipv6 ext headers without frag0 invalidation
selftests/net: fix GRO coalesce test and add ext header coalesce test
net/ipv6/exthdrs_offload.c | 11 +++++
net/ipv6/ip6_offload.c | 76 ++++++++++++++++++++----------
tools/testing/selftests/net/gro.c | 78 ++++++++++++++++++++++++++++---
3 files changed, 134 insertions(+), 31 deletions(-)
--
2.36.1
Nested translation is a hardware feature that is supported by many modern
IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
to get access to the physical address. stage-1 translation table is owned
by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
to stage-1 translation table should be followed by an IOTLB invalidation.
Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.
.-------------. .---------------------------.
| vIOMMU | | Guest I/O page table |
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush --+
'-------------' |
| | V
| | I/O page table pointer in GPA
'-------------'
Guest
------| Shadow |---------------------------|--------
v v v
Host
.-------------. .------------------------.
| pIOMMU | | FS for GIOVA->GPA |
| | '------------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.----------------------------------.
| | | SS for GPA->HPA, unmanaged domain|
| | '----------------------------------'
'-------------'
Where:
- FS = First stage page tables
- SS = Second stage page tables
<Intel VT-d Nested translation>
This series is based on the first part which was merged [1], this series is to
add the cache invalidation interface or the userspace to invalidate cache after
modifying the stage-1 page table. This includes both the iommufd changes and the
VT-d driver changes.
Complete code can be found in [2], QEMU could can be found in [3].
At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks
them for the help. ^_^. Look forward to your feedbacks.
[1] https://lore.kernel.org/linux-iommu/20231026044216.64964-1-yi.l.liu@intel.c… - merged
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1
Change log:
v7:
- Remove domain->ops->cache_invalidate_user check in hwpt alloc path due
to failure in bisect (Baolu)
- Remove out_driver_error_code from struct iommu_hwpt_invalidate after
discussion in v6. Should expect per-entry error code.
- Rework the selftest cache invalidation part to report a per-entry error
- Allow user to pass in an empty array to have a try-and-fail mechanism for
user to check if a given req_type is supported by the kernel (Jason)
- Define a separate enum type for cache invalidation data (Jason)
- Fix the IOMMU_HWPT_INVALIDATE to always update the req_num field before
returning (Nicolin)
- Merge the VT-d nesting part 2/2
https://lore.kernel.org/linux-iommu/20231117131816.24359-1-yi.l.liu@intel.c…
into this series to avoid defining empty enum in the middle of the series.
The major difference is adding the VT-d related invalidation uapi structures
together with the generic data structures in patch 02 of this series.
- VT-d driver was refined to report ICE/ITE error from the bottom cache
invalidation submit helpers, hence the cache_invalidate_user op could
report such errors via the per-entry error field to user. VT-d driver
will not stop the invalidation array walking due to the ICE/ITE errors
as such errors are defined by VT-d spec, userspace should be able to
handle it and let the real user (say Virtual Machine) know about it.
But for other errors like invalid uapi data structure configuration,
memory copy failure, such errors should stop the array walking as it
may have more issues if go on.
- Minor fixes per Jason and Kevin's review comments
v6: https://lore.kernel.org/linux-iommu/20231117130717.19875-1-yi.l.liu@intel.c…
- No much change, just rebase on top of 6.7-rc1 as part 1/2 is merged
v5: https://lore.kernel.org/linux-iommu/20231020092426.13907-1-yi.l.liu@intel.c…
- Split the iommufd nesting series into two parts of alloc_user and
invalidation (Jason)
- Split IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING/_NESTED, and
do the same with the structures/alloc()/abort()/destroy(). Reworked the
selftest accordingly too. (Jason)
- Move hwpt/data_type into struct iommu_user_data from standalone op
arguments. (Jason)
- Rename hwpt_type to be data_type, the HWPT_TYPE to be HWPT_ALLOC_DATA,
_TYPE_DEFAULT to be _ALLOC_DATA_NONE (Jason, Kevin)
- Rename iommu_copy_user_data() to iommu_copy_struct_from_user() (Kevin)
- Add macro to the iommu_copy_struct_from_user() to calculate min_size
(Jason)
- Fix two bugs spotted by ZhaoYan
v4: https://lore.kernel.org/linux-iommu/20230921075138.124099-1-yi.l.liu@intel.…
- Separate HWPT alloc/destroy/abort functions between user-managed HWPTs
and kernel-managed HWPTs
- Rework invalidate uAPI to be a multi-request array-based design
- Add a struct iommu_user_data_array and a helper for driver to sanitize
and copy the entry data from user space invalidation array
- Add a patch fixing TEST_LENGTH() in selftest program
- Drop IOMMU_RESV_IOVA_RANGES patches
- Update kdoc and inline comments
- Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation,
this does not change the rule that resv regions should only be added to the
kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series
as it is needed only by SMMU so far.
v3: https://lore.kernel.org/linux-iommu/20230724110406.107212-1-yi.l.liu@intel.…
- Add new uAPI things in alphabetical order
- Pass in "enum iommu_hwpt_type hwpt_type" to op->domain_alloc_user for
sanity, replacing the previous op->domain_alloc_user_data_len solution
- Return ERR_PTR from domain_alloc_user instead of NULL
- Only add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation (Kevin)
- Add IOMMU_RESV_IOVA_RANGES to report resv iova ranges to userspace hence
userspace is able to exclude the ranges in the stage-1 HWPT (e.g. guest I/O
page table). (Kevin)
- Add selftest coverage for the new IOMMU_RESV_IOVA_RANGES ioctl
- Minor changes per Kevin's inputs
v2: https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.c…
- Add union iommu_domain_user_data to include all user data structures to avoid
passing void * in kernel APIs.
- Add iommu op to return user data length for user domain allocation
- Rename struct iommu_hwpt_alloc::data_type to be hwpt_type
- Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len
- Convert cache_invalidate_user op to be int instead of void
- Remove @data_type in struct iommu_hwpt_invalidate
- Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1
v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.…
Thanks,
Yi Liu
Lu Baolu (4):
iommu: Add cache_invalidate_user op
iommu/vt-d: Allow qi_submit_sync() to return the QI faults
iommu/vt-d: Convert pasid based cache invalidation to return QI fault
iommu/vt-d: Add iotlb flush for nested domain
Nicolin Chen (4):
iommu: Add iommu_copy_struct_from_user_array helper
iommufd/selftest: Add mock_domain_cache_invalidate_user support
iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
Yi Liu (1):
iommufd: Add IOMMU_HWPT_INVALIDATE
drivers/iommu/intel/dmar.c | 36 ++--
drivers/iommu/intel/iommu.c | 12 +-
drivers/iommu/intel/iommu.h | 8 +-
drivers/iommu/intel/irq_remapping.c | 2 +-
drivers/iommu/intel/nested.c | 116 ++++++++++++
drivers/iommu/intel/pasid.c | 14 +-
drivers/iommu/intel/svm.c | 14 +-
drivers/iommu/iommufd/hw_pagetable.c | 36 ++++
drivers/iommu/iommufd/iommufd_private.h | 10 ++
drivers/iommu/iommufd/iommufd_test.h | 39 ++++
drivers/iommu/iommufd/main.c | 3 +
drivers/iommu/iommufd/selftest.c | 93 ++++++++++
include/linux/iommu.h | 101 +++++++++++
include/uapi/linux/iommufd.h | 100 +++++++++++
tools/testing/selftests/iommu/iommufd.c | 170 ++++++++++++++++++
tools/testing/selftests/iommu/iommufd_utils.h | 57 ++++++
16 files changed, 773 insertions(+), 38 deletions(-)
--
2.34.1
Patch 1 is a cleanup one: mptcp_is_tcpsk() helper was modifying sock_ops
in some cases which is unexpected with that name.
Patch 2 to 4 add support for two socket options: IP_LOCAL_PORT_RANGE and
IP_BIND_ADDRESS_NO_PORT. The first one is a preparation patch, the
second one adds the support while the last one modifies an existing
selftest to validate the new features.
Signed-off-by: Matthieu Baerts <matttbe(a)kernel.org>
---
Davide Caratti (1):
mptcp: don't overwrite sock_ops in mptcp_is_tcpsk()
Maxim Galaganov (3):
mptcp: rename mptcp_setsockopt_sol_ip_set_transparent()
mptcp: sockopt: support IP_LOCAL_PORT_RANGE and IP_BIND_ADDRESS_NO_PORT
selftests/net: add MPTCP coverage for IP_LOCAL_PORT_RANGE
net/mptcp/protocol.c | 108 +++++++++-------------
net/mptcp/sockopt.c | 27 +++++-
tools/testing/selftests/net/ip_local_port_range.c | 12 +++
3 files changed, 79 insertions(+), 68 deletions(-)
---
base-commit: 62ed78f3baff396bd928ee77077580c5aa940149
change-id: 20231219-upstream-net-next-20231219-mptcp-sockopts-ephemeral-ports-645522e83161
Best regards,
--
Matthieu Baerts <matttbe(a)kernel.org>
From: Maxim Mikityanskiy <maxim(a)isovalent.com>
The goal of this series is to extend the verifier's capabilities of
tracking scalars when they are spilled to stack, especially when the
spill or fill is narrowing. It also contains a fix by Eduard for
infinite loop detection and a state pruning optimization by Eduard that
compensates for a verification complexity regression introduced by
tracking unbounded scalars. These improvements reduce the surface of
false rejections that I saw while working on Cilium codebase.
Patch 1 (Maxim): Fix for an existing test, it will matter later in the
series.
Patches 2-3 (Eduard): Fixes for false rejections in infinite loop
detection that happen in the selftests when my patches are applied.
Patches 4-5 (Maxim): Fix the inconsistency of find_equal_scalars that
was possible if 32-bit spills were made.
Patches 6-11 (Maxim): Support the case when boundary checks are first
performed after the register was spilled to the stack.
Patches 12-13 (Maxim): Support narrowing fills.
Patches 14-15 (Eduard): Optimization for state pruning in stacksafe() to
mitigate the verification complexity regression.
veristat -e file,prog,states -f '!states_diff<50' -f '!states_pct<10' -f '!states_a<10' -f '!states_b<10' -C ...
* Without patch 14:
File Program States (A) States (B) States (DIFF)
-------------------- ------------ ---------- ---------- ----------------
bpf_xdp.o tail_lb_ipv6 3877 2936 -941 (-24.27%)
pyperf180.bpf.o on_event 8422 10456 +2034 (+24.15%)
pyperf600.bpf.o on_event 22259 37319 +15060 (+67.66%)
pyperf600_iter.bpf.o on_event 400 540 +140 (+35.00%)
strobemeta.bpf.o on_event 4702 13435 +8733 (+185.73%)
* With patch 14:
File Program States (A) States (B) States (DIFF)
-------------------- ------------ ---------- ---------- --------------
bpf_xdp.o tail_lb_ipv6 3877 2937 -940 (-24.25%)
pyperf600_iter.bpf.o on_event 400 500 +100 (+25.00%)
Eduard Zingerman (4):
bpf: make infinite loop detection in is_state_visited() exact
selftests/bpf: check if imprecise stack spills confuse infinite loop
detection
bpf: Optimize state pruning for spilled scalars
selftests/bpf: states pruning checks for scalar vs STACK_{MISC,ZERO}
Maxim Mikityanskiy (11):
selftests/bpf: Fix the u64_offset_to_skb_data test
bpf: Make bpf_for_each_spilled_reg consider narrow spills
selftests/bpf: Add a test case for 32-bit spill tracking
bpf: Add the assign_scalar_id_before_mov function
bpf: Add the get_reg_width function
bpf: Assign ID to scalars on spill
selftests/bpf: Test assigning ID to scalars on spill
bpf: Track spilled unbounded scalars
selftests/bpf: Test tracking spilled unbounded scalars
bpf: Preserve boundaries and track scalars on narrowing fill
selftests/bpf: Add test cases for narrowing fill
include/linux/bpf_verifier.h | 2 +-
kernel/bpf/verifier.c | 160 +++++-
.../bpf/progs/verifier_direct_packet_access.c | 2 +-
.../selftests/bpf/progs/verifier_loops1.c | 24 +
.../selftests/bpf/progs/verifier_spill_fill.c | 529 +++++++++++++++++-
.../testing/selftests/bpf/verifier/precise.c | 6 +-
6 files changed, 677 insertions(+), 46 deletions(-)
--
2.42.1
The KUnit device helpers are documented with kerneldoc in their header
file, but also have short comments over their implementation. These were
mistakenly formatted as kerneldoc comments, even though they're not
valid kerneldoc. It shouldn't cause any serious problems -- this file
isn't included in the docs -- but it could be confusing, and causes
warnings.
Remove the extra '*' so that these aren't treated as kerneldoc.
Fixes: d03c720e03bd ("kunit: Add APIs for managing devices")
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312181920.H4EPAH20-lkp@intel.com/
Signed-off-by: David Gow <davidgow(a)google.com>
---
lib/kunit/device.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/kunit/device.c b/lib/kunit/device.c
index 1db4305b615a..f5371287b375 100644
--- a/lib/kunit/device.c
+++ b/lib/kunit/device.c
@@ -60,7 +60,7 @@ static void kunit_device_release(struct device *d)
kfree(to_kunit_device(d));
}
-/**
+/*
* Create and register a KUnit-managed struct device_driver on the kunit_bus.
* Returns an error pointer on failure.
*/
@@ -124,7 +124,7 @@ static struct kunit_device *kunit_device_register_internal(struct kunit *test,
return kunit_dev;
}
-/**
+/*
* Create and register a new KUnit-managed device, using the user-supplied device_driver.
* On failure, returns an error pointer.
*/
@@ -141,7 +141,7 @@ struct device *kunit_device_register_with_driver(struct kunit *test,
}
EXPORT_SYMBOL_GPL(kunit_device_register_with_driver);
-/**
+/*
* Create and register a new KUnit-managed device, including a matching device_driver.
* On failure, returns an error pointer.
*/
--
2.43.0.472.g3155946c3a-goog
Here is the last part of converting net selftests to run in unique namespace.
This part converts all left tests. After the conversion, we can run the net
sleftests in parallel. e.g.
# ./run_kselftest.sh -n -t net:reuseport_bpf
TAP version 13
1..1
# selftests: net: reuseport_bpf
ok 1 selftests: net: reuseport_bpf
mod 10...
# Socket 0: 0
# Socket 1: 1
...
# Socket 4: 19
# Testing filter add without bind...
# SUCCESS
# ./run_kselftest.sh -p -n -t net:cmsg_so_mark.sh -t net:cmsg_time.sh -t net:cmsg_ipv6.sh
TAP version 13
1..3
# selftests: net: cmsg_so_mark.sh
ok 1 selftests: net: cmsg_so_mark.sh
# selftests: net: cmsg_time.sh
ok 2 selftests: net: cmsg_time.sh
# selftests: net: cmsg_ipv6.sh
ok 3 selftests: net: cmsg_ipv6.sh
# ./run_kselftest.sh -p -n -c net
TAP version 13
1..95
# selftests: net: reuseport_bpf_numa
ok 3 selftests: net: reuseport_bpf_numa
# selftests: net: reuseport_bpf_cpu
ok 2 selftests: net: reuseport_bpf_cpu
# selftests: net: sk_bind_sendto_listen
ok 9 selftests: net: sk_bind_sendto_listen
# selftests: net: reuseaddr_conflict
ok 5 selftests: net: reuseaddr_conflict
...
Here is the part 1 link:
https://lore.kernel.org/netdev/20231202020110.362433-1-liuhangbin@gmail.com
part 2 link:
https://lore.kernel.org/netdev/20231206070801.1691247-1-liuhangbin@gmail.com
part 3 link:
https://lore.kernel.org/netdev/20231213060856.4030084-1-liuhangbin@gmail.com
Hangbin Liu (8):
selftests/net: convert gre_gso.sh to run it in unique namespace
selftests/net: convert netns-name.sh to run it in unique namespace
selftests/net: convert rtnetlink.sh to run it in unique namespace
selftests/net: convert stress_reuseport_listen.sh to run it in unique
namespace
selftests/net: convert xfrm_policy.sh to run it in unique namespace
selftests/net: use unique netns name for setup_loopback.sh
setup_veth.sh
selftests/net: convert pmtu.sh to run it in unique namespace
kselftest/runner.sh: add netns support
tools/testing/selftests/kselftest/runner.sh | 38 ++++-
tools/testing/selftests/net/gre_gso.sh | 18 +--
tools/testing/selftests/net/gro.sh | 4 +-
tools/testing/selftests/net/netns-name.sh | 44 +++---
tools/testing/selftests/net/pmtu.sh | 27 ++--
tools/testing/selftests/net/rtnetlink.sh | 34 +++--
tools/testing/selftests/net/setup_loopback.sh | 8 +-
tools/testing/selftests/net/setup_veth.sh | 9 +-
.../selftests/net/stress_reuseport_listen.sh | 6 +-
tools/testing/selftests/net/toeplitz.sh | 14 +-
tools/testing/selftests/net/xfrm_policy.sh | 138 +++++++++---------
tools/testing/selftests/run_kselftest.sh | 10 +-
12 files changed, 193 insertions(+), 157 deletions(-)
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
As there were bugs found with the ownership of eventfs dynamic file
creation. Add a test to test it.
It will remount tracefs with a different gid and check the ownership of
the eventfs directory, as well as the system and event directories. It
will also check the event file directories.
It then does a chgrp on each of these as well to see if they all get
updated as expected.
Then it remounts the tracefs file system back to the original group and
makes sure that all the updated files and directories were reset back to
the original ownership.
It does the same for instances that change the ownership of he instance
directory.
Note, because the uid is not reset by a remount, it is tested for every
file by switching it to a new owner and then back again.
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Tested-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
Changes since v3: https://lore.kernel.org/linux-trace-kernel/20231221211229.13398ef3@gandalf.…
- Added missing SPDX and removed exec permission from file (Shuah Khan)
.../ftrace/test.d/00basic/test_ownership.tc | 114 ++++++++++++++++++
1 file changed, 114 insertions(+)
create mode 100644 tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
diff --git a/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
new file mode 100644
index 000000000000..add7d5bf585d
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
@@ -0,0 +1,114 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Test file and directory owership changes for eventfs
+
+original_group=`stat -c "%g" .`
+original_owner=`stat -c "%u" .`
+
+mount_point=`stat -c '%m' .`
+mount_options=`mount | grep "$mount_point" | sed -e 's/.*(\(.*\)).*/\1/'`
+
+# find another owner and group that is not the original
+other_group=`tac /etc/group | grep -v ":$original_group:" | head -1 | cut -d: -f3`
+other_owner=`tac /etc/passwd | grep -v ":$original_owner:" | head -1 | cut -d: -f3`
+
+# Remove any group ownership already
+new_options=`echo "$mount_options" | sed -e "s/gid=[0-9]*/gid=$other_group/"`
+
+if [ "$new_options" = "$mount_options" ]; then
+ new_options="$mount_options,gid=$other_group"
+ mount_options="$mount_options,gid=$original_group"
+fi
+
+canary="events/timer events/timer/timer_cancel events/timer/timer_cancel/format"
+
+test() {
+ file=$1
+ test_group=$2
+
+ owner=`stat -c "%u" $file`
+ group=`stat -c "%g" $file`
+
+ echo "testing $file $owner=$original_owner and $group=$test_group"
+ if [ $owner -ne $original_owner ]; then
+ exit_fail
+ fi
+ if [ $group -ne $test_group ]; then
+ exit_fail
+ fi
+
+ # Note, the remount does not update ownership so test going to and from owner
+ echo "test owner $file to $other_owner"
+ chown $other_owner $file
+ owner=`stat -c "%u" $file`
+ if [ $owner -ne $other_owner ]; then
+ exit_fail
+ fi
+
+ chown $original_owner $file
+ owner=`stat -c "%u" $file`
+ if [ $owner -ne $original_owner ]; then
+ exit_fail
+ fi
+
+}
+
+run_tests() {
+ for d in "." "events" "events/sched" "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $other_group
+ done
+
+ chgrp $original_group events
+ test "events" $original_group
+ for d in "." "events/sched" "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $other_group
+ done
+
+ chgrp $original_group events/sched
+ test "events/sched" $original_group
+ for d in "." "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $other_group
+ done
+
+ chgrp $original_group events/sched/sched_switch
+ test "events/sched/sched_switch" $original_group
+ for d in "." "events/sched/sched_switch/enable" $canary; do
+ test "$d" $other_group
+ done
+
+ chgrp $original_group events/sched/sched_switch/enable
+ test "events/sched/sched_switch/enable" $original_group
+ for d in "." $canary; do
+ test "$d" $other_group
+ done
+}
+
+mount -o remount,"$new_options" .
+
+run_tests
+
+mount -o remount,"$mount_options" .
+
+for d in "." "events" "events/sched" "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $original_group
+done
+
+# check instances as well
+
+chgrp $other_group instances
+
+instance="$(mktemp -u test-XXXXXX)"
+
+mkdir instances/$instance
+
+cd instances/$instance
+
+run_tests
+
+cd ../..
+
+rmdir instances/$instance
+
+chgrp $original_group instances
+
+exit 0
--
2.42.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
As there were bugs found with the ownership of eventfs dynamic file
creation. Add a test to test it.
It will remount tracefs with a different gid and check the ownership of
the eventfs directory, as well as the system and event directories. It
will also check the event file directories.
It then does a chgrp on each of these as well to see if they all get
updated as expected.
Then it remounts the tracefs file system back to the original group and
makes sure that all the updated files and directories were reset back to
the original ownership.
It does the same for instances that change the ownership of he instance
directory.
Note, because the uid is not reset by a remount, it is tested for every
file by switching it to a new owner and then back again.
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
Changes since v2: https://lore.kernel.org/linux-trace-kernel/20231221194516.53e1ee43@gandalf.…
- Changed the instance test name from "foo-$(mktemp -u XXXXX)" to
"$(mktemp -u test-XXXXXX)" as Masami reported that busybox mktemp only
works with 6 Xs and not 5. Also changed "foo" to "test" and placed it
into the mktemp format.
.../ftrace/test.d/00basic/test_ownership.tc | 113 ++++++++++++++++++
1 file changed, 113 insertions(+)
create mode 100755 tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
diff --git a/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
new file mode 100755
index 000000000000..4c20be3a714a
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
@@ -0,0 +1,113 @@
+#!/bin/sh
+# description: Test file and directory owership changes for eventfs
+
+original_group=`stat -c "%g" .`
+original_owner=`stat -c "%u" .`
+
+mount_point=`stat -c '%m' .`
+mount_options=`mount | grep "$mount_point" | sed -e 's/.*(\(.*\)).*/\1/'`
+
+# find another owner and group that is not the original
+other_group=`tac /etc/group | grep -v ":$original_group:" | head -1 | cut -d: -f3`
+other_owner=`tac /etc/passwd | grep -v ":$original_owner:" | head -1 | cut -d: -f3`
+
+# Remove any group ownership already
+new_options=`echo "$mount_options" | sed -e "s/gid=[0-9]*/gid=$other_group/"`
+
+if [ "$new_options" = "$mount_options" ]; then
+ new_options="$mount_options,gid=$other_group"
+ mount_options="$mount_options,gid=$original_group"
+fi
+
+canary="events/timer events/timer/timer_cancel events/timer/timer_cancel/format"
+
+test() {
+ file=$1
+ test_group=$2
+
+ owner=`stat -c "%u" $file`
+ group=`stat -c "%g" $file`
+
+ echo "testing $file $owner=$original_owner and $group=$test_group"
+ if [ $owner -ne $original_owner ]; then
+ exit_fail
+ fi
+ if [ $group -ne $test_group ]; then
+ exit_fail
+ fi
+
+ # Note, the remount does not update ownership so test going to and from owner
+ echo "test owner $file to $other_owner"
+ chown $other_owner $file
+ owner=`stat -c "%u" $file`
+ if [ $owner -ne $other_owner ]; then
+ exit_fail
+ fi
+
+ chown $original_owner $file
+ owner=`stat -c "%u" $file`
+ if [ $owner -ne $original_owner ]; then
+ exit_fail
+ fi
+
+}
+
+run_tests() {
+ for d in "." "events" "events/sched" "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $other_group
+ done
+
+ chgrp $original_group events
+ test "events" $original_group
+ for d in "." "events/sched" "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $other_group
+ done
+
+ chgrp $original_group events/sched
+ test "events/sched" $original_group
+ for d in "." "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $other_group
+ done
+
+ chgrp $original_group events/sched/sched_switch
+ test "events/sched/sched_switch" $original_group
+ for d in "." "events/sched/sched_switch/enable" $canary; do
+ test "$d" $other_group
+ done
+
+ chgrp $original_group events/sched/sched_switch/enable
+ test "events/sched/sched_switch/enable" $original_group
+ for d in "." $canary; do
+ test "$d" $other_group
+ done
+}
+
+mount -o remount,"$new_options" .
+
+run_tests
+
+mount -o remount,"$mount_options" .
+
+for d in "." "events" "events/sched" "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $original_group
+done
+
+# check instances as well
+
+chgrp $other_group instances
+
+instance="$(mktemp -u test-XXXXXX)"
+
+mkdir instances/$instance
+
+cd instances/$instance
+
+run_tests
+
+cd ../..
+
+rmdir instances/$instance
+
+chgrp $original_group instances
+
+exit 0
--
2.42.0
This series implements support for SME use in non-protected KVM guests.
Much of this is very similar to SVE, the main additional challenge that
SME presents is that it introduces two new controls which change the
registers seen by guests:
- PSTATE.ZA enables the ZA matrix register and, if SME2 is supported,
the ZT0 LUT register.
- PSTATE.SM enables streaming mode, a new floating point mode which
uses the SVE register set with a separately configured vector length.
In streaming mode implementation of the FFR register is optional.
It is also permitted to build systems which support SME without SVE, in
this case when not in streaming mode no SVE registers or instructions
are available. Further, there is no requirement that there be any
overlap in the set of vector lengths supported by SVE and SME in a
system, this is expected to be a common situation in practical systems.
Since there is a new vector length to configure we introduce a new
feature parallel to the existing SVE one with a new pseudo register for
the streaming mode vector length. Due to the overlap with SVE caused by
streaming mode rather than finalising SME as a separate feature we use
the existing SVE finalisation to also finalise SME, a new define
KVM_ARM_VCPU_VEC is provided to help make user code clearer. Finalising
SVE and SME separately would introduce complication with register access
since finalising SVE makes the SVE regsiters writeable by userspace and
doing multiple finalisations results in an error being reported.
Dealing with a state where the SVE registers are writeable due to one of
SVE or SME being finalised but may have their VL changed by the other
being finalised seems like needless complexity with minimal practical
utility, it seems clearer to just express directly that only one
finalisation can be done in the ABI.
We represent the streaming mode registers to userspace by always using
the existing SVE registers to access the floating point state, using the
larger of the SME and (if enabled for the guest) SVE vector lengths.
There are a large number of subfeatures for SME, most of which only
offer additional instructions but some of which (SME2 and FA64) add
architectural state. The expectation is that these will be configured
via the ID registers but since the mechanism for doing this is still
unclear the current code enables SME2 and FA64 for the guest if the host
supports them regardless of what the ID registers say.
Since we do not yet have support for SVE in protected guests and SME is
very reliant on SVE this series does not implement support for SME in
protected guests. This will be added separately once SVE support is
merged into mainline (or along with merging that), there is code for
protected guests using SVE in the Android tree.
The new KVM_ARM_VCPU_VEC feature and ZA and ZT0 registers have not been
added to the get-reg-list selftest, the idea of supporting additional
features there without restructuring the program to generate all
possible feature combinations has been rejected. I will post a separate
series which does that restructuring.
I am seeing some test failures currently which I've not got to the
bottom of, at this point I'm reasonably sure these are preexisting
issues in the kernel which are more apparent in a guest.
To: Marc Zyngier <maz(a)kernel.org>
To: Oliver Upton <oliver.upton(a)linux.dev>
To: James Morse <james.morse(a)arm.com>
To: Suzuki K Poulose <suzuki.poulose(a)arm.com>
To: Catalin Marinas <catalin.marinas(a)arm.com>
To: Will Deacon <will(a)kernel.org>
Cc: <linux-arm-kernel(a)lists.infradead.org>
Cc: <kvmarm(a)lists.linux.dev>
Cc: <linux-kernel(a)vger.kernel.org>
To: Paolo Bonzini <pbonzini(a)redhat.com>
To: Jonathan Corbet <corbet(a)lwn.net>
Cc: <kvm(a)vger.kernel.org>
Cc: <linux-doc(a)vger.kernel.org>
To: Shuah Khan <shuah(a)kernel.org>
Cc: <linux-kselftest(a)vger.kernel.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Changes in v2:
- Rebase onto v6.7-rc3.
- Configure subfeatures based on host system only.
- Complete nVHE support.
- There was some snafu with sending v1 out, it didn't make it to the
lists but in case it hit people's inboxes I'm sending as v2.
---
Mark Brown (22):
KVM: arm64: Document why we trap SVE access from the host
arm64/fpsimd: Make SVE<->FPSIMD rewriting available to KVM
KVM: arm64: Move SVE state access macros after feature test macros
KVM: arm64: Store vector lengths in an array
KVM: arm64: Document the KVM ABI for SME
KVM: arm64: Make FFR restore optional in __sve_restore_state()
KVM: arm64: Define guest flags for SME
KVM: arm64: Rename SVE finalization constants to be more general
KVM: arm64: Basic SME system register descriptions
KVM: arm64: Add support for TPIDR2_EL0
KVM: arm64: Make SMPRI_EL1 RES0 for SME guests
KVM: arm64: Make SVCR a normal system register
KVM: arm64: Context switch SME state for guest
KVM: arm64: Manage and handle SME traps
KVM: arm64: Implement SME vector length configuration
KVM: arm64: Rename sve_state_reg_region
KVM: arm64: Support userspace access to streaming mode SVE registers
KVM: arm64: Expose ZA to userspace
KVM: arm64: Provide userspace access to ZT0
KVM: arm64: Support SME version configuration via ID registers
KVM: arm64: Provide userspace ABI for enabling SME
KVM: arm64: selftests: Add SME system registers to get-reg-list
Documentation/virt/kvm/api.rst | 104 +++++---
arch/arm64/include/asm/fpsimd.h | 5 +
arch/arm64/include/asm/kvm_emulate.h | 13 +-
arch/arm64/include/asm/kvm_host.h | 99 +++++---
arch/arm64/include/asm/kvm_hyp.h | 3 +-
arch/arm64/include/uapi/asm/kvm.h | 33 +++
arch/arm64/kernel/fpsimd.c | 51 +++-
arch/arm64/kvm/arm.c | 16 +-
arch/arm64/kvm/fpsimd.c | 266 ++++++++++++++++++---
arch/arm64/kvm/guest.c | 230 +++++++++++++++---
arch/arm64/kvm/handle_exit.c | 11 +
arch/arm64/kvm/hyp/fpsimd.S | 11 +-
arch/arm64/kvm/hyp/include/hyp/switch.h | 86 ++++++-
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 16 ++
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 60 ++++-
arch/arm64/kvm/hyp/nvhe/switch.c | 13 +-
arch/arm64/kvm/hyp/vhe/switch.c | 3 +
arch/arm64/kvm/reset.c | 150 +++++++++---
arch/arm64/kvm/sys_regs.c | 67 +++++-
include/uapi/linux/kvm.h | 1 +
tools/testing/selftests/kvm/aarch64/get-reg-list.c | 32 ++-
21 files changed, 1063 insertions(+), 207 deletions(-)
---
base-commit: 4ae6e89253b387476c2ba0202c3a80f2e1284e91
change-id: 20230301-kvm-arm64-sme-06a1246d3636
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Swap the arguments to typecheck_fn() in kunit_activate_static_stub()
so that real_fn_addr can be either the function itself or a pointer
to that function.
This is useful to simplify redirecting static functions in a module.
Having to pass the actual function meant that it must be exported
from the module. Either making the 'static' and EXPORT_SYMBOL*()
conditional (which makes the code messy), or change it to always
exported (which increases the export namespace and prevents the
compiler inlining a trivial stub function in non-test builds).
With the original definition of kunit_activate_static_stub() the
address of real_fn_addr was passed to typecheck_fn() as the type to
be passed. This meant that if real_fn_addr was a pointer-to-function
it would resolve to a ** instead of a *, giving an error like this:
error: initialization of ‘int (**)(int)’ from incompatible pointer
type ‘int (*)(int)’ [-Werror=incompatible-pointer-types]
kunit_activate_static_stub(test, add_one_fn_ptr, subtract_one);
| ^~~~~~~~~~~~
./include/linux/typecheck.h:21:25: note: in definition of macro
‘typecheck_fn’
21 | ({ typeof(type) __tmp = function; \
Swapping the arguments to typecheck_fn makes it take the type of a
pointer to the replacement function. Either a function or a pointer
to function can be assigned to that. For example:
static int some_function(int x)
{
/* whatever */
}
int (* some_function_ptr)(int) = some_function;
static int replacement(int x)
{
/* whatever */
}
Then:
kunit_activate_static_stub(test, some_function, replacement);
yields:
typecheck_fn(typeof(&replacement), some_function);
and:
kunit_activate_static_stub(test, some_function_ptr, replacement);
yields:
typecheck_fn(typeof(&replacement), some_function_ptr);
The two typecheck_fn() then resolve to:
int (*__tmp)(int) = some_function;
and
int (*__tmp)(int) = some_function_ptr;
Both of these are valid. In the first case the compiler inserts
an implicit '&' to take the address of the supplied function, and
in the second case the RHS is already a pointer to the same type.
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
Reviewed-by: Rae Moar <rmoar(a)google.com>
---
No changes since V1.
---
include/kunit/static_stub.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/kunit/static_stub.h b/include/kunit/static_stub.h
index 85315c80b303..bf940322dfc0 100644
--- a/include/kunit/static_stub.h
+++ b/include/kunit/static_stub.h
@@ -93,7 +93,7 @@ void __kunit_activate_static_stub(struct kunit *test,
* The redirection can be disabled again with kunit_deactivate_static_stub().
*/
#define kunit_activate_static_stub(test, real_fn_addr, replacement_addr) do { \
- typecheck_fn(typeof(&real_fn_addr), replacement_addr); \
+ typecheck_fn(typeof(&replacement_addr), real_fn_addr); \
__kunit_activate_static_stub(test, real_fn_addr, replacement_addr); \
} while (0)
--
2.30.2
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
As there were bugs found with the ownership of eventfs dynamic file
creation. Add a test to test it.
It will remount tracefs with a different gid and check the ownership of
the eventfs directory, as well as the system and event directories. It
will also check the event file directories.
It then does a chgrp on each of these as well to see if they all get
updated as expected.
Then it remounts the tracefs file system back to the original group and
makes sure that all the updated files and directories were reset back to
the original ownership.
It does the same for instances that change the ownership of he instance
directory.
Note, because the uid is not reset by a remount, it is tested for every
file by switching it to a new owner and then back again.
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
Changes since v1: https://lore.kernel.org/linux-trace-kernel/20231221193551.13a0b7bd@gandalf.…
- Fixed a cut and paste error of using $original_group for finding another uid
.../ftrace/test.d/00basic/test_ownership.tc | 113 ++++++++++++++++++
1 file changed, 113 insertions(+)
create mode 100755 tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
diff --git a/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
new file mode 100755
index 000000000000..83cbd116d06b
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
@@ -0,0 +1,113 @@
+#!/bin/sh
+# description: Test file and directory owership changes for eventfs
+
+original_group=`stat -c "%g" .`
+original_owner=`stat -c "%u" .`
+
+mount_point=`stat -c '%m' .`
+mount_options=`mount | grep "$mount_point" | sed -e 's/.*(\(.*\)).*/\1/'`
+
+# find another owner and group that is not the original
+other_group=`tac /etc/group | grep -v ":$original_group:" | head -1 | cut -d: -f3`
+other_owner=`tac /etc/passwd | grep -v ":$original_owner:" | head -1 | cut -d: -f3`
+
+# Remove any group ownership already
+new_options=`echo "$mount_options" | sed -e "s/gid=[0-9]*/gid=$other_group/"`
+
+if [ "$new_options" = "$mount_options" ]; then
+ new_options="$mount_options,gid=$other_group"
+ mount_options="$mount_options,gid=$original_group"
+fi
+
+canary="events/timer events/timer/timer_cancel events/timer/timer_cancel/format"
+
+test() {
+ file=$1
+ test_group=$2
+
+ owner=`stat -c "%u" $file`
+ group=`stat -c "%g" $file`
+
+ echo "testing $file $owner=$original_owner and $group=$test_group"
+ if [ $owner -ne $original_owner ]; then
+ exit_fail
+ fi
+ if [ $group -ne $test_group ]; then
+ exit_fail
+ fi
+
+ # Note, the remount does not update ownership so test going to and from owner
+ echo "test owner $file to $other_owner"
+ chown $other_owner $file
+ owner=`stat -c "%u" $file`
+ if [ $owner -ne $other_owner ]; then
+ exit_fail
+ fi
+
+ chown $original_owner $file
+ owner=`stat -c "%u" $file`
+ if [ $owner -ne $original_owner ]; then
+ exit_fail
+ fi
+
+}
+
+run_tests() {
+ for d in "." "events" "events/sched" "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $other_group
+ done
+
+ chgrp $original_group events
+ test "events" $original_group
+ for d in "." "events/sched" "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $other_group
+ done
+
+ chgrp $original_group events/sched
+ test "events/sched" $original_group
+ for d in "." "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $other_group
+ done
+
+ chgrp $original_group events/sched/sched_switch
+ test "events/sched/sched_switch" $original_group
+ for d in "." "events/sched/sched_switch/enable" $canary; do
+ test "$d" $other_group
+ done
+
+ chgrp $original_group events/sched/sched_switch/enable
+ test "events/sched/sched_switch/enable" $original_group
+ for d in "." $canary; do
+ test "$d" $other_group
+ done
+}
+
+mount -o remount,"$new_options" .
+
+run_tests
+
+mount -o remount,"$mount_options" .
+
+for d in "." "events" "events/sched" "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $original_group
+done
+
+# check instances as well
+
+chgrp $other_group instances
+
+instance="foo-$(mktemp -u XXXXX)"
+
+mkdir instances/$instance
+
+cd instances/$instance
+
+run_tests
+
+cd ../..
+
+rmdir instances/$instance
+
+chgrp $original_group instances
+
+exit 0
--
2.42.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
As there were bugs found with the ownership of eventfs dynamic file
creation. Add a test to test it.
It will remount tracefs with a different gid and check the ownership of
the eventfs directory, as well as the system and event directories. It
will also check the event file directories.
It then does a chgrp on each of these as well to see if they all get
updated as expected.
Then it remounts the tracefs file system back to the original group and
makes sure that all the updated files and directories were reset back to
the original ownership.
It does the same for instances that change the ownership of he instance
directory.
Note, because the uid is not reset by a remount, it is tested for every
file by switching it to a new owner and then back again.
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
.../ftrace/test.d/00basic/test_ownership.tc | 113 ++++++++++++++++++
1 file changed, 113 insertions(+)
create mode 100755 tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
diff --git a/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
new file mode 100755
index 000000000000..de8cdf6f207b
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
@@ -0,0 +1,113 @@
+#!/bin/sh
+# description: Test file and directory owership changes for eventfs
+
+original_group=`stat -c "%g" .`
+original_owner=`stat -c "%u" .`
+
+mount_point=`stat -c '%m' .`
+mount_options=`mount | grep "$mount_point" | sed -e 's/.*(\(.*\)).*/\1/'`
+
+# find another owner and group that is not the original
+other_group=`tac /etc/group | grep -v ":$original_group:" | head -1 | cut -d: -f3`
+other_owner=`tac /etc/passwd | grep -v ":$original_group:" | head -1 | cut -d: -f3`
+
+# Remove any group ownership already
+new_options=`echo "$mount_options" | sed -e "s/gid=[0-9]*/gid=$other_group/"`
+
+if [ "$new_options" = "$mount_options" ]; then
+ new_options="$mount_options,gid=$other_group"
+ mount_options="$mount_options,gid=$original_group"
+fi
+
+canary="events/timer events/timer/timer_cancel events/timer/timer_cancel/format"
+
+test() {
+ file=$1
+ test_group=$2
+
+ owner=`stat -c "%u" $file`
+ group=`stat -c "%g" $file`
+
+ echo "testing $file $owner=$original_owner and $group=$test_group"
+ if [ $owner -ne $original_owner ]; then
+ exit_fail
+ fi
+ if [ $group -ne $test_group ]; then
+ exit_fail
+ fi
+
+ # Note, the remount does not update ownership so test going to and from owner
+ echo "test owner $file to $other_owner"
+ chown $other_owner $file
+ owner=`stat -c "%u" $file`
+ if [ $owner -ne $other_owner ]; then
+ exit_fail
+ fi
+
+ chown $original_owner $file
+ owner=`stat -c "%u" $file`
+ if [ $owner -ne $original_owner ]; then
+ exit_fail
+ fi
+
+}
+
+run_tests() {
+ for d in "." "events" "events/sched" "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $other_group
+ done
+
+ chgrp $original_group events
+ test "events" $original_group
+ for d in "." "events/sched" "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $other_group
+ done
+
+ chgrp $original_group events/sched
+ test "events/sched" $original_group
+ for d in "." "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $other_group
+ done
+
+ chgrp $original_group events/sched/sched_switch
+ test "events/sched/sched_switch" $original_group
+ for d in "." "events/sched/sched_switch/enable" $canary; do
+ test "$d" $other_group
+ done
+
+ chgrp $original_group events/sched/sched_switch/enable
+ test "events/sched/sched_switch/enable" $original_group
+ for d in "." $canary; do
+ test "$d" $other_group
+ done
+}
+
+mount -o remount,"$new_options" .
+
+run_tests
+
+mount -o remount,"$mount_options" .
+
+for d in "." "events" "events/sched" "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $original_group
+done
+
+# check instances as well
+
+chgrp $other_group instances
+
+instance="foo-$(mktemp -u XXXXX)"
+
+mkdir instances/$instance
+
+cd instances/$instance
+
+run_tests
+
+cd ../..
+
+rmdir instances/$instance
+
+chgrp $original_group instances
+
+exit 0
--
2.42.0
This makes the uevent selftests build not write to the source tree
unconditionally, as that breaks out of tree builds when the source tree
is read-only. It also avoids leaving a git repository in a dirty state
after a build.
Signed-off-by: Antonio Terceiro <antonio.terceiro(a)linaro.org>
---
tools/testing/selftests/uevent/Makefile | 16 ++++------------
1 file changed, 4 insertions(+), 12 deletions(-)
diff --git a/tools/testing/selftests/uevent/Makefile b/tools/testing/selftests/uevent/Makefile
index f7baa9aa2932..9d1ba09baa90 100644
--- a/tools/testing/selftests/uevent/Makefile
+++ b/tools/testing/selftests/uevent/Makefile
@@ -1,17 +1,9 @@
# SPDX-License-Identifier: GPL-2.0
all:
-include ../lib.mk
-
-.PHONY: all clean
-
-BINARIES := uevent_filtering
-CFLAGS += -Wl,-no-as-needed -Wall
-
-uevent_filtering: uevent_filtering.c ../kselftest.h ../kselftest_harness.h
- $(CC) $(CFLAGS) $< -o $@
+# SPDX-License-Identifier: GPL-2.0
+CFLAGS += -Wl,-no-as-needed -Wall $(KHDR_INCLUDES)
-TEST_PROGS += $(BINARIES)
-EXTRA_CLEAN := $(BINARIES)
+TEST_GEN_PROGS = uevent_filtering
-all: $(BINARIES)
+include ../lib.mk
--
2.43.0
Currently the seccomp benchmark selftest produces non-standard output,
meaning that while it makes a number of checks of the performance it
observes this has to be parsed by humans. This means that automated
systems running this suite of tests are almost certainly ignoring the
results which isn't ideal for spotting problems. Let's rework things so
that each check that the program does is reported as a test result to
the framework.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (2):
kselftest/seccomp: Use kselftest output functions for benchmark
kselftest/seccomp: Report each expectation we assert as a KTAP test
.../testing/selftests/seccomp/seccomp_benchmark.c | 105 +++++++++++++--------
1 file changed, 65 insertions(+), 40 deletions(-)
---
base-commit: 2cc14f52aeb78ce3f29677c2de1f06c0e91471ab
change-id: 20231219-b4-kselftest-seccomp-benchmark-ktap-357603823708
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Hi all,
The livepatch selftest somehow fails in -next on s390 due to what
appears to me as 'comm' usage issue. E.g the removal of timestamp-
less line "with link type OSD_10GIG." in the below output forces
'comm' to produce the correct result in check_result() function of
tools/testing/selftests/livepatch/functions.sh script:
[ 11.229256] qeth 0.0.bd02: qdio: OSA on SC 2624 using AI:1 QEBSM:0 PRI:1 TDD:1 SIGA: W
[ 11.250189] systemd-journald[943]: Successfully sent stream file descriptor to service manager.
[ 11.258763] qeth 0.0.bd00: Device is a OSD Express card (level: 0165)
with link type OSD_10GIG.
[ 11.259261] qeth 0.0.bd00: The device represents a Bridge Capable Port
[ 11.262376] qeth 0.0.bd00: MAC address b2:96:9c:49:aa:e9 successfully registered
[ 11.269654] qeth 0.0.bd00: MAC address 06:c6:b5:7d:ee:63 successfully registered
By contrast, using the 'diff' instead works as a charm. But it was
removed with commit 2f3f651f3756 ("selftests/livepatch: Use "comm"
instead of "diff" for dmesg").
I am attaching the contents of "$expect" and "$result" script
variables and the output of 'dmesg' before and after test run
dmesg-saved.txt and dmesg.txt.
Another 'dmesg' output dmesg-saved1.txt and dmesg1.txt also
shows the same problem, which seems like something to do with
sorting.
The minimal reproducer attached is dmesg-saved1-rep.txt and
dmesg1-rep.txt, that could be described as:
--- dmesg-saved1-rep.txt 2023-12-17 21:08:14.171014218 +0100
+++ dmesg1-rep.txt 2023-12-17 21:06:52.221014218 +0100
@@ -1,3 +1,3 @@
-[ 98.820331] livepatch: 'test_klp_state2': starting patching transition
[ 100.031067] livepatch: 'test_klp_state2': completing patching transition
[ 284.224335] livepatch: kernel.ftrace_enabled = 1
+[ 284.232921] ===== TEST: basic shadow variable API =====
The culprit is the extra space in [ 98.820331] timestamp, that from
the script point of view produces the output with two extra lines:
[ 100.031067] livepatch: 'test_klp_state2': completing patching transition
[ 284.224335] livepatch: kernel.ftrace_enabled = 1
[ 284.232921] ===== TEST: basic shadow variable API =====
If the line with [ 98.820331] timestamp removed or changed to e.g
[ 100.031066] (aka 1 us less), then the result output is as expected:
[ 284.232921] ===== TEST: basic shadow variable API =====
Thanks!
This patchset moves the current kernel testing livepatch modules from
lib/livepatches to tools/testing/selftest/livepatch/test_modules, and compiles
them as out-of-tree modules before testing.
There is also a new test being added. This new test exercises multiple processes
calling a syscall, while a livepatch patched the syscall.
Why this move is an improvement:
* The modules are now compiled as out-of-tree modules against the current
running kernel, making them capable of being tested on different systems with
newer or older kernels.
* Such approach now needs kernel-devel package to be installed, since they are
out-of-tree modules. These can be generated by running "make rpm-pkg" in the
kernel source.
What needs to be solved:
* Currently gen_tar only packages the resulting binaries of the tests, and not
the sources. For the current approach, the newly added modules would be
compiled and then packaged. It works when testing on a system with the same
kernel version. But it will fail when running on a machine with different kernel
version, since module was compiled against the kernel currently running.
This is not a new problem, just aligning the expectations. For the current
approach to be truly system agnostic gen_tar would need to include the module
and program sources to be compiled in the target systems.
I'm sending the patches now so it can be discussed before Plumbers.
Thanks in advance!
Marcos
To: Shuah Khan <shuah(a)kernel.org>
To: Jonathan Corbet <corbet(a)lwn.net>
To: Heiko Carstens <hca(a)linux.ibm.com>
To: Vasily Gorbik <gor(a)linux.ibm.com>
To: Alexander Gordeev <agordeev(a)linux.ibm.com>
To: Christian Borntraeger <borntraeger(a)linux.ibm.com>
To: Sven Schnelle <svens(a)linux.ibm.com>
To: Josh Poimboeuf <jpoimboe(a)kernel.org>
To: Jiri Kosina <jikos(a)kernel.org>
To: Miroslav Benes <mbenes(a)suse.cz>
To: Petr Mladek <pmladek(a)suse.com>
To: Joe Lawrence <joe.lawrence(a)redhat.com>
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-doc(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-s390(a)vger.kernel.org
Cc: live-patching(a)vger.kernel.org
Signed-off-by: Marcos Paulo de Souza <mpdesouza(a)suse.com>
Changes in v3:
* Rebased on top of v6.6-rc5
* The commits messages were improved (Thanks Petr!)
* Created TEST_GEN_MODS_DIR variable to point to a directly that contains kernel
modules, and adapt selftests to build it before running the test.
* Moved test_klp-call_getpid out of test_programs, since the gen_tar
would just copy the generated test programs to the livepatches dir,
and so scripts relying on test_programs/test_klp-call_getpid will fail.
* Added a module_param for klp_pids, describing it's usage.
* Simplified the call_getpid program to ignore the return of getpid syscall,
since we only want to make sure the process transitions correctly to the
patched stated
* The test-syscall.sh not prints a log message showing the number of remaining
processes to transition into to livepatched state, and check_output expects it
to be 0.
* Added MODULE_AUTHOR and MODULE_DESCRIPTION to test_klp_syscall.c
The v2 can be seen here:
https://lore.kernel.org/linux-kselftest/20220630141226.2802-1-mpdesouza@sus…
---
Marcos Paulo de Souza (3):
kselftests: lib.mk: Add TEST_GEN_MODS_DIR variable
livepatch: Move tests from lib/livepatch to selftests/livepatch
selftests: livepatch: Test livepatching a heavily called syscall
Documentation/dev-tools/kselftest.rst | 4 +
arch/s390/configs/debug_defconfig | 1 -
arch/s390/configs/defconfig | 1 -
lib/Kconfig.debug | 22 ----
lib/Makefile | 2 -
lib/livepatch/Makefile | 14 ---
tools/testing/selftests/lib.mk | 20 +++-
tools/testing/selftests/livepatch/Makefile | 5 +-
tools/testing/selftests/livepatch/README | 17 +--
tools/testing/selftests/livepatch/config | 1 -
tools/testing/selftests/livepatch/functions.sh | 34 +++---
.../testing/selftests/livepatch/test-callbacks.sh | 50 ++++-----
tools/testing/selftests/livepatch/test-ftrace.sh | 6 +-
.../testing/selftests/livepatch/test-livepatch.sh | 10 +-
.../selftests/livepatch/test-shadow-vars.sh | 2 +-
tools/testing/selftests/livepatch/test-state.sh | 18 ++--
tools/testing/selftests/livepatch/test-syscall.sh | 53 ++++++++++
tools/testing/selftests/livepatch/test-sysfs.sh | 6 +-
.../selftests/livepatch/test_klp-call_getpid.c | 44 ++++++++
.../selftests/livepatch/test_modules/Makefile | 20 ++++
.../test_modules}/test_klp_atomic_replace.c | 0
.../test_modules}/test_klp_callbacks_busy.c | 0
.../test_modules}/test_klp_callbacks_demo.c | 0
.../test_modules}/test_klp_callbacks_demo2.c | 0
.../test_modules}/test_klp_callbacks_mod.c | 0
.../livepatch/test_modules}/test_klp_livepatch.c | 0
.../livepatch/test_modules}/test_klp_shadow_vars.c | 0
.../livepatch/test_modules}/test_klp_state.c | 0
.../livepatch/test_modules}/test_klp_state2.c | 0
.../livepatch/test_modules}/test_klp_state3.c | 0
.../livepatch/test_modules/test_klp_syscall.c | 116 +++++++++++++++++++++
31 files changed, 325 insertions(+), 121 deletions(-)
---
base-commit: 6489bf2e1df1c84e9bcd4694029ff35b39fd3397
change-id: 20231031-send-lp-kselftests-4c917dcd4565
Best regards,
--
Marcos Paulo de Souza <mpdesouza(a)suse.com>
From: Paul Durrant <pdurrant(a)amazon.com>
This series has some small fixes from what was in version 10 [1]:
* KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA
This required a small fix to kvm_gpc_check() for an error that was
introduced in version 8.
* KVM: xen: separate initialization of shared_info cache and content
This accidentally regressed a fix in commit 5d6d6a7d7e66a ("KVM: x86:
Refine calculation of guest wall clock to use a single TSC read").
* KVM: xen: re-initialize shared_info if guest (32/64-bit) mode is set
This mistakenly removed the initialization of shared_info from the code
setting the KVM_XEN_ATTR_TYPE_SHARED_INFO attribute, which broke the self-
tests.
* KVM: xen: split up kvm_xen_set_evtchn_fast()
This had a /32 and a /64 swapped in set_vcpu_info_evtchn_pending().
[1] https://lore.kernel.org/kvm/20231204144334.910-1-paul@xen.org/
Paul Durrant (19):
KVM: pfncache: Add a map helper function
KVM: pfncache: remove unnecessary exports
KVM: xen: mark guest pages dirty with the pfncache lock held
KVM: pfncache: add a mark-dirty helper
KVM: pfncache: remove KVM_GUEST_USES_PFN usage
KVM: pfncache: stop open-coding offset_in_page()
KVM: pfncache: include page offset in uhva and use it consistently
KVM: pfncache: allow a cache to be activated with a fixed (userspace)
HVA
KVM: xen: separate initialization of shared_info cache and content
KVM: xen: re-initialize shared_info if guest (32/64-bit) mode is set
KVM: xen: allow shared_info to be mapped by fixed HVA
KVM: xen: allow vcpu_info to be mapped by fixed HVA
KVM: selftests / xen: map shared_info using HVA rather than GFN
KVM: selftests / xen: re-map vcpu_info using HVA rather than GPA
KVM: xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA capability
KVM: xen: split up kvm_xen_set_evtchn_fast()
KVM: xen: don't block on pfncache locks in kvm_xen_set_evtchn_fast()
KVM: pfncache: check the need for invalidation under read lock first
KVM: xen: allow vcpu_info content to be 'safely' copied
Documentation/virt/kvm/api.rst | 53 ++-
arch/x86/kvm/x86.c | 7 +-
arch/x86/kvm/xen.c | 360 +++++++++++-------
include/linux/kvm_host.h | 40 +-
include/linux/kvm_types.h | 8 -
include/uapi/linux/kvm.h | 9 +-
.../selftests/kvm/x86_64/xen_shinfo_test.c | 59 ++-
virt/kvm/pfncache.c | 188 ++++-----
8 files changed, 466 insertions(+), 258 deletions(-)
base-commit: f2a3fb7234e52f72ff4a38364dbf639cf4c7d6c6
--
2.39.2
For now, the reg bounds is not handled for BPF_JNE case, which can cause
the failure of following case:
/* The type of "a" is u32 */
if (a > 0 && a < 100) {
/* the range of the register for a is [0, 99], not [1, 99],
* and will cause the following error:
*
* invalid zero-sized read
*
* as a can be 0.
*/
bpf_skb_store_bytes(skb, xx, xx, a, 0);
}
In the code above, "a > 0" will be compiled to "if a == 0 goto xxx". In
the TRUE branch, the dst_reg will be marked as known to 0. However, in the
fallthrough(FALSE) branch, the dst_reg will not be handled, which makes
the [min, max] for a is [0, 99], not [1, 99].
In the 1st patch, we reduce the range of the dst reg if the src reg is a
const and is exactly the edge of the dst reg For BPF_JNE.
In the 2nd patch, we remove reduplicated s32 casting in "crafted_cases".
In the 3rd patch, we just activate the test case for this logic in
range_cond(), which is committed by Andrii in the
commit 8863238993e2 ("selftests/bpf: BPF register range bounds tester").
In the 4th patch, we convert the case above to a testcase and add it to
verifier_bounds.c.
Changes since v4:
- add the 2nd patch
- add "{U32, U32, {0, U32_MAX}, {U32_MAX, U32_MAX}}" that we missed in the
3rd patch
- add some comments to the function that we add in the 4th patch
- add reg_not_equal_const() in the 4th patch
Changes since v3:
- do some adjustment to the crafted cases that we added in the 2nd patch
- add the 3rd patch
Changes since v2:
- fix a typo in the subject of the 1st patch
- add some comments to the 1st patch, as Eduard advised
- add some cases to the "crafted_cases"
Changes since v1:
- simplify the code in the 1st patch
- introduce the 2nd patch for the testing
Menglong Dong (4):
bpf: make the verifier tracks the "not equal" for regs
selftests/bpf: remove reduplicated s32 casting in "crafted_cases"
selftests/bpf: activate the OP_NE logic in range_cond()
selftests/bpf: add testcase to verifier_bounds.c for BPF_JNE
kernel/bpf/verifier.c | 38 +++++++++++-
.../selftests/bpf/prog_tests/reg_bounds.c | 27 +++++---
.../selftests/bpf/progs/verifier_bounds.c | 62 +++++++++++++++++++
3 files changed, 116 insertions(+), 11 deletions(-)
--
2.39.2
Swap the arguments to typecheck_fn() in kunit_activate_static_stub()
so that real_fn_addr can be either the function itself or a pointer
to that function.
This is useful to simplify redirecting static functions in a module.
Having to pass the actual function meant that it must be exported
from the module. Either making the 'static' and EXPORT_SYMBOL*()
conditional (which makes the code messy), or change it to always
exported (which increases the export namespace and prevents the
compiler inlining a trivial stub function in non-test builds).
With the original definition of kunit_activate_static_stub() the
address of real_fn_addr was passed to typecheck_fn() as the type to
be passed. This meant that if real_fn_addr was a pointer-to-function
it would resolve to a ** instead of a *, giving an error like this:
error: initialization of ‘int (**)(int)’ from incompatible pointer
type ‘int (*)(int)’ [-Werror=incompatible-pointer-types]
kunit_activate_static_stub(test, add_one_fn_ptr, subtract_one);
| ^~~~~~~~~~~~
./include/linux/typecheck.h:21:25: note: in definition of macro
‘typecheck_fn’
21 | ({ typeof(type) __tmp = function; \
Swapping the arguments to typecheck_fn makes it take the type of a
pointer to the replacement function. Either a function or a pointer
to function can be assigned to that. For example:
static int some_function(int x)
{
/* whatever */
}
int (* some_function_ptr)(int) = some_function;
static int replacement(int x)
{
/* whatever */
}
Then:
kunit_activate_static_stub(test, some_function, replacement);
yields:
typecheck_fn(typeof(&replacement), some_function);
and:
kunit_activate_static_stub(test, some_function_ptr, replacement);
yields:
typecheck_fn(typeof(&replacement), some_function_ptr);
The two typecheck_fn() then resolve to:
int (*__tmp)(int) = some_function;
and
int (*__tmp)(int) = some_function_ptr;
Both of these are valid. In the first case the compiler inserts
an implicit '&' to take the address of the supplied function, and
in the second case the RHS is already a pointer to the same type.
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
---
include/kunit/static_stub.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/kunit/static_stub.h b/include/kunit/static_stub.h
index 85315c80b303..bf940322dfc0 100644
--- a/include/kunit/static_stub.h
+++ b/include/kunit/static_stub.h
@@ -93,7 +93,7 @@ void __kunit_activate_static_stub(struct kunit *test,
* The redirection can be disabled again with kunit_deactivate_static_stub().
*/
#define kunit_activate_static_stub(test, real_fn_addr, replacement_addr) do { \
- typecheck_fn(typeof(&real_fn_addr), replacement_addr); \
+ typecheck_fn(typeof(&replacement_addr), real_fn_addr); \
__kunit_activate_static_stub(test, real_fn_addr, replacement_addr); \
} while (0)
--
2.30.2
The seccomp benchmark runs five scenarios, one calibration run with no
seccomp filters enabled then four further runs each adding a filter. The
calibration run times itself for 15s and then each additional run executes
for the same number of times.
Currently the seccomp tests, including the benchmark, run with an extended
120s timeout but this is not sufficient to robustly run the tests on a lot
of platforms. Sample timings from some recent runs:
Platform Run 1 Run 2 Run 3 Run 4
--------- ----- ----- ----- -----
PowerEdge R200 16.6s 16.6s 31.6s 37.4s
BBB (arm) 20.4s 20.4s 54.5s
Synquacer (arm64) 20.7s 23.7s 40.3s
The x86 runs from the PowerEdge are quite marginal and routinely fail, for
the successful run reported here the timed portions of the run are at
117.2s leaving less than 3s of margin which is frequently breached. The
added overhead of adding filters on the other platforms is such that there
is no prospect of their runs fitting into the 120s timeout, especially
on 32 bit arm where there is no BPF JIT.
While we could lower the time we calibrate for I'm also already seeing the
currently completing runs reporting issues with the per filter overheads
not matching expectations:
Let's instead raise the timeout to 180s which is only a 50% increase on the
current timeout which is itself not *too* large given that there's only two
tests in this suite.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/seccomp/settings | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/seccomp/settings b/tools/testing/selftests/seccomp/settings
index 6091b45d226b..a953c96aa16e 100644
--- a/tools/testing/selftests/seccomp/settings
+++ b/tools/testing/selftests/seccomp/settings
@@ -1 +1 @@
-timeout=120
+timeout=180
---
base-commit: 2cc14f52aeb78ce3f29677c2de1f06c0e91471ab
change-id: 20231219-b4-kselftest-seccomp-benchmark-timeout-05b66e7d29d1
Best regards,
--
Mark Brown <broonie(a)kernel.org>