[ This series depends on the VFIO device cdev series ]
Changelog
v7:
* Rebased on top of v6.4-rc1 and cdev v11 candidate
* Fixed a wrong file in replace() API patch
* Added Kevin's "Reviewed-by" to replace() API patch
v6:
https://lore.kernel.org/all/cover.1679939952.git.nicolinc@nvidia.com/
* Rebased on top of cdev v8 series
https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
* Added "Reviewed-by" from Kevin to PATCH-4
* Squashed access->ioas updating lines into iommufd_access_change_pt(),
and changed function return type accordingly for simplification.
v5:
https://lore.kernel.org/all/cover.1679559476.git.nicolinc@nvidia.com/
* Kept the cmd->id in the iommufd_test_create_access() so the access can
be created with an ioas by default. Then, renamed the previous ioctl
IOMMU_TEST_OP_ACCESS_SET_IOAS to IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, so
it would be used to replace an access->ioas pointer.
* Added iommufd_access_replace() API after the introductions of the other
two APIs iommufd_access_attach() and iommufd_access_detach().
* Since vdev->iommufd_attached is also set in emulated pathway too, call
iommufd_access_update(), similar to the physical pathway.
v4:
https://lore.kernel.org/all/cover.1678284812.git.nicolinc@nvidia.com/
* Rebased on top of Jason's series adding replace() and hwpt_alloc()
https://lore.kernel.org/all/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia…
* Rebased on top of cdev series v6
https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/
* Dropped the patch that's moved to cdev series.
* Added unmap function pointer sanity before calling it.
* Added "Reviewed-by" from Kevin and Yi.
* Added back the VFIO change updating the ATTACH uAPI.
v3:
https://lore.kernel.org/all/cover.1677288789.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt branch:
https://lore.kernel.org/all/0-v2-406f7ac07936+6a-iommufd_hwpt_jgg@nvidia.co…
* Dropped patches from this series accordingly. There were a couple of
VFIO patches that will be submitted after the VFIO cdev series. Also,
renamed the series to be "emulated".
* Moved dma_unmap sanity patch to the first in the series.
* Moved dma_unmap sanity to cover both VFIO and IOMMUFD pathways.
* Added Kevin's "Reviewed-by" to two of the patches.
* Fixed a NULL pointer bug in vfio_iommufd_emulated_bind().
* Moved unmap() call to the common place in iommufd_access_set_ioas().
v2:
https://lore.kernel.org/all/cover.1675802050.git.nicolinc@nvidia.com/
* Rebased on top of vfio_device cdev v2 series.
* Update the kdoc and commit message of iommu_group_replace_domain().
* Dropped revert-to-core-domain part in iommu_group_replace_domain().
* Dropped !ops->dma_unmap check in vfio_iommufd_emulated_attach_ioas().
* Added missing rc value in vfio_iommufd_emulated_attach_ioas() from the
iommufd_access_set_ioas() call.
* Added a new patch in vfio_main to deny vfio_pin/unpin_pages() calls if
vdev->ops->dma_unmap is not implemented.
* Added a __iommmufd_device_detach helper and let the replace routine do
a partial detach().
* Added restriction on auto_domains to use the replace feature.
* Added the patch "iommufd/device: Make hwpt_list list_add/del symmetric"
from the has_group removal series.
v1:
https://lore.kernel.org/all/cover.1675320212.git.nicolinc@nvidia.com/
Hi all,
The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
to an S1 domain, or when switching between relevant S1 domains.
This series is rebased on top of Jason Gunthorpe's series that introduces
iommu_group_replace_domain API and IOMMUFD infrastructure for the IOMMUFD
"physical" devices. The IOMMUFD "emulated" deivces will need some extra
steps to replace the access->ioas object and its iopt pointer.
You can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v7
Thank you
Nicolin Chen
Nicolin Chen (4):
vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
iommufd: Add iommufd_access_replace() API
iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
vfio: Support IO page table replacement
drivers/iommu/iommufd/device.c | 53 ++++++++++++++-----
drivers/iommu/iommufd/iommufd_test.h | 4 ++
drivers/iommu/iommufd/selftest.c | 19 +++++++
drivers/vfio/iommufd.c | 11 ++--
drivers/vfio/vfio_main.c | 4 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/vfio.h | 6 +++
tools/testing/selftests/iommu/iommufd.c | 29 +++++++++-
tools/testing/selftests/iommu/iommufd_utils.h | 19 +++++++
9 files changed, 127 insertions(+), 19 deletions(-)
--
2.40.1
In the unlikely case that CLOCK_REALTIME is not defined, variable ret is
not initialized and further accumulation of return values to ret can leave
ret in an undefined state. Fix this by initialized ret to zero and changing
the assignment of ret to an accumulation for the CLOCK_REALTIME case.
Fixes: 03f55c7952c9 ("kselftest: Extend vDSO selftest to clock_getres")
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/vDSO/vdso_test_clock_getres.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vDSO/vdso_test_clock_getres.c b/tools/testing/selftests/vDSO/vdso_test_clock_getres.c
index 15dcee16ff72..38d46a8bf7cb 100644
--- a/tools/testing/selftests/vDSO/vdso_test_clock_getres.c
+++ b/tools/testing/selftests/vDSO/vdso_test_clock_getres.c
@@ -84,12 +84,12 @@ static inline int vdso_test_clock(unsigned int clock_id)
int main(int argc, char **argv)
{
- int ret;
+ int ret = 0;
#if _POSIX_TIMERS > 0
#ifdef CLOCK_REALTIME
- ret = vdso_test_clock(CLOCK_REALTIME);
+ ret += vdso_test_clock(CLOCK_REALTIME);
#endif
#ifdef CLOCK_BOOTTIME
--
2.30.2
When we added fd based file streams we created references to STx_FILENO in
stdio.h but these constants are declared in unistd.h which is the last file
included by the top level nolibc.h meaning those constants are not defined
when we try to build stdio.h. This causes programs using nolibc.h to fail
to build.
Reorder the headers to avoid this issue.
Fixes: d449546c957f ("tools/nolibc: implement fd-based FILE streams")
Acked-by: Willy Tarreau <w(a)1wt.eu>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.4-rc1.
- This is now a fix for Linus' tree.
- Link to v1: https://lore.kernel.org/r/20230413-nolibc-stdio-fix-v1-1-fa05fc3ba1fe@kerne…
---
tools/include/nolibc/nolibc.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 04739a6293c4..05a228a6ee78 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -99,11 +99,11 @@
#include "sys.h"
#include "ctype.h"
#include "signal.h"
+#include "unistd.h"
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
#include "time.h"
-#include "unistd.h"
#include "stackprotector.h"
/* Used by programs to avoid std includes */
---
base-commit: ac9a78681b921877518763ba0e89202254349d1b
change-id: 20230413-nolibc-stdio-fix-fb42de39d099
Best regards,
--
Mark Brown,,, <broonie(a)kernel.org>
The ftrace selftests do not currently produce KTAP output, they produce a
custom format much nicer for human consumption. This means that when run in
automated test systems we just get a single result for the suite as a whole
rather than recording results for individual test cases, making it harder
to look at the test data and masking things like inappropriate skips.
Address this by adding support for KTAP output to the ftracetest script and
providing a trivial wrapper which will be invoked by the kselftest runner
to generate output in this format by default, users using ftracetest
directly will continue to get the existing output.
This is not the most elegant solution but it is simple and effective. I
did consider implementing this by post processing the existing output
format but that felt more complex and likely to result in all output being
lost if something goes seriously wrong during the run which would not be
helpful. I did also consider just writing a separate runner script but
there's enough going on with things like the signal handling for that to
seem like it would be duplicating too much.
Acked-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Tested-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
It might make sense to merge this via the ftrace tree - kselftests often
get merged with the code they test.
Changes in v2:
- Rebase onto v6.4-rc1.
- Link to v1: https://lore.kernel.org/r/20230302-ftrace-kselftest-ktap-v1-1-a84a0765b7ad@…
---
tools/testing/selftests/ftrace/Makefile | 3 +-
tools/testing/selftests/ftrace/ftracetest | 63 ++++++++++++++++++++++++--
tools/testing/selftests/ftrace/ftracetest-ktap | 8 ++++
3 files changed, 70 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/ftrace/Makefile b/tools/testing/selftests/ftrace/Makefile
index d6e106fbce11..a1e955d2de4c 100644
--- a/tools/testing/selftests/ftrace/Makefile
+++ b/tools/testing/selftests/ftrace/Makefile
@@ -1,7 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
all:
-TEST_PROGS := ftracetest
+TEST_PROGS_EXTENDED := ftracetest
+TEST_PROGS := ftracetest-ktap
TEST_FILES := test.d settings
EXTRA_CLEAN := $(OUTPUT)/logs/*
diff --git a/tools/testing/selftests/ftrace/ftracetest b/tools/testing/selftests/ftrace/ftracetest
index c3311c8c4089..2506621e75df 100755
--- a/tools/testing/selftests/ftrace/ftracetest
+++ b/tools/testing/selftests/ftrace/ftracetest
@@ -13,6 +13,7 @@ echo "Usage: ftracetest [options] [testcase(s)] [testcase-directory(s)]"
echo " Options:"
echo " -h|--help Show help message"
echo " -k|--keep Keep passed test logs"
+echo " -K|--ktap Output in KTAP format"
echo " -v|--verbose Increase verbosity of test messages"
echo " -vv Alias of -v -v (Show all results in stdout)"
echo " -vvv Alias of -v -v -v (Show all commands immediately)"
@@ -85,6 +86,10 @@ parse_opts() { # opts
KEEP_LOG=1
shift 1
;;
+ --ktap|-K)
+ KTAP=1
+ shift 1
+ ;;
--verbose|-v|-vv|-vvv)
if [ $VERBOSE -eq -1 ]; then
usage "--console can not use with --verbose"
@@ -178,6 +183,7 @@ TEST_DIR=$TOP_DIR/test.d
TEST_CASES=`find_testcases $TEST_DIR`
LOG_DIR=$TOP_DIR/logs/`date +%Y%m%d-%H%M%S`/
KEEP_LOG=0
+KTAP=0
DEBUG=0
VERBOSE=0
UNSUPPORTED_RESULT=0
@@ -229,7 +235,7 @@ prlog() { # messages
newline=
shift
fi
- printf "$*$newline"
+ [ "$KTAP" != "1" ] && printf "$*$newline"
[ "$LOG_FILE" ] && printf "$*$newline" | strip_esc >> $LOG_FILE
}
catlog() { #file
@@ -260,11 +266,11 @@ TOTAL_RESULT=0
INSTANCE=
CASENO=0
+CASENAME=
testcase() { # testfile
CASENO=$((CASENO+1))
- desc=`grep "^#[ \t]*description:" $1 | cut -f2- -d:`
- prlog -n "[$CASENO]$INSTANCE$desc"
+ CASENAME=`grep "^#[ \t]*description:" $1 | cut -f2- -d:`
}
checkreq() { # testfile
@@ -277,40 +283,68 @@ test_on_instance() { # testfile
grep -q "^#[ \t]*flags:.*instance" $1
}
+ktaptest() { # result comment
+ if [ "$KTAP" != "1" ]; then
+ return
+ fi
+
+ local result=
+ if [ "$1" = "1" ]; then
+ result="ok"
+ else
+ result="not ok"
+ fi
+ shift
+
+ local comment=$*
+ if [ "$comment" != "" ]; then
+ comment="# $comment"
+ fi
+
+ echo $CASENO $result $INSTANCE$CASENAME $comment
+}
+
eval_result() { # sigval
case $1 in
$PASS)
prlog " [${color_green}PASS${color_reset}]"
+ ktaptest 1
PASSED_CASES="$PASSED_CASES $CASENO"
return 0
;;
$FAIL)
prlog " [${color_red}FAIL${color_reset}]"
+ ktaptest 0
FAILED_CASES="$FAILED_CASES $CASENO"
return 1 # this is a bug.
;;
$UNRESOLVED)
prlog " [${color_blue}UNRESOLVED${color_reset}]"
+ ktaptest 0 UNRESOLVED
UNRESOLVED_CASES="$UNRESOLVED_CASES $CASENO"
return $UNRESOLVED_RESULT # depends on use case
;;
$UNTESTED)
prlog " [${color_blue}UNTESTED${color_reset}]"
+ ktaptest 1 SKIP
UNTESTED_CASES="$UNTESTED_CASES $CASENO"
return 0
;;
$UNSUPPORTED)
prlog " [${color_blue}UNSUPPORTED${color_reset}]"
+ ktaptest 1 SKIP
UNSUPPORTED_CASES="$UNSUPPORTED_CASES $CASENO"
return $UNSUPPORTED_RESULT # depends on use case
;;
$XFAIL)
prlog " [${color_green}XFAIL${color_reset}]"
+ ktaptest 1 XFAIL
XFAILED_CASES="$XFAILED_CASES $CASENO"
return 0
;;
*)
prlog " [${color_blue}UNDEFINED${color_reset}]"
+ ktaptest 0 error
UNDEFINED_CASES="$UNDEFINED_CASES $CASENO"
return 1 # this must be a test bug
;;
@@ -371,6 +405,7 @@ __run_test() { # testfile
run_test() { # testfile
local testname=`basename $1`
testcase $1
+ prlog -n "[$CASENO]$INSTANCE$CASENAME"
if [ ! -z "$LOG_FILE" ] ; then
local testlog=`mktemp $LOG_DIR/${CASENO}-${testname}-log.XXXXXX`
else
@@ -405,6 +440,17 @@ run_test() { # testfile
# load in the helper functions
. $TEST_DIR/functions
+if [ "$KTAP" = "1" ]; then
+ echo "TAP version 13"
+
+ casecount=`echo $TEST_CASES | wc -w`
+ for t in $TEST_CASES; do
+ test_on_instance $t || continue
+ casecount=$((casecount+1))
+ done
+ echo "1..${casecount}"
+fi
+
# Main loop
for t in $TEST_CASES; do
run_test $t
@@ -439,6 +485,17 @@ prlog "# of unsupported: " `echo $UNSUPPORTED_CASES | wc -w`
prlog "# of xfailed: " `echo $XFAILED_CASES | wc -w`
prlog "# of undefined(test bug): " `echo $UNDEFINED_CASES | wc -w`
+if [ "$KTAP" = "1" ]; then
+ echo -n "# Totals:"
+ echo -n " pass:"`echo $PASSED_CASES | wc -w`
+ echo -n " faii:"`echo $FAILED_CASES | wc -w`
+ echo -n " xfail:"`echo $XFAILED_CASES | wc -w`
+ echo -n " xpass:0"
+ echo -n " skip:"`echo $UNTESTED_CASES $UNSUPPORTED_CASES | wc -w`
+ echo -n " error:"`echo $UNRESOLVED_CASES $UNDEFINED_CASES | wc -w`
+ echo
+fi
+
cleanup
# if no error, return 0
diff --git a/tools/testing/selftests/ftrace/ftracetest-ktap b/tools/testing/selftests/ftrace/ftracetest-ktap
new file mode 100755
index 000000000000..b3284679ef3a
--- /dev/null
+++ b/tools/testing/selftests/ftrace/ftracetest-ktap
@@ -0,0 +1,8 @@
+#!/bin/sh -e
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# ftracetest-ktap: Wrapper to integrate ftracetest with the kselftest runner
+#
+# Copyright (C) Arm Ltd., 2023
+
+./ftracetest -K
---
base-commit: ac9a78681b921877518763ba0e89202254349d1b
change-id: 20230302-ftrace-kselftest-ktap-9d7878691557
Best regards,
--
Mark Brown,,, <broonie(a)kernel.org>
The "test_encl.elf" file used by test_sgx is not installed in
INSTALL_PATH. Attempting to execute test_sgx causes false negative:
"
enclave executable open(): No such file or directory
main.c:188:unclobbered_vdso:Failed to load the test enclave.
"
Add "test_encl.elf" to TEST_FILES so that it will be installed.
Fixes: 2adcba79e69d ("selftests/x86: Add a selftest for SGX")
Signed-off-by: Yi Lai <yi1.lai(a)intel.com>
---
tools/testing/selftests/sgx/Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/sgx/Makefile b/tools/testing/selftests/sgx/Makefile
index 75af864e07b6..50aab6b57da3 100644
--- a/tools/testing/selftests/sgx/Makefile
+++ b/tools/testing/selftests/sgx/Makefile
@@ -17,6 +17,7 @@ ENCL_CFLAGS := -Wall -Werror -static -nostdlib -nostartfiles -fPIC \
-fno-stack-protector -mrdrnd $(INCLUDES)
TEST_CUSTOM_PROGS := $(OUTPUT)/test_sgx
+TEST_FILES := $(OUTPUT)/test_encl.elf
ifeq ($(CAN_BUILD_X86_64), 1)
all: $(TEST_CUSTOM_PROGS) $(OUTPUT)/test_encl.elf
--
2.25.1
Enabling a (modular) test must not silently enable additional kernel
functionality, as that may increase the attack vector of a product.
Fix this by:
1. making REGMAP visible if CONFIG_KUNIT_ALL_TESTS is enabled,
2. making REGMAP_KUNIT depend on REGMAP instead of selecting it.
After this, one can safely enable CONFIG_KUNIT_ALL_TESTS=m to build
modules for all appropriate tests for ones system, without pulling in
extra unwanted functionality, while still allowing a tester to manually
enable REGMAP and its test suite on a system where REGMAP is not enabled
by default.
Fixes: 2238959b6ad27040 ("regmap: Add some basic kunit tests")
Signed-off-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
---
v2:
- Make REGMAP visible if CONFIG_KUNIT_ALL_TESTS is enabled.
---
drivers/base/regmap/Kconfig | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/base/regmap/Kconfig b/drivers/base/regmap/Kconfig
index 33a8366e22a584a5..0db2021f7477f2ab 100644
--- a/drivers/base/regmap/Kconfig
+++ b/drivers/base/regmap/Kconfig
@@ -4,16 +4,23 @@
# subsystems should select the appropriate symbols.
config REGMAP
+ bool "Register Map support" if KUNIT_ALL_TESTS
default y if (REGMAP_I2C || REGMAP_SPI || REGMAP_SPMI || REGMAP_W1 || REGMAP_AC97 || REGMAP_MMIO || REGMAP_IRQ || REGMAP_SOUNDWIRE || REGMAP_SOUNDWIRE_MBQ || REGMAP_SCCB || REGMAP_I3C || REGMAP_SPI_AVMM || REGMAP_MDIO || REGMAP_FSI)
select IRQ_DOMAIN if REGMAP_IRQ
select MDIO_BUS if REGMAP_MDIO
- bool
+ help
+ Enable support for the Register Map (regmap) access API.
+
+ Usually, this option is automatically selected when needed.
+ However, you may want to enable it manually for running the regmap
+ KUnit tests.
+
+ If unsure, say N.
config REGMAP_KUNIT
tristate "KUnit tests for regmap"
- depends on KUNIT
+ depends on KUNIT && REGMAP
default KUNIT_ALL_TESTS
- select REGMAP
select REGMAP_RAM
config REGMAP_AC97
--
2.34.1
These patches relax a few verifier requirements around dynptrs.
Patches 1-3 are unchanged from v2, apart from rebasing
Patch 4 is the same as in v1, see
https://lore.kernel.org/bpf/CA+PiJmST4WUH061KaxJ4kRL=fqy3X6+Wgb2E2rrLT5OYjU…
Patch 5 adds a test for the change in Patch 4
Daniel Rosenberg (5):
bpf: Allow NULL buffers in bpf_dynptr_slice(_rw)
selftests/bpf: Test allowing NULL buffer in dynptr slice
selftests/bpf: Check overflow in optional buffer
bpf: verifier: Accept dynptr mem as mem in helpers
selftests/bpf: Accept mem from dynptr in helper funcs
Documentation/bpf/kfuncs.rst | 23 ++++++++++-
include/linux/skbuff.h | 2 +-
kernel/bpf/helpers.c | 30 +++++++++------
kernel/bpf/verifier.c | 21 ++++++++--
.../testing/selftests/bpf/prog_tests/dynptr.c | 2 +
.../testing/selftests/bpf/progs/dynptr_fail.c | 20 ++++++++++
.../selftests/bpf/progs/dynptr_success.c | 38 +++++++++++++++++++
7 files changed, 118 insertions(+), 18 deletions(-)
base-commit: f4dea9689c5fea3d07170c2cb0703e216f1a0922
--
2.40.1.521.gf1e218fcd8-goog
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
Trace sched related functions, such as enqueue_task_fair, it is necessary to
specify a task instead of the current task which within a given cgroup.
Feng Zhou (2):
bpf: Add bpf_task_under_cgroup() kfunc
selftests/bpf: Add testcase for bpf_task_under_cgroup
Changelog:
v6->v7: Addressed comments from Hao Luo
- Get rid of unnecessary check
Details in here:
https://lore.kernel.org/all/20230505060818.60037-1-zhoufeng.zf@bytedance.co…
v5->v6: Addressed comments from Yonghong Song
- Some code format modifications.
- Add ack-by
Details in here:
https://lore.kernel.org/all/20230504031513.13749-1-zhoufeng.zf@bytedance.co…
v4->v5: Addressed comments from Yonghong Song
- Some code format modifications.
Details in here:
https://lore.kernel.org/all/20230428071737.43849-1-zhoufeng.zf@bytedance.co…
v3->v4: Addressed comments from Yonghong Song
- Modify test cases and test other tasks, not the current task.
Details in here:
https://lore.kernel.org/all/20230427023019.73576-1-zhoufeng.zf@bytedance.co…
v2->v3: Addressed comments from Alexei Starovoitov
- Modify the comment information of the function.
- Narrow down the testcase's hook point
Details in here:
https://lore.kernel.org/all/20230421090403.15515-1-zhoufeng.zf@bytedance.co…
v1->v2: Addressed comments from Alexei Starovoitov
- Add kfunc instead.
Details in here:
https://lore.kernel.org/all/20230420072657.80324-1-zhoufeng.zf@bytedance.co…
kernel/bpf/helpers.c | 17 ++++++
tools/testing/selftests/bpf/DENYLIST.s390x | 1 +
.../bpf/prog_tests/task_under_cgroup.c | 53 +++++++++++++++++++
.../bpf/progs/test_task_under_cgroup.c | 51 ++++++++++++++++++
4 files changed, 122 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c
create mode 100644 tools/testing/selftests/bpf/progs/test_task_under_cgroup.c
--
2.20.1
This patch series improves guest vCPUs performances on Arm during clearing
dirty log operations by taking MMU read lock instead of MMU write lock.
vCPUs write protection faults are fixed in Arm using MMU read locks.
However, when userspace is clearing dirty logs via KVM_CLEAR_DIRTY_LOG
ioctl, then kernel code takes MMU write lock. This will block vCPUs
write protection faults and degrade guest performance. This
degradation gets worse as guest VM size increases in terms of memory and
vCPU count.
In this series, MMU read lock adoption is made possible by using
KVM_PGTABLE_WALK_SHARED flag in page walker.
Patches 1 to 5:
These patches are modifying dirty_log_perf_test. Intent is to mimic
production scenarios where guest keeps on executing while userspace
threads collect and clear dirty logs independently.
Three new command line options are added:
1. j: Allows to run guest vCPUs and main thread collecting dirty logs
independently of each other after initialization is complete.
2. k: Allows to clear dirty logs in smaller chunks compared to existing
whole memslot clear in one call.
3. l: Allows to add customizable wait time between consecutive clear
dirty log calls to mimic sending dirty memory to destination.
Patch 7-8:
These patches refactor code to move MMU lock operations to arch specific
code, refactor Arm's page table walker APIs, and change MMU write lock
for clearing dirty logs to read lock. Patch 8 has results showing
improvements based on dirty_log_perf_test.
Vipin Sharma (9):
KVM: selftests: Allow dirty_log_perf_test to clear dirty memory in
chunks
KVM: selftests: Add optional delay between consecutive Clear-Dirty-Log
calls
KVM: selftests: Pass count of read and write accesses from guest to
host
KVM: selftests: Print read and write accesses of pages by vCPUs in
dirty_log_perf_test
KVM: selftests: Allow independent execution of vCPUs in
dirty_log_perf_test
KVM: arm64: Correct the kvm_pgtable_stage2_flush() documentation
KVM: mmu: Move mmu lock/unlock to arch code for clear dirty log
KMV: arm64: Allow stage2_apply_range_sched() to pass page table walker
flags
KVM: arm64: Run clear-dirty-log under MMU read lock
arch/arm64/include/asm/kvm_pgtable.h | 17 ++-
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 4 +-
arch/arm64/kvm/hyp/pgtable.c | 16 ++-
arch/arm64/kvm/mmu.c | 36 ++++--
arch/mips/kvm/mmu.c | 2 +
arch/riscv/kvm/mmu.c | 2 +
arch/x86/kvm/mmu/mmu.c | 3 +
.../selftests/kvm/dirty_log_perf_test.c | 108 ++++++++++++++----
.../testing/selftests/kvm/include/memstress.h | 13 ++-
tools/testing/selftests/kvm/lib/memstress.c | 43 +++++--
virt/kvm/dirty_ring.c | 2 -
virt/kvm/kvm_main.c | 4 -
12 files changed, 185 insertions(+), 65 deletions(-)
base-commit: 95b9779c1758f03cf494e8550d6249a40089ed1c
--
2.40.0.634.g4ca3ef3211-goog
There is a spelling mistake in a message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/cachestat/test_cachestat.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/cachestat/test_cachestat.c b/tools/testing/selftests/cachestat/test_cachestat.c
index c3823b809c25..9be2262e5c17 100644
--- a/tools/testing/selftests/cachestat/test_cachestat.c
+++ b/tools/testing/selftests/cachestat/test_cachestat.c
@@ -191,7 +191,7 @@ bool test_cachestat_shmem(void)
}
if (ftruncate(fd, filesize)) {
- ksft_print_msg("Unable to trucate shmem file.\n");
+ ksft_print_msg("Unable to truncate shmem file.\n");
ret = false;
goto close_fd;
}
--
2.30.2
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
Trace sched related functions, such as enqueue_task_fair, it is necessary to
specify a task instead of the current task which within a given cgroup.
Feng Zhou (2):
bpf: Add bpf_task_under_cgroup() kfunc
selftests/bpf: Add testcase for bpf_task_under_cgroup
Changelog:
v5->v6: Addressed comments from Yonghong Song
- Some code format modifications.
- Add ack-by
Details in here:
https://lore.kernel.org/all/20230504031513.13749-1-zhoufeng.zf@bytedance.co…
v4->v5: Addressed comments from Yonghong Song
- Some code format modifications.
Details in here:
https://lore.kernel.org/all/20230428071737.43849-1-zhoufeng.zf@bytedance.co…
v3->v4: Addressed comments from Yonghong Song
- Modify test cases and test other tasks, not the current task.
Details in here:
https://lore.kernel.org/all/20230427023019.73576-1-zhoufeng.zf@bytedance.co…
v2->v3: Addressed comments from Alexei Starovoitov
- Modify the comment information of the function.
- Narrow down the testcase's hook point
Details in here:
https://lore.kernel.org/all/20230421090403.15515-1-zhoufeng.zf@bytedance.co…
v1->v2: Addressed comments from Alexei Starovoitov
- Add kfunc instead.
Details in here:
https://lore.kernel.org/all/20230420072657.80324-1-zhoufeng.zf@bytedance.co…
kernel/bpf/helpers.c | 20 +++++++
tools/testing/selftests/bpf/DENYLIST.s390x | 1 +
.../bpf/prog_tests/task_under_cgroup.c | 53 +++++++++++++++++++
.../bpf/progs/test_task_under_cgroup.c | 51 ++++++++++++++++++
4 files changed, 125 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c
create mode 100644 tools/testing/selftests/bpf/progs/test_task_under_cgroup.c
--
2.20.1
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
Trace sched related functions, such as enqueue_task_fair, it is necessary to
specify a task instead of the current task which within a given cgroup.
Feng Zhou (2):
bpf: Add bpf_task_under_cgroup() kfunc
selftests/bpf: Add testcase for bpf_task_under_cgroup
Changelog:
v4->v5: Addressed comments from Yonghong Song
- Some code format modifications.
Details in here:
https://lore.kernel.org/all/20230428071737.43849-1-zhoufeng.zf@bytedance.co…
v3->v4: Addressed comments from Yonghong Song
- Modify test cases and test other tasks, not the current task.
Details in here:
https://lore.kernel.org/all/20230427023019.73576-1-zhoufeng.zf@bytedance.co…
v2->v3: Addressed comments from Alexei Starovoitov
- Modify the comment information of the function.
- Narrow down the testcase's hook point
Details in here:
https://lore.kernel.org/all/20230421090403.15515-1-zhoufeng.zf@bytedance.co…
v1->v2: Addressed comments from Alexei Starovoitov
- Add kfunc instead.
Details in here:
https://lore.kernel.org/all/20230420072657.80324-1-zhoufeng.zf@bytedance.co…
kernel/bpf/helpers.c | 20 +++++++
tools/testing/selftests/bpf/DENYLIST.s390x | 1 +
.../bpf/prog_tests/task_under_cgroup.c | 54 +++++++++++++++++++
.../bpf/progs/test_task_under_cgroup.c | 51 ++++++++++++++++++
4 files changed, 126 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c
create mode 100644 tools/testing/selftests/bpf/progs/test_task_under_cgroup.c
--
2.20.1
Verify that calling clone3 with an exit signal (SIGCHLD) in flags will
fail.
Signed-off-by: Tobias Klauser <tklauser(a)distanz.ch>
---
tools/testing/selftests/clone3/clone3.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index e495f895a2cd..e60cf4da8fb0 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -129,7 +129,7 @@ int main(int argc, char *argv[])
uid_t uid = getuid();
ksft_print_header();
- ksft_set_plan(18);
+ ksft_set_plan(19);
test_clone3_supported();
/* Just a simple clone3() should return 0.*/
@@ -198,5 +198,8 @@ int main(int argc, char *argv[])
/* Do a clone3() in a new time namespace */
test_clone3(CLONE_NEWTIME, 0, 0, CLONE3_ARGS_NO_TEST);
+ /* Do a clone3() with exit signal (SIGCHLD) in flags */
+ test_clone3(SIGCHLD, 0, -EINVAL, CLONE3_ARGS_NO_TEST);
+
ksft_finished();
}
--
2.40.0
Hi all,
This patch series fixes a crash in the new input selftest, and makes the
test available when the KUnit framework is modular.
Unfortunately test 3 still fails for me (tested on Koelsch (R-Car M2-W)
and ARAnyM):
KTAP version 1
# Subtest: input_core
1..3
input: Test input device as /devices/virtual/input/input1
ok 1 input_test_polling
input: Test input device as /devices/virtual/input/input2
ok 2 input_test_timestamp
input: Test input device as /devices/virtual/input/input3
# input_test_match_device_id: ASSERTION FAILED at # drivers/input/tests/input_test.c:99
Expected input_match_device_id(input_dev, &id) to be true, but is false
not ok 3 input_test_match_device_id
# input_core: pass:2 fail:1 skip:0 total:3
# Totals: pass:2 fail:1 skip:0 total:3
not ok 1 input_core
Thanks!
Geert Uytterhoeven (2):
Input: tests - fix use-after-free and refcount underflow in
input_test_exit()
Input: tests - modular KUnit tests should not depend on KUNIT=y
drivers/input/Kconfig | 2 +-
drivers/input/tests/input_test.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
--
2.34.1
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert(a)linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
When the documentation was updated for modular tests, the dependency on
"KUNIT=y" was forgotten to be updated, now encouraging people to create
tests that cannot be enabled when the KUNIT framework itself is modular.
Fix this by changing the dependency to "KUNIT".
Document when it is appropriate (and required) to depend on "KUNIT=y".
Fixes: c9ef2d3e3f3b3e56 ("KUnit: Docs: make start.rst example Kconfig follow style.rst")
Signed-off-by: Geert Uytterhoeven <geert+renesas(a)glider.be>
---
Documentation/dev-tools/kunit/start.rst | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/Documentation/dev-tools/kunit/start.rst b/Documentation/dev-tools/kunit/start.rst
index c736613c9b199bff..9619a044093042ce 100644
--- a/Documentation/dev-tools/kunit/start.rst
+++ b/Documentation/dev-tools/kunit/start.rst
@@ -256,9 +256,12 @@ Now we are ready to write the test cases.
config MISC_EXAMPLE_TEST
tristate "Test for my example" if !KUNIT_ALL_TESTS
- depends on MISC_EXAMPLE && KUNIT=y
+ depends on MISC_EXAMPLE && KUNIT
default KUNIT_ALL_TESTS
+Note: If your test does not support being built as a loadable module (which is
+discouraged), replace tristate by bool, and depend on KUNIT=y instead of KUNIT.
+
3. Add the following lines to ``drivers/misc/Makefile``:
.. code-block:: make
--
2.34.1
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
Trace sched related functions, such as enqueue_task_fair, it is necessary to
specify a task instead of the current task which within a given cgroup.
Feng Zhou (2):
bpf: Add bpf_task_under_cgroup() kfunc
selftests/bpf: Add testcase for bpf_task_under_cgroup
Changelog:
v3->v4: Addressed comments from Yonghong Song
- Modify test cases and test other tasks, not the current task.
Details in here:
https://lore.kernel.org/all/20230427023019.73576-1-zhoufeng.zf@bytedance.co…
v2->v3: Addressed comments from Alexei Starovoitov
- Modify the comment information of the function.
- Narrow down the testcase's hook point
Details in here:
https://lore.kernel.org/all/20230421090403.15515-1-zhoufeng.zf@bytedance.co…
v1->v2: Addressed comments from Alexei Starovoitov
- Add kfunc instead.
Details in here:
https://lore.kernel.org/all/20230420072657.80324-1-zhoufeng.zf@bytedance.co…
kernel/bpf/helpers.c | 20 +++++++
tools/testing/selftests/bpf/DENYLIST.s390x | 1 +
.../bpf/prog_tests/task_under_cgroup.c | 55 +++++++++++++++++++
.../bpf/progs/test_task_under_cgroup.c | 51 +++++++++++++++++
4 files changed, 127 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c
create mode 100644 tools/testing/selftests/bpf/progs/test_task_under_cgroup.c
--
2.20.1
These patches relax a few verifier requirements around dynptrs.
I was unable to test the patch in 0003 due to unrelated issues compiling the
bpf selftests, but did run an equivalent local test program.
This is the issue I was running into:
progs/cgrp_ls_attach_cgroup.c:17:15: error: use of undeclared identifier 'BPF_MAP_TYPE_CGRP_STORAGE'; did you mean 'BPF_MAP_TYPE_CGROUP_STORAGE'?
__uint(type, BPF_MAP_TYPE_CGRP_STORAGE);
^~~~~~~~~~~~~~~~~~~~~~~~~
BPF_MAP_TYPE_CGROUP_STORAGE
/ssd/kernel/fuse-bpf/tools/testing/selftests/bpf/tools/include/bpf/bpf_helpers.h:13:39: note: expanded from macro '__uint'
#define __uint(name, val) int (*name)[val]
^
/ssd/kernel/fuse-bpf/tools/testing/selftests/bpf/tools/include/vmlinux.h:27892:2: note: 'BPF_MAP_TYPE_CGROUP_STORAGE' declared here
BPF_MAP_TYPE_CGROUP_STORAGE = 19,
^
1 error generated.
Daniel Rosenberg (3):
bpf: verifier: Accept dynptr mem as mem in helpers
bpf: Allow NULL buffers in bpf_dynptr_slice(_rw)
selftests/bpf: Test allowing NULL buffer in dynptr slice
Documentation/bpf/kfuncs.rst | 23 ++++++++++++-
kernel/bpf/helpers.c | 32 ++++++++++++-------
kernel/bpf/verifier.c | 21 ++++++++++++
.../testing/selftests/bpf/prog_tests/dynptr.c | 1 +
.../selftests/bpf/progs/dynptr_success.c | 21 ++++++++++++
5 files changed, 85 insertions(+), 13 deletions(-)
base-commit: 5af607a861d43ffff830fc1890033e579ec44799
--
2.40.0.577.gac1e443424-goog
When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't
respected. This patchset fixes this by regarding the incoming device's
VRF attachment when performing the socket lookups from tc/xdp.
The first two patches are coding changes which factor out the tc helper's
logic which was shared with cg/sk_skb (which operate correctly).
This refactoring is needed in order to avoid affecting the cgroup/sk_skb
flows as there does not seem to be a strict criteria for discerning which
flow the helper is called from based on the net device or packet
information.
The third patch contains the actual bugfix.
The fourth patch adds bpf tests for these lookup functions.
---
v4: - Move dev_sdif() to include/linux/netdevice.h as suggested by Stanislav Fomichev
- Remove SYS and SYS_NOFAIL duplicate definitions
v3: - Rename bpf_l2_sdif() to dev_sdif() as suggested by Stanislav Fomichev
- Added xdp tests as suggested by Daniel Borkmann
- Use start_server() to avoid duplicate code as suggested by Stanislav Fomichev
v2: Fixed uninitialized var in test patch (4).
Gilad Sever (4):
bpf: factor out socket lookup functions for the TC hookpoint.
bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC
hookpoint
bpf: fix bpf socket lookup from tc/xdp to respect socket VRF bindings
selftests/bpf: Add vrf_socket_lookup tests
include/linux/netdevice.h | 9 +
net/core/filter.c | 123 +++++--
.../bpf/prog_tests/vrf_socket_lookup.c | 312 ++++++++++++++++++
.../selftests/bpf/progs/vrf_socket_lookup.c | 88 +++++
4 files changed, 511 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/vrf_socket_lookup.c
create mode 100644 tools/testing/selftests/bpf/progs/vrf_socket_lookup.c
--
2.34.1
> This adds the general_profit KSM sysfs knob and the process profit metric
> knobs to ksm_stat.
>
> 1) expose general_profit metric
>
> The documentation mentions a general profit metric, however this
> metric is not calculated. In addition the formula depends on the size
> of internal structures, which makes it more difficult for an
> administrator to make the calculation. Adding the metric for a better
> user experience.
>
> 2) document general_profit sysfs knob
>
> 3) calculate ksm process profit metric
>
> The ksm documentation mentions the process profit metric and how to
> calculate it. This adds the calculation of the metric.
>
> 4) mm: expose ksm process profit metric in ksm_stat
>
> This exposes the ksm process profit metric in /proc/<pid>/ksm_stat.
> The documentation mentions the formula for the ksm process profit
> metric, however it does not calculate it. In addition the formula
> depends on the size of internal structures. So it makes sense to
> expose it.
>
Hi, Stefan, I think you should give some credits to me about my contributions on
the concept and formula of ksm profit (process wide and system wide), it's kind
of idea stealing.
Besides, the idea of Process control KSM was proposed by me last year although you use
prctl instead of /proc fs. you even didn't CC my email. I think you should CC my email
(xu.xin16(a)zte.com.cn) as least.
> 5) document new procfs ksm knobs
>
> Signed-off-by: Stefan Roesch <shr(a)devkernel.io>
> Reviewed-by: Bagas Sanjaya <bagasdotme(a)gmail.com>
> Acked-by: David Hildenbrand <david(a)redhat.com>
> Cc: David Hildenbrand <david(a)redhat.com>
> Cc: Johannes Weiner <hannes(a)cmpxchg.org>
> Cc: Michal Hocko <mhocko(a)suse.com>
> Cc: Rik van Riel <riel(a)surriel.com>
> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
> ---
> Documentation/ABI/testing/sysfs-kernel-mm-ksm | 8 +++++++
> Documentation/admin-guide/mm/ksm.rst | 5 ++++-
> fs/proc/base.c | 3 +++
> include/linux/ksm.h | 4 ++++
> mm/ksm.c | 21 +++++++++++++++++++
> 5 files changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-ksm b/Documentation/ABI/testing/sysfs-kernel-mm-ksm
> index d244674a9480..6041a025b65a 100644
> --- a/Documentation/ABI/testing/sysfs-kernel-mm-ksm
> +++ b/Documentation/ABI/testing/sysfs-kernel-mm-ksm
> @@ -51,3 +51,11 @@ Description: Control merging pages across different NUMA nodes.
>
> When it is set to 0 only pages from the same node are merged,
> otherwise pages from all nodes can be merged together (default).
> +
> +What: /sys/kernel/mm/ksm/general_profit
> +Date: April 2023
> +KernelVersion: 6.4
> +Contact: Linux memory management mailing list <linux-mm(a)kvack.org>
> +Description: Measure how effective KSM is.
> + general_profit: how effective is KSM. The formula for the
> + calculation is in Documentation/admin-guide/mm/ksm.rst.
> diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst
On some distributions, the rp_filter is automatically set (=1) by
default on a netdev basis (also on VRFs).
In an SRv6 End.DT46 behavior, decapsulated IPv4 packets are routed using
the table associated with the VRF bound to that tunnel. During lookup
operations, the rp_filter can lead to packet loss when activated on the
VRF.
Therefore, we chose to make this selftest more robust by explicitly
disabling the rp_filter during tests (as it is automatically set by some
Linux distributions).
Fixes: 03a0b567a03d ("selftests: seg6: add selftest for SRv6 End.DT46 Behavior")
Reported-by: Hangbin Liu <liuhangbin(a)gmail.com>
Signed-off-by: Andrea Mayer <andrea.mayer(a)uniroma2.it>
Tested-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
.../testing/selftests/net/srv6_end_dt46_l3vpn_test.sh | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh b/tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh
index aebaab8ce44c..441eededa031 100755
--- a/tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh
+++ b/tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh
@@ -292,6 +292,11 @@ setup_hs()
ip netns exec ${hsname} sysctl -wq net.ipv6.conf.all.accept_dad=0
ip netns exec ${hsname} sysctl -wq net.ipv6.conf.default.accept_dad=0
+ # disable the rp_filter otherwise the kernel gets confused about how
+ # to route decap ipv4 packets.
+ ip netns exec ${rtname} sysctl -wq net.ipv4.conf.all.rp_filter=0
+ ip netns exec ${rtname} sysctl -wq net.ipv4.conf.default.rp_filter=0
+
ip -netns ${hsname} link add veth0 type veth peer name ${rtveth}
ip -netns ${hsname} link set ${rtveth} netns ${rtname}
ip -netns ${hsname} addr add ${IPv6_HS_NETWORK}::${hs}/64 dev veth0 nodad
@@ -316,11 +321,6 @@ setup_hs()
ip netns exec ${rtname} sysctl -wq net.ipv6.conf.${rtveth}.proxy_ndp=1
ip netns exec ${rtname} sysctl -wq net.ipv4.conf.${rtveth}.proxy_arp=1
- # disable the rp_filter otherwise the kernel gets confused about how
- # to route decap ipv4 packets.
- ip netns exec ${rtname} sysctl -wq net.ipv4.conf.all.rp_filter=0
- ip netns exec ${rtname} sysctl -wq net.ipv4.conf.${rtveth}.rp_filter=0
-
ip netns exec ${rtname} sh -c "echo 1 > /proc/sys/net/vrf/strict_mode"
}
--
2.20.1
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign() and remove malloc.h include
that was there for memalign().
As a pointer is passed into posix_memalign(), initialize *s to NULL
to silence a warning about the function's return value being used as
uninitialized (which is not valid anyway because the error is properly
checked before s is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/powerpc/stringloops/strlen.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/powerpc/stringloops/strlen.c b/tools/testing/selftests/powerpc/stringloops/strlen.c
index 9055ebc484d0..f9c1f9cc2d32 100644
--- a/tools/testing/selftests/powerpc/stringloops/strlen.c
+++ b/tools/testing/selftests/powerpc/stringloops/strlen.c
@@ -1,5 +1,4 @@
// SPDX-License-Identifier: GPL-2.0
-#include <malloc.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
@@ -51,10 +50,11 @@ static void bench_test(char *s)
static int testcase(void)
{
char *s;
+ int ret;
unsigned long i;
- s = memalign(128, SIZE);
- if (!s) {
+ ret = posix_memalign((void **)&s, 128, SIZE);
+ if (ret < 0) {
perror("memalign");
exit(1);
}
--
2.27.0
For cases like IPv6 addresses, having a means to supply tracing
predicates for fields with more than 8 bytes would be convenient.
This series provides a simple way to support this by allowing
simple ==, != memory comparison with the predicate supplied when
the size of the field exceeds 8 bytes. For example, to trace
::1, the predicate
"dst == 0x00000000000000000000000000000001"
..could be used.
Patch 1 provides the support for > 8 byte fields via a memcmp()-style
predicate. Patch 2 adds tests for filter predicates, and patch 3
documents the fact that for > 8 bytes. only == and != are supported.
Changes since RFC [1]:
- originally a fix was intermixed with the new functionality as
patch 1 in series [1]; the fix landed separately
- small tweaks to how filter predicates are defined via fn_num as
opposed to via fn directly
[1] https://lore.kernel.org/lkml/1659910883-18223-1-git-send-email-alan.maguire…
Alan Maguire (3):
tracing: support > 8 byte array filter predicates
selftests/ftrace: add test coverage for filter predicates
tracing: document > 8 byte numeric filtering support
Documentation/trace/events.rst | 9 +++
kernel/trace/trace_events_filter.c | 55 +++++++++++++++-
.../selftests/ftrace/test.d/event/filter.tc | 62 +++++++++++++++++++
3 files changed, 125 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/event/filter.tc
--
2.31.1
When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't
respected. This patchset fixes this by regarding the incoming device's
VRF attachment when performing the socket lookups from tc/xdp.
The first two patches are coding changes which factor out the tc helper's
logic which was shared with cg/sk_skb (which operate correctly).
This refactoring is needed in order to avoid affecting the cgroup/sk_skb
flows as there does not seem to be a strict criteria for discerning which
flow the helper is called from based on the net device or packet
information.
The third patch contains the actual bugfix.
The fourth patch adds bpf tests for these lookup functions.
---
v3: - Rename bpf_l2_sdif() to dev_sdif() as suggested by Stanislav Fomichev
- Added xdp tests as suggested by Daniel Borkmann
- Use start_server() to avoid duplicate code as suggested by Stanislav Fomichev
v2: Fixed uninitialized var in test patch (4).
Gilad Sever (4):
bpf: factor out socket lookup functions for the TC hookpoint.
bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC
hookpoint
bpf: fix bpf socket lookup from tc/xdp to respect socket VRF bindings
selftests/bpf: Add vrf_socket_lookup tests
net/core/filter.c | 132 +++++--
.../bpf/prog_tests/vrf_socket_lookup.c | 327 ++++++++++++++++++
.../selftests/bpf/progs/vrf_socket_lookup.c | 88 +++++
3 files changed, 526 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/vrf_socket_lookup.c
create mode 100644 tools/testing/selftests/bpf/progs/vrf_socket_lookup.c
--
2.34.1
From: SeongJae Park <sjpark(a)amazon.de>
When running a test program, 'run_one()' checks if the program has the
execution permission and fails if it doesn't. However, it's easy to
mistakenly missing the permission, as some common tools like 'diff'
don't support the permission change well[1]. Compared to that, making
mistakes in the test program's path would only rare, as those are
explicitly listed in 'TEST_PROGS'. Therefore, it might make more sense
to resolve the situation on our own and run the program.
For the reason, this commit makes the test program runner function to
still print the warning message but try parsing the interpreter of the
program and explicitly run it with the interpreter, in the case.
[1] https://lore.kernel.org/mm-commits/YRJisBs9AunccCD4@kroah.com/
Suggested-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: SeongJae Park <sjpark(a)amazon.de>
---
Changes from v1
(https://lore.kernel.org/linux-kselftest/20210810140459.23990-1-sj38.park@gm…)
- Parse and use the interpreter instead of changing the file
tools/testing/selftests/kselftest/runner.sh | 28 +++++++++++++--------
1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index cc9c846585f0..a9ba782d8ca0 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -33,9 +33,9 @@ tap_timeout()
{
# Make sure tests will time out if utility is available.
if [ -x /usr/bin/timeout ] ; then
- /usr/bin/timeout --foreground "$kselftest_timeout" "$1"
+ /usr/bin/timeout --foreground "$kselftest_timeout" $1
else
- "$1"
+ $1
fi
}
@@ -65,17 +65,25 @@ run_one()
TEST_HDR_MSG="selftests: $DIR: $BASENAME_TEST"
echo "# $TEST_HDR_MSG"
- if [ ! -x "$TEST" ]; then
- echo -n "# Warning: file $TEST is "
- if [ ! -e "$TEST" ]; then
- echo "missing!"
- else
- echo "not executable, correct this."
- fi
+ if [ ! -e "$TEST" ]; then
+ echo "# Warning: file $TEST is missing!"
echo "not ok $test_num $TEST_HDR_MSG"
else
+ cmd="./$BASENAME_TEST"
+ if [ ! -x "$TEST" ]; then
+ echo "# Warning: file $TEST is not executable"
+
+ if [ $(head -n 1 "$TEST" | cut -c -2) = "#!" ]
+ then
+ interpreter=$(head -n 1 "$TEST" | cut -c 3-)
+ cmd="$interpreter ./$BASENAME_TEST"
+ else
+ echo "not ok $test_num $TEST_HDR_MSG"
+ return
+ fi
+ fi
cd `dirname $TEST` > /dev/null
- ((((( tap_timeout ./$BASENAME_TEST 2>&1; echo $? >&3) |
+ ((((( tap_timeout "$cmd" 2>&1; echo $? >&3) |
tap_prefix >&4) 3>&1) |
(read xs; exit $xs)) 4>>"$logfile" &&
echo "ok $test_num $TEST_HDR_MSG") ||
--
2.17.1
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
Trace sched related functions, such as enqueue_task_fair, it is necessary to
specify a task instead of the current task which within a given cgroup.
Feng Zhou (2):
bpf: Add bpf_task_under_cgroup() kfunc
selftests/bpf: Add testcase for bpf_task_under_cgroup
Changelog:
v2->v3: Addressed comments from Alexei Starovoitov
- Modify the comment information of the function.
- Narrow down the testcase's hook point
Details in here:
https://lore.kernel.org/all/20230421090403.15515-1-zhoufeng.zf@bytedance.co…
v1->v2: Addressed comments from Alexei Starovoitov
- Add kfunc instead.
Details in here:
https://lore.kernel.org/all/20230420072657.80324-1-zhoufeng.zf@bytedance.co…
kernel/bpf/helpers.c | 20 ++++++++
tools/testing/selftests/bpf/DENYLIST.s390x | 1 +
.../bpf/prog_tests/task_under_cgroup.c | 47 +++++++++++++++++++
.../selftests/bpf/progs/cgrp_kfunc_common.h | 1 +
.../bpf/progs/test_task_under_cgroup.c | 37 +++++++++++++++
5 files changed, 106 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c
create mode 100644 tools/testing/selftests/bpf/progs/test_task_under_cgroup.c
--
2.20.1
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Goals
=====
Support new key and signature formats with the same kernel component.
Verify the authenticity of system data with newly supported data formats.
Mitigate the risk of parsing arbitrary data in the kernel.
Motivation
==========
Adding new functionality to the kernel comes with an increased risk of
introducing new bugs which, once exploited, can lead to a partial or full
system compromise.
Parsing arbitrary data is particularly critical, since it allows an
attacker to send a malicious sequence of bytes to exploit vulnerabilities
in the parser code. The attacker might be able to overwrite kernel memory
to bypass kernel protections, and obtain more privileges.
User Mode Drivers (UMDs) can effectively mitigate this risk. If the parser
runs in user space, even if it has a bug, it won't allow the attacker to
overwrite kernel memory.
The communication protocol between the UMD and the kernel should be simple
enough, that the kernel can immediately recognize malformed data sent by an
attacker controlling the UMD, and discard it.
Solution
========
Register a new parser of the asymmetric key type which, instead of parsing
the key blob, forwards it to a UMD, and populates the key fields from the
UMD response. That response contains the data for each field of the public
key structure, defined in the kernel, and possibly a key description.
Supporting new data formats can be achieved by simply extending the UMD. As
long as the UMD recognizes them, and provides the crypto material to the
kernel Crypto API in the expected format, the kernel does not need to be
aware of the UMD changes.
Add a new API to verify the authenticity of system data, similar to the one
for PKCS#7 signatures. As for the key parser, send the signature to a UMD,
and fill the public_key_signature structure from the UMD response.
The API still supports a very basic trust model, it accepts a key for
signature verification if it is in the supplied keyring. The API can be
extended later to support more sophisticated models.
Use cases
=========
eBPF
----
The eBPF infrastructure already offers to eBPF programs the ability to
verify PKCS#7 signatures, through the bpf_verify_pkcs7_signature() kfunc.
Add the new bpf_verify_umd_signature() kfunc, to allow eBPF programs verify
signatures in a data format that is not PKCS#7 (for example PGP).
IMA Appraisal
-------------
An alternative to appraising each file with its signature (Fedora 38) is to
build a repository of reference file digests from signed RPM headers, and
lookup the calculated digest of files being accessed in that repository
(DIGLIM[1]).
With this patch set, the kernel can verify the authenticity of RPM headers
from their PGP signature against the Linux distribution GPG keys. Once
verified, RPM headers can be parsed with a UMD to build the repository of
reference file digests.
With DIGLIM, Linux distributions are not required to change anything in
their building infrastructure (no extra data in the RPM header, no new PKI
for IMA signatures).
[1]: https://lore.kernel.org/linux-integrity/20210914163401.864635-1-roberto.sas…
UMD development
===============
The header file crypto/asymmetric_keys/umd_key_sig_umh.h contains the
details of the communication protocol between the kernel and the UMD
handler.
The UMD handler should implement the commands defined, CMD_KEY and CMD_SIG,
should set the result of the processing, and fill the key and
signature-specific structures umd_key_msg_out and umd_sig_msg_out.
The UMD handler should provide the key and signature blobs in a format that
is understood by the kernel. For example, for RSA keys, it should provide
them in ASN.1 format (SEQUENCE of INTEGER).
The auth IDs of the keys and signatures should match, for signature
verification. Auth ID matching can be partial.
Patch set dependencies
======================
This patch set depends on 'usermode_driver: Add management library and
API':
https://lore.kernel.org/bpf/20230317145240.363908-1-roberto.sassu@huaweiclo…
Patch set content
=================
Patch 1 introduces the new parser for the asymmetric key type.
Patch 2 introduces the parser for signatures and its API.
Patch 3 introduces the system-level API for signature verification.
Patch 4 extends eBPF to use the new system-level API.
Patch 5 adds a test for UMD-parser signatures (not executed until the UMD
supports PGP).
Patch 6 introduces the skeleton of the UMD handler.
PGP
===
A work in progress implementation of the PGP format (RFC 4880 and RFC 6637)
in the UMD handler is available at:
https://github.com/robertosassu/linux/commits/pgp-signatures-umd-v1-devel-v…
It is based on a previous work of David Howells, available at:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-modsign.git/…
The patches have been adapted for use in user space.
Roberto Sassu (6):
KEYS: asymmetric: Introduce UMD-based asymmetric key parser
KEYS: asymmetric: Introduce UMD-based asymmetric key signature parser
verification: Introduce verify_umd_signature() and
verify_umd_message_sig()
bpf: Introduce bpf_verify_umd_signature() kfunc
selftests/bpf: Prepare a test for UMD-parsed signatures
KEYS: asymmetric: Add UMD handler
.gitignore | 3 +
MAINTAINERS | 1 +
certs/system_keyring.c | 125 ++++++
crypto/asymmetric_keys/Kconfig | 32 ++
crypto/asymmetric_keys/Makefile | 23 +
crypto/asymmetric_keys/asymmetric_type.c | 3 +-
crypto/asymmetric_keys/umd_key.h | 28 ++
crypto/asymmetric_keys/umd_key_parser.c | 203 +++++++++
crypto/asymmetric_keys/umd_key_sig_loader.c | 32 ++
crypto/asymmetric_keys/umd_key_sig_umh.h | 71 +++
crypto/asymmetric_keys/umd_key_sig_umh_blob.S | 7 +
crypto/asymmetric_keys/umd_key_sig_umh_user.c | 84 ++++
crypto/asymmetric_keys/umd_sig_parser.c | 416 ++++++++++++++++++
include/crypto/umd_sig.h | 71 +++
include/keys/asymmetric-type.h | 1 +
include/linux/verification.h | 48 ++
kernel/trace/bpf_trace.c | 69 ++-
...ify_pkcs7_sig.c => verify_pkcs7_umd_sig.c} | 109 +++--
...kcs7_sig.c => test_verify_pkcs7_umd_sig.c} | 18 +-
.../testing/selftests/bpf/verify_sig_setup.sh | 82 +++-
20 files changed, 1378 insertions(+), 48 deletions(-)
create mode 100644 crypto/asymmetric_keys/umd_key.h
create mode 100644 crypto/asymmetric_keys/umd_key_parser.c
create mode 100644 crypto/asymmetric_keys/umd_key_sig_loader.c
create mode 100644 crypto/asymmetric_keys/umd_key_sig_umh.h
create mode 100644 crypto/asymmetric_keys/umd_key_sig_umh_blob.S
create mode 100644 crypto/asymmetric_keys/umd_key_sig_umh_user.c
create mode 100644 crypto/asymmetric_keys/umd_sig_parser.c
create mode 100644 include/crypto/umd_sig.h
rename tools/testing/selftests/bpf/prog_tests/{verify_pkcs7_sig.c => verify_pkcs7_umd_sig.c} (75%)
rename tools/testing/selftests/bpf/progs/{test_verify_pkcs7_sig.c => test_verify_pkcs7_umd_sig.c} (82%)
--
2.25.1
Dan Carpenter spotted a race condition in a couple of situations like
these in the test_firmware driver:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
ret = kstrtou8(buf, 10, &val);
if (ret)
return ret;
mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
static ssize_t config_num_requests_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
int rc;
mutex_lock(&test_fw_mutex);
if (test_fw_config->reqs) {
pr_err("Must call release_all_firmware prior to changing config\n");
rc = -EINVAL;
mutex_unlock(&test_fw_mutex);
goto out;
}
mutex_unlock(&test_fw_mutex);
rc = test_dev_config_update_u8(buf, count,
&test_fw_config->num_requests);
out:
return rc;
}
static ssize_t config_read_fw_idx_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
return test_dev_config_update_u8(buf, count,
&test_fw_config->read_fw_idx);
}
The function test_dev_config_update_u8() is called from both the locked
and the unlocked context, function config_num_requests_store() and
config_read_fw_idx_store() which can both be called asynchronously as
they are driver's methods, while test_dev_config_update_u8() and siblings
change their argument pointed to by u8 *cfg or similar pointer.
To avoid deadlock on test_fw_mutex, the lock is dropped before calling
test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8()
itself, but alas this creates a race condition.
Having two locks wouldn't assure a race-proof mutual exclusion.
This situation is best avoided by the introduction of a new, unlocked
function __test_dev_config_update_u8() which can be called from the locked
context and reducing test_dev_config_update_u8() to:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
mutex_lock(&test_fw_mutex);
ret = __test_dev_config_update_u8(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
}
doing the locking and calling the unlocked primitive, which enables both
locked and unlocked versions without duplication of code.
The similar approach was applied to all functions called from the locked
and the unlocked context, which safely mitigates both deadlocks and race
conditions in the driver.
__test_dev_config_update_bool(), __test_dev_config_update_u8() and
__test_dev_config_update_size_t() unlocked versions of the functions
were introduced to be called from the locked contexts as a workaround
without releasing the main driver's lock and thereof causing a race
condition.
The test_dev_config_update_bool(), test_dev_config_update_u8() and
test_dev_config_update_size_t() locked versions of the functions
are being called from driver methods without the unnecessary multiplying
of the locking and unlocking code for each method, and complicating
the code with saving of the return value across lock.
Fixes: 7feebfa487b92 ("test_firmware: add support for request_firmware_into_buf")
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Takashi Iwai <tiwai(a)suse.de>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v5.4
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
---
lib/test_firmware.c | 52 ++++++++++++++++++++++++++++++---------------
1 file changed, 35 insertions(+), 17 deletions(-)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index 05ed84c2fc4c..35417e0af3f4 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -353,16 +353,26 @@ static ssize_t config_test_show_str(char *dst,
return len;
}
-static int test_dev_config_update_bool(const char *buf, size_t size,
+static inline int __test_dev_config_update_bool(const char *buf, size_t size,
bool *cfg)
{
int ret;
- mutex_lock(&test_fw_mutex);
if (kstrtobool(buf, cfg) < 0)
ret = -EINVAL;
else
ret = size;
+
+ return ret;
+}
+
+static int test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_bool(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
@@ -373,7 +383,8 @@ static ssize_t test_dev_config_show_bool(char *buf, bool val)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_size_t(const char *buf,
+static int __test_dev_config_update_size_t(
+ const char *buf,
size_t size,
size_t *cfg)
{
@@ -384,9 +395,7 @@ static int test_dev_config_update_size_t(const char *buf,
if (ret)
return ret;
- mutex_lock(&test_fw_mutex);
*(size_t *)cfg = new;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
@@ -402,7 +411,7 @@ static ssize_t test_dev_config_show_int(char *buf, int val)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+static int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
@@ -411,14 +420,23 @@ static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
if (ret)
return ret;
- mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
+static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_u8(buf, size, cfg);
+ mutex_unlock(&test_fw_mutex);
+
+ return ret;
+}
+
static ssize_t test_dev_config_show_u8(char *buf, u8 val)
{
return snprintf(buf, PAGE_SIZE, "%u\n", val);
@@ -471,10 +489,10 @@ static ssize_t config_num_requests_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_u8(buf, count,
- &test_fw_config->num_requests);
+ rc = __test_dev_config_update_u8(buf, count,
+ &test_fw_config->num_requests);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
@@ -518,10 +536,10 @@ static ssize_t config_buf_size_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_size_t(buf, count,
- &test_fw_config->buf_size);
+ rc = __test_dev_config_update_size_t(buf, count,
+ &test_fw_config->buf_size);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
@@ -548,10 +566,10 @@ static ssize_t config_file_offset_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_size_t(buf, count,
- &test_fw_config->file_offset);
+ rc = __test_dev_config_update_size_t(buf, count,
+ &test_fw_config->file_offset);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
--
2.30.2
This patchset adds a stress test for kprobes and a test for checking
optimized probes.
The two tests are being added based on the below discussion:
https://lore.kernel.org/all/20230128101622.ce6f8e64d929e29d36b08b73@kernel.…
Changelog:
* Add an explicit fork after enabling the events ( echo "forked" )
* Remove the extended test from multiple_kprobe_types.tc which adds
multiple consecutive probes in a function and add it as a
separate test case.
* Add new test case which checks for optimized probes.
Akanksha J N (2):
selftests/ftrace: Add new test case which adds multiple consecutive
probes in a function
selftests/ftrace: Add new test case which checks for optimized probes
.../test.d/kprobe/kprobe_insn_boundary.tc | 19 +++++++++++
.../ftrace/test.d/kprobe/kprobe_opt_types.tc | 34 +++++++++++++++++++
2 files changed, 53 insertions(+)
create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kprobe_insn_boundary.tc
create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kprobe_opt_types.tc
--
2.31.1
This is v1 of the KUnit deferred actions API, which implements an
equivalent of devm_add_action[1] on top of KUnit managed resources. This
provides a simple way of scheduling a function to run when the test
terminates (whether successfully, or with an error). It's therefore very
useful for freeing resources, or otherwise cleaning up.
The notable changes since RFCv2[2] are:
- Got rid of the 'cancellation token' concept. It was overcomplicated,
and we can add it back if we need to.
- kunit_add_action() therefore now returns 0 on success, and an error
otherwise (like devm_add_action()). Though you may wish to use:
- Added kunit_add_action_or_reset(), which will call the deferred
function if an error occurs. (See devm_add_action_or_reset()). This
also returns an error on failure, which can be asserted safely.
- Got rid of the function pointer typedef. Personally, I liked it, but
it's more typedef-y than most kernel code.
- Got rid of the 'internal_gfp' argument: all internal state is now
allocated with GFP_KERNEL. The main KUnit resource API can be used
instead if this doesn't work for your use-case.
I'd love to hear any further thoughts!
Cheers,
-- David
[1]: https://docs.kernel.org/driver-api/basics.html#c.devm_add_action
[2]: https://patchwork.kernel.org/project/linux-kselftest/list/?series=735720
David Gow (3):
kunit: Add kunit_add_action() to defer a call until test exit
kunit: executor_test: Use kunit_add_action()
kunit: kmalloc_array: Use kunit_add_action()
include/kunit/resource.h | 76 +++++++++++++++++++++++++++++++
include/kunit/test.h | 10 ++++-
lib/kunit/executor_test.c | 11 ++---
lib/kunit/kunit-test.c | 88 +++++++++++++++++++++++++++++++++++-
lib/kunit/resource.c | 95 +++++++++++++++++++++++++++++++++++++++
lib/kunit/test.c | 48 ++++----------------
6 files changed, 279 insertions(+), 49 deletions(-)
--
2.40.0.634.g4ca3ef3211-goog
KUnit tests run in a kthread, with the current->kunit_test pointer set
to the test's context. This allows the kunit_get_current_test() and
kunit_fail_current_test() macros to work. Normally, this pointer is
still valid during test shutdown (i.e., the suite->exit function, and
any resource cleanup). However, if the test has exited early (e.g., due
to a failed assertion), the cleanup is done in the parent KUnit thread,
which does not have an active context.
Instead, in the event test terminates early, run the test exit and
cleanup from a new 'cleanup' kthread, which sets current->kunit_test,
and better isolates the rest of KUnit from issues which arise in test
cleanup.
If a test cleanup function itself aborts (e.g., due to an assertion
failing), there will be no further attempts to clean up: an error will
be logged and the test failed. For example:
# example_simple_test: test aborted during cleanup. continuing without cleaning up
This should also make it easier to get access to the KUnit context,
particularly from within resource cleanup functions, which may, for
example, need access to data in test->priv.
Signed-off-by: David Gow <davidgow(a)google.com>
---
This is an updated version of / replacement of "kunit: Set the current
KUnit context when cleaning up", which instead creates a new kthread
for cleanup tasks if the original test kthread is aborted. This protects
us from failed assertions during cleanup, if the test exited early.
Changes since v2:
https://lore.kernel.org/linux-kselftest/20230419085426.1671703-1-davidgow@g…
- Always run cleanup in its own kthread
- Therefore, never attempt to re-run it if it exits
- Thanks, Benjamin.
Changes since v1:
https://lore.kernel.org/linux-kselftest/20230415091401.681395-1-davidgow@go…
- Move cleanup execution to another kthread
- (Thanks, Benjamin, for pointing out the assertion issues)
---
lib/kunit/test.c | 55 ++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 48 insertions(+), 7 deletions(-)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index e2910b261112..2025e51941e6 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -419,10 +419,50 @@ static void kunit_try_run_case(void *data)
* thread will resume control and handle any necessary clean up.
*/
kunit_run_case_internal(test, suite, test_case);
- /* This line may never be reached. */
+}
+
+static void kunit_try_run_case_cleanup(void *data)
+{
+ struct kunit_try_catch_context *ctx = data;
+ struct kunit *test = ctx->test;
+ struct kunit_suite *suite = ctx->suite;
+
+ current->kunit_test = test;
+
kunit_run_case_cleanup(test, suite);
}
+static void kunit_catch_run_case_cleanup(void *data)
+{
+ struct kunit_try_catch_context *ctx = data;
+ struct kunit *test = ctx->test;
+ int try_exit_code = kunit_try_catch_get_result(&test->try_catch);
+
+ /* It is always a failure if cleanup aborts. */
+ kunit_set_failure(test);
+
+ if (try_exit_code) {
+ /*
+ * Test case could not finish, we have no idea what state it is
+ * in, so don't do clean up.
+ */
+ if (try_exit_code == -ETIMEDOUT) {
+ kunit_err(test, "test case cleanup timed out\n");
+ /*
+ * Unknown internal error occurred preventing test case from
+ * running, so there is nothing to clean up.
+ */
+ } else {
+ kunit_err(test, "internal error occurred during test case cleanup: %d\n",
+ try_exit_code);
+ }
+ return;
+ }
+
+ kunit_err(test, "test aborted during cleanup. continuing without cleaning up\n");
+}
+
+
static void kunit_catch_run_case(void *data)
{
struct kunit_try_catch_context *ctx = data;
@@ -448,12 +488,6 @@ static void kunit_catch_run_case(void *data)
}
return;
}
-
- /*
- * Test case was run, but aborted. It is the test case's business as to
- * whether it failed or not, we just need to clean up.
- */
- kunit_run_case_cleanup(test, suite);
}
/*
@@ -478,6 +512,13 @@ static void kunit_run_case_catch_errors(struct kunit_suite *suite,
context.test_case = test_case;
kunit_try_catch_run(try_catch, &context);
+ /* Now run the cleanup */
+ kunit_try_catch_init(try_catch,
+ test,
+ kunit_try_run_case_cleanup,
+ kunit_catch_run_case_cleanup);
+ kunit_try_catch_run(try_catch, &context);
+
/* Propagate the parameter result to the test case. */
if (test->status == KUNIT_FAILURE)
test_case->status = KUNIT_FAILURE;
--
2.40.0.634.g4ca3ef3211-goog
The ftrace selftests do not currently produce KTAP output, they produce a
custom format much nicer for human consumption. This means that when run in
automated test systems we just get a single result for the suite as a whole
rather than recording results for individual test cases, making it harder
to look at the test data and masking things like inappropriate skips.
Address this by adding support for KTAP output to the ftracetest script and
providing a trivial wrapper which will be invoked by the kselftest runner
to generate output in this format by default, users using ftracetest
directly will continue to get the existing output.
This is not the most elegant solution but it is simple and effective. I
did consider implementing this by post processing the existing output
format but that felt more complex and likely to result in all output being
lost if something goes seriously wrong during the run which would not be
helpful. I did also consider just writing a separate runner script but
there's enough going on with things like the signal handling for that to
seem like it would be duplicating too much.
Acked-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Tested-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Update help text.
- Link to v1: https://lore.kernel.org/r/20230302-ftrace-kselftest-ktap-v1-1-a84a0765b7ad@…
---
tools/testing/selftests/ftrace/Makefile | 3 +-
tools/testing/selftests/ftrace/ftracetest | 63 ++++++++++++++++++++++++--
tools/testing/selftests/ftrace/ftracetest-ktap | 8 ++++
3 files changed, 70 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/ftrace/Makefile b/tools/testing/selftests/ftrace/Makefile
index d6e106fbce11..a1e955d2de4c 100644
--- a/tools/testing/selftests/ftrace/Makefile
+++ b/tools/testing/selftests/ftrace/Makefile
@@ -1,7 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
all:
-TEST_PROGS := ftracetest
+TEST_PROGS_EXTENDED := ftracetest
+TEST_PROGS := ftracetest-ktap
TEST_FILES := test.d settings
EXTRA_CLEAN := $(OUTPUT)/logs/*
diff --git a/tools/testing/selftests/ftrace/ftracetest b/tools/testing/selftests/ftrace/ftracetest
index c3311c8c4089..2506621e75df 100755
--- a/tools/testing/selftests/ftrace/ftracetest
+++ b/tools/testing/selftests/ftrace/ftracetest
@@ -13,6 +13,7 @@ echo "Usage: ftracetest [options] [testcase(s)] [testcase-directory(s)]"
echo " Options:"
echo " -h|--help Show help message"
echo " -k|--keep Keep passed test logs"
+echo " -K|--ktap Output in KTAP format"
echo " -v|--verbose Increase verbosity of test messages"
echo " -vv Alias of -v -v (Show all results in stdout)"
echo " -vvv Alias of -v -v -v (Show all commands immediately)"
@@ -85,6 +86,10 @@ parse_opts() { # opts
KEEP_LOG=1
shift 1
;;
+ --ktap|-K)
+ KTAP=1
+ shift 1
+ ;;
--verbose|-v|-vv|-vvv)
if [ $VERBOSE -eq -1 ]; then
usage "--console can not use with --verbose"
@@ -178,6 +183,7 @@ TEST_DIR=$TOP_DIR/test.d
TEST_CASES=`find_testcases $TEST_DIR`
LOG_DIR=$TOP_DIR/logs/`date +%Y%m%d-%H%M%S`/
KEEP_LOG=0
+KTAP=0
DEBUG=0
VERBOSE=0
UNSUPPORTED_RESULT=0
@@ -229,7 +235,7 @@ prlog() { # messages
newline=
shift
fi
- printf "$*$newline"
+ [ "$KTAP" != "1" ] && printf "$*$newline"
[ "$LOG_FILE" ] && printf "$*$newline" | strip_esc >> $LOG_FILE
}
catlog() { #file
@@ -260,11 +266,11 @@ TOTAL_RESULT=0
INSTANCE=
CASENO=0
+CASENAME=
testcase() { # testfile
CASENO=$((CASENO+1))
- desc=`grep "^#[ \t]*description:" $1 | cut -f2- -d:`
- prlog -n "[$CASENO]$INSTANCE$desc"
+ CASENAME=`grep "^#[ \t]*description:" $1 | cut -f2- -d:`
}
checkreq() { # testfile
@@ -277,40 +283,68 @@ test_on_instance() { # testfile
grep -q "^#[ \t]*flags:.*instance" $1
}
+ktaptest() { # result comment
+ if [ "$KTAP" != "1" ]; then
+ return
+ fi
+
+ local result=
+ if [ "$1" = "1" ]; then
+ result="ok"
+ else
+ result="not ok"
+ fi
+ shift
+
+ local comment=$*
+ if [ "$comment" != "" ]; then
+ comment="# $comment"
+ fi
+
+ echo $CASENO $result $INSTANCE$CASENAME $comment
+}
+
eval_result() { # sigval
case $1 in
$PASS)
prlog " [${color_green}PASS${color_reset}]"
+ ktaptest 1
PASSED_CASES="$PASSED_CASES $CASENO"
return 0
;;
$FAIL)
prlog " [${color_red}FAIL${color_reset}]"
+ ktaptest 0
FAILED_CASES="$FAILED_CASES $CASENO"
return 1 # this is a bug.
;;
$UNRESOLVED)
prlog " [${color_blue}UNRESOLVED${color_reset}]"
+ ktaptest 0 UNRESOLVED
UNRESOLVED_CASES="$UNRESOLVED_CASES $CASENO"
return $UNRESOLVED_RESULT # depends on use case
;;
$UNTESTED)
prlog " [${color_blue}UNTESTED${color_reset}]"
+ ktaptest 1 SKIP
UNTESTED_CASES="$UNTESTED_CASES $CASENO"
return 0
;;
$UNSUPPORTED)
prlog " [${color_blue}UNSUPPORTED${color_reset}]"
+ ktaptest 1 SKIP
UNSUPPORTED_CASES="$UNSUPPORTED_CASES $CASENO"
return $UNSUPPORTED_RESULT # depends on use case
;;
$XFAIL)
prlog " [${color_green}XFAIL${color_reset}]"
+ ktaptest 1 XFAIL
XFAILED_CASES="$XFAILED_CASES $CASENO"
return 0
;;
*)
prlog " [${color_blue}UNDEFINED${color_reset}]"
+ ktaptest 0 error
UNDEFINED_CASES="$UNDEFINED_CASES $CASENO"
return 1 # this must be a test bug
;;
@@ -371,6 +405,7 @@ __run_test() { # testfile
run_test() { # testfile
local testname=`basename $1`
testcase $1
+ prlog -n "[$CASENO]$INSTANCE$CASENAME"
if [ ! -z "$LOG_FILE" ] ; then
local testlog=`mktemp $LOG_DIR/${CASENO}-${testname}-log.XXXXXX`
else
@@ -405,6 +440,17 @@ run_test() { # testfile
# load in the helper functions
. $TEST_DIR/functions
+if [ "$KTAP" = "1" ]; then
+ echo "TAP version 13"
+
+ casecount=`echo $TEST_CASES | wc -w`
+ for t in $TEST_CASES; do
+ test_on_instance $t || continue
+ casecount=$((casecount+1))
+ done
+ echo "1..${casecount}"
+fi
+
# Main loop
for t in $TEST_CASES; do
run_test $t
@@ -439,6 +485,17 @@ prlog "# of unsupported: " `echo $UNSUPPORTED_CASES | wc -w`
prlog "# of xfailed: " `echo $XFAILED_CASES | wc -w`
prlog "# of undefined(test bug): " `echo $UNDEFINED_CASES | wc -w`
+if [ "$KTAP" = "1" ]; then
+ echo -n "# Totals:"
+ echo -n " pass:"`echo $PASSED_CASES | wc -w`
+ echo -n " faii:"`echo $FAILED_CASES | wc -w`
+ echo -n " xfail:"`echo $XFAILED_CASES | wc -w`
+ echo -n " xpass:0"
+ echo -n " skip:"`echo $UNTESTED_CASES $UNSUPPORTED_CASES | wc -w`
+ echo -n " error:"`echo $UNRESOLVED_CASES $UNDEFINED_CASES | wc -w`
+ echo
+fi
+
cleanup
# if no error, return 0
diff --git a/tools/testing/selftests/ftrace/ftracetest-ktap b/tools/testing/selftests/ftrace/ftracetest-ktap
new file mode 100755
index 000000000000..b3284679ef3a
--- /dev/null
+++ b/tools/testing/selftests/ftrace/ftracetest-ktap
@@ -0,0 +1,8 @@
+#!/bin/sh -e
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# ftracetest-ktap: Wrapper to integrate ftracetest with the kselftest runner
+#
+# Copyright (C) Arm Ltd., 2023
+
+./ftracetest -K
---
base-commit: 457391b0380335d5e9a5babdec90ac53928b23b4
change-id: 20230302-ftrace-kselftest-ktap-9d7878691557
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The ftrace selftests do not currently produce KTAP output, they produce a
custom format much nicer for human consumption. This means that when run in
automated test systems we just get a single result for the suite as a whole
rather than recording results for individual test cases, making it harder
to look at the test data and masking things like inappropriate skips.
Address this by adding support for KTAP output to the ftracetest script and
providing a trivial wrapper which will be invoked by the kselftest runner
to generate output in this format by default, users using ftracetest
directly will continue to get the existing output.
This is not the most elegant solution but it is simple and effective. I
did consider implementing this by post processing the existing output
format but that felt more complex and likely to result in all output being
lost if something goes seriously wrong during the run which would not be
helpful. I did also consider just writing a separate runner script but
there's enough going on with things like the signal handling for that to
seem like it would be duplicating too much.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/ftrace/Makefile | 3 +-
tools/testing/selftests/ftrace/ftracetest | 63 ++++++++++++++++++++++++--
tools/testing/selftests/ftrace/ftracetest-ktap | 8 ++++
3 files changed, 70 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/ftrace/Makefile b/tools/testing/selftests/ftrace/Makefile
index d6e106fbce11..a1e955d2de4c 100644
--- a/tools/testing/selftests/ftrace/Makefile
+++ b/tools/testing/selftests/ftrace/Makefile
@@ -1,7 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
all:
-TEST_PROGS := ftracetest
+TEST_PROGS_EXTENDED := ftracetest
+TEST_PROGS := ftracetest-ktap
TEST_FILES := test.d settings
EXTRA_CLEAN := $(OUTPUT)/logs/*
diff --git a/tools/testing/selftests/ftrace/ftracetest b/tools/testing/selftests/ftrace/ftracetest
index c3311c8c4089..539c8d6d5d71 100755
--- a/tools/testing/selftests/ftrace/ftracetest
+++ b/tools/testing/selftests/ftrace/ftracetest
@@ -13,6 +13,7 @@ echo "Usage: ftracetest [options] [testcase(s)] [testcase-directory(s)]"
echo " Options:"
echo " -h|--help Show help message"
echo " -k|--keep Keep passed test logs"
+echo " -K|--KTAP Output in KTAP format"
echo " -v|--verbose Increase verbosity of test messages"
echo " -vv Alias of -v -v (Show all results in stdout)"
echo " -vvv Alias of -v -v -v (Show all commands immediately)"
@@ -85,6 +86,10 @@ parse_opts() { # opts
KEEP_LOG=1
shift 1
;;
+ --ktap|-K)
+ KTAP=1
+ shift 1
+ ;;
--verbose|-v|-vv|-vvv)
if [ $VERBOSE -eq -1 ]; then
usage "--console can not use with --verbose"
@@ -178,6 +183,7 @@ TEST_DIR=$TOP_DIR/test.d
TEST_CASES=`find_testcases $TEST_DIR`
LOG_DIR=$TOP_DIR/logs/`date +%Y%m%d-%H%M%S`/
KEEP_LOG=0
+KTAP=0
DEBUG=0
VERBOSE=0
UNSUPPORTED_RESULT=0
@@ -229,7 +235,7 @@ prlog() { # messages
newline=
shift
fi
- printf "$*$newline"
+ [ "$KTAP" != "1" ] && printf "$*$newline"
[ "$LOG_FILE" ] && printf "$*$newline" | strip_esc >> $LOG_FILE
}
catlog() { #file
@@ -260,11 +266,11 @@ TOTAL_RESULT=0
INSTANCE=
CASENO=0
+CASENAME=
testcase() { # testfile
CASENO=$((CASENO+1))
- desc=`grep "^#[ \t]*description:" $1 | cut -f2- -d:`
- prlog -n "[$CASENO]$INSTANCE$desc"
+ CASENAME=`grep "^#[ \t]*description:" $1 | cut -f2- -d:`
}
checkreq() { # testfile
@@ -277,40 +283,68 @@ test_on_instance() { # testfile
grep -q "^#[ \t]*flags:.*instance" $1
}
+ktaptest() { # result comment
+ if [ "$KTAP" != "1" ]; then
+ return
+ fi
+
+ local result=
+ if [ "$1" = "1" ]; then
+ result="ok"
+ else
+ result="not ok"
+ fi
+ shift
+
+ local comment=$*
+ if [ "$comment" != "" ]; then
+ comment="# $comment"
+ fi
+
+ echo $CASENO $result $INSTANCE$CASENAME $comment
+}
+
eval_result() { # sigval
case $1 in
$PASS)
prlog " [${color_green}PASS${color_reset}]"
+ ktaptest 1
PASSED_CASES="$PASSED_CASES $CASENO"
return 0
;;
$FAIL)
prlog " [${color_red}FAIL${color_reset}]"
+ ktaptest 0
FAILED_CASES="$FAILED_CASES $CASENO"
return 1 # this is a bug.
;;
$UNRESOLVED)
prlog " [${color_blue}UNRESOLVED${color_reset}]"
+ ktaptest 0 UNRESOLVED
UNRESOLVED_CASES="$UNRESOLVED_CASES $CASENO"
return $UNRESOLVED_RESULT # depends on use case
;;
$UNTESTED)
prlog " [${color_blue}UNTESTED${color_reset}]"
+ ktaptest 1 SKIP
UNTESTED_CASES="$UNTESTED_CASES $CASENO"
return 0
;;
$UNSUPPORTED)
prlog " [${color_blue}UNSUPPORTED${color_reset}]"
+ ktaptest 1 SKIP
UNSUPPORTED_CASES="$UNSUPPORTED_CASES $CASENO"
return $UNSUPPORTED_RESULT # depends on use case
;;
$XFAIL)
prlog " [${color_green}XFAIL${color_reset}]"
+ ktaptest 1 XFAIL
XFAILED_CASES="$XFAILED_CASES $CASENO"
return 0
;;
*)
prlog " [${color_blue}UNDEFINED${color_reset}]"
+ ktaptest 0 error
UNDEFINED_CASES="$UNDEFINED_CASES $CASENO"
return 1 # this must be a test bug
;;
@@ -371,6 +405,7 @@ __run_test() { # testfile
run_test() { # testfile
local testname=`basename $1`
testcase $1
+ prlog -n "[$CASENO]$INSTANCE$CASENAME"
if [ ! -z "$LOG_FILE" ] ; then
local testlog=`mktemp $LOG_DIR/${CASENO}-${testname}-log.XXXXXX`
else
@@ -405,6 +440,17 @@ run_test() { # testfile
# load in the helper functions
. $TEST_DIR/functions
+if [ "$KTAP" = "1" ]; then
+ echo "TAP version 13"
+
+ casecount=`echo $TEST_CASES | wc -w`
+ for t in $TEST_CASES; do
+ test_on_instance $t || continue
+ casecount=$((casecount+1))
+ done
+ echo "1..${casecount}"
+fi
+
# Main loop
for t in $TEST_CASES; do
run_test $t
@@ -439,6 +485,17 @@ prlog "# of unsupported: " `echo $UNSUPPORTED_CASES | wc -w`
prlog "# of xfailed: " `echo $XFAILED_CASES | wc -w`
prlog "# of undefined(test bug): " `echo $UNDEFINED_CASES | wc -w`
+if [ "$KTAP" = "1" ]; then
+ echo -n "# Totals:"
+ echo -n " pass:"`echo $PASSED_CASES | wc -w`
+ echo -n " faii:"`echo $FAILED_CASES | wc -w`
+ echo -n " xfail:"`echo $XFAILED_CASES | wc -w`
+ echo -n " xpass:0"
+ echo -n " skip:"`echo $UNTESTED_CASES $UNSUPPORTED_CASES | wc -w`
+ echo -n " error:"`echo $UNRESOLVED_CASES $UNDEFINED_CASES | wc -w`
+ echo
+fi
+
cleanup
# if no error, return 0
diff --git a/tools/testing/selftests/ftrace/ftracetest-ktap b/tools/testing/selftests/ftrace/ftracetest-ktap
new file mode 100755
index 000000000000..b3284679ef3a
--- /dev/null
+++ b/tools/testing/selftests/ftrace/ftracetest-ktap
@@ -0,0 +1,8 @@
+#!/bin/sh -e
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# ftracetest-ktap: Wrapper to integrate ftracetest with the kselftest runner
+#
+# Copyright (C) Arm Ltd., 2023
+
+./ftracetest -K
---
base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6
change-id: 20230302-ftrace-kselftest-ktap-9d7878691557
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Dzień dobry,
czy rozważali Państwo rozwój kwalifikacji językowych swoich pracowników?
Opracowaliśmy kursy językowe dla różnych branż, w których koncentrujemy się na podniesieniu poziomu słownictwa i jakości komunikacji wykorzystując autorską metodę, stworzoną specjalnie dla wymagającego biznesu.
Niestandardowy kurs on-line, dopasowany do profilu firmy i obszarów świadczonych usług, w szybkim czasie przyniesie efekty, które zwiększą komfort i jakość pracy, rozwijając możliwości biznesowe.
Zdalne szkolenie językowe to m.in. zajęcia z native speakerami, które w szybkim czasie nauczą pracowników rozmawiać za pomocą jasnego i zwięzłego języka Business English.
Czy mógłbym przedstawić więcej szczegółów i opowiedzieć jak działamy?
Pozdrawiam
Krzysztof Maj
When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't
respected. This patchset fixes this by regarding the incoming device's
VRF attachment when performing the socket lookups from tc/xdp.
The first two patches are coding changes which facilitate this fix by
factoring out the tc helper's logic which was shared with cg/sk_skb
(which operate correctly).
The third patch contains the actual bugfix.
The fourth patch adds bpf tests for these lookup functions.
---
v2: Fixed uninitialized var in test patch (4).
Gilad Sever (4):
bpf: factor out socket lookup functions for the TC hookpoint.
bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC
hookpoint
bpf: fix bpf socket lookup from tc/xdp to respect socket VRF bindings
selftests/bpf: Add tc_socket_lookup tests
net/core/filter.c | 132 +++++--
.../bpf/prog_tests/tc_socket_lookup.c | 341 ++++++++++++++++++
.../selftests/bpf/progs/tc_socket_lookup.c | 73 ++++
3 files changed, 525 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_socket_lookup.c
create mode 100644 tools/testing/selftests/bpf/progs/tc_socket_lookup.c
--
2.34.1
This is a follow-up to [1]:
[PATCH v9 0/3] mm: process/cgroup ksm support
which is now in mm-stable. Ideally we'd get at least patch #1 into the
same kernel release as [1], so the semantics of setting
PR_SET_MEMORY_MERGE=0 are unchanged between kernel versions.
(1) Make PR_SET_MEMORY_MERGE=0 unmerge pages like setting MADV_UNMERGEABLE
does, (2) add a selftest for it and (3) factor out disabling of KSM from
s390/gmap code.
v1 -> v2:
- "mm/ksm: unmerge and clear VM_MERGEABLE when setting
PR_SET_MEMORY_MERGE=0"
-> Cleanup one if/else
-> Add doc for ksm_disable_merge_any()
- Added ACKs
[1] https://lkml.kernel.org/r/20230418051342.1919757-1-shr@devkernel.io
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Stefan Roesch <shr(a)devkernel.io>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Cc: Janosch Frank <frankja(a)linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda(a)linux.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: Sven Schnelle <svens(a)linux.ibm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
David Hildenbrand (3):
mm/ksm: unmerge and clear VM_MERGEABLE when setting
PR_SET_MEMORY_MERGE=0
selftests/ksm: ksm_functional_tests: add prctl unmerge test
mm/ksm: move disabling KSM from s390/gmap code to KSM code
arch/s390/mm/gmap.c | 20 +-----
include/linux/ksm.h | 7 ++
kernel/sys.c | 12 +---
mm/ksm.c | 70 +++++++++++++++++++
.../selftests/mm/ksm_functional_tests.c | 46 ++++++++++--
5 files changed, 121 insertions(+), 34 deletions(-)
--
2.40.0
Hi Linus,
Please pull the following KUnit next update for Linux 6.4-rc1.
linux-kselftest-kunit-6.4-rc1
This KUnit update Linux 6.4-rc1 consists of:
- several fixes to kunit tool
- new klist structure test
- support for m68k under QEMU
- support for overriding the QEMU serial port
- support for SH under QEMU
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit fe15c26ee26efa11741a7b632e9f23b01aca4cc6:
Linux 6.3-rc1 (2023-03-05 14:52:03 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-kunit-6.4-rc1
for you to fetch changes up to a42077b787680cbc365a96446b30f32399fa3f6f:
kunit: add tests for using current KUnit test field (2023-04-05 12:51:30 -0600)
----------------------------------------------------------------
linux-kselftest-kunit-6.4-rc1
This KUnit update Linux 6.4-rc1 consists of:
- several fixes to kunit tool
- new klist structure test
- support for m68k under QEMU
- support for overriding the QEMU serial port
- support for SH under QEMU
----------------------------------------------------------------
Andy Shevchenko (1):
.gitignore: Unignore .kunitconfig
Daniel Latypov (3):
kunit: tool: add subscripts for type annotations where appropriate
kunit: tool: remove unused imports and variables
kunit: tool: fix pre-existing `mypy --strict` errors and update run_checks.py
Geert Uytterhoeven (3):
kunit: tool: Add support for m68k under QEMU
kunit: tool: Add support for overriding the QEMU serial port
kunit: tool: Add support for SH under QEMU
Heiko Carstens (1):
kunit: increase KUNIT_LOG_SIZE to 2048 bytes
Rae Moar (4):
kunit: fix bug in debugfs logs of parameterized tests
kunit: fix bug in the order of lines in debugfs logs
kunit: fix bug of extra newline characters in debugfs logs
kunit: add tests for using current KUnit test field
Sadiya Kazi (1):
list: test: Test the klist structure
Stephen Boyd (1):
kunit: Use gfp in kunit_alloc_resource() kernel-doc
.gitignore | 1 +
include/kunit/resource.h | 2 +-
include/kunit/test.h | 4 +-
lib/kunit/debugfs.c | 14 +-
lib/kunit/kunit-test.c | 77 ++++++--
lib/kunit/test.c | 57 ++++--
lib/list-test.c | 300 ++++++++++++++++++++++++++++++-
tools/testing/kunit/kunit.py | 26 +--
tools/testing/kunit/kunit_config.py | 4 +-
tools/testing/kunit/kunit_kernel.py | 39 ++--
tools/testing/kunit/kunit_parser.py | 1 -
tools/testing/kunit/kunit_printer.py | 2 +-
tools/testing/kunit/kunit_tool_test.py | 2 +-
tools/testing/kunit/qemu_config.py | 1 +
tools/testing/kunit/qemu_configs/m68k.py | 10 ++
tools/testing/kunit/qemu_configs/sh.py | 17 ++
tools/testing/kunit/run_checks.py | 6 +-
17 files changed, 491 insertions(+), 72 deletions(-)
create mode 100644 tools/testing/kunit/qemu_configs/m68k.py
create mode 100644 tools/testing/kunit/qemu_configs/sh.py
----------------------------------------------------------------
Hi Linus,
Please pull the following Kselftest update for Linux 6.4-rc1.
This Kselftest update for Linux 6.4-rc1 consists of:
- several patches to enhance and fix resctrl test
- nolibc support for kselftest with an addition to vprintf() to
tools/nolibc/stdio and related test changes
- Refactor 'peeksiginfo' ptrace test part
- add 'malloc' failures checks in cgroup test_memcontrol
- a new prctl test
- enhancements sched test with additional ore schedule prctl calls
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit fe15c26ee26efa11741a7b632e9f23b01aca4cc6:
Linux 6.3-rc1 (2023-03-05 14:52:03 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-next-6.4-rc1
for you to fetch changes up to 50ad2fb7ec2b18186b8a4fa1c0e00f78b3de5119:
selftests/resctrl: Fix incorrect error return on test complete (2023-04-14 11:13:18 -0600)
----------------------------------------------------------------
linux-kselftest-next-6.4-rc1
linux-kselftest-next-6.4-rc1
This Kselftest update for Linux 6.4-rc1 consists of:
- several patches to enhance and fix resctrl test
- nolibc support for kselftest with an addition to vprintf() to
tools/nolibc/stdio and related test changes
- Refactor 'peeksiginfo' ptrace test part
- add 'malloc' failures checks in cgroup test_memcontrol
- a new prctl test
- enhancements sched test with additional ore schedule prctl calls
----------------------------------------------------------------
Fenghua Yu (1):
selftests/resctrl: Change name from CBM_MASK_PATH to INFO_PATH
Ilpo Järvinen (8):
selftests/resctrl: Return NULL if malloc_and_init_memory() did not alloc mem
selftests/resctrl: Move ->setup() call outside of test specific branches
selftests/resctrl: Allow ->setup() to return errors
selftests/resctrl: Check for return value after write_schemata()
selftests/resctrl: Replace obsolete memalign() with posix_memalign()
selftests/resctrl: Change initialize_llc_perf() return type to void
selftests/resctrl: Use remount_resctrlfs() consistently with boolean
selftests/resctrl: Correct get_llc_perf() param in function comment
Ivan Orlov (4):
selftests: Refactor 'peeksiginfo' ptrace test part
selftests: cgroup: Add 'malloc' failures checks in test_memcontrol
selftests: sched: Add more core schedule prctl calls
selftests: prctl: Add new prctl test for PR_SET_VMA action
Mark Brown (3):
tools/nolibc/stdio: Implement vprintf()
kselftest: Support nolibc
kselftest/arm64: Convert za-fork to use kselftest.h
Peter Newman (1):
selftests/resctrl: Use correct exit code when tests fail
Reinette Chatre (1):
selftests/resctrl: Fix incorrect error return on test complete
Shaopeng Tan (6):
selftests/resctrl: Fix set up schemata with 100% allocation on first run in MBM test
selftests/resctrl: Return MBA check result and make it to output message
selftests/resctrl: Flush stdout file buffer before executing fork()
selftests/resctrl: Cleanup properly when an error occurs in CAT test
selftests/resctrl: Commonize the signal handler register/unregister for all tests
selftests/resctrl: Remove duplicate codes that clear each test result file
Sukrut Bellary (1):
kselftest: amd-pstate: Fix spelling mistakes
tools/include/nolibc/stdio.h | 6 ++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/amd-pstate/gitsource.sh | 4 +-
tools/testing/selftests/amd-pstate/run.sh | 4 +-
tools/testing/selftests/arm64/fp/Makefile | 2 +-
tools/testing/selftests/arm64/fp/za-fork.c | 88 ++++-------------
tools/testing/selftests/cgroup/test_memcontrol.c | 15 +++
tools/testing/selftests/kselftest.h | 2 +
tools/testing/selftests/prctl/.gitignore | 1 +
tools/testing/selftests/prctl/Makefile | 2 +-
tools/testing/selftests/prctl/config | 1 +
.../selftests/prctl/set-anon-vma-name-test.c | 104 +++++++++++++++++++++
tools/testing/selftests/ptrace/peeksiginfo.c | 14 +--
tools/testing/selftests/resctrl/cache.c | 17 ++--
tools/testing/selftests/resctrl/cat_test.c | 33 ++++---
tools/testing/selftests/resctrl/cmt_test.c | 16 ++--
tools/testing/selftests/resctrl/fill_buf.c | 21 +----
tools/testing/selftests/resctrl/mba_test.c | 34 ++++---
tools/testing/selftests/resctrl/mbm_test.c | 22 ++---
tools/testing/selftests/resctrl/resctrl.h | 8 +-
tools/testing/selftests/resctrl/resctrl_tests.c | 14 +--
tools/testing/selftests/resctrl/resctrl_val.c | 88 +++++++++++------
tools/testing/selftests/resctrl/resctrlfs.c | 7 +-
tools/testing/selftests/sched/cs_prctl_test.c | 6 ++
24 files changed, 306 insertions(+), 204 deletions(-)
create mode 100644 tools/testing/selftests/prctl/config
create mode 100644 tools/testing/selftests/prctl/set-anon-vma-name-test.c
----------------------------------------------------------------
From: Zhang Yunkai (CGEL ZTE) <zhang.yunkai(a)zte.com.cn>
The verification function of this test case is likely to encounter the
following error, which may confuse users. The problem is easily
reproducible in the latest kernel.
Environment A, the sender:
bash# udpgso_bench_tx -l 4 -4 -D "$IP_B"
udpgso_bench_tx: write: Connection refused
Environment B, the receiver:
bash# udpgso_bench_rx -4 -G -S 1472 -v
udpgso_bench_rx: data[1472]: len 17664, a(97) != q(113)
If the packet is captured, you will see:
Environment A, the sender:
bash# tcpdump -i eth0 host "$IP_B" &
IP $IP_A.41025 > $IP_B.8000: UDP, length 1472
IP $IP_A.41025 > $IP_B.8000: UDP, length 1472
IP $IP_B > $IP_A: ICMP $IP_B udp port 8000 unreachable, length 556
Environment B, the receiver:
bash# tcpdump -i eth0 host "$IP_B" &
IP $IP_A.41025 > $IP_B.8000: UDP, length 7360
IP $IP_A.41025 > $IP_B.8000: UDP, length 14720
IP $IP_B > $IP_A: ICMP $IP_B udp port 8000 unreachable, length 556
In one test, the verification data is printed as follows:
abcd...xyz | 1...
.. |
abcd...xyz |
abcd...opabcd...xyz | ...1472... Not xyzabcd, messages are merged
.. |
The issue is that the test on receive for expected payload pattern
{AB..Z}+ fail for GRO packets if segment payload does not end on a Z.
The issue still exists when using the GRO with -G, but not using the -S
to obtain gsosize. Therefore, a print has been added to remind users.
Changes in v3:
- Simplify description and adjust judgment order.
Changes in v2:
- Fix confusing descriptions.
Signed-off-by: Zhang Yunkai (CGEL ZTE) <zhang.yunkai(a)zte.com.cn>
Reviewed-by: Xu Xin (CGEL ZTE) <xu.xin16(a)zte.com.cn>
Reviewed-by: Yang Yang (CGEL ZTE) <yang.yang29(a)zte.com.cn>
Cc: Xuexin Jiang (CGEL ZTE) <jiang.xuexin(a)zte.com.cn>
---
tools/testing/selftests/net/udpgso_bench_rx.c | 34 +++++++++++++++++++++++----
1 file changed, 29 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c
index f35a924d4a30..3ad18cbc570d 100644
--- a/tools/testing/selftests/net/udpgso_bench_rx.c
+++ b/tools/testing/selftests/net/udpgso_bench_rx.c
@@ -189,26 +189,45 @@ static char sanitized_char(char val)
return (val >= 'a' && val <= 'z') ? val : '.';
}
-static void do_verify_udp(const char *data, int len)
+static void do_verify_udp(const char *data, int start, int len)
{
- char cur = data[0];
+ char cur = data[start];
int i;
/* verify contents */
if (cur < 'a' || cur > 'z')
error(1, 0, "data initial byte out of range");
- for (i = 1; i < len; i++) {
+ for (i = start + 1; i < start + len; i++) {
if (cur == 'z')
cur = 'a';
else
cur++;
- if (data[i] != cur)
+ if (data[i] != cur) {
+ if (cfg_gro_segment && !cfg_expected_gso_size)
+ error(0, 0, "Use -S to obtain gsosize to guide "
+ "splitting and verification.");
+
error(1, 0, "data[%d]: len %d, %c(%hhu) != %c(%hhu)\n",
i, len,
sanitized_char(data[i]), data[i],
sanitized_char(cur), cur);
+ }
+ }
+}
+
+static void do_verify_udp_gro(const char *data, int len, int segment_size)
+{
+ int start = 0;
+
+ while (len - start > 0) {
+ if (len - start > segment_size)
+ do_verify_udp(data, start, segment_size);
+ else
+ do_verify_udp(data, start, len - start);
+
+ start += segment_size;
}
}
@@ -268,7 +287,12 @@ static void do_flush_udp(int fd)
if (ret == 0)
error(1, errno, "recv: 0 byte datagram\n");
- do_verify_udp(rbuf, ret);
+ if (!cfg_gro_segment)
+ do_verify_udp(rbuf, 0, ret);
+ else if (gso_size > 0)
+ do_verify_udp_gro(rbuf, ret, gso_size);
+ else
+ do_verify_udp_gro(rbuf, ret, ret);
}
if (cfg_expected_gso_size && cfg_expected_gso_size != gso_size)
error(1, 0, "recv: bad gso size, got %d, expected %d "
--
2.15.2
This is a follow-up to [1]:
[PATCH v9 0/3] mm: process/cgroup ksm support
which is not in mm-unstable yet (but soon? :) ). I'll be on vacation for
~2 weeks, so sending it out now as reply to [1].
(1) Make PR_SET_MEMORY_MERGE=0 unmerge pages like setting MADV_UNMERGEABLE
does, (2) add a selftest for it and (3) factor out disabling of KSM from
s390/gmap code.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Stefan Roesch <shr(a)devkernel.io>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Cc: Janosch Frank <frankja(a)linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda(a)linux.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: Sven Schnelle <svens(a)linux.ibm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Shuah Khan <shuah(a)kernel.org>
[1] https://lkml.kernel.org/r/20230418051342.1919757-1-shr@devkernel.io
David Hildenbrand (3):
mm/ksm: unmerge and clear VM_MERGEABLE when setting
PR_SET_MEMORY_MERGE=0
selftests/ksm: ksm_functional_tests: add prctl unmerge test
mm/ksm: move disabling KSM from s390/gmap code to KSM code
arch/s390/mm/gmap.c | 20 +------
include/linux/ksm.h | 7 +++
kernel/sys.c | 7 +--
mm/ksm.c | 58 +++++++++++++++++++
.../selftests/mm/ksm_functional_tests.c | 46 +++++++++++++--
5 files changed, 107 insertions(+), 31 deletions(-)
--
2.39.2
This series consolidates the behavior of the 2 drivers that implement
the ethtool MAC Merge layer by making NXP ENETC commit its preemptible
traffic classes to hardware only when MM TX is active (same as Ocelot).
Then, after resolving an issue with the ENETC driver, it restricts user
space from entering 2 states which don't make sense:
- pmac-enabled off tx-enabled on verify-enabled *
- pmac-enabled * tx-enabled off verify-enabled on
Then, it introduces a selftest (ethtool_mm.sh) which puts everything
together and tests all valid configurations known to me.
This is simultaneously the v2 of "[PATCH net-next 0/2] ethtool mm API
improvements":
https://lore.kernel.org/netdev/20230415173454.3970647-1-vladimir.oltean@nxp…
which had caused some problems to openlldp. Those were solved in the
meantime, see:
https://github.com/intel/openlldp/commit/11171b474f6f3cbccac5d608b7f26b32ff…
and of "[RFC PATCH net-next] selftests: forwarding: add a test for MAC
Merge layer":
https://lore.kernel.org/netdev/20230210221243.228932-1-vladimir.oltean@nxp.…
Petr Machata (2):
selftests: forwarding: sch_tbf_*: Add a pre-run hook
selftests: forwarding: generalize bail_on_lldpad from mlxsw
Vladimir Oltean (7):
net: enetc: fix MAC Merge layer remaining enabled until a link down
event
net: enetc: report mm tx-active based on tx-enabled and verify-status
net: enetc: only commit preemptible TCs to hardware when MM TX is
active
net: enetc: include MAC Merge / FP registers in register dump
net: ethtool: mm: sanitize some UAPI configurations
selftests: forwarding: introduce helper for standard ethtool counters
selftests: forwarding: add a test for MAC Merge layer
drivers/net/ethernet/freescale/enetc/enetc.c | 23 +-
drivers/net/ethernet/freescale/enetc/enetc.h | 5 +-
.../ethernet/freescale/enetc/enetc_ethtool.c | 94 +++++-
.../net/ethernet/freescale/enetc/enetc_hw.h | 3 +
net/ethtool/mm.c | 10 +
.../drivers/net/mlxsw/qos_headroom.sh | 3 +-
.../selftests/drivers/net/mlxsw/qos_lib.sh | 28 --
.../selftests/drivers/net/mlxsw/qos_pfc.sh | 3 +-
.../selftests/drivers/net/mlxsw/sch_ets.sh | 3 +-
.../drivers/net/mlxsw/sch_red_core.sh | 1 -
.../drivers/net/mlxsw/sch_red_ets.sh | 2 +-
.../drivers/net/mlxsw/sch_red_root.sh | 2 +-
.../drivers/net/mlxsw/sch_tbf_ets.sh | 6 +-
.../drivers/net/mlxsw/sch_tbf_prio.sh | 6 +-
.../drivers/net/mlxsw/sch_tbf_root.sh | 6 +-
.../testing/selftests/net/forwarding/Makefile | 1 +
.../selftests/net/forwarding/ethtool_mm.sh | 288 ++++++++++++++++++
tools/testing/selftests/net/forwarding/lib.sh | 60 ++++
.../net/forwarding/sch_tbf_etsprio.sh | 4 +
.../selftests/net/forwarding/sch_tbf_root.sh | 4 +
20 files changed, 486 insertions(+), 66 deletions(-)
create mode 100755 tools/testing/selftests/net/forwarding/ethtool_mm.sh
--
2.34.1
This is the basic functionality for iommufd to support
iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.
iommufd_device_replace() allows changing the HWPT associated with the
device to a new IOAS or HWPT. Replace does this in way that failure leaves
things unchanged, and utilizes the iommu iommu_group_replace_domain() API
to allow the iommu driver to perform an optional non-disruptive change.
IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and
used by attach or replace. At this point it isn't very useful since the
HWPT is the same as the automatically managed HWPT from the IOAS. However
a following series will allow userspace to customize the created HWPT.
The implementation is complicated because we have to introduce some
per-iommu_group memory in iommufd and redo how we think about multi-device
groups to be more explicit. This solves all the locking problems in the
prior attempts.
This series is infrastructure work for the following series which:
- Add replace for attach
- Expose replace through VFIO APIs
- Implement driver parameters for HWPT creation (nesting)
Once review of this is complete I will keep it on a side branch and
accumulate the following series when they are ready so we can have a
stable base and make more incremental progress. When we have all the parts
together to get a full implementation it can go to Linus.
This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
v6:
- Go back to the v4 locking arragnment with now both the attach/detach
igroup->locks inside the functions, Kevin says he needs this for a
followup series. This still fixes the syzkaller bug
- Fix two more error unwind locking bugs where
iommufd_object_abort_and_destroy(hwpt) would deadlock or be mislocked.
Make sure fail_nth will catch these mistakes
- Add a patch allowing objects to have different abort than destroy
function, it allows hwpt abort to require the caller to continue
to hold the lock and enforces this with lockdep.
v5: https://lore.kernel.org/r/0-v5-6716da355392+c5-iommufd_alloc_jgg@nvidia.com
- Go back to the v3 version of the code, keep the comment changes from
v4. Syzkaller says the group lock change in v4 didn't work.
- Adjust the fail_nth test to cover the path syzkaller found. We need to
have an ioas with a mapped page installed to inject a failure during
domain attachment.
v4: https://lore.kernel.org/r/0-v4-9cd79ad52ee8+13f5-iommufd_alloc_jgg@nvidia.c…
- Refine comments and commit messages
- Move the group lock into iommufd_hw_pagetable_attach()
- Fix error unwind in iommufd_device_do_replace()
v3: https://lore.kernel.org/r/0-v3-61d41fd9e13e+1f5-iommufd_alloc_jgg@nvidia.com
- Refine comments and commit messages
- Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only
set on success
- Reject replace on non-attached devices
- Add missing __reserved check for IOMMU_HWPT_ALLOC
v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.c…
- Use WARN_ON for the igroup->group test and move that logic to a
function iommufd_group_try_get()
- Change igroup->devices to igroup->device list
Replace will need to iterate over all attached idevs
- Rename to iommufd_group_setup_msi()
- New patch to export iommu_get_resv_regions()
- New patch to use per-device reserved regions instead of per-group
regions
- Split out the reorganizing of iommufd_device_change_pt() from the
replace patch
- Replace uses the per-dev reserved regions
- Use stdev_id in a few more places in the selftest
- Fix error handling in IOMMU_HWPT_ALLOC
- Clarify comments
- Rebase on v6.3-rc1
v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia…
Compared to v3 the diff for the whole series looks like:
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 024ed8ee9939cd..2770087059ba73 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -341,14 +341,15 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev)
{
struct iommufd_hw_pagetable *hwpt = idev->igroup->hwpt;
- lockdep_assert_held(&idev->igroup->lock);
-
+ mutex_lock(&idev->igroup->lock);
list_del(&idev->group_item);
if (list_empty(&idev->igroup->device_list)) {
iommu_detach_group(hwpt->domain, idev->igroup->group);
idev->igroup->hwpt = NULL;
}
iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev);
+ mutex_unlock(&idev->igroup->lock);
+
/* Caller must destroy hwpt */
return hwpt;
}
@@ -515,8 +516,8 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev,
hwpt->auto_domain = true;
*pt_id = hwpt->obj.id;
- mutex_unlock(&ioas->mutex);
iommufd_object_finalize(idev->ictx, &hwpt->obj);
+ mutex_unlock(&ioas->mutex);
return destroy_hwpt;
out_abort:
@@ -610,7 +611,6 @@ EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD);
* This is the same as
* iommufd_device_detach();
* iommufd_device_attach();
- *
* If it fails then no change is made to the attachment. The iommu driver may
* implement this so there is no disruption in translation. This can only be
* called if iommufd_device_attach() has already succeeded.
@@ -633,10 +633,7 @@ void iommufd_device_detach(struct iommufd_device *idev)
{
struct iommufd_hw_pagetable *hwpt;
- mutex_lock(&idev->igroup->lock);
hwpt = iommufd_hw_pagetable_detach(idev);
- mutex_unlock(&idev->igroup->lock);
-
iommufd_hw_pagetable_put(idev->ictx, hwpt);
refcount_dec(&idev->obj.users);
}
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 8aa9ac130b5960..655ed32144f62e 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -26,6 +26,21 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
refcount_dec(&hwpt->ioas->obj.users);
}
+void iommufd_hw_pagetable_abort(struct iommufd_object *obj)
+{
+ struct iommufd_hw_pagetable *hwpt =
+ container_of(obj, struct iommufd_hw_pagetable, obj);
+
+ /* The ioas->mutex must be held until finalize is called. */
+ lockdep_assert_held(&hwpt->ioas->mutex);
+
+ if (!list_empty(&hwpt->hwpt_item)) {
+ list_del_init(&hwpt->hwpt_item);
+ iopt_table_remove_domain(&hwpt->ioas->iopt, hwpt->domain);
+ }
+ iommufd_hw_pagetable_destroy(obj);
+}
+
int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
{
if (hwpt->enforce_cache_coherency)
@@ -50,6 +65,10 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
* Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT
* will be linked to the given ioas and upon return the underlying iommu_domain
* is fully popoulated.
+ *
+ * The caller must hold the ioas->mutex until after
+ * iommufd_object_abort_and_destroy() or iommufd_object_finalize() is called on
+ * the returned hwpt.
*/
struct iommufd_hw_pagetable *
iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
@@ -93,9 +112,6 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
* directly allocate a domain. These drivers do not finish creating the
* domain until attach is completed. Thus we must have this call
* sequence. Once those drivers are fixed this should be removed.
- *
- * Note we hold the igroup->lock here which prevents any other thread
- * from observing igroup->hwpt until we finish setting it up.
*/
if (immediate_attach) {
rc = iommufd_hw_pagetable_attach(hwpt, idev);
@@ -140,10 +156,9 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
mutex_lock(&ioas->mutex);
hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev, false);
- mutex_unlock(&ioas->mutex);
if (IS_ERR(hwpt)) {
rc = PTR_ERR(hwpt);
- goto out_put_ioas;
+ goto out_unlock;
}
cmd->out_hwpt_id = hwpt->obj.id;
@@ -151,11 +166,12 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
if (rc)
goto out_hwpt;
iommufd_object_finalize(ucmd->ictx, &hwpt->obj);
- goto out_put_ioas;
+ goto out_unlock;
out_hwpt:
iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj);
-out_put_ioas:
+out_unlock:
+ mutex_unlock(&ioas->mutex);
iommufd_put_object(&ioas->obj);
out_put_idev:
iommufd_put_object(&idev->obj);
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index f842768b2e250b..21052f64f95649 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -1172,6 +1172,9 @@ int iopt_table_enforce_dev_resv_regions(struct io_pagetable *iopt,
unsigned int num_sw_msi = 0;
int rc;
+ if (iommufd_should_fail())
+ return -EINVAL;
+
down_write(&iopt->iova_rwsem);
/* FIXME: drivers allocate memory but there is no failure propogated */
iommu_get_resv_regions(dev, &resv_regions);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index cb693190bf51c5..ba50eb4661e217 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -261,6 +261,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
struct iommufd_hw_pagetable *
iommufd_hw_pagetable_detach(struct iommufd_device *idev);
void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
+void iommufd_hw_pagetable_abort(struct iommufd_object *obj);
int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd);
static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 694da191e4b155..73a91e96896252 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -24,6 +24,7 @@
struct iommufd_object_ops {
void (*destroy)(struct iommufd_object *obj);
+ void (*abort)(struct iommufd_object *obj);
};
static const struct iommufd_object_ops iommufd_object_ops[];
static struct miscdevice vfio_misc_dev;
@@ -104,7 +105,10 @@ void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj)
void iommufd_object_abort_and_destroy(struct iommufd_ctx *ictx,
struct iommufd_object *obj)
{
- iommufd_object_ops[obj->type].destroy(obj);
+ if (iommufd_object_ops[obj->type].abort)
+ iommufd_object_ops[obj->type].abort(obj);
+ else
+ iommufd_object_ops[obj->type].destroy(obj);
iommufd_object_abort(ictx, obj);
}
@@ -413,6 +417,7 @@ static const struct iommufd_object_ops iommufd_object_ops[] = {
},
[IOMMUFD_OBJ_HW_PAGETABLE] = {
.destroy = iommufd_hw_pagetable_destroy,
+ .abort = iommufd_hw_pagetable_abort,
},
#ifdef CONFIG_IOMMUFD_TEST
[IOMMUFD_OBJ_SELFTEST] = {
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index c07252dbf62d72..8b2c18ac6a2864 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -9,9 +9,6 @@
#include "iommufd_utils.h"
-static void *buffer;
-
-static unsigned long PAGE_SIZE;
static unsigned long HUGEPAGE_SIZE;
#define MOCK_PAGE_SIZE (PAGE_SIZE / 2)
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index 7e1afb6ff9bd8d..d4c552e5694812 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -41,6 +41,8 @@ static int writeat(int dfd, const char *fn, const char *val)
static __attribute__((constructor)) void setup_buffer(void)
{
+ PAGE_SIZE = sysconf(_SC_PAGE_SIZE);
+
BUFFER_SIZE = 2*1024*1024;
buffer = mmap(0, BUFFER_SIZE, PROT_READ | PROT_WRITE,
@@ -579,6 +581,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)
uint32_t stdev_id;
uint32_t idev_id;
uint32_t hwpt_id;
+ __u64 iova;
self->fd = open("/dev/iommu", O_RDWR);
if (self->fd == -1)
@@ -590,6 +593,18 @@ TEST_FAIL_NTH(basic_fail_nth, device)
if (_test_ioctl_ioas_alloc(self->fd, &ioas_id2))
return -1;
+ iova = MOCK_APERTURE_START;
+ if (_test_ioctl_ioas_map(self->fd, ioas_id, buffer, PAGE_SIZE, &iova,
+ IOMMU_IOAS_MAP_FIXED_IOVA |
+ IOMMU_IOAS_MAP_WRITEABLE |
+ IOMMU_IOAS_MAP_READABLE))
+ return -1;
+ if (_test_ioctl_ioas_map(self->fd, ioas_id2, buffer, PAGE_SIZE, &iova,
+ IOMMU_IOAS_MAP_FIXED_IOVA |
+ IOMMU_IOAS_MAP_WRITEABLE |
+ IOMMU_IOAS_MAP_READABLE))
+ return -1;
+
fail_nth_enable();
if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL,
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 9b6dcb921750b6..53b4d3f2d9fc6c 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -19,6 +19,8 @@
static void *buffer;
static unsigned long BUFFER_SIZE;
+static unsigned long PAGE_SIZE;
+
/*
* Have the kernel check the refcount on pages. I don't know why a freshly
* mmap'd anon non-compound page starts out with a ref of 3
Jason Gunthorpe (17):
iommufd: Move isolated msi enforcement to iommufd_device_bind()
iommufd: Add iommufd_group
iommufd: Replace the hwpt->devices list with iommufd_group
iommu: Export iommu_get_resv_regions()
iommufd: Keep track of each device's reserved regions instead of
groups
iommufd: Use the iommufd_group to avoid duplicate MSI setup
iommufd: Make sw_msi_start a group global
iommufd: Move putting a hwpt to a helper function
iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
iommufd: Allow a hwpt to be aborted after allocation
iommufd: Fix locking around hwpt allocation
iommufd: Reorganize iommufd_device_attach into
iommufd_device_change_pt
iommufd: Add iommufd_device_replace()
iommufd: Make destroy_rwsem use a lock class per object type
iommufd: Add IOMMU_HWPT_ALLOC
iommufd/selftest: Return the real idev id from selftest mock_domain
iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
Nicolin Chen (2):
iommu: Introduce a new iommu_group_replace_domain() API
iommufd/selftest: Test iommufd_device_replace()
drivers/iommu/iommu-priv.h | 10 +
drivers/iommu/iommu.c | 41 +-
drivers/iommu/iommufd/device.c | 525 +++++++++++++-----
drivers/iommu/iommufd/hw_pagetable.c | 112 +++-
drivers/iommu/iommufd/io_pagetable.c | 30 +-
drivers/iommu/iommufd/iommufd_private.h | 52 +-
drivers/iommu/iommufd/iommufd_test.h | 6 +
drivers/iommu/iommufd/main.c | 24 +-
drivers/iommu/iommufd/selftest.c | 40 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/iommufd.h | 26 +
tools/testing/selftests/iommu/iommufd.c | 67 ++-
.../selftests/iommu/iommufd_fail_nth.c | 67 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 63 ++-
14 files changed, 853 insertions(+), 211 deletions(-)
create mode 100644 drivers/iommu/iommu-priv.h
base-commit: fd8c1a4aee973e87d890a5861e106625a33b2c4e
--
2.40.0
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
Trace sched related functions, such as enqueue_task_fair, it is necessary to
specify a task instead of the current task which within a given cgroup to a map.
Feng Zhou (2):
bpf: Add bpf_task_under_cgroup helper
selftests/bpf: Add testcase for bpf_task_under_cgroup
include/uapi/linux/bpf.h | 13 +++++
kernel/bpf/verifier.c | 4 +-
kernel/trace/bpf_trace.c | 31 ++++++++++++
tools/include/uapi/linux/bpf.h | 13 +++++
.../bpf/prog_tests/task_under_cgroup.c | 49 +++++++++++++++++++
.../bpf/progs/test_task_under_cgroup.c | 31 ++++++++++++
6 files changed, 140 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c
create mode 100644 tools/testing/selftests/bpf/progs/test_task_under_cgroup.c
--
2.20.1
There's been a bunch of off-list discussions about this, including at
Plumbers. The original plan was to do something involving providing an
ISA string to userspace, but ISA strings just aren't sufficient for a
stable ABI any more: in order to parse an ISA string users need the
version of the specifications that the string is written to, the version
of each extension (sometimes at a finer granularity than the RISC-V
releases/versions encode), and the expected use case for the ISA string
(ie, is it a U-mode or M-mode string). That's a lot of complexity to
try and keep ABI compatible and it's probably going to continue to grow,
as even if there's no more complexity in the specifications we'll have
to deal with the various ISA string parsing oddities that end up all
over userspace.
Instead this patch set takes a very different approach and provides a set
of key/value pairs that encode various bits about the system. The big
advantage here is that we can clearly define what these mean so we can
ensure ABI stability, but it also allows us to encode information that's
unlikely to ever appear in an ISA string (see the misaligned access
performance, for example). The resulting interface looks a lot like
what arm64 and x86 do, and will hopefully fit well into something like
ACPI in the future.
The actual user interface is a syscall, with a vDSO function in front of
it. The vDSO function can answer some queries without a syscall at all,
and falls back to the syscall for cases it doesn't have answers to.
Currently we prepopulate it with an array of answers for all keys and
a CPU set of "all CPUs". This can be adjusted as necessary to provide
fast answers to the most common queries.
An example series in glibc exposing this syscall and using it in an
ifunc selector for memcpy can be found at [1].
I was asked about the performance delta between this and something like
sysfs. I created a small test program [2] and ran it on a Nezha D1
Allwinner board. Doing each operation 100000 times and dividing, these
operations take the following amount of time:
- open()+read()+close() of /sys/kernel/cpu_byteorder: 3.8us
- access("/sys/kernel/cpu_byteorder", R_OK): 1.3us
- riscv_hwprobe() vDSO and syscall: .0094us
- riscv_hwprobe() vDSO with no syscall: 0.0091us
These numbers get farther apart if we query multiple keys, as sysfs will
scale linearly with the number of keys, where the dedicated syscall
stays the same. To frame these numbers, I also did a tight
fork/exec/wait loop, which I measured as 4.8ms. So doing 4
open/read/close operations is a delta of about 0.3%, versus a single vDSO
call is a delta of essentially zero.
[1] https://patchwork.ozlabs.org/project/glibc/list/?series=343050
[2] https://pastebin.com/x84NEKaS
Changes in v6:
- Remove spurious blank line (Conorbot)
- Update copyrights (Paul)
- Update copyrights (Paul)
- Wrap init_hwprobe_vdso_data() in CONFIG_MMU to fix nommu build break
(Conorbot)
- Update copyrights (Paul)
Changes in v5:
- Added tags
- Fixed misuse of ISA_EXT_c as bitmap, changed to use
riscv_isa_extension_available() (Heiko, Conor)
- Document the alternatives approach in the commit message (Conor and
Heiko).
- Fix __init call warnings by making probe_vendor_features() and
thead_feature_probe_func() __init_or_module.
- Fixed compat vdso compilation failure (lkp).
Changes in v4:
- Used real types in syscall prototypes (Arnd)
- Fixed static line break in do_riscv_hwprobe() (Conor)
- Added newlines between documentation lists (Conor)
- Crispen up size types to size_t, and cpu indices to int (Joe)
- Fix copy_from_user() return logic bug (found via kselftests!)
- Add __user to SYSCALL_DEFINE() to fix warning
- More newlines in BASE_BEHAVIOR_IMA documentation (Conor)
- Add newlines to CPUPERF_0 documentation (Conor)
- Add UNSUPPORTED value (Conor)
- Switched from DT to alternatives-based probing (Rob)
- Crispen up cpu index type to always be int (Conor)
- Fixed selftests commit description, no more tiny libc (Mark Brown)
- Fixed selftest syscall prototype types to match v4.
- Added a prototype to fix -Wmissing-prototype warning (lkp(a)intel.com)
- Fixed rv32 build failure (lkp(a)intel.com)
- Make vdso prototype match syscall types update
Changes in v3:
- Updated copyright date in cpufeature.h
- Fixed typo in cpufeature.h comment (Conor)
- Refactored functions so that kernel mode can query too, in
preparation for the vDSO data population.
- Changed the vendor/arch/imp IDs to return a value of -1 on mismatch
rather than failing the whole call.
- Const cpumask pointer in hwprobe_mid()
- Embellished documentation WRT cpu_set and the returned values.
- Renamed hwprobe_mid() to hwprobe_arch_id() (Conor)
- Fixed machine ID doc warnings, changed elements to c:macro:.
- Completed dangling unistd.h comment (Conor)
- Fixed line breaks and minor logic optimization (Conor).
- Use riscv_cached_mxxxid() (Conor)
- Refactored base ISA behavior probe to allow kernel probing as well,
in prep for vDSO data initialization.
- Fixed doc warnings in IMA text list, use :c:macro:.
- Have hwprobe_misaligned return int instead of long.
- Constify cpumask pointer in hwprobe_misaligned()
- Fix warnings in _PERF_O list documentation, use :c:macro:.
- Move include cpufeature.h to misaligned patch.
- Fix documentation mismatch for RISCV_HWPROBE_KEY_CPUPERF_0 (Conor)
- Use for_each_possible_cpu() instead of NR_CPUS (Conor)
- Break early in misaligned access iteration (Conor)
- Increase MISALIGNED_MASK from 2 bits to 3 for possible UNSUPPORTED future
value (Conor)
- Introduced vDSO function
Changes in v2:
- Factored the move of struct riscv_cpuinfo to its own header
- Changed the interface to look more like poll(). Rather than supplying
key_offset and getting back an array of values with numerically
contiguous keys, have the user pre-fill the key members of the array,
and the kernel will fill in the corresponding values. For any key it
doesn't recognize, it will set the key of that element to -1. This
allows usermode to quickly ask for exactly the elements it cares
about, and not get bogged down in a back and forth about newer keys
that older kernels might not recognize. In other words, the kernel
can communicate that it doesn't recognize some of the keys while
still providing the data for the keys it does know.
- Added a shortcut to the cpuset parameters that if a size of 0 and
NULL is provided for the CPU set, the kernel will use a cpu mask of
all online CPUs. This is convenient because I suspect most callers
will only want to act on a feature if it's supported on all CPUs, and
it's a headache to dynamically allocate an array of all 1s, not to
mention a waste to have the kernel loop over all of the offline bits.
- Fixed logic error in if(of_property_read_string...) that caused crash
- Include cpufeature.h in cpufeature.h to avoid undeclared variable
warning.
- Added a _MASK define
- Fix random checkpatch complaints
- Updated the selftests to the new API and added some more.
- Fixed indentation, comments in .S, and general checkpatch complaints.
Evan Green (6):
RISC-V: Move struct riscv_cpuinfo to new header
RISC-V: Add a syscall for HW probing
RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
RISC-V: hwprobe: Support probing of misaligned access performance
selftests: Test the new RISC-V hwprobe interface
RISC-V: Add hwprobe vDSO function and data
Documentation/riscv/hwprobe.rst | 86 +++++++
Documentation/riscv/index.rst | 1 +
arch/riscv/Kconfig | 1 +
arch/riscv/errata/thead/errata.c | 10 +
arch/riscv/include/asm/alternative.h | 5 +
arch/riscv/include/asm/cpufeature.h | 23 ++
arch/riscv/include/asm/hwprobe.h | 13 +
arch/riscv/include/asm/syscall.h | 4 +
arch/riscv/include/asm/vdso/data.h | 17 ++
arch/riscv/include/asm/vdso/gettimeofday.h | 8 +
arch/riscv/include/uapi/asm/hwprobe.h | 37 +++
arch/riscv/include/uapi/asm/unistd.h | 9 +
arch/riscv/kernel/alternative.c | 19 ++
arch/riscv/kernel/compat_vdso/Makefile | 2 +-
arch/riscv/kernel/cpu.c | 8 +-
arch/riscv/kernel/cpufeature.c | 3 +
arch/riscv/kernel/smpboot.c | 1 +
arch/riscv/kernel/sys_riscv.c | 228 +++++++++++++++++-
arch/riscv/kernel/vdso.c | 6 -
arch/riscv/kernel/vdso/Makefile | 4 +
arch/riscv/kernel/vdso/hwprobe.c | 52 ++++
arch/riscv/kernel/vdso/sys_hwprobe.S | 15 ++
arch/riscv/kernel/vdso/vdso.lds.S | 3 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/riscv/Makefile | 58 +++++
.../testing/selftests/riscv/hwprobe/Makefile | 10 +
.../testing/selftests/riscv/hwprobe/hwprobe.c | 90 +++++++
.../selftests/riscv/hwprobe/sys_hwprobe.S | 12 +
28 files changed, 712 insertions(+), 14 deletions(-)
create mode 100644 Documentation/riscv/hwprobe.rst
create mode 100644 arch/riscv/include/asm/cpufeature.h
create mode 100644 arch/riscv/include/asm/hwprobe.h
create mode 100644 arch/riscv/include/asm/vdso/data.h
create mode 100644 arch/riscv/include/uapi/asm/hwprobe.h
create mode 100644 arch/riscv/kernel/vdso/hwprobe.c
create mode 100644 arch/riscv/kernel/vdso/sys_hwprobe.S
create mode 100644 tools/testing/selftests/riscv/Makefile
create mode 100644 tools/testing/selftests/riscv/hwprobe/Makefile
create mode 100644 tools/testing/selftests/riscv/hwprobe/hwprobe.c
create mode 100644 tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S
--
2.25.1
This is the basic functionality for iommufd to support
iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.
iommufd_device_replace() allows changing the HWPT associated with the
device to a new IOAS or HWPT. Replace does this in way that failure leaves
things unchanged, and utilizes the iommu iommu_group_replace_domain() API
to allow the iommu driver to perform an optional non-disruptive change.
IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and
used by attach or replace. At this point it isn't very useful since the
HWPT is the same as the automatically managed HWPT from the IOAS. However
a following series will allow userspace to customize the created HWPT.
The implementation is complicated because we have to introduce some
per-iommu_group memory in iommufd and redo how we think about multi-device
groups to be more explicit. This solves all the locking problems in the
prior attempts.
This series is infrastructure work for the following series which:
- Add replace for attach
- Expose replace through VFIO APIs
- Implement driver parameters for HWPT creation (nesting)
Once review of this is complete I will keep it on a side branch and
accumulate the following series when they are ready so we can have a
stable base and make more incremental progress. When we have all the parts
together to get a full implementation it can go to Linus.
I have this on github:
https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
v3:
- Refine comments and commit messages
- Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only
set on success
- Reject replace on non-attached devices
- Add missing __reserved check for IOMMU_HWPT_ALLOC
v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.c…
- Use WARN_ON for the igroup->group test and move that logic to a
function iommufd_group_try_get()
- Change igroup->devices to igroup->device list
Replace will need to iterate over all attached idevs
- Rename to iommufd_group_setup_msi()
- New patch to export iommu_get_resv_regions()
- New patch to use per-device reserved regions instead of per-group
regions
- Split out the reorganizing of iommufd_device_change_pt() from the
replace patch
- Replace uses the per-dev reserved regions
- Use stdev_id in a few more places in the selftest
- Fix error handling in IOMMU_HWPT_ALLOC
- Clarify comments
- Rebase on v6.3-rc1
v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia…
Jason Gunthorpe (15):
iommufd: Move isolated msi enforcement to iommufd_device_bind()
iommufd: Add iommufd_group
iommufd: Replace the hwpt->devices list with iommufd_group
iommu: Export iommu_get_resv_regions()
iommufd: Keep track of each device's reserved regions instead of
groups
iommufd: Use the iommufd_group to avoid duplicate MSI setup
iommufd: Make sw_msi_start a group global
iommufd: Move putting a hwpt to a helper function
iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
iommufd: Reorganize iommufd_device_attach into
iommufd_device_change_pt
iommufd: Add iommufd_device_replace()
iommufd: Make destroy_rwsem use a lock class per object type
iommufd: Add IOMMU_HWPT_ALLOC
iommufd/selftest: Return the real idev id from selftest mock_domain
iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
Nicolin Chen (2):
iommu: Introduce a new iommu_group_replace_domain() API
iommufd/selftest: Test iommufd_device_replace()
drivers/iommu/iommu-priv.h | 10 +
drivers/iommu/iommu.c | 41 +-
drivers/iommu/iommufd/device.c | 512 +++++++++++++-----
drivers/iommu/iommufd/hw_pagetable.c | 96 +++-
drivers/iommu/iommufd/io_pagetable.c | 27 +-
drivers/iommu/iommufd/iommufd_private.h | 51 +-
drivers/iommu/iommufd/iommufd_test.h | 6 +
drivers/iommu/iommufd/main.c | 17 +-
drivers/iommu/iommufd/selftest.c | 40 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/iommufd.h | 26 +
tools/testing/selftests/iommu/iommufd.c | 64 ++-
.../selftests/iommu/iommufd_fail_nth.c | 52 +-
tools/testing/selftests/iommu/iommufd_utils.h | 61 ++-
14 files changed, 804 insertions(+), 200 deletions(-)
create mode 100644 drivers/iommu/iommu-priv.h
base-commit: fd8c1a4aee973e87d890a5861e106625a33b2c4e
--
2.40.0
From: Chuck Lever <chuck.lever(a)oracle.com>
Circumvent the .gitignore wildcard to avoid warnings about ignored
.kunitconfig files. As far as I can tell, the warnings are harmless
and these files are not actually ignored.
Reported-by: kernel test robot <lkp(a)intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304142337.jc4oUrov-lkp@intel.com/
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
---
.gitignore | 1 +
1 file changed, 1 insertion(+)
Resending... It was not clear to me if this file has a specific
maintainer. I chose to send it to the most recent committer.
diff --git a/.gitignore b/.gitignore
index 70ec6037fa7a..51117ba29c88 100644
--- a/.gitignore
+++ b/.gitignore
@@ -105,6 +105,7 @@ modules.order
!.gitignore
!.mailmap
!.rustfmt.toml
+!.kunitconfig
#
# Generated include files
When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't
respected. This patchset fixes this by regarding the incoming device's
VRF attachment when performing the socket lookups from tc/xdp.
The first two patches are coding changes which facilitate this fix by
factoring out the tc helper's logic which was shared with cg/sk_skb
(which operate correctly).
The third patch contains the actual bugfix.
The fourth patch adds bpf tests for these lookup functions.
Gilad Sever (4):
bpf: factor out socket lookup functions for the TC hookpoint.
bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC
hookpoint
bpf: fix bpf socket lookup from tc/xdp to respect socket VRF bindings
selftests/bpf: Add tc_socket_lookup tests
net/core/filter.c | 132 +++++--
.../bpf/prog_tests/tc_socket_lookup.c | 341 ++++++++++++++++++
.../selftests/bpf/progs/tc_socket_lookup.c | 73 ++++
3 files changed, 525 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_socket_lookup.c
create mode 100644 tools/testing/selftests/bpf/progs/tc_socket_lookup.c
--
2.34.1
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
KUnit tests run in a kthread, with the current->kunit_test pointer set
to the test's context. This allows the kunit_get_current_test() and
kunit_fail_current_test() macros to work. Normally, this pointer is
still valid during test shutdown (i.e., the suite->exit function, and
any resource cleanup). However, if the test has exited early (e.g., due
to a failed assertion), the cleanup is done in the parent KUnit thread,
which does not have an active context.
Instead, in the event test terminates early, run the test exit and
cleanup from a new 'cleanup' kthread, which sets current->kunit_test,
and better isolates the rest of KUnit from issues which arise in test
cleanup.
If a test cleanup function itself aborts (e.g., due to an assertion
failing), there will be no further attempts to clean up: an error will
be logged and the test failed.
This should also make it easier to get access to the KUnit context,
particularly from within resource cleanup functions, which may, for
example, need access to data in test->priv.
Signed-off-by: David Gow <davidgow(a)google.com>
---
This is an updated version of / replacement of "kunit: Set the current
KUnit context when cleaning up", which instead creates a new kthread
for cleanup tasks if the original test kthread is aborted. This protects
us from failed assertions during cleanup, if the test exited early.
Changes since v1:
https://lore.kernel.org/linux-kselftest/20230415091401.681395-1-davidgow@go…
- Move cleanup execution to another kthread
- (Thanks, Benjamin, for pointing out the assertion issues)
---
lib/kunit/test.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 52 insertions(+), 2 deletions(-)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index e2910b261112..caeae0dfd82b 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -423,8 +423,51 @@ static void kunit_try_run_case(void *data)
kunit_run_case_cleanup(test, suite);
}
+static void kunit_try_run_case_cleanup(void *data)
+{
+ struct kunit_try_catch_context *ctx = data;
+ struct kunit *test = ctx->test;
+ struct kunit_suite *suite = ctx->suite;
+
+ current->kunit_test = test;
+
+ kunit_run_case_cleanup(test, suite);
+}
+
+static void kunit_catch_run_case_cleanup(void *data)
+{
+ struct kunit_try_catch_context *ctx = data;
+ struct kunit *test = ctx->test;
+ int try_exit_code = kunit_try_catch_get_result(&test->try_catch);
+
+ /* It is always a failure if cleanup aborts. */
+ kunit_set_failure(test);
+
+ if (try_exit_code) {
+ /*
+ * Test case could not finish, we have no idea what state it is
+ * in, so don't do clean up.
+ */
+ if (try_exit_code == -ETIMEDOUT) {
+ kunit_err(test, "test case cleanup timed out\n");
+ /*
+ * Unknown internal error occurred preventing test case from
+ * running, so there is nothing to clean up.
+ */
+ } else {
+ kunit_err(test, "internal error occurred during test case cleanup: %d\n",
+ try_exit_code);
+ }
+ return;
+ }
+
+ kunit_err(test, "test aborted during cleanup. continuing without cleaning up\n");
+}
+
+
static void kunit_catch_run_case(void *data)
{
+ struct kunit_try_catch cleanup;
struct kunit_try_catch_context *ctx = data;
struct kunit *test = ctx->test;
struct kunit_suite *suite = ctx->suite;
@@ -451,9 +494,16 @@ static void kunit_catch_run_case(void *data)
/*
* Test case was run, but aborted. It is the test case's business as to
- * whether it failed or not, we just need to clean up.
+ * whether it failed or not, we just need to clean up. Do this in a new
+ * try / catch context, in case it asserts, too.
*/
- kunit_run_case_cleanup(test, suite);
+ kunit_try_catch_init(&cleanup,
+ test,
+ kunit_try_run_case_cleanup,
+ kunit_catch_run_case_cleanup);
+ ctx->test = test;
+ ctx->suite = suite;
+ kunit_try_catch_run(&cleanup, ctx);
}
/*
--
2.40.0.634.g4ca3ef3211-goog
*Changes in v15*
- Build fix
*Changes in v14*
- Fix build error caused by #ifdef added at last minute in some configs
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 56 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 481 +++++++
fs/userfaultfd.c | 26 +-
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 53 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 32 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 53 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1326 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
15 files changed, 2105 insertions(+), 23 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
From: Zhang Yunkai (CGEL ZTE) <zhang.yunkai(a)zte.com.cn>
The verification function of this test case is likely to encounter the
following error, which may confuse users. The problem is easily
reproducible in the latest kernel.
Environment A, the sender:
bash# udpgso_bench_tx -l 4 -4 -D "$IP_B"
udpgso_bench_tx: write: Connection refused
Environment B, the receiver:
bash# udpgso_bench_rx -4 -G -S 1472 -v
udpgso_bench_rx: data[1472]: len 17664, a(97) != q(113)
If the packet is captured, you will see:
Environment A, the sender:
bash# tcpdump -i eth0 host "$IP_B" &
IP $IP_A.41025 > $IP_B.8000: UDP, length 1472
IP $IP_A.41025 > $IP_B.8000: UDP, length 1472
IP $IP_B > $IP_A: ICMP $IP_B udp port 8000 unreachable, length 556
Environment B, the receiver:
bash# tcpdump -i eth0 host "$IP_B" &
IP $IP_A.41025 > $IP_B.8000: UDP, length 7360
IP $IP_A.41025 > $IP_B.8000: UDP, length 14720
IP $IP_B > $IP_A: ICMP $IP_B udp port 8000 unreachable, length 556
In one test, the verification data is printed as follows:
abcd...xyz | 1...
.. |
abcd...xyz |
abcd...opabcd...xyz | ...1472... Not xyzabcd, messages are merged
.. |
This is because the sending buffer is buf[64K], and its content is a
loop of A-Z. But maybe only 1472 bytes per send, or more if UDP GSO is
used. The message content does not necessarily end with XYZ, but GRO
will merge these packets, and the -v parameter directly verifies the
entire GRO receive buffer. So we do the validation after the data is split
at the receiving end, just as the application actually uses this feature.
If the sender does not use GSO, each individual segment starts at A,
end at somewhere. Using GSO also has the same problem, and. The data
between each segment during transmission is continuous, but GRO is merged
in the order received, which is not necessarily the order of transmission.
Execution in the same environment does not cause problems, because the
lo device is not NAPI, and does not perform GRO processing. Perhaps it
could be worth supporting to reduce system calls.
bash# tcpdump -i lo host "$IP_self" &
bash# echo udp_gro_receive > /sys/kernel/debug/tracing/set_ftrace_filter
bash# echo function > /sys/kernel/debug/tracing/current_tracer
bash# udpgso_bench_rx -4 -G -S 1472 -v &
bash# udpgso_bench_tx -l 4 -4 -D "$IP_self"
The issue still exists when using the GRO with -G, but not using the -S
to obtain gsosize. Therefore, a print has been added to remind users.
After this issue is resolved, another issue will be encountered and will
be resolved in the next patch.
Environment A, the sender:
bash# udpgso_bench_tx -l 4 -4 -D "$DST"
udpgso_bench_tx: write: Connection refused
Environment B, the receiver:
bash# udpgso_bench_rx -4 -G -S 1472
udp rx: 15 MB/s 256 calls/s
udp rx: 30 MB/s 512 calls/s
udpgso_bench_rx: recv: bad gso size, got -1, expected 1472
(-1 == no gso cmsg))
v2:
- Fix confusing descriptions
Signed-off-by: Zhang Yunkai (CGEL ZTE) <zhang.yunkai(a)zte.com.cn>
Reviewed-by: Xu Xin (CGEL ZTE) <xu.xin16(a)zte.com.cn>
Reviewed-by: Yang Yang (CGEL ZTE) <yang.yang29(a)zte.com.cn>
Cc: Xuexin Jiang (CGEL ZTE) <jiang.xuexin(a)zte.com.cn>
---
tools/testing/selftests/net/udpgso_bench_rx.c | 40 +++++++++++++++++++++------
1 file changed, 31 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c
index f35a924d4a30..6a2026494cdb 100644
--- a/tools/testing/selftests/net/udpgso_bench_rx.c
+++ b/tools/testing/selftests/net/udpgso_bench_rx.c
@@ -189,26 +189,44 @@ static char sanitized_char(char val)
return (val >= 'a' && val <= 'z') ? val : '.';
}
-static void do_verify_udp(const char *data, int len)
+static void do_verify_udp(const char *data, int start, int len)
{
- char cur = data[0];
+ char cur = data[start];
int i;
/* verify contents */
if (cur < 'a' || cur > 'z')
error(1, 0, "data initial byte out of range");
- for (i = 1; i < len; i++) {
+ for (i = start + 1; i < start + len; i++) {
if (cur == 'z')
cur = 'a';
else
cur++;
- if (data[i] != cur)
+ if (data[i] != cur) {
+ if (cfg_gro_segment && !cfg_expected_gso_size)
+ error(0, 0, "Use -S to obtain gsosize, to %s"
+ , "help guide split and verification.");
+
error(1, 0, "data[%d]: len %d, %c(%hhu) != %c(%hhu)\n",
i, len,
sanitized_char(data[i]), data[i],
sanitized_char(cur), cur);
+ }
+ }
+}
+
+static void do_verify_udp_gro(const char *data, int len, int gso_size)
+{
+ int start = 0;
+
+ while (len - start > 0) {
+ if (len - start > gso_size)
+ do_verify_udp(data, start, gso_size);
+ else
+ do_verify_udp(data, start, len - start);
+ start += gso_size;
}
}
@@ -264,16 +282,20 @@ static void do_flush_udp(int fd)
if (cfg_expected_pkt_len && ret != cfg_expected_pkt_len)
error(1, 0, "recv: bad packet len, got %d,"
" expected %d\n", ret, cfg_expected_pkt_len);
+ if (cfg_expected_gso_size && cfg_expected_gso_size != gso_size)
+ error(1, 0, "recv: bad gso size, got %d, expected %d %s",
+ gso_size, cfg_expected_gso_size, "(-1 == no gso cmsg))\n");
if (len && cfg_verify) {
if (ret == 0)
error(1, errno, "recv: 0 byte datagram\n");
- do_verify_udp(rbuf, ret);
+ if (!cfg_gro_segment)
+ do_verify_udp(rbuf, 0, ret);
+ else if (gso_size > 0)
+ do_verify_udp_gro(rbuf, ret, gso_size);
+ else
+ do_verify_udp_gro(rbuf, ret, ret);
}
- if (cfg_expected_gso_size && cfg_expected_gso_size != gso_size)
- error(1, 0, "recv: bad gso size, got %d, expected %d "
- "(-1 == no gso cmsg))\n", gso_size,
- cfg_expected_gso_size);
packets++;
bytes += ret;
--
2.15.2
*Changes in v14*
- Fix build error caused by #ifdef added at last minute in some configs
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 56 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 478 +++++++
fs/userfaultfd.c | 26 +-
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 53 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 32 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 53 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1326 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
15 files changed, 2102 insertions(+), 23 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
From: zhang yunkai (CGEL ZTE) <zhang.yunkai(a)zte.com.cn>
1.Fix verifty exception
Executing the following command fails:
bash# udpgso_bench_tx -l 4 -4 -D "$DST"
bash# udpgso_bench_tx -l 4 -4 -D "$DST" -S 0
bash# udpgso_bench_rx -4 -G -S 1472 -v
udpgso_bench_rx: data[1472]: len 2944, a(97) != q(113)
This is because the sending buffers are not aligned by 26 bytes, and the
GRO is not merged sequentially, and the receiver does not judge this
situation. We think of the receiver to split the data and then validate
it, just as the application actually uses this feature.
2.Fix gsosize exception
Executing the following command fails:
bash# udpgso_bench_tx -l 4 -4 -D "$DST"
bash# udpgso_bench_tx -l 4 -4 -D "$DST" -S 0
bash# udpgso_bench_rx -4 -G -S 1472
udp rx: 15 MB/s 256 calls/s
udp rx: 30 MB/s 512 calls/s
udpgso_bench_rx: recv: bad gso size, got -1, expected 1472
(-1 == no gso cmsg))
IP 192.168.2.199.55238 > 192.168.2.203.8000: UDP, length 7360
IP 192.168.2.199.55238 > 192.168.2.203.8000: UDP, length 1472
IP 192.168.2.199.55238 > 192.168.2.203.8000: UDP, length 1472
IP 192.168.2.199.55238 > 192.168.2.203.8000: UDP, length 4416
IP 192.168.2.199.55238 > 192.168.2.203.8000: UDP, length 11776
IP 192.168.2.199.55238 > 192.168.2.203.8000: UDP, length 20608
recv: got one message len:1472, probably not an error.
recv: got one message len:1472, probably not an error.
This is due to network, NAPI, timer, etc., only one message being received.
We believe that this situation should be normal.
3.Fix packet number exception
bash# udpgso_bench_rx -4 -n 100
bash# udpgso_bench_tx -l 1 -4 -D "$DST"
udpgso_bench_rx: wrong packet number! got 0, expected 100
This is because the packets is cleared after print.
Zhang Yunkai (3):
selftests: net: udpgso_bench_rx: Fix verifty exceptions
selftests: net: udpgso_bench_rx: Fix gsosize exceptions
selftests: net: udpgso_bench_rx: Fix packet number exceptions
tools/testing/selftests/net/udpgso_bench_rx.c | 45 +++++++++++++++++++++------
1 file changed, 35 insertions(+), 10 deletions(-)
--
2.15.2
KUnit tests run in a kthread, with the current->kunit_test pointer set
to the test's context. This allows the kunit_get_current_test() and
kunit_fail_current_test() macros to work. Normally, this pointer is
still valid during test shutdown (i.e., the suite->exit function, and
any resource cleanup). However, if the test has exited early (e.g., due
to a failed assertion), the cleanup is done in the parent KUnit thread,
which does not have an active context.
Fix this by setting the active KUnit context for the duration of the
test shutdown procedure. When the test exits normally, this does
nothing. When run from the KUits previous value (probably NULL)
afterwards.
This should make it easier to get access to the KUnit context,
particularly from within resource cleanup functions, which may, for
example, need access to data in test->priv.
Signed-off-by: David Gow <davidgow(a)google.com>
---
This becomes useful with the current kunit_add_action() implementation,
as actions do not get the KUnit context passed in by default:
https://lore.kernel.org/linux-kselftest/CABVgOSmjs0wLUa4=ErkB9tH8p6A1P6N33b…
I think it's probably correct anyway, though, so we should either do
this, or totally rule out using kunit_get_current_test() here at all, by
resetting current->kunit_test to NULL before running cleanup even in
the normal case.
I've only given this the most cursory testing so far (I'm not sure how
much of the executor innards I want to expose to be able to actually
write a proper test for it), so more eyes and/or suggestions are
welcome.
Cheers,
-- David
---
lib/kunit/test.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index e2910b261112..2d7cad249863 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -392,10 +392,21 @@ static void kunit_case_internal_cleanup(struct kunit *test)
static void kunit_run_case_cleanup(struct kunit *test,
struct kunit_suite *suite)
{
+ /*
+ * If we're no-longer running from within the test kthread() because it failed
+ * or timed out, we still need the context to be okay when running exit and
+ * cleanup functions.
+ */
+ struct kunit *old_current = current->kunit_test;
+
+ current->kunit_test = test;
if (suite->exit)
suite->exit(test);
kunit_case_internal_cleanup(test);
+
+ /* Restore the thread's previous test context (probably NULL or test). */
+ current->kunit_test = old_current;
}
struct kunit_try_catch_context {
--
2.40.0.634.g4ca3ef3211-goog
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
Add support for integer type of accessing variable length array.
Add a selftest to check it.
Feng Zhou (2):
bpf: support access variable length array of integer type
selftests/bpf: Add test to access integer type of variable array
kernel/bpf/btf.c | 8 +++++---
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 20 +++++++++++++++++++
.../selftests/bpf/prog_tests/tracing_struct.c | 2 ++
.../selftests/bpf/progs/tracing_struct.c | 13 ++++++++++++
4 files changed, 40 insertions(+), 3 deletions(-)
--
2.20.1
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 56 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 478 +++++++
fs/userfaultfd.c | 26 +-
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 53 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 32 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 53 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1326 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
15 files changed, 2102 insertions(+), 23 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
Add stackprotector support for all remaining architectures, except s390.
On s390 the stackprotectors are not supported in "global" mode; only
"sysreg" mode which is not suppored in nolibc.
The series also contains a small optimization to strace output during
execution of nolibc-test.
This series is based on the 20230415-nolibc-updates-4a branch of the
nolibc tree.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (6):
selftests/nolibc: reduce syscalls during space padding
tools/nolibc: riscv: add stackprotector support
tools/nolibc: aarch64: add stackprotector support
tools/nolibc: arm: add stackprotector support
tools/nolibc: loongarch: add stackprotector support
tools/nolibc: mips: add stackprotector support
tools/include/nolibc/arch-aarch64.h | 7 ++++++-
tools/include/nolibc/arch-arm.h | 7 ++++++-
tools/include/nolibc/arch-loongarch.h | 7 ++++++-
tools/include/nolibc/arch-mips.h | 8 +++++++-
tools/include/nolibc/arch-riscv.h | 7 ++++++-
tools/testing/selftests/nolibc/Makefile | 5 +++++
tools/testing/selftests/nolibc/nolibc-test.c | 15 +++++++++++----
7 files changed, 47 insertions(+), 9 deletions(-)
---
base-commit: e35214ea9db7477a02e67a8b412e8046534bb97c
change-id: 20230408-nolibc-stackprotector-archs-42244674616e
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
There was a report that the hardware breakpoints and watch points weren't
reporting the debug architecture version as expected, they were reporting
a version of 0 which is not defined in the architecture. This happens
when running in a KVM guest if the host has a debug architecture version
not supported by KVM, it in turn confuses GDB which rejects any debug
architecture version it does not know about.
Add a test that covers that situation and while we're at it reports the
debug architecture version and number of slots available to aid with
figuring out problems that may arise.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/abi/ptrace.c | 32 +++++++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/abi/ptrace.c b/tools/testing/selftests/arm64/abi/ptrace.c
index be952511af22..abe4d58d731d 100644
--- a/tools/testing/selftests/arm64/abi/ptrace.c
+++ b/tools/testing/selftests/arm64/abi/ptrace.c
@@ -20,7 +20,7 @@
#include "../../kselftest.h"
-#define EXPECTED_TESTS 7
+#define EXPECTED_TESTS 11
#define MAX_TPIDRS 2
@@ -132,6 +132,34 @@ static void test_tpidr(pid_t child)
}
}
+static void test_hw_debug(pid_t child, int type, const char *type_name)
+{
+ struct user_hwdebug_state state;
+ struct iovec iov;
+ int slots, arch, ret;
+
+ iov.iov_len = sizeof(state);
+ iov.iov_base = &state;
+
+ /* Should be able to read the values */
+ ret = ptrace(PTRACE_GETREGSET, child, type, &iov);
+ ksft_test_result(ret == 0, "read_%s\n", type_name);
+
+ if (ret == 0) {
+ /* Low 8 bits is the number of slots, next 4 bits the arch */
+ slots = state.dbg_info & 0xff;
+ arch = (state.dbg_info >> 8) & 0xf;
+
+ ksft_print_msg("%s version %d with %d slots\n", type_name,
+ arch, slots);
+
+ /* Zero is not currently architecturally valid */
+ ksft_test_result(arch, "%s_arch_set\n", type_name);
+ } else {
+ ksft_test_result_skip("%s_arch_set\n");
+ }
+}
+
static int do_child(void)
{
if (ptrace(PTRACE_TRACEME, -1, NULL, NULL))
@@ -207,6 +235,8 @@ static int do_parent(pid_t child)
ksft_print_msg("Parent is %d, child is %d\n", getpid(), child);
test_tpidr(child);
+ test_hw_debug(child, NT_ARM_HW_WATCH, "NT_ARM_HW_WATCH");
+ test_hw_debug(child, NT_ARM_HW_BREAK, "NT_ARM_HW_BREAK");
ret = EXIT_SUCCESS;
---
base-commit: e8d018dd0257f744ca50a729e3d042cf2ec9da65
change-id: 20230414-arm64-test-hw-breakpoint-83fe02f607fc
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This is a follow-up to the kunit_defer() parts of 'KUnit device API
proposal'[1], with a number of changes suggested by Matti Vaittinen,
Maxime Ripard and Benjamin Berg.
Most notably, kunit_defer() has been renamed to kunit_add_action(), in
order to match the equivalent devres API[2]. Likewise:
kunit_defer_cancel() has become kunit_remove_action(), and
kunit_defer_trigger() has become kunit_release_action().
The _token() versions of these APIs remain, for the moment, even though
they're a bit more awkward and less useful, as they have two advantages:
1. They're faster, as the action doesn't need to be looked up.
2. They provide more flexibility in the ordering of actions in cases
where several identical actions are interleaved with other, different
actions.
Similarly, the internal_gfp argument remains for now, as this is useful
in implementing kunit_kalloc() and similar.
The implementation now uses a single allocation for both the
kunit_resource and the kunit_action_ctx (previously kunit_defer_ctx).
The 'cancellation token' is now of type 'struct kunit_action_ctx',
instead of void*.
Tests have been added to the kunit-resource-test suite which exercise
this functionality. Similarly, the kunit executor tests and
kunit allocation functions have been updated to make use of this API.
I'd love to hear any further thoughts!
Cheers,
-- David
[1]: https://lore.kernel.org/linux-kselftest/20230325043104.3761770-1-davidgow@g…
[2]: https://docs.kernel.org/driver-api/basics.html#c.devm_add_action
David Gow (3):
kunit: Add kunit_add_action() to defer a call until test exit
kunit: executor_test: Use kunit_add_action()
kunit: kmalloc_array: Use kunit_add_action()
include/kunit/resource.h | 89 +++++++++++++++++++++++++++
lib/kunit/executor_test.c | 12 ++--
lib/kunit/kunit-test.c | 123 +++++++++++++++++++++++++++++++++++++-
lib/kunit/resource.c | 99 ++++++++++++++++++++++++++++++
lib/kunit/test.c | 48 +++------------
5 files changed, 323 insertions(+), 48 deletions(-)
--
2.40.0.348.gf938b09366-goog
On 16.04.23 00:59, Stefan Roesch wrote:
> This adds three new tests to the selftests for KSM. These tests use the
> new prctl API's to enable and disable KSM.
>
> 1) add new prctl flags to prctl header file in tools dir
>
> This adds the new prctl flags to the include file prct.h in the
> tools directory. This makes sure they are available for testing.
>
> 2) add KSM prctl merge test to ksm_tests
>
> This adds the -t option to the ksm_tests program. The -t flag
> allows to specify if it should use madvise or prctl ksm merging.
>
> 3) add two functions for debugging merge outcome for ksm_tests
>
> This adds two functions to report the metrics in /proc/self/ksm_stat
> and /sys/kernel/debug/mm/ksm. The debug output is enabled with the
> -d option.
>
> 4) add KSM prctl test to ksm_functional_tests
>
> This adds a test to the ksm_functional_test that verifies that the
> prctl system call to enable / disable KSM works.
>
> 5) add KSM fork test to ksm_functional_test
>
> Add fork test to verify that the MMF_VM_MERGE_ANY flag is inherited
> by the child process.
>
> Signed-off-by: Stefan Roesch <shr(a)devkernel.io>
> Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
> Cc: David Hildenbrand <david(a)redhat.com>
> Cc: Johannes Weiner <hannes(a)cmpxchg.org>
> Cc: Michal Hocko <mhocko(a)suse.com>
> Cc: Rik van Riel <riel(a)surriel.com>
> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
> ---
Thanks!
Acked-by: David Hildenbrand <david(a)redhat.com>
--
Thanks,
David / dhildenb
Thanks for moving the functional tests. Some more feedback forksm_functional_tests change. Writing tests in the
ksft testing framework can be a bit "special".
I'm seeing some weird test failures due to
prctl(PR_GET_MEMORY_MERGE, 0)
Apparently, these go away when using
prctl(PR_GET_MEMORY_MERGE, 0, 0, 0, 0)
to explicitly force the other values to 0. Most probably, we should do that
for PR_SET_MEMORY_MERGE as well (especially if we check for the arguments as
well).
[...]
> @@ -15,8 +15,10 @@
> #include <errno.h>
> #include <fcntl.h>
> #include <sys/mman.h>
> +#include <sys/prctl.h>
> #include <sys/syscall.h>
> #include <sys/ioctl.h>
> +#include <sys/wait.h>
> #include <linux/userfaultfd.h>
>
> #include "../kselftest.h"
> @@ -326,9 +328,80 @@ static void test_unmerge_uffd_wp(void)
> }
> #endif
>
> +/* Verify that KSM can be enabled / queried with prctl. */
> +static void test_ksm_prctl(void)
Maybe call this "test_prctl", because after all, these are all KSM tests.
> +{
> + bool ret = false;
> + int is_on;
> + int is_off;
> +
> + ksft_print_msg("[RUN] %s\n", __func__);
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 1)) {
> + perror("prctl set");
> + goto out;
> + }
> +
> + is_on = prctl(PR_GET_MEMORY_MERGE, 0);
> + if (prctl(PR_SET_MEMORY_MERGE, 0)) {
> + perror("prctl set");
> + goto out;
> + }
> +
> + is_off = prctl(PR_GET_MEMORY_MERGE, 0);
> + if (is_on && is_off)
> + ret = true;
> +
> +out:
> + ksft_test_result(ret, "prctl get / set\n");
The test fails if the kernel does not support PR_SET_MEMORY_MERGE.
I'd modify this test to:
(1) skip if the first PR_SET_MEMORY_MERGE=1 failed with EINVAL.
(2) distinguish for PR_GET_MEMORY_MERGE whether it returned an error or
whether it returned a wrong value. Feel free to keep that as is, whatever
you prefer.
(3) exit early for all failures, you get exactly one expected skip/pass/fail for the
test and use specific test failure messages.
(4) Pass "0" for all other arguments of prctl.
Something like:
static void test_prctl(void)
{
int ret;
ksft_print_msg("[RUN] %s\n", __func__);
ret = prctl(PR_SET_MEMORY_MERGE, 1, 0, 0, 0);
if (ret < 0 && errno == EINVAL){
ksft_test_result_skip("PR_SET_MEMORY_MERGE not supported\n");
return;
} else if (ret) {
ksft_test_result_fail("PR_SET_MEMORY_MERGE=1 failed\n");
return;
}
ret = prctl(PR_GET_MEMORY_MERGE, 0, 0, 0, 0);
if (ret < 0) {
ksft_test_result_fail("PR_GET_MEMORY_MERGE failed\n");
return;
} else if (ret != 1) {
ksft_test_result_fail("PR_SET_MEMORY_MERGE=1 not effective\n");
return;
}
ret = prctl(PR_SET_MEMORY_MERGE, 0, 0, 0, 0);
if (ret){
ksft_test_result_fail("PR_SET_MEMORY_MERGE=0 failed\n");
return;
}
ret = prctl(PR_GET_MEMORY_MERGE, 0, 0, 0, 0);
if (ret < 0) {
ksft_test_result_fail("PR_GET_MEMORY_MERGE failed\n");
return;
} else if (ret != 0) {
ksft_test_result_fail("PR_SET_MEMORY_MERGE=0 not effective\n");
return;
}
ksft_test_result_pass("Setting/clearing PR_SET_MEMORY_MERGE works\n");
}
> +}
> +
> +/* Verify that prctl ksm flag is inherited. */
> +static void test_ksm_fork(void)
Maybe call it "test_prctl_fork"
> +{
> + int status;
> + bool ret = false;
> + pid_t child_pid;
> +
> + ksft_print_msg("[RUN] %s\n", __func__);
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 1)) {
> + ksft_test_result_fail("prctl failed\n");
> + goto out;
> + }
> +
> + child_pid = fork();
> + if (child_pid == 0) {
> + int is_on =
> +
> + if (!is_on)
> + exit(-1);
> +
> + exit(0);
> + }
> +
> + if (child_pid < 0) {
> + ksft_test_result_fail("child pid < 0\n");
> + goto out;> +
> + if (waitpid(child_pid, &status, 0) < 0 || WEXITSTATUS(status) != 0) {
> + ksft_test_result_fail("wait pid < 0\n");
> + goto out;
> + }
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 0))
> + ksft_test_result_fail("prctl 2 failed\n");
> + else
> + ret = true;
> +
> +out:
> + ksft_test_result(ret, "ksm_flag is inherited\n");
> +}
Again, test fails if kernel support is not around.
I'd modify this test to:
(1) skip if the first PR_SET_MEMORY_MERGE=1 failed with EINVAL just as in the other test.
(2) Use a simple exit(prctl(PR_GET_MEMORY_MERGE, 0, 0, 0, 0)); in the child.
(3) exit early for all failures, you get exactly one expected skip/pass/fail for the
test and use specific test failure messages.
(4) Split up the waitpid() check to test what failed.
(5) Pass "0" for all other arguments of prctl.
Something like:
static void test_prctl_fork(void)
{
int ret, status;
pid_t child_pid;
ksft_print_msg("[RUN] %s\n", __func__);
ret = prctl(PR_SET_MEMORY_MERGE, 1, 0, 0, 0);
if (ret < 0 && errno == EINVAL){
ksft_test_result_skip("PR_SET_MEMORY_MERGE not supported\n");
return;
} else if (ret) {
ksft_test_result_fail("PR_SET_MEMORY_MERGE=1 failed\n");
return;
}
child_pid = fork();
if (!child_pid) {
exit(prctl(PR_GET_MEMORY_MERGE, 0, 0, 0, 0));
} else if (child_pid < 0) {
ksft_test_result_fail("fork() failed\n");
return;
}
if (waitpid(child_pid, &status, 0) < 0) {
ksft_test_result_fail("waitpid() failed\n");
return;
} else if (WEXITSTATUS(status) != 1) {
ksft_test_result_fail("unexpected PR_GET_MEMORY_MERGE result in child\n");
return;
}
if (prctl(PR_SET_MEMORY_MERGE, 0, 0, 0, 0)) {
ksft_test_result_fail("PR_SET_MEMORY_MERGE=0 failed\n");
return;
}
ksft_test_result_pass("PR_SET_MEMORY_MERGE value is inherited\n");
}
> +
> int main(int argc, char **argv)
> {
> - unsigned int tests = 2;
> + unsigned int tests = 6;
Assuming you execute exactly one ksft_test_result_skip/fail/pass on every path of your two
test, this would become "4".
> int err;
>
> #ifdef __NR_userfaultfd
> @@ -358,6 +431,8 @@ int main(int argc, char **argv)
> #ifdef __NR_userfaultfd
> test_unmerge_uffd_wp();
> #endif
> + test_ksm_prctl();
> + test_ksm_fork();
>
With above outlined changes (feel free to integrate what you consider valuable),
on an older kernel I get:
$ sudo ./ksm_functional_tests
TAP version 13
1..5
# [RUN] test_unmerge
ok 1 Pages were unmerged
# [RUN] test_unmerge_discarded
ok 2 Pages were unmerged
# [RUN] test_unmerge_uffd_wp
ok 3 Pages were unmerged
# [RUN] test_prctl
ok 4 # SKIP PR_SET_MEMORY_MERGE not supported
# [RUN] test_prctl_fork
ok 5 # SKIP PR_SET_MEMORY_MERGE not supported
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:2 error:0
On a kernel with your patch #1:
# ./ksm_functional_tests
TAP version 13
1..5
# [RUN] test_unmerge
ok 1 Pages were unmerged
# [RUN] test_unmerge_discarded
ok 2 Pages were unmerged
# [RUN] test_unmerge_uffd_wp
ok 3 Pages were unmerged
# [RUN] test_prctl
ok 4 Setting/clearing PR_SET_MEMORY_MERGE works
# [RUN] test_prctl_fork
ok 5 PR_SET_MEMORY_MERGE value is inherited
# Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0
> err = ksft_get_fail_cnt();
> if (err)
> diff --git a/tools/testing/selftests/mm/ksm_tests.c b/tools/testing/selftests/mm/ksm_tests.c
> index f9eb4d67e0dd..35b3828d44b4 100644
> --- a/tools/testing/selftests/mm/ksm_tests.c
> +++ b/tools/testing/selftests/mm/ksm_tests.c
> @@ -1,6 +1,8 @@
> // SPDX-License-Identifier: GPL-2.0
[...]
Changes to ksm_tests mostly look good. Two comments:
> - if (ksm_merge_pages(map_ptr, page_size * page_count, start_time, timeout))
> + if (ksm_merge_pages(merge_type, map_ptr, page_size * page_count, start_time, timeout))
> goto err_out;
>
> /* verify that the right number of pages are merged */
> if (assert_ksm_pages_count(page_count)) {
> printf("OK\n");
> - munmap(map_ptr, page_size * page_count);
> + if (merge_type == KSM_MERGE_MADVISE)
> + munmap(map_ptr, page_size * page_count);
> + else if (merge_type == KSM_MERGE_PRCTL)
> + prctl(PR_SET_MEMORY_MERGE, 0);
Are you sure that we don't want to unmap here? I'd assume we want to unmap in either way.
[...]
> + case 'd':
> + debug = 1;
> + break;
> case 's':
> size_MB = atoi(optarg);
> if (size_MB <= 0) {
> printf("Size must be greater than 0\n");
> return KSFT_FAIL;
> }
> + case 't':
> + {
> + int tmp = atoi(optarg);
> +
> + if (tmp < 0 || tmp > KSM_MERGE_LAST) {
> + printf("Invalid merge type\n");
> + return KSFT_FAIL;
> + }
> + merge_type = atoi(optarg);
You can simply reuse tmp
merge_type = tmp;
--
Thanks,
David / dhildenb
Patch 1 makes a function static because it is only used in one file.
Patch 2 adds info about the git trees we use to help occasional devs.
Patch 3 removes an unused variable.
Patch 4 removes duplicated entries from the help menu of a tool used in
MPTCP selftests.
Patch 5 removes some ShellCheck warnings in mptcp_join.sh selftest.
Only very minor improvements then.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Geliang Tang (1):
mptcp: make userspace_pm_append_new_local_addr static
Matthieu Baerts (4):
MAINTAINERS: add git trees for MPTCP
mptcp: remove unused 'remaining' variable
selftests: mptcp: remove duplicated entries in usage
selftests: mptcp: join: fix ShellCheck warnings
MAINTAINERS | 2 ++
net/mptcp/options.c | 7 ++-----
net/mptcp/pm_userspace.c | 4 ++--
net/mptcp/protocol.h | 2 --
tools/testing/selftests/net/mptcp/mptcp_connect.c | 8 ++++----
tools/testing/selftests/net/mptcp/mptcp_join.sh | 10 ++++++++--
6 files changed, 18 insertions(+), 15 deletions(-)
---
base-commit: c11d2e718c792468e67389b506451eddf26c2dac
change-id: 20230414-upstream-net-next-20230414-mptcp-small-cleanups-1cba986990b1
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
The existing selftest suite for openvswitch will work for regression
testing the datapath feature bits, but won't test things like adding
interfaces, or the upcall interface. Here, we add some additional
test facilities.
First, extend the ovs-dpctl.py python module to support the OVS_FLOW
and OVS_PACKET netlink families, with some associated messages. These
can be extended over time, but the initial support is for more well
known cases (output, userspace, and CT).
Next, extend the test suite to test upcalls by adding a datapath,
monitoring the upcall socket associated with the datapath, and then
dumping any upcalls that are received. Compare with expected ARP
upcall via arping.
Aaron Conole (3):
selftests: openvswitch: add interface support
selftests: openvswitch: add flow dump support
selftests: openvswitch: add support for upcall testing
.../selftests/net/openvswitch/openvswitch.sh | 89 +-
.../selftests/net/openvswitch/ovs-dpctl.py | 1276 ++++++++++++++++-
2 files changed, 1349 insertions(+), 16 deletions(-)
--
2.39.2
From: Chuck Lever <chuck.lever(a)oracle.com>
Circumvent the .gitignore wildcard to avoid warnings about ignored
.kunitconfig files. As far as I can tell, the warnings are harmless
and these files are not actually ignored.
Reported-by: kernel test robot <lkp(a)intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304142337.jc4oUrov-lkp@intel.com/
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
---
.gitignore | 1 +
1 file changed, 1 insertion(+)
diff --git a/.gitignore b/.gitignore
index 70ec6037fa7a..51117ba29c88 100644
--- a/.gitignore
+++ b/.gitignore
@@ -105,6 +105,7 @@ modules.order
!.gitignore
!.mailmap
!.rustfmt.toml
+!.kunitconfig
#
# Generated include files
From: Jinrong Liang <cloudliang(a)tencent.com>
Hi,
This patch set adds some tests to ensure consistent PMU performance event
filter behavior. Specifically, the patches aim to improve KVM's PMU event
filter by strengthening the test coverage, adding documentation, and making
other small changes.
The first patch replaces int with uint32_t for nevents to ensure consistency
and readability in the code. The second patch adds fixed_counter_bitmap to
create_pmu_event_filter() to support the use of the same creator to control
the use of guest fixed counters. The third patch adds test cases for
unsupported input values in PMU filter, including unsupported "action"
values, unsupported "flags" values, and unsupported "nevents" values. Also,
it tests setting non-existent fixed counters in the fixed bitmap doesn't
fail.
The fourth patch updates the documentation for KVM_SET_PMU_EVENT_FILTER ioctl
to include a detailed description of how fixed performance events are handled
in the pmu filter. The fifth patch adds tests to cover that pmu_event_filter
works as expected when applied to fixed performance counters, even if there
is no fixed counter exists. The sixth patch adds a test to ensure that setting
both generic and fixed performance event filters does not affect the consistency
of the fixed performance filter behavior in KVM. The seventh patch adds a test
to verify the behavior of the pmu event filter when an incomplete
kvm_pmu_event_filter structure is used.
These changes help to ensure that KVM's PMU event filter functions as expected
in all supported use cases. These patches have been tested and verified to
function properly.
Thanks for your review and feedback.
Sincerely,
Jinrong Liang
Jinrong Liang (7):
KVM: selftests: Replace int with uint32_t for nevents
KVM: selftests: Apply create_pmu_event_filter() to fixed ctrs
KVM: selftests: Test unavailable event filters are rejected
KVM: x86/pmu: Add documentation for fixed ctr on PMU filter
KVM: selftests: Check if pmu_event_filter meets expectations on fixed
ctrs
KVM: selftests: Check gp event filters without affecting fixed event
filters
KVM: selftests: Test pmu event filter with incompatible
kvm_pmu_event_filter
Documentation/virt/kvm/api.rst | 21 ++
.../kvm/x86_64/pmu_event_filter_test.c | 239 ++++++++++++++++--
2 files changed, 243 insertions(+), 17 deletions(-)
base-commit: a25497a280bbd7bbcc08c87ddb2b3909affc8402
--
2.31.1
This series replaces the C99 compatibility patch. (See v1 link below).
After the discussion about support C99 and/or GNU89 I came to the
conclusion supporting straight C89 is not very hard.
Instead of validating both C99 and GNU89 in some awkward way only for
somebody requesting true C89 support let's just do it this way.
Feel free to squash all the comment syntax patches together if you
prefer.
All changes in this series are cosmetic only.
To: Willy Tarreau <w(a)1wt.eu>
To: Shuah Khan <shuah(a)kernel.org>
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
This series is based on the "dev" branch of the RCU tree.
---
Changes in v2:
- Target C89 instead of C99
- Link to v1: https://lore.kernel.org/r/20230328-nolibc-c99-v1-1-a8302fb19f19@weissschuh.…
---
Thomas Weißschuh (11):
tools/nolibc: use standard __asm__ statements
tools/nolibc: use __inline__ syntax
tools/nolibc: i386: use C89 comment syntax
tools/nolibc: x86_64: use C89 comment syntax
tools/nolibc: riscv: use C89 comment syntax
tools/nolibc: aarch64: use C89 comment syntax
tools/nolibc: arm: use C89 comment syntax
tools/nolibc: mips: use C89 comment syntax
tools/nolibc: loongarch: use C89 comment syntax
tools/nolibc: use C89 comment syntax
tools/nolibc: validate C89 compatibility
tools/include/nolibc/arch-aarch64.h | 32 ++++++++--------
tools/include/nolibc/arch-arm.h | 42 ++++++++++-----------
tools/include/nolibc/arch-i386.h | 40 ++++++++++----------
tools/include/nolibc/arch-loongarch.h | 38 +++++++++----------
tools/include/nolibc/arch-mips.h | 56 ++++++++++++++--------------
tools/include/nolibc/arch-riscv.h | 40 ++++++++++----------
tools/include/nolibc/arch-x86_64.h | 34 ++++++++---------
tools/include/nolibc/stackprotector.h | 4 +-
tools/include/nolibc/stdlib.h | 18 ++++-----
tools/include/nolibc/string.h | 4 +-
tools/include/nolibc/sys.h | 8 ++--
tools/testing/selftests/nolibc/Makefile | 2 +-
tools/testing/selftests/nolibc/nolibc-test.c | 14 +++----
13 files changed, 166 insertions(+), 166 deletions(-)
---
base-commit: bd5b341f0f69eb4c958ffd48699213c5b9af8145
change-id: 20230328-nolibc-c99-59f44ea45636
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Due to the lack of the SKIP directive in the output, if any of the
parameterized test was skipped, the parser could not recognize that
correctly and was marking the test as PASSED.
This can easily be seen by running the new subtest from patch 1:
$ ./tools/testing/kunit/kunit.py run \
--kunitconfig ./lib/kunit/.kunitconfig *.example_params*
[ ] Starting KUnit Kernel (1/1)...
[ ] ============================================================
[ ] =================== example (1 subtest) ====================
[ ] =================== example_params_test ===================
[ ] [PASSED] example value 2
[ ] [PASSED] example value 1
[ ] [PASSED] example value 0
[ ] =============== [PASSED] example_params_test ===============
[ ] ===================== [PASSED] example =====================
[ ] ============================================================
[ ] Testing complete. Ran 3 tests: passed: 3
$ ./tools/testing/kunit/kunit.py run \
--kunitconfig ./lib/kunit/.kunitconfig *.example_params* \
--raw_output
[ ] Starting KUnit Kernel (1/1)...
KTAP version 1
1..1
# example: initializing suite
KTAP version 1
# Subtest: example
1..1
KTAP version 1
# Subtest: example_params_test
# example_params_test: initializing
ok 1 example value 2
# example_params_test: initializing
ok 2 example value 1
# example_params_test: initializing
ok 3 example value 0
# example_params_test: pass:2 fail:0 skip:1 total:3
ok 1 example_params_test
# Totals: pass:2 fail:0 skip:1 total:3
ok 1 example
After adding the SKIP directive, the report looks as expected:
[ ] Starting KUnit Kernel (1/1)...
[ ] ============================================================
[ ] =================== example (1 subtest) ====================
[ ] =================== example_params_test ===================
[ ] [PASSED] example value 2
[ ] [PASSED] example value 1
[ ] [SKIPPED] example value 0
[ ] =============== [PASSED] example_params_test ===============
[ ] ===================== [PASSED] example =====================
[ ] ============================================================
[ ] Testing complete. Ran 3 tests: passed: 2, skipped: 1
[ ] Starting KUnit Kernel (1/1)...
KTAP version 1
1..1
# example: initializing suite
KTAP version 1
# Subtest: example
1..1
KTAP version 1
# Subtest: example_params_test
# example_params_test: initializing
ok 1 example value 2
# example_params_test: initializing
ok 2 example value 1
# example_params_test: initializing
ok 3 example value 0 # SKIP unsupported param value
# example_params_test: pass:2 fail:0 skip:1 total:3
ok 1 example_params_test
# Totals: pass:2 fail:0 skip:1 total:3
ok 1 example
v2: better align with future support for arbitrary levels of testing
Cc: David Gow <davidgow(a)google.com>
Cc: Rae Moar <rmoar(a)google.com>
Michal Wajdeczko (3):
kunit/test: Add example test showing parameterized testing
kunit: Fix reporting of the skipped parameterized tests
kunit: Update kunit_print_ok_not_ok function
include/kunit/test.h | 1 +
lib/kunit/kunit-example-test.c | 34 +++++++++++++++++++++++++++
lib/kunit/test.c | 43 ++++++++++++++++++++++------------
3 files changed, 63 insertions(+), 15 deletions(-)
--
2.25.1
Building bpf selftests with clang can trigger errors like the following:
CLNG-BPF [test_maps] bpf_iter_netlink.bpf.o
progs/bpf_iter_netlink.c:32:4: error: incompatible pointer types assigning to 'struct sock *' from 'struct sock___17 *' [-Werror,-Wincompatible-pointer-types]
s = &nlk->sk;
^ ~~~~~~~~
1 error generated.
This is due to the fact that bpftool emits duplicate data types with
different names in vmlinux.h (i.e., `struct sock` in this case) and
these types, despite having a different name, represent in fact the same
object.
Add -Wno-incompatible-pointer-types to CLANG_CLAGS to prevent these
errors.
Signed-off-by: Andrea Righi <andrea.righi(a)canonical.com>
---
tools/testing/selftests/bpf/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index b677dcd0b77a..0d9ef819a065 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -356,7 +356,8 @@ BPF_CFLAGS = -g -Werror -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN) \
-I$(abspath $(OUTPUT)/../usr/include)
CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
- -Wno-compare-distinct-pointer-types
+ -Wno-compare-distinct-pointer-types \
+ -Wno-incompatible-pointer-types
$(OUTPUT)/test_l4lb_noinline.o: BPF_CFLAGS += -fno-inline
$(OUTPUT)/test_xdp_noinline.o: BPF_CFLAGS += -fno-inline
--
2.39.2
An error snuck in between two recent conflicting changes:
Until recently ->setup() used negative values to indicate
normal test termination. This was changed in
commit fa10366cc6f4 ("selftests/resctrl: Allow ->setup() to return
errors") that transitioned ->setup() to use negative values
to indicate errors and a new END_OF_TESTS to indicate normal
termination.
commit 42e3b093eb7c ("selftests/resctrl: Fix set up schemata with 100%
allocation on first run in MBM test") continued to use
negative return to indicate normal test termination.
Fix mbm_setup() to use the new END_OF_TESTS to indicate
error-free test termination.
Fixes: 42e3b093eb7c ("selftests/resctrl: Fix set up schemata with 100% allocation on first run in MBM test")
Reported-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Link: https://lore.kernel.org/lkml/bb65cce8-54d7-68c5-ef19-3364ec95392a@linux.int…
Signed-off-by: Reinette Chatre <reinette.chatre(a)intel.com>
---
Hi Shuah,
Apologies, this error snuck in between the two series
merged into kselftest's next this week.
tools/testing/selftests/resctrl/mbm_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
index 146132fa986d..538d35a6485a 100644
--- a/tools/testing/selftests/resctrl/mbm_test.c
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -98,7 +98,7 @@ static int mbm_setup(int num, ...)
/* Run NUM_OF_RUNS times */
if (p->num_of_runs >= NUM_OF_RUNS)
- return -1;
+ return END_OF_TESTS;
/* Set up shemata with 100% allocation on the first run. */
if (p->num_of_runs == 0)
--
2.34.1
Hello Reinette,
The aim of this patch series is to improve the resctrl selftest.
Without these fixes, some unnecessary processing will be executed
and test results will be confusing.
There is no behavior change in test themselves.
[patch 1] Make write_schemata() run to set up shemata with 100% allocation
on first run in MBM test.
[patch 2] The MBA test result message is always output as "ok",
make output message to be "not ok" if MBA check result is failed.
[patch 3] When a child process is created by fork(), the buffer of the
parent process is also copied. Flush the buffer before
executing fork().
[patch 4] An error occurs whether in parents process or child process,
the parents process always kills child process and runs
umount_resctrlfs(), and the child process always waits to be
killed by the parent process.
[patch 5] If a signal received, to cleanup properly before exiting the
parent process, commonize the signal handler registered for
CMT/MBM/MBA tests and reuse it in CAT, also unregister the
signal handler at the end of each test.
[patch 6] Before exiting each test CMT/CAT/MBM/MBA, clear test result
files function cat/cmt/mbm/mba_test_cleanup() are called
twice. Delete once.
This patch series is based on based on the "next" branch of
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git
Pervious versions of this series:
[v1] https://lore.kernel.org/lkml/20220914015147.3071025-1-tan.shaopeng@jp.fujit…
[v2] https://lore.kernel.org/lkml/20221005013933.1486054-1-tan.shaopeng@jp.fujit…
[v3] https://lore.kernel.org/lkml/20221101094341.3383073-1-tan.shaopeng@jp.fujit…
[v4] https://lore.kernel.org/lkml/20221117010541.1014481-1-tan.shaopeng@jp.fujit…
[v5] https://lore.kernel.org/lkml/20230111075802.3556803-1-tan.shaopeng@jp.fujit…
[v6] https://lore.kernel.org/lkml/20230131054655.396270-1-tan.shaopeng@jp.fujits…
[v7] https://lore.kernel.org/lkml/20230213062428.1721572-1-tan.shaopeng@jp.fujit…
[v8] https://lore.kernel.org/lkml/20230215083230.3155897-1-tan.shaopeng@jp.fujit…
Shaopeng Tan (6):
selftests/resctrl: Fix set up schemata with 100% allocation on first
run in MBM test
selftests/resctrl: Return MBA check result and make it to output
message
selftests/resctrl: Flush stdout file buffer before executing fork()
selftests/resctrl: Cleanup properly when an error occurs in CAT test
selftests/resctrl: Commonize the signal handler register/unregister
for all tests
selftests/resctrl: Remove duplicate codes that clear each test result
file
tools/testing/selftests/resctrl/cat_test.c | 29 ++++----
tools/testing/selftests/resctrl/cmt_test.c | 7 +-
tools/testing/selftests/resctrl/fill_buf.c | 14 ----
tools/testing/selftests/resctrl/mba_test.c | 23 +++----
tools/testing/selftests/resctrl/mbm_test.c | 20 +++---
tools/testing/selftests/resctrl/resctrl.h | 2 +
.../testing/selftests/resctrl/resctrl_tests.c | 4 --
tools/testing/selftests/resctrl/resctrl_val.c | 67 ++++++++++++++-----
tools/testing/selftests/resctrl/resctrlfs.c | 5 +-
9 files changed, 96 insertions(+), 75 deletions(-)
--
2.27.0
There is a spelling mistake in an error message string. Fix it.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/mm/uffd-common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/uffd-common.c b/tools/testing/selftests/mm/uffd-common.c
index 61c6250adf93..54dfd92d86cf 100644
--- a/tools/testing/selftests/mm/uffd-common.c
+++ b/tools/testing/selftests/mm/uffd-common.c
@@ -311,7 +311,7 @@ int uffd_test_ctx_init(uint64_t features, const char **errmsg)
ret = userfaultfd_open(&features);
if (ret) {
if (errmsg)
- *errmsg = "possible lack of priviledge";
+ *errmsg = "possible lack of privilege";
return ret;
}
--
2.30.2
Due to the lack of the SKIP directive in the output, if any of the
parameterized test was skipped, the parser could not recognize that
correctly and was marking the test as PASSED.
This can easily be seen by running the new subtest from patch 1:
$ ./tools/testing/kunit/kunit.py run \
--kunitconfig ./lib/kunit/.kunitconfig *.example_params*
[ ] Starting KUnit Kernel (1/1)...
[ ] ============================================================
[ ] =================== example (1 subtest) ====================
[ ] =================== example_params_test ===================
[ ] [PASSED] example value 2
[ ] [PASSED] example value 1
[ ] [PASSED] example value 0
[ ] =============== [PASSED] example_params_test ===============
[ ] ===================== [PASSED] example =====================
[ ] ============================================================
[ ] Testing complete. Ran 3 tests: passed: 3
$ ./tools/testing/kunit/kunit.py run \
--kunitconfig ./lib/kunit/.kunitconfig *.example_params* \
--raw_output
[ ] Starting KUnit Kernel (1/1)...
KTAP version 1
1..1
# example: initializing suite
KTAP version 1
# Subtest: example
1..1
KTAP version 1
# Subtest: example_params_test
# example_params_test: initializing
ok 1 example value 2
# example_params_test: initializing
ok 2 example value 1
# example_params_test: initializing
ok 3 example value 0
# example_params_test: pass:2 fail:0 skip:1 total:3
ok 1 example_params_test
# Totals: pass:2 fail:0 skip:1 total:3
ok 1 example
After adding the SKIP directive, the report looks as expected:
[ ] Starting KUnit Kernel (1/1)...
[ ] ============================================================
[ ] =================== example (1 subtest) ====================
[ ] =================== example_params_test ===================
[ ] [PASSED] example value 2
[ ] [PASSED] example value 1
[ ] [SKIPPED] example value 0
[ ] =============== [PASSED] example_params_test ===============
[ ] ===================== [PASSED] example =====================
[ ] ============================================================
[ ] Testing complete. Ran 3 tests: passed: 2, skipped: 1
[ ] Starting KUnit Kernel (1/1)...
KTAP version 1
1..1
# example: initializing suite
KTAP version 1
# Subtest: example
1..1
KTAP version 1
# Subtest: example_params_test
# example_params_test: initializing
ok 1 example value 2
# example_params_test: initializing
ok 2 example value 1
# example_params_test: initializing
ok 3 example value 0 # SKIP unsupported param value
# example_params_test: pass:2 fail:0 skip:1 total:3
ok 1 example_params_test
# Totals: pass:2 fail:0 skip:1 total:3
ok 1 example
Cc: David Gow <davidgow(a)google.com>
Michal Wajdeczko (3):
kunit/test: Add example test showing parameterized testing
kunit: Fix reporting of the skipped parameterized tests
kunit: Update reporting function to support results from subtests
lib/kunit/kunit-example-test.c | 34 ++++++++++++++++++++++++++++++++++
lib/kunit/test.c | 26 +++++++++++++++++---------
2 files changed, 51 insertions(+), 9 deletions(-)
--
2.25.1
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign().
As a pointer is passed into posix_memalign(),initialize *map to
NULL,to silence a warning about the function's return value being
used as uninitialized (which is not valid anyway because the
error is properly checked before map is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/soft-dirty.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 21d8830c5f24..c99350e110ec 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -80,9 +80,9 @@ static void test_hugepage(int pagemap_fd, int pagesize)
int i, ret;
size_t hpage_len = read_pmd_pagesize();
- map = memalign(hpage_len, hpage_len);
- if (!map)
- ksft_exit_fail_msg("memalign failed\n");
+ ret = posix_memalign((void **)(&map), hpage_len, hpage_len);
+ if (ret < 0)
+ ksft_exit_fail_msg("posix_memalign failed\n");
ret = madvise(map, hpage_len, MADV_HUGEPAGE);
if (ret)
--
2.27.0
When we added fd based file streams we created references to STx_FILENO in
stdio.h but these constants are declared in unistd.h which is the last file
included by the top level nolibc.h meaning those constants are not defined
when we try to build stdio.h. This causes programs using nolibc.h to fail
to build.
Reorder the headers to avoid this issue.
Fixes: d449546c957f ("tools/nolibc: implement fd-based FILE streams")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/include/nolibc/nolibc.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 04739a6293c4..05a228a6ee78 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -99,11 +99,11 @@
#include "sys.h"
#include "ctype.h"
#include "signal.h"
+#include "unistd.h"
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
#include "time.h"
-#include "unistd.h"
#include "stackprotector.h"
/* Used by programs to avoid std includes */
---
base-commit: 7d8214bba44c1aa6a75921a09a691945d26a8d43
change-id: 20230413-nolibc-stdio-fix-fb42de39d099
Best regards,
--
Mark Brown <broonie(a)kernel.org>
On 12.04.23 05:16, Stefan Roesch wrote:
> This adds three new tests to the selftests for KSM. These tests use the
> new prctl API's to enable and disable KSM.
>
> 1) add new prctl flags to prctl header file in tools dir
>
> This adds the new prctl flags to the include file prct.h in the
> tools directory. This makes sure they are available for testing.
>
> 2) add KSM prctl merge test
>
> This adds the -t option to the ksm_tests program. The -t flag
> allows to specify if it should use madvise or prctl ksm merging.
>
> 3) add KSM get merge type test
>
> This adds the -G flag to the ksm_tests program to query the KSM
> status with prctl after KSM has been enabled with prctl.
>
> 4) add KSM fork test
>
> Add fork test to verify that the MMF_VM_MERGE_ANY flag is inherited
> by the child process.
>
> 5) add two functions for debugging merge outcome
>
> This adds two functions to report the metrics in /proc/self/ksm_stat
> and /sys/kernel/debug/mm/ksm.
>
> The debugging can be enabled with the following command line:
> make -C tools/testing/selftests TARGETS="mm" --keep-going \
> EXTRA_CFLAGS=-DDEBUG=1
Would it make sense to instead have a "-D" (if still unused) runtime
options to print this data? Dead code that's not compiled is a bit
unfortunate as it can easily bit-rot.
This patch essentially does two things
1) Add the option to run all tests/benchmarks with the PRCTL instead of
MADVISE
2) Add some functional KSM tests for the new PRCTL (fork, enabling
works, disabling works).
The latter should rather go into ksm_functional_tests().
[...]
>
> -static int check_ksm_unmerge(int mapping, int prot, int timeout, size_t page_size)
> +/* Verify that prctl ksm flag is inherited. */
> +static int check_ksm_fork(void)
> +{
> + int rc = KSFT_FAIL;
> + pid_t child_pid;
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 1)) {
> + perror("prctl");
> + return KSFT_FAIL;
> + }
> +
> + child_pid = fork();
> + if (child_pid == 0) {
> + int is_on = prctl(PR_GET_MEMORY_MERGE, 0);
> +
> + if (!is_on)
> + exit(KSFT_FAIL);
> +
> + exit(KSFT_PASS);
> + }
> +
> + if (child_pid < 0)
> + goto out;
> +
> + if (waitpid(child_pid, &rc, 0) < 0)
> + rc = KSFT_FAIL;
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 0)) {
> + perror("prctl");
> + rc = KSFT_FAIL;
> + }
> +
> +out:
> + if (rc == KSFT_PASS)
> + printf("OK\n");
> + else
> + printf("Not OK\n");
> +
> + return rc;
> +}
> +
> +static int check_ksm_get_merge_type(void)
> +{
> + if (prctl(PR_SET_MEMORY_MERGE, 1)) {
> + perror("prctl set");
> + return 1;
> + }
> +
> + int is_on = prctl(PR_GET_MEMORY_MERGE, 0);
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 0)) {
> + perror("prctl set");
> + return 1;
> + }
> +
> + int is_off = prctl(PR_GET_MEMORY_MERGE, 0);
> +
> + if (is_on && is_off) {
> + printf("OK\n");
> + return KSFT_PASS;
> + }
> +
> + printf("Not OK\n");
> + return KSFT_FAIL;
> +}
Yes, these two are better located in ksm_functional_tests() to just run
them both automatically when the test is executed.
--
Thanks,
David / dhildenb
Hi Shuah and kselftest team,
There are a couple of resctrl selftest patches that are ready for inclusion. They have been percolating on the list for a while without expecting more feedback. All have "Reviewed-by" tags from at least one reviewer. Could you please consider including them into the kselftest repo? There is one minor merge conflict between two of the series for which the snippet below shows resolution.
[PATCH v8 0/6] Some improvements of resctrl selftest
https://lore.kernel.org/lkml/20230215083230.3155897-1-tan.shaopeng@jp.fujit…
[PATCH v2 0/9] selftests/resctrl: Fixes to error handling logic and cleanups
https://lore.kernel.org/lkml/20230215130605.31583-1-ilpo.jarvinen@linux.int…
[PATCH] selftests/resctrl: Use correct exit code when tests fail
https://lore.kernel.org/lkml/20230309145757.2280518-1-peternewman@google.co…
The snippet below shows resolution of the merge conflict between the
first and second series:
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
index 040ca1f9c173..775f9e542ff6 100644
--- a/tools/testing/selftests/resctrl/mbm_test.c
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -98,7 +98,7 @@ static int mbm_setup(int num, ...)
/* Run NUM_OF_RUNS times */
if (p->num_of_runs >= NUM_OF_RUNS)
- return -1;
+ return END_OF_TESTS;
/* Set up shemata with 100% allocation on the first run. */
if (p->num_of_runs == 0)
Thank you very much.
Reinette
Patch 1 avoids scheduling the MPTCP worker on a closed socket on some
edge cases. It fixes issues that can be visible from v5.11.
Patch 2 makes sure the MPTCP worker doesn't try to manipulate
disconnected sockets. This is also a fix for an issue that can be
visible from v5.11.
Patch 3 fixes a NULL pointer dereference when MPTCP FastOpen is used
and an early fallback is done. A fix for v6.2.
Patch 4 improves the stability of the userspace PM selftest for a
subtest added in v6.2.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Matthieu Baerts (1):
selftests: mptcp: userspace pm: uniform verify events
Paolo Abeni (3):
mptcp: use mptcp_schedule_work instead of open-coding it
mptcp: stricter state check in mptcp_worker
mptcp: fix NULL pointer dereference on fastopen early fallback
net/mptcp/fastopen.c | 11 +++++++++--
net/mptcp/options.c | 5 ++---
net/mptcp/protocol.c | 2 +-
net/mptcp/subflow.c | 18 ++++++------------
tools/testing/selftests/net/mptcp/userspace_pm.sh | 2 ++
5 files changed, 20 insertions(+), 18 deletions(-)
---
base-commit: a4506722dc39ca840593f14e3faa4c9ba9408211
change-id: 20230411-upstream-net-20230411-mptcp-fixes-db47f50c2688
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
Nested translation is a hardware feature that is supported by many modern
IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
to get access to the physical address. stage-1 translation table is owned
by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
to stage-1 translation table should be followed by an IOTLB invalidation.
Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.
.-------------. .---------------------------.
| vIOMMU | | Guest I/O page table |
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush --+
'-------------' |
| | V
| | I/O page table pointer in GPA
'-------------'
Guest
------| Shadow |--------------------------|--------
v v v
Host
.-------------. .------------------------.
| pIOMMU | | FS for GIOVA->GPA |
| | '------------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.----------------------------------.
| | | SS for GPA->HPA, unmanaged domain|
| | '----------------------------------'
'-------------'
Where:
- FS = First stage page tables
- SS = Second stage page tables
<Intel VT-d Nested translation>
In IOMMUFD, all the translation tables are tracked by hw_pagetable (hwpt)
and each has an iommu_domain allocated from iommu driver. So in this series
hw_pagetable and iommu_domain means the same thing if no special note.
IOMMUFD has already supported allocating hw_pagetable that is linked with
an IOAS. However, nesting requires IOMMUFD to allow allocating hw_pagetable
with driver specific parameters and interface to sync stage-1 IOTLB as user
owns the stage-1 translation table.
This series is based on the iommu hw info reporting series [1]. It first
introduces new iommu op for allocating domains with user data and the op
for syncing stage-1 IOTLB, and then extend the IOMMUFD internal infrastructure
to accept user_data and parent hwpt, then relay the data to iommu core to
allocate iommu_domain. After it, extend the ioctl IOMMU_HWPT_ALLOC to accept
user data and stage-2 hwpt ID to allocate hwpt. Along with it, ioctl
IOMMU_HWPT_INVALIDATE is added to invalidate stage-1 IOTLB. This is needed
for user-managed hwpts. ioctl IOMMU_DEVICE_GET_HW_INFO is extended to report
the supported hwpt types bitmap to user. Selftest is added as well to cover
the new ioctls.
Complete code can be found in [2], QEMU could can be found in [3].
At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks
them for the help. ^_^. Look forward to your feedbacks.
base-commit: 3dfe670c94c7fc4af42e5c08cdd8a110b594e18e
[1] https://lore.kernel.org/linux-iommu/20230309075358.571567-1-yi.l.liu@intel.…
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv3%2Bnesting
Thanks,
Yi Liu
Lu Baolu (2):
iommu: Add new iommu op to create domains owned by userspace
iommu: Add nested domain support
Nicolin Chen (5):
iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
iommufd/selftest: Add domain_alloc_user() support in iommu mock
iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data
iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
Yi Liu (5):
iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
iommufd: Pass parent hwpt and user_data to
iommufd_hw_pagetable_alloc()
iommufd: IOMMU_HWPT_ALLOC allocation with user data
iommufd: Add IOMMU_HWPT_INVALIDATE
iommufd/device: Report supported hwpt_types
drivers/iommu/iommufd/device.c | 9 +-
drivers/iommu/iommufd/hw_pagetable.c | 242 +++++++++++++++++-
drivers/iommu/iommufd/iommufd_private.h | 16 +-
drivers/iommu/iommufd/iommufd_test.h | 30 +++
drivers/iommu/iommufd/main.c | 7 +-
drivers/iommu/iommufd/selftest.c | 104 +++++++-
include/linux/iommu.h | 11 +
include/uapi/linux/iommufd.h | 65 +++++
tools/testing/selftests/iommu/iommufd.c | 126 ++++++++-
tools/testing/selftests/iommu/iommufd_utils.h | 71 +++++
10 files changed, 654 insertions(+), 27 deletions(-)
--
2.34.1
vfprintf() is complex and so far did not have proper tests.
This series is based on the "dev" branch of the RCU tree.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Changes in v3:
- also provide and use fflush/fclose.
- reject fileno(NULL).
- provide compatability with buffered streams from glibc.
- Link to v2: https://lore.kernel.org/r/20230328-nolibc-printf-test-v2-0-f72bdf210190@wei…
Changes in v2:
- Include <sys/mman.h> for tests.
- Implement FILE* in terms of integer pointers.
- Provide fdopen() and fileno().
- Link to v1: https://lore.kernel.org/lkml/20230328-nolibc-printf-test-v1-0-d7290ec893dd@…
---
Thomas Weißschuh (4):
tools/nolibc: add libc-test binary
tools/nolibc: add wrapper for memfd_create
tools/nolibc: implement fd-based FILE streams
tools/nolibc: add testcases for vfprintf
tools/include/nolibc/stdio.h | 95 ++++++++++++++++++++--------
tools/include/nolibc/sys.h | 23 +++++++
tools/testing/selftests/nolibc/.gitignore | 1 +
tools/testing/selftests/nolibc/Makefile | 5 ++
tools/testing/selftests/nolibc/nolibc-test.c | 86 +++++++++++++++++++++++++
5 files changed, 183 insertions(+), 27 deletions(-)
---
base-commit: a63baab5f60110f3631c98b55d59066f1c68c4f7
change-id: 20230328-nolibc-printf-test-052d5abc2118
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign()
As a pointer is passed into posix_memalign(), initialize *one_page
to NULL to silence a warning about the function's return value being
used as uninitialized (which is not valid anyway because the error
is properly checked before one_page is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/split_huge_page_test.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index cbb5e6893cbf..94c7dffc4d7d 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -96,10 +96,10 @@ void split_pmd_thp(void)
char *one_page;
size_t len = 4 * pmd_pagesize;
size_t i;
+ int ret;
- one_page = memalign(pmd_pagesize, len);
-
- if (!one_page) {
+ ret = posix_memalign((void **)&one_page, pmd_pagesize, len);
+ if (ret < 0) {
printf("Fail to allocate memory\n");
exit(EXIT_FAILURE);
}
--
2.27.0
Here is a series with some fixes and cleanups to resctrl selftests and
rewrite of CAT test into something that really tests CAT working or not
condition.
I know that this series will conflict with some of patches from
Shaopeng Tan that so far have not made it into the kselftest tree. Due
to CAT test rewrite done in this series, some of those patches would no
longer be relevant anyway but some of them are still very valid (I've
not tried to reinvent the fixes in Shaopeng's series in this series).
Ilpo Järvinen (22):
selftests/resctrl: Add resctrl.h into build deps
selftests/resctrl: Check also too low values for CBM bits
selftests/resctrl: Make span unsigned long everywhere
selftests/resctrl: Express span in bytes
selftests/resctrl: Remove duplicated preparation for span arg
selftests/resctrl: Don't use variable argument list for ->setup()
selftests/resctrl: Remove "malloc_and_init_memory" param from
run_fill_buf()
selftests/resctrl: Split run_fill_buf() to alloc, work, and dealloc
helpers
selftests/resctrl: Remove start_buf local variable from buffer alloc
func
selftests/resctrl: Don't pass test name to fill_buf
selftests/resctrl: Add flush_buffer() to fill_buf
selftests/resctrl: Remove test type checks from cat_val()
selftests/resctrl: Refactor get_cbm_mask()
selftests/resctrl: Create cache_alloc_size() helper
selftests/resctrl: Replace count_bits with count_consecutive_bits()
selftests/resctrl: Exclude shareable bits from schemata in CAT test
selftests/resctrl: Pass the real number of tests to show_cache_info()
selftests/resctrl: Move CAT/CMT test global vars to func they are used
selftests/resctrl: Read in less obvious order to defeat prefetch
optimizations
selftests/resctrl: Split measure_cache_vals() function
selftests/resctrl: Split show_cache_info() to test specific and
generic parts
selftests/resctrl: Rewrite Cache Allocation Technology (CAT) test
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/resctrl/cache.c | 154 ++++++------
tools/testing/selftests/resctrl/cat_test.c | 221 +++++++++---------
tools/testing/selftests/resctrl/cmt_test.c | 60 +++--
tools/testing/selftests/resctrl/fill_buf.c | 107 +++++----
tools/testing/selftests/resctrl/mba_test.c | 8 +-
tools/testing/selftests/resctrl/mbm_test.c | 16 +-
tools/testing/selftests/resctrl/resctrl.h | 28 ++-
.../testing/selftests/resctrl/resctrl_tests.c | 34 ++-
tools/testing/selftests/resctrl/resctrl_val.c | 4 +-
tools/testing/selftests/resctrl/resctrlfs.c | 160 ++++++++++---
11 files changed, 447 insertions(+), 347 deletions(-)
--
2.30.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
This patch series adds unit tests for the clk fixed rate basic type and
the clk registration functions that use struct clk_parent_data. To get
there, we add support for loading device tree overlays onto the live DTB
along with probing platform drivers to bind to device nodes in the
overlays. With this series, we're able to exercise some of the code in
the common clk framework that uses devicetree lookups to find parents
and the fixed rate clk code that scans device tree directly and creates
clks. Please review.
I Cced everyone to all the patches so they get the full context. I'm
hoping I can take the whole pile through the clk tree as they almost all
depend on each other.
Changes from v2 (https://lore.kernel.org/r/20230315183729.2376178-1-sboyd@kernel.org):
* Overlays don't depend on __symbols__ node
* Depend on Frank's always create root node if CONFIG_OF series[1]
* Added kernel-doc to KUnit API doc
* Fixed some kernel-doc on functions
* More test cases for fixed rate clk
Changes from v1 (https://lore.kernel.org/r/20230302013822.1808711-1-sboyd@kernel.org):
* Don't depend on UML, use unittest data approach to attach nodes
* Introduce overlay loading API for KUnit
* Move platform_device KUnit code to drivers/base/test
* Use #define macros for constants shared between unit tests and
overlays
* Settle on "test" as a vendor prefix
* Make KUnit wrappers have "_kunit" postfix
Stephen Boyd (11):
of: Add KUnit test to confirm DTB is loaded
of: Add test managed wrappers for of_overlay_apply()/of_node_put()
dt-bindings: vendor-prefixes: Add "test" vendor for KUnit and friends
dt-bindings: test: Add KUnit empty node binding
of: Add a KUnit test for overlays and test managed APIs
platform: Add test managed platform_device/driver APIs
dt-bindings: kunit: Add fixed rate clk consumer test
clk: Add test managed clk provider/consumer APIs
clk: Add KUnit tests for clk fixed rate basic type
dt-bindings: clk: Add KUnit clk_parent_data test
clk: Add KUnit tests for clks registered with struct clk_parent_data
Documentation/dev-tools/kunit/api/clk.rst | 10 +
Documentation/dev-tools/kunit/api/index.rst | 22 +
Documentation/dev-tools/kunit/api/of.rst | 13 +
.../dev-tools/kunit/api/platformdevice.rst | 10 +
.../bindings/clock/test,clk-parent-data.yaml | 47 ++
.../bindings/test/test,clk-fixed-rate.yaml | 35 ++
.../devicetree/bindings/test/test,empty.yaml | 30 ++
.../devicetree/bindings/vendor-prefixes.yaml | 2 +
drivers/base/test/Makefile | 3 +
drivers/base/test/platform_kunit-test.c | 140 ++++++
drivers/base/test/platform_kunit.c | 215 ++++++++
drivers/clk/.kunitconfig | 3 +
drivers/clk/Kconfig | 7 +
drivers/clk/Makefile | 9 +-
drivers/clk/clk-fixed-rate_test.c | 374 ++++++++++++++
drivers/clk/clk-fixed-rate_test.h | 8 +
drivers/clk/clk_kunit.c | 224 +++++++++
drivers/clk/clk_parent_data_test.h | 10 +
drivers/clk/clk_test.c | 459 +++++++++++++++++-
drivers/clk/kunit_clk_fixed_rate_test.dtso | 19 +
drivers/clk/kunit_clk_parent_data_test.dtso | 28 ++
drivers/of/.kunitconfig | 5 +
drivers/of/Kconfig | 19 +
drivers/of/Makefile | 4 +
drivers/of/kunit_overlay_test.dtso | 9 +
drivers/of/of_kunit.c | 125 +++++
drivers/of/of_test.c | 34 ++
drivers/of/overlay_test.c | 110 +++++
include/kunit/clk.h | 28 ++
include/kunit/of.h | 94 ++++
include/kunit/platform_device.h | 15 +
31 files changed, 2109 insertions(+), 2 deletions(-)
create mode 100644 Documentation/dev-tools/kunit/api/clk.rst
create mode 100644 Documentation/dev-tools/kunit/api/of.rst
create mode 100644 Documentation/dev-tools/kunit/api/platformdevice.rst
create mode 100644 Documentation/devicetree/bindings/clock/test,clk-parent-data.yaml
create mode 100644 Documentation/devicetree/bindings/test/test,clk-fixed-rate.yaml
create mode 100644 Documentation/devicetree/bindings/test/test,empty.yaml
create mode 100644 drivers/base/test/platform_kunit-test.c
create mode 100644 drivers/base/test/platform_kunit.c
create mode 100644 drivers/clk/clk-fixed-rate_test.c
create mode 100644 drivers/clk/clk-fixed-rate_test.h
create mode 100644 drivers/clk/clk_kunit.c
create mode 100644 drivers/clk/clk_parent_data_test.h
create mode 100644 drivers/clk/kunit_clk_fixed_rate_test.dtso
create mode 100644 drivers/clk/kunit_clk_parent_data_test.dtso
create mode 100644 drivers/of/.kunitconfig
create mode 100644 drivers/of/kunit_overlay_test.dtso
create mode 100644 drivers/of/of_kunit.c
create mode 100644 drivers/of/of_test.c
create mode 100644 drivers/of/overlay_test.c
create mode 100644 include/kunit/clk.h
create mode 100644 include/kunit/of.h
create mode 100644 include/kunit/platform_device.h
[1] https://lore.kernel.org/r/20230317053415.2254616-1-frowand.list@gmail.com
base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6
prerequisite-patch-id: 33517b96dd0768ab9c265f5721629786354ee320
prerequisite-patch-id: 909221815eeca0a2b0cdd385c76f57e185fb9e33
--
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git
This patch set adds support for using FOU or GUE encapsulation with
an ipip device operating in collect-metadata mode and a set of kfuncs
for controlling encap parameters exposed to a BPF tc-hook.
BPF tc-hooks allow us to read tunnel metadata (like remote IP addresses)
in the ingress path of an externally controlled tunnel interface via
the bpf_skb_get_tunnel_{key,opt} bpf-helpers. Packets can then be
redirected to the same or a different externally controlled tunnel
interface by overwriting metadata via the bpf_skb_set_tunnel_{key,opt}
helpers and a call to bpf_redirect. This enables us to redirect packets
between tunnel interfaces - and potentially change the encapsulation
type - using only a single BPF program.
Today this approach works fine for a couple of tunnel combinations.
For example: redirecting packets between Geneve and GRE interfaces or
GRE and plain ipip interfaces. However, redirecting using FOU or GUE is
not supported today. The ip_tunnel module does not allow us to egress
packets using additional UDP encapsulation from an ipip device in
collect-metadata mode.
Patch 1 lifts this restriction by adding a struct ip_tunnel_encap to
the tunnel metadata. It can be filled by a new BPF kfunc introduced
in Patch 2 and evaluated by the ip_tunnel egress path. This will allow
us to use FOU and GUE encap with externally controlled ipip devices.
Patch 2 introduces two new BPF kfuncs: bpf_skb_{set,get}_fou_encap.
These helpers can be used to set and get UDP encap parameters from the
BPF tc-hook doing the packet redirect.
Patch 3 adds BPF tunnel selftests using the two kfuncs.
---
v3:
- Integrate selftest into test_progs (Alexei)
v2:
- Fixes for checkpatch.pl
- Fixes for kernel test robot
Christian Ehrig (3):
ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip
devices
bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs
selftests/bpf: Test FOU kfuncs for externally controlled ipip devices
include/net/fou.h | 2 +
include/net/ip_tunnels.h | 28 ++--
net/ipv4/Makefile | 2 +-
net/ipv4/fou_bpf.c | 119 ++++++++++++++
net/ipv4/fou_core.c | 5 +
net/ipv4/ip_tunnel.c | 22 ++-
net/ipv4/ipip.c | 1 +
net/ipv6/sit.c | 2 +-
.../selftests/bpf/prog_tests/test_tunnel.c | 153 +++++++++++++++++-
.../selftests/bpf/progs/test_tunnel_kern.c | 117 ++++++++++++++
10 files changed, 432 insertions(+), 19 deletions(-)
create mode 100644 net/ipv4/fou_bpf.c
--
2.39.2
This is the follow-up on [1], adding selftests (testing for known issues
we added workarounds for and other issues that haven't been fixed yet),
fixing sparc64, reverting the workarounds, and perform one cleanup.
The patch from [1] was modified slightly (updated/extended patch
description, dropped one unnecessary NOP instruction from the ASM in
__pte_mkhwwrite()).
Retested on x86_64 and sparc64 (sun4u in QEMU).
I scanned most architectures to make sure their (pte|pmd)_mkdirty()
handling is correct. To be sure, we can run the selftests and find out if
other architectures are still affectes (loongarch was fixed recently as
well).
Based on master for now. I don't expect surprises regarding mm-tress, but
I can rebase if there are any problems.
[1] https://lkml.kernel.org/r/20221212130213.136267-1-david@redhat.com
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
David Hildenbrand (6):
selftests/mm: reuse read_pmd_pagesize() in COW selftest
selftests/mm: mkdirty: test behavior of (pte|pmd)_mkdirty on VMAs
without write permissions
sparc/mm: don't unconditionally set HW writable bit when setting PTE
dirty on 64bit
mm/migrate: revert "mm/migrate: fix wrongly apply write bit after
mkdirty on sparc64"
mm/huge_memory: revert "Partly revert "mm/thp: carry over dirty bit
when thp splits on pmd""
mm/huge_memory: conditionally call maybe_mkwrite() and drop
pte_wrprotect() in __split_huge_pmd_locked()
arch/sparc/include/asm/pgtable_64.h | 116 +++---
mm/huge_memory.c | 16 +-
mm/migrate.c | 2 -
tools/testing/selftests/mm/Makefile | 2 +
tools/testing/selftests/mm/cow.c | 33 +-
tools/testing/selftests/mm/khugepaged.c | 4 +
tools/testing/selftests/mm/mkdirty.c | 379 ++++++++++++++++++
tools/testing/selftests/mm/soft-dirty.c | 3 +
.../selftests/mm/split_huge_page_test.c | 4 +
tools/testing/selftests/mm/vm_util.c | 4 +-
10 files changed, 468 insertions(+), 95 deletions(-)
create mode 100644 tools/testing/selftests/mm/mkdirty.c
--
2.39.2
This patch series introduces a new "isolcpus" partition type to the
existing list of {member, root, isolated} types. The primary reason
of adding this new "isolcpus" partition is to facilitate the
distribution of isolated CPUs down the cgroup v2 hierarchy.
The other non-member partition types have the limitation that their
parents have to be valid partitions too. It will be hard to create a
partition a few layers down the hierarchy.
It is relatively rare to have applications that require creation of
a separate scheduling domain (root). However, it is more common to
have applications that require the use of isolated CPUs (isolated),
e.g. DPDK. One can use the "isolcpus" or "nohz_full" boot command options
to get that statically. Of course, the "isolated" partition is another
way to achieve that dynamically.
Modern container orchestration tools like Kubernetes use the cgroup
hierarchy to manage different containers. If a container needs to use
isolated CPUs, it is hard to get those with existing set of cpuset
partition types. With this patch series, a new "isolcpus" partition
can be created to hold a set of isolated CPUs that can be pull into
other "isolated" partitions.
The "isolcpus" partition is special that there can have at most one
instance of this in a system. It serves as a pool for isolated CPUs
and cannot hold tasks or sub-cpusets underneath it. It is also not
cpu-exclusive so that the isolated CPUs can be distributed down the
sibling hierarchies, though those isolated CPUs will not be useable
until the partition type becomes "isolated".
Once isolated CPUs are needed in a cgroup, the administrator can write
a list of isolated CPUs into its "cpuset.cpus" and change its partition
type to "isolated" to pull in those isolated CPUs from the "isolcpus"
partition and use them in that cgroup. That will make the distribution
of isolated CPUs to cgroups that need them much easier.
In the future, we may be able to extend this special "isolcpus" partition
type to support other isolation attributes like those that can be
specified with the "isolcpus" boot command line and related options.
Waiman Long (5):
cgroup/cpuset: Extract out CS_CPU_EXCLUSIVE & CS_SCHED_LOAD_BALANCE
handling
cgroup/cpuset: Add a new "isolcpus" paritition root state
cgroup/cpuset: Make isolated partition pull CPUs from isolcpus
partition
cgroup/cpuset: Documentation update for the new "isolcpus" partition
cgroup/cpuset: Extend test_cpuset_prs.sh to test isolcpus partition
Documentation/admin-guide/cgroup-v2.rst | 89 ++-
kernel/cgroup/cpuset.c | 548 +++++++++++++++---
.../selftests/cgroup/test_cpuset_prs.sh | 376 ++++++++----
3 files changed, 789 insertions(+), 224 deletions(-)
--
2.31.1
Hello,
The aim of this patch series is to improve the resctrl selftest.
Without these fixes, some unnecessary processing will be executed
and test results will be confusing.
There is no behavior change in test themselves.
[patch 1] Make write_schemata() run to set up shemata with 100% allocation
on first run in MBM test.
[patch 2] The MBA test result message is always output as "ok",
make output message to be "not ok" if MBA check result is failed.
[patch 3] When a child process is created by fork(), the buffer of the
parent process is also copied. Flush the buffer before
executing fork().
[patch 4] An error occurs whether in parents process or child process,
the parents process always kills child process and runs
umount_resctrlfs(), and the child process always waits to be
killed by the parent process.
[patch 5] If a signal received, to cleanup properly before exiting the
parent process, commonize the signal handler registered for
CMT/MBM/MBA tests and reuse it in CAT, also unregister the
signal handler at the end of each test.
[patch 6] Before exiting each test CMT/CAT/MBM/MBA, clear test result
files function cat/cmt/mbm/mba_test_cleanup() are called
twice. Delete once.
This patch series is based on Linux v6.2-rc7.
Difference from v7:
[patch 4]
- Fix commitlog.
[patch 5]
- Fix commitlog.
Pervious versions of this series:
[v1] https://lore.kernel.org/lkml/20220914015147.3071025-1-tan.shaopeng@jp.fujit…
[v2] https://lore.kernel.org/lkml/20221005013933.1486054-1-tan.shaopeng@jp.fujit…
[v3] https://lore.kernel.org/lkml/20221101094341.3383073-1-tan.shaopeng@jp.fujit…
[v4] https://lore.kernel.org/lkml/20221117010541.1014481-1-tan.shaopeng@jp.fujit…
[v5] https://lore.kernel.org/lkml/20230111075802.3556803-1-tan.shaopeng@jp.fujit…
[v6] https://lore.kernel.org/lkml/20230131054655.396270-1-tan.shaopeng@jp.fujits…
[v7] https://lore.kernel.org/lkml/20230213062428.1721572-1-tan.shaopeng@jp.fujit…
Shaopeng Tan (6):
selftests/resctrl: Fix set up schemata with 100% allocation on first
run in MBM test
selftests/resctrl: Return MBA check result and make it to output
message
selftests/resctrl: Flush stdout file buffer before executing fork()
selftests/resctrl: Cleanup properly when an error occurs in CAT test
selftests/resctrl: Commonize the signal handler register/unregister
for all tests
selftests/resctrl: Remove duplicate codes that clear each test result
file
tools/testing/selftests/resctrl/cat_test.c | 29 ++++----
tools/testing/selftests/resctrl/cmt_test.c | 7 +-
tools/testing/selftests/resctrl/fill_buf.c | 14 ----
tools/testing/selftests/resctrl/mba_test.c | 23 +++----
tools/testing/selftests/resctrl/mbm_test.c | 20 +++---
tools/testing/selftests/resctrl/resctrl.h | 2 +
.../testing/selftests/resctrl/resctrl_tests.c | 4 --
tools/testing/selftests/resctrl/resctrl_val.c | 67 ++++++++++++++-----
tools/testing/selftests/resctrl/resctrlfs.c | 5 +-
9 files changed, 96 insertions(+), 75 deletions(-)
--
2.27.0
On Tue, Apr 11, 2023 at 08:16:46PM -0700, Stefan Roesch wrote:
> case PR_SET_VMA:
> error = prctl_set_vma(arg2, arg3, arg4, arg5);
> break;
> +#ifdef CONFIG_KSM
> + case PR_SET_MEMORY_MERGE:
> + if (mmap_write_lock_killable(me->mm))
> + return -EINTR;
> +
> + if (arg2) {
> + int err = ksm_add_mm(me->mm);
> +
> + if (!err)
> + ksm_add_vmas(me->mm);
in the last version of this patch, you reported the error. Now you
swallow the error. I have no idea which is correct, but you've
changed the behaviour without explaining it, so I assume it's wrong.
> + } else {
> + clear_bit(MMF_VM_MERGE_ANY, &me->mm->flags);
> + }
> + mmap_write_unlock(me->mm);
> + break;
> + case PR_GET_MEMORY_MERGE:
> + if (arg2 || arg3 || arg4 || arg5)
> + return -EINVAL;
> +
> + error = !!test_bit(MMF_VM_MERGE_ANY, &me->mm->flags);
> + break;
Why do we need a GET? Just for symmetry, or is there an actual need for
it?
This patch updates the cgroup-v2.rst file to include information about
the new "isolcpus" partition type.
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
Documentation/admin-guide/cgroup-v2.rst | 89 +++++++++++++++++++------
1 file changed, 70 insertions(+), 19 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index f67c0829350b..352a02849fa7 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2225,7 +2225,8 @@ Cpuset Interface Files
========== =====================================
"member" Non-root member of a partition
"root" Partition root
- "isolated" Partition root without load balancing
+ "isolcpus" Partition root for isolated CPUs pool
+ "isolated" Partition root for isolated CPUs
========== =====================================
The root cgroup is always a partition root and its state
@@ -2237,24 +2238,41 @@ Cpuset Interface Files
its descendants except those that are separate partition roots
themselves and their descendants.
+ When set to "isolcpus", the CPUs in that partition root will
+ be in an isolated state without any load balancing from the
+ scheduler. This partition root is special as there can be at
+ most one instance of it in a system and no task or child cpuset
+ is allowed in this cgroup. It acts as a pool of isolated CPUs to
+ be pulled into other "isolated" partitions. The "cpuset.cpus"
+ of an "isolcpus" partition root contains the list of isolated
+ CPUs it holds, where "cpuset.cpus.effective" contains the list
+ of freely available isolated CPUs that are ready to be pull
+ into other "isolated" partition.
+
When set to "isolated", the CPUs in that partition root will
be in an isolated state without any load balancing from the
scheduler. Tasks placed in such a partition with multiple
CPUs should be carefully distributed and bound to each of the
- individual CPUs for optimal performance.
-
- The value shown in "cpuset.cpus.effective" of a partition root
- is the CPUs that the partition root can dedicate to a potential
- new child partition root. The new child subtracts available
- CPUs from its parent "cpuset.cpus.effective".
-
- A partition root ("root" or "isolated") can be in one of the
- two possible states - valid or invalid. An invalid partition
- root is in a degraded state where some state information may
- be retained, but behaves more like a "member".
-
- All possible state transitions among "member", "root" and
- "isolated" are allowed.
+ individual CPUs for optimal performance. The isolated CPUs can
+ come from either the parent partition root or from an "isolcpus"
+ partition if the parent cannot satisfy its request.
+
+ The value shown in "cpuset.cpus.effective" of a partition root is
+ the CPUs that the partition root can dedicate to a potential new
+ child partition root. The new child partition subtracts available
+ CPUs from its parent "cpuset.cpus.effective". An exception is
+ an "isolated" partition that pulls its isolated CPUs from the
+ "isolcpus" partition root that is not its direct parent.
+
+ A partition root can be in one of the two possible states -
+ valid or invalid. An invalid partition root is in a degraded
+ state where some state information may be retained, but behaves
+ more like a "member".
+
+ All possible state transitions among "member", "root", "isolcpus"
+ and "isolated" are allowed. However, the partition root may
+ not be valid if the corresponding prerequisite conditions are
+ not met.
On read, the "cpuset.cpus.partition" file can show the following
values.
@@ -2262,16 +2280,18 @@ Cpuset Interface Files
============================= =====================================
"member" Non-root member of a partition
"root" Partition root
- "isolated" Partition root without load balancing
+ "isolcpus" Partition root for isolated CPUs pool
+ "isolated" Partition root for isolated CPUs
"root invalid (<reason>)" Invalid partition root
+ "isolcpus invalid (<reason>)" Invalid isolcpus partition root
"isolated invalid (<reason>)" Invalid isolated partition root
============================= =====================================
In the case of an invalid partition root, a descriptive string on
- why the partition is invalid is included within parentheses.
+ why the partition is invalid may be included within parentheses.
- For a partition root to become valid, the following conditions
- must be met.
+ For a "root" partition root to become valid, the following
+ conditions must be met.
1) The "cpuset.cpus" is exclusive with its siblings , i.e. they
are not shared by any of its siblings (exclusivity rule).
@@ -2281,6 +2301,37 @@ Cpuset Interface Files
4) The "cpuset.cpus.effective" cannot be empty unless there is
no task associated with this partition.
+ A valid "isolcpus" partition root requires the following
+ conditions.
+
+ 1) The parent cgroup is a valid partition root.
+ 2) The "cpuset.cpus" must be a subset of parent's "cpuset.cpus"
+ including an empty cpu list.
+ 3) There can be no more than one valid "isolcpus" partition.
+ 4) No task or child cpuset is allowed.
+
+ Note that an "isolcpus" partition is not exclusive and its
+ isolated CPUs can be distributed down sibling cgroups even
+ though they may not appear in their "cpuset.cpus.effective".
+
+ A valid "isolated" partition root can pull isolated CPUs from
+ either its parent partition or from the "isolcpus" partition.
+ It also requires the following conditions to be met.
+
+ 1) The "cpuset.cpus" is exclusive with its siblings , i.e. they
+ are not shared by any of its siblings (exclusivity rule).
+ 2) The "cpuset.cpus" is not empty and must be a subset of
+ parent's "cpuset.cpus".
+ 3) The "cpuset.cpus.effective" cannot be empty unless there is
+ no task associated with this partition.
+
+ If pulling isolated CPUS from "isolcpus" partition,
+ the "cpuset.cpus" must also be a subset of "isolcpus"
+ partition's "cpuset.cpus" and all the requested CPUs must
+ be available for pulling, i.e. in "isolcpus" partition's
+ "cpuset.cpus.effective". In this case, its hierarchical parent
+ does not need to be a valid partition root.
+
External events like hotplug or changes to "cpuset.cpus" can
cause a valid partition root to become invalid and vice versa.
Note that a task cannot be moved to a cgroup with empty
--
2.31.1
With the addition of a new "isolcpus" partition in a previous patch,
this patch adds the capability for a privileged user to pull isolated
CPUs from the "isolcpus" partition to an "isolated" partition if its
parent cannot satisfy its request directly.
The following conditions must be true for the pulling of isolated CPUs
from "isolcpus" partition to be successful.
(1) The value of "cpuset.cpus" must still be a subset of its parent's
"cpuset.cpus" to ensure proper inheritance even though these CPUs
cannot be used until the cpuset becomes an "isolated" partition.
(2) All the CPUs in "cpuset.cpus" are freely available in the
"isolcpus" partition, i.e. in its "cpuset.cpus.effective" and not
yet claimed by other isolated partitions.
With this change, the CPUs in an "isolated" partition can either
come from the "isolcpus" partition or from its direct parent, but not
both. Now the parent of an isolated partition does not need to be a
partition root anymore.
Because of the cpu exclusive nature of an "isolated" partition, these
isolated CPUs cannot be distributed to other siblings of that isolated
partition.
Changes to "cpuset.cpus" of such an isolated partition is allowed as
long as all the newly requested CPUs can be granted from the "isolcpus"
partition. Otherwise, the partition will become invalid.
This makes the management and distribution of isolated CPUs to those
applications that require them much easier.
An "isolated" partition that pulls CPUs from the special "isolcpus"
partition can now have 2 parents - the "isolcpus" partition where
it gets its isolated CPUs and its hierarchical parent where it gets
all the other resources. However, such an "isolated" partition cannot
have subpartitions as all the CPUs from "isolcpus" must be in the same
isolated state.
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
kernel/cgroup/cpuset.c | 282 ++++++++++++++++++++++++++++++++++++++---
1 file changed, 264 insertions(+), 18 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 444eae3a9a6b..a5bbd43ed46e 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -101,6 +101,7 @@ enum prs_errcode {
PERR_ISOLCPUS,
PERR_ISOLTASK,
PERR_ISOLCHILD,
+ PERR_ISOPARENT,
};
static const char * const perr_strings[] = {
@@ -114,6 +115,7 @@ static const char * const perr_strings[] = {
[PERR_ISOLCPUS] = "An isolcpus partition is already present",
[PERR_ISOLTASK] = "Isolcpus partition can't have tasks",
[PERR_ISOLCHILD] = "Isolcpus partition can't have children",
+ [PERR_ISOPARENT] = "Isolated/isolcpus parent can't have subpartition",
};
struct cpuset {
@@ -1333,6 +1335,195 @@ static void update_partition_sd_lb(struct cpuset *cs, int old_prs)
rebuild_sched_domains_locked();
}
+/*
+ * isolcpus_pull - Enable or disable pulling of isolated cpus from isolcpus
+ * @cs: the cpuset to update
+ * @cmd: the command code (only partcmd_enable or partcmd_disable)
+ * Return: 1 if successful, 0 if error
+ *
+ * Note that pulling isolated cpus from isolcpus or cpus from parent does
+ * not require rebuilding sched domains. So we can change the flags directly.
+ */
+static int isolcpus_pull(struct cpuset *cs, enum subparts_cmd cmd)
+{
+ struct cpuset *parent = parent_cs(cs);
+
+ if (!isolcpus_cs)
+ return 0;
+
+ /*
+ * To enable pulling of isolated CPUs from isolcpus, cpus_allowed
+ * must be a subset of both its parent's cpus_allowed and isolcpus_cs's
+ * effective_cpus and the user has sysadmin privilege.
+ */
+ if ((cmd == partcmd_enable) && capable(CAP_SYS_ADMIN) &&
+ cpumask_subset(cs->cpus_allowed, isolcpus_cs->effective_cpus) &&
+ cpumask_subset(cs->cpus_allowed, parent->cpus_allowed)) {
+ /*
+ * Move cpus from effective_cpus to subparts_cpus & make
+ * cs a child of isolcpus partition.
+ */
+ spin_lock_irq(&callback_lock);
+ cpumask_andnot(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, cs->cpus_allowed);
+ cpumask_or(isolcpus_cs->subparts_cpus,
+ isolcpus_cs->subparts_cpus, cs->cpus_allowed);
+ cpumask_copy(cs->effective_cpus, cs->cpus_allowed);
+ isolcpus_cs->nr_subparts_cpus
+ = cpumask_weight(isolcpus_cs->subparts_cpus);
+
+ if (cs->use_parent_ecpus) {
+ cs->use_parent_ecpus = false;
+ parent->child_ecpus_count--;
+ }
+ list_add(&cs->isol_sibling, &isol_children);
+ clear_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
+ spin_unlock_irq(&callback_lock);
+ return 1;
+ }
+
+ if ((cmd == partcmd_disable) && !list_empty(&cs->isol_sibling)) {
+ /*
+ * This can be called after isolcpus shrinks its cpu list.
+ * So not all the cpus should be returned back to isolcpus.
+ */
+ WARN_ON_ONCE(cs->partition_root_state != PRS_ISOLATED);
+ spin_lock_irq(&callback_lock);
+ cpumask_andnot(isolcpus_cs->subparts_cpus,
+ isolcpus_cs->subparts_cpus, cs->cpus_allowed);
+ cpumask_or(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, cs->effective_cpus);
+ cpumask_and(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus,
+ isolcpus_cs->cpus_allowed);
+ cpumask_and(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, cpu_active_mask);
+ isolcpus_cs->nr_subparts_cpus
+ = cpumask_weight(isolcpus_cs->subparts_cpus);
+
+ if (!cpumask_and(cs->effective_cpus, parent->effective_cpus,
+ cs->cpus_allowed)) {
+ cs->use_parent_ecpus = true;
+ parent->child_ecpus_count++;
+ cpumask_copy(cs->effective_cpus,
+ parent->effective_cpus);
+ }
+ list_del_init(&cs->isol_sibling);
+ cs->partition_root_state = PRS_INVALID_ISOLATED;
+ cs->prs_err = PERR_INVCPUS;
+
+ set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
+ clear_bit(CS_CPU_EXCLUSIVE, &cs->flags);
+ spin_unlock_irq(&callback_lock);
+ return 1;
+ }
+ return 0;
+}
+
+static void isolcpus_disable(void)
+{
+ struct cpuset *child, *next;
+
+ list_for_each_entry_safe(child, next, &isol_children, isol_sibling)
+ WARN_ON_ONCE(isolcpus_pull(child, partcmd_disable));
+
+ isolcpus_cs = NULL;
+}
+
+/*
+ * isolcpus_cpus_update - cpuset.cpus change in isolcpus partition
+ */
+static void isolcpus_cpus_update(struct cpuset *cs)
+{
+ struct cpuset *child, *next;
+
+ if (WARN_ON_ONCE(isolcpus_cs != cs))
+ return;
+
+ if (list_empty(&isol_children))
+ return;
+
+ /*
+ * Remove child isolated partitions that are not fully covered by
+ * subparts_cpus.
+ */
+ list_for_each_entry_safe(child, next, &isol_children,
+ isol_sibling) {
+ if (cpumask_subset(child->cpus_allowed,
+ cs->subparts_cpus))
+ continue;
+
+ isolcpus_pull(child, partcmd_disable);
+ }
+}
+
+/*
+ * isolated_cpus_update - cpuset.cpus change in isolated partition
+ *
+ * Return: 1 if no further action needs, 0 otherwise
+ */
+static int isolated_cpus_update(struct cpuset *cs, struct cpumask *newmask,
+ struct tmpmasks *tmp)
+{
+ struct cpumask *addmask = tmp->addmask;
+ struct cpumask *delmask = tmp->delmask;
+
+ if (WARN_ON_ONCE(cs->partition_root_state != PRS_ISOLATED) ||
+ list_empty(&cs->isol_sibling))
+ return 0;
+
+ if (WARN_ON_ONCE(!isolcpus_cs) || cpumask_empty(newmask)) {
+ isolcpus_pull(cs, partcmd_disable);
+ return 0;
+ }
+
+ if (cpumask_andnot(addmask, newmask, cs->cpus_allowed)) {
+ /*
+ * Check if isolcpus partition can provide the new CPUs
+ */
+ if (!cpumask_subset(addmask, isolcpus_cs->cpus_allowed) ||
+ cpumask_intersects(addmask, isolcpus_cs->subparts_cpus)) {
+ isolcpus_pull(cs, partcmd_disable);
+ return 0;
+ }
+
+ /*
+ * Pull addmask isolated CPUs from isolcpus partition
+ */
+ spin_lock_irq(&callback_lock);
+ cpumask_andnot(isolcpus_cs->subparts_cpus,
+ isolcpus_cs->subparts_cpus, addmask);
+ cpumask_andnot(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, addmask);
+ isolcpus_cs->nr_subparts_cpus
+ = cpumask_weight(isolcpus_cs->subparts_cpus);
+ spin_unlock_irq(&callback_lock);
+ }
+
+ if (cpumask_andnot(tmp->delmask, cs->cpus_allowed, newmask)) {
+ /*
+ * Return isolated CPUs back to isolcpus partition
+ */
+ spin_lock_irq(&callback_lock);
+ cpumask_or(isolcpus_cs->subparts_cpus,
+ isolcpus_cs->subparts_cpus, delmask);
+ cpumask_or(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, delmask);
+ cpumask_and(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, cpu_active_mask);
+ isolcpus_cs->nr_subparts_cpus
+ = cpumask_weight(isolcpus_cs->subparts_cpus);
+ spin_unlock_irq(&callback_lock);
+ }
+
+ spin_lock_irq(&callback_lock);
+ cpumask_copy(cs->cpus_allowed, newmask);
+ cpumask_andnot(cs->effective_cpus, newmask, cs->subparts_cpus);
+ cpumask_and(cs->effective_cpus, cs->effective_cpus, cpu_active_mask);
+ spin_unlock_irq(&callback_lock);
+ return 1;
+}
+
/**
* update_parent_subparts_cpumask - update subparts_cpus mask of parent cpuset
* @cs: The cpuset that requests change in partition root state
@@ -1579,7 +1770,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
spin_unlock_irq(&callback_lock);
if ((isolcpus_cs == cs) && (cs->partition_root_state != PRS_ISOLCPUS))
- isolcpus_cs = NULL;
+ isolcpus_disable();
if (adding || deleting)
update_tasks_cpumask(parent, tmp->addmask);
@@ -1625,6 +1816,12 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
struct cpuset *parent = parent_cs(cp);
bool update_parent = false;
+ /*
+ * Skip isolated cpuset that pull isolated CPUs from isolcpus
+ */
+ if (!list_empty(&cp->isol_sibling))
+ continue;
+
compute_effective_cpumask(tmp->new_cpus, cp, parent);
/*
@@ -1742,7 +1939,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
WARN_ON(!is_in_v2_mode() &&
!cpumask_equal(cp->cpus_allowed, cp->effective_cpus));
- update_tasks_cpumask(cp, tmp->new_cpus);
+ update_tasks_cpumask(cp, cp->effective_cpus);
/*
* On legacy hierarchy, if the effective cpumask of any non-
@@ -1888,6 +2085,10 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
return retval;
if (cs->partition_root_state) {
+ if (!list_empty(&cs->isol_sibling) &&
+ isolated_cpus_update(cs, trialcs->cpus_allowed, &tmp))
+ goto update_hier; /* CPUs update done */
+
if (invalidate)
update_parent_subparts_cpumask(cs, partcmd_invalidate,
NULL, &tmp);
@@ -1920,6 +2121,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
}
spin_unlock_irq(&callback_lock);
+update_hier:
#ifdef CONFIG_CPUMASK_OFFSTACK
/* Now trialcs->cpus_allowed is available */
tmp.new_cpus = trialcs->cpus_allowed;
@@ -1928,8 +2130,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
/* effective_cpus will be updated here */
update_cpumasks_hier(cs, &tmp, false);
- if (cs->partition_root_state) {
- bool force = (cs->partition_root_state == PRS_ISOLCPUS);
+ if (cs->partition_root_state && list_empty(&cs->isol_sibling)) {
struct cpuset *parent = parent_cs(cs);
/*
@@ -1937,8 +2138,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
* cpusets if they use parent's effective_cpus or when
* the current cpuset is an isolcpus partition.
*/
- if (parent->child_ecpus_count || force)
- update_sibling_cpumasks(parent, cs, &tmp, force);
+ if (cs->partition_root_state == PRS_ISOLCPUS) {
+ update_sibling_cpumasks(parent, cs, &tmp, true);
+ isolcpus_cpus_update(cs);
+ } else if (parent->child_ecpus_count) {
+ update_sibling_cpumasks(parent, cs, &tmp, false);
+ }
/* Update CS_SCHED_LOAD_BALANCE and/or sched_domains */
update_partition_sd_lb(cs, old_prs);
@@ -2307,7 +2512,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
return err;
}
-/**
+/*
* update_prstate - update partition_root_state
* @cs: the cpuset to update
* @new_prs: new partition root state
@@ -2325,13 +2530,10 @@ static int update_prstate(struct cpuset *cs, int new_prs)
return 0;
/*
- * For a previously invalid partition root, leave it at being
- * invalid if new_prs is not "member".
+ * For a previously invalid partition root, treat it like a "member".
*/
- if (new_prs && is_prs_invalid(old_prs)) {
- cs->partition_root_state = -new_prs;
- return 0;
- }
+ if (new_prs && is_prs_invalid(old_prs))
+ old_prs = PRS_MEMBER;
if (alloc_cpumasks(NULL, &tmpmask))
return -ENOMEM;
@@ -2371,6 +2573,21 @@ static int update_prstate(struct cpuset *cs, int new_prs)
}
}
+ /*
+ * A parent isolated partition that gets its isolated CPUs from
+ * isolcpus cannot have subpartition.
+ */
+ if (new_prs && !list_empty(&parent->isol_sibling)) {
+ err = PERR_ISOPARENT;
+ goto out;
+ }
+
+ if ((old_prs == PRS_ISOLATED) && !list_empty(&cs->isol_sibling)) {
+ isolcpus_pull(cs, partcmd_disable);
+ old_prs = 0;
+ }
+ WARN_ON_ONCE(!list_empty(&cs->isol_sibling));
+
err = update_partition_exclusive(cs, new_prs);
if (err)
goto out;
@@ -2386,6 +2603,10 @@ static int update_prstate(struct cpuset *cs, int new_prs)
err = update_parent_subparts_cpumask(cs, partcmd_enable,
NULL, &tmpmask);
+ if (err && (new_prs == PRS_ISOLATED) &&
+ isolcpus_pull(cs, partcmd_enable))
+ err = 0; /* Successful isolcpus pull */
+
if (err)
goto out;
} else if (old_prs && new_prs) {
@@ -2445,7 +2666,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
if (new_prs == PRS_ISOLCPUS)
isolcpus_cs = cs;
else if (cs == isolcpus_cs)
- isolcpus_cs = NULL;
+ isolcpus_disable();
/*
* Update child cpusets, if present.
@@ -3674,8 +3895,31 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
}
parent = parent_cs(cs);
- compute_effective_cpumask(&new_cpus, cs, parent);
nodes_and(new_mems, cs->mems_allowed, parent->effective_mems);
+ /*
+ * In the special case of a valid isolated cpuset pulling isolated
+ * cpus from isolcpus. We just need to mask offline cpus from
+ * cpus_allowed unless all the isolated cpus are gone.
+ */
+ if (!list_empty(&cs->isol_sibling)) {
+ if (!cpumask_and(&new_cpus, cs->cpus_allowed, cpu_active_mask))
+ isolcpus_pull(cs, partcmd_disable);
+ } else if ((cs->partition_root_state == PRS_ISOLCPUS) &&
+ cpumask_empty(cs->cpus_allowed)) {
+ /*
+ * For isolcpus with empty cpus_allowed, just update
+ * effective_mems and be done with it.
+ */
+ spin_lock_irq(&callback_lock);
+ if (nodes_empty(new_mems))
+ cs->effective_mems = parent->effective_mems;
+ else
+ cs->effective_mems = new_mems;
+ spin_unlock_irq(&callback_lock);
+ goto unlock;
+ } else {
+ compute_effective_cpumask(&new_cpus, cs, parent);
+ }
if (cs->nr_subparts_cpus)
/*
@@ -3707,10 +3951,12 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
* the following conditions hold:
* 1) empty effective cpus but not valid empty partition.
* 2) parent is invalid or doesn't grant any cpus to child
- * partitions.
+ * partitions and not an isolated cpuset pulling cpus from
+ * isolcpus.
*/
- if (is_partition_valid(cs) && (!parent->nr_subparts_cpus ||
- (cpumask_empty(&new_cpus) && partition_is_populated(cs, NULL)))) {
+ if (is_partition_valid(cs) &&
+ ((!parent->nr_subparts_cpus && list_empty(&cs->isol_sibling)) ||
+ (cpumask_empty(&new_cpus) && partition_is_populated(cs, NULL)))) {
int old_prs, parent_prs;
update_parent_subparts_cpumask(cs, partcmd_disable, NULL, tmp);
--
2.31.1
One can use "cpuset.cpus.partition" to create multiple scheduling domains
or to produce a set of isolated CPUs where load balancing is disabled.
The former use case is less common but the latter one can be frequently
used especially for the Telco use cases like DPDK.
The existing "isolated" partition can be used to produce isolated
CPUs if the applications have full control of a system. However, in a
containerized environment where all the apps are run in a container,
it is hard to distribute out isolated CPUs from the root down given
the unified hierarchy nature of cgroup v2.
The container running on isolated CPUs can be several layers down from
the root. The current partition feature requires that all the ancestors
of a leaf partition root must be parititon roots themselves. This can
be hard to manage.
This patch introduces a new special partition root state called
"isolcpus" that serves as a pool of isolated CPUs to be pulled into other
"isolated" partitions. At most one instance of the "isolcpus" partition
is allowed in a system preferrably as a child of the top cpuset.
In a valid "isolcpus" partition, "cpuset.cpus" contains the set of
isolated CPUs and "cpuset.cpus.effective" contains the set of freely
available isolated CPUs that have not yet been pulled into other
"isolated" cpusets.
The special "isolcpus" partition cannot have normal cpuset children. So
we are not allowed to enable child cpuset in its "cgroup.subtree_control"
file if it has children. Tasks are also not allowed in the "cgroup.procs"
of the "isolcpus" partition. Unlike other partition roots, empty
"cpuset.cpus" is allowed in the "isolcpus" partition as this special
cpuset is not designed to hold tasks.
The CPUs in the "isolcpus" partition are not exclusive so that those
isolated CPUs can be distributed down sibling hierarchies as usual even
though they will not show up in their "cpuset.cpus.effective".
Right now, an "isolcpus" partition only disable load balancing of
the isolated CPUs. In the near future, it may be extended to support
additional isolation attributes like those currently supported by the
"isolcpus" or related kernel boot command line options.
In a subsequent patch, a privileged user can change a "member" cpuset
to an "isolated" partition root by pulling isolated CPUs from the
"isolcpus" partition if its parent is not a partition root that can
directly satisfy the request.
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
kernel/cgroup/cpuset.c | 158 ++++++++++++++++++++++++++++++++++-------
1 file changed, 133 insertions(+), 25 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 83a7193e0f2c..444eae3a9a6b 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -98,6 +98,9 @@ enum prs_errcode {
PERR_NOCPUS,
PERR_HOTPLUG,
PERR_CPUSEMPTY,
+ PERR_ISOLCPUS,
+ PERR_ISOLTASK,
+ PERR_ISOLCHILD,
};
static const char * const perr_strings[] = {
@@ -108,6 +111,9 @@ static const char * const perr_strings[] = {
[PERR_NOCPUS] = "Parent unable to distribute cpu downstream",
[PERR_HOTPLUG] = "No cpu available due to hotplug",
[PERR_CPUSEMPTY] = "cpuset.cpus is empty",
+ [PERR_ISOLCPUS] = "An isolcpus partition is already present",
+ [PERR_ISOLTASK] = "Isolcpus partition can't have tasks",
+ [PERR_ISOLCHILD] = "Isolcpus partition can't have children",
};
struct cpuset {
@@ -198,6 +204,9 @@ struct cpuset {
/* Handle for cpuset.cpus.partition */
struct cgroup_file partition_file;
+
+ /* siblings list anchored at isol_children */
+ struct list_head isol_sibling;
};
/*
@@ -206,14 +215,26 @@ struct cpuset {
* 0 - member (not a partition root)
* 1 - partition root
* 2 - partition root without load balancing (isolated)
+ * 3 - isolated cpu pool (isolcpus)
* -1 - invalid partition root
* -2 - invalid isolated partition root
+ * -3 - invalid isolated cpu pool
+ *
+ * An isolated cpu pool is a special isolated partition root. At most one
+ * instance of it is allowed in a system. It provides a pool of isolated
+ * cpus that a normal isolated partition root can pull from, if privileged,
+ * in case its parent cannot fulfill its request.
*/
#define PRS_MEMBER 0
#define PRS_ROOT 1
#define PRS_ISOLATED 2
+#define PRS_ISOLCPUS 3
#define PRS_INVALID_ROOT -1
#define PRS_INVALID_ISOLATED -2
+#define PRS_INVALID_ISOLCPUS -3
+
+static struct cpuset *isolcpus_cs; /* System isolcpus partition root */
+static struct list_head isol_children; /* Children that pull isolated cpus */
static inline bool is_prs_invalid(int prs_state)
{
@@ -335,6 +356,7 @@ static struct cpuset top_cpuset = {
.flags = ((1 << CS_ONLINE) | (1 << CS_CPU_EXCLUSIVE) |
(1 << CS_MEM_EXCLUSIVE)),
.partition_root_state = PRS_ROOT,
+ .isol_sibling = LIST_HEAD_INIT(top_cpuset.isol_sibling),
};
/**
@@ -1282,7 +1304,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
*/
static int update_partition_exclusive(struct cpuset *cs, int new_prs)
{
- bool exclusive = (new_prs > 0);
+ bool exclusive = (new_prs == PRS_ROOT) || (new_prs == PRS_ISOLATED);
if (exclusive && !is_cpu_exclusive(cs)) {
if (update_flag(CS_CPU_EXCLUSIVE, cs, 1))
@@ -1303,7 +1325,7 @@ static int update_partition_exclusive(struct cpuset *cs, int new_prs)
static void update_partition_sd_lb(struct cpuset *cs, int old_prs)
{
int new_prs = cs->partition_root_state;
- bool new_lb = (new_prs != PRS_ISOLATED);
+ bool new_lb = (new_prs != PRS_ISOLATED) && (new_prs != PRS_ISOLCPUS);
if (new_lb != !!is_sched_load_balance(cs))
update_flag(CS_SCHED_LOAD_BALANCE, cs, new_lb);
@@ -1360,18 +1382,20 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
int part_error = PERR_NONE; /* Partition error? */
percpu_rwsem_assert_held(&cpuset_rwsem);
+ old_prs = new_prs = cs->partition_root_state;
/*
* The parent must be a partition root.
* The new cpumask, if present, or the current cpus_allowed must
- * not be empty.
+ * not be empty except for isolcpus partition.
*/
if (!is_partition_valid(parent)) {
return is_partition_invalid(parent)
? PERR_INVPARENT : PERR_NOTPART;
}
- if ((newmask && cpumask_empty(newmask)) ||
- (!newmask && cpumask_empty(cs->cpus_allowed)))
+ if ((new_prs != PRS_ISOLCPUS) &&
+ ((newmask && cpumask_empty(newmask)) ||
+ (!newmask && cpumask_empty(cs->cpus_allowed))))
return PERR_CPUSEMPTY;
/*
@@ -1379,7 +1403,6 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
* partcmd_invalidate commands.
*/
adding = deleting = false;
- old_prs = new_prs = cs->partition_root_state;
if (cmd == partcmd_enable) {
/*
* Enabling partition root is not allowed if cpus_allowed
@@ -1498,11 +1521,13 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
switch (cs->partition_root_state) {
case PRS_ROOT:
case PRS_ISOLATED:
+ case PRS_ISOLCPUS:
if (part_error)
new_prs = -old_prs;
break;
case PRS_INVALID_ROOT:
case PRS_INVALID_ISOLATED:
+ case PRS_INVALID_ISOLCPUS:
if (!part_error)
new_prs = -old_prs;
break;
@@ -1553,6 +1578,9 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
spin_unlock_irq(&callback_lock);
+ if ((isolcpus_cs == cs) && (cs->partition_root_state != PRS_ISOLCPUS))
+ isolcpus_cs = NULL;
+
if (adding || deleting)
update_tasks_cpumask(parent, tmp->addmask);
@@ -1640,7 +1668,14 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
*/
old_prs = new_prs = cp->partition_root_state;
if ((cp != cs) && old_prs) {
- switch (parent->partition_root_state) {
+ int parent_prs = parent->partition_root_state;
+
+ /*
+ * isolcpus partition parent can't have children
+ */
+ WARN_ON_ONCE(parent_prs == PRS_ISOLCPUS);
+
+ switch (parent_prs) {
case PRS_ROOT:
case PRS_ISOLATED:
update_parent = true;
@@ -1735,9 +1770,10 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
* @parent: Parent cpuset
* @cs: Current cpuset
* @tmp: Temp variables
+ * @force: Force update if set
*/
static void update_sibling_cpumasks(struct cpuset *parent, struct cpuset *cs,
- struct tmpmasks *tmp)
+ struct tmpmasks *tmp, bool force)
{
struct cpuset *sibling;
struct cgroup_subsys_state *pos_css;
@@ -1756,7 +1792,7 @@ static void update_sibling_cpumasks(struct cpuset *parent, struct cpuset *cs,
cpuset_for_each_child(sibling, pos_css, parent) {
if (sibling == cs)
continue;
- if (!sibling->use_parent_ecpus)
+ if (!sibling->use_parent_ecpus && !force)
continue;
if (!css_tryget_online(&sibling->css))
continue;
@@ -1893,14 +1929,16 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
update_cpumasks_hier(cs, &tmp, false);
if (cs->partition_root_state) {
+ bool force = (cs->partition_root_state == PRS_ISOLCPUS);
struct cpuset *parent = parent_cs(cs);
/*
* For partition root, update the cpumasks of sibling
- * cpusets if they use parent's effective_cpus.
+ * cpusets if they use parent's effective_cpus or when
+ * the current cpuset is an isolcpus partition.
*/
- if (parent->child_ecpus_count)
- update_sibling_cpumasks(parent, cs, &tmp);
+ if (parent->child_ecpus_count || force)
+ update_sibling_cpumasks(parent, cs, &tmp, force);
/* Update CS_SCHED_LOAD_BALANCE and/or sched_domains */
update_partition_sd_lb(cs, old_prs);
@@ -2298,6 +2336,41 @@ static int update_prstate(struct cpuset *cs, int new_prs)
if (alloc_cpumasks(NULL, &tmpmask))
return -ENOMEM;
+ /*
+ * Only one isolcpus partition is allowed and it can't have children
+ * or tasks in it. The isolcpus partition is also not exclusive so
+ * that the isolated but unused cpus can be distributed down the
+ * hierarchy.
+ */
+ if (new_prs == PRS_ISOLCPUS) {
+ if (isolcpus_cs)
+ err = PERR_ISOLCPUS;
+ else if (!list_empty(&cs->css.children))
+ err = PERR_ISOLCHILD;
+ else if (cs->css.cgroup->nr_populated_csets)
+ err = PERR_ISOLTASK;
+
+ if (err && old_prs) {
+ /*
+ * A previous valid partition root is now invalid
+ */
+ goto disable_partition;
+ } else if (err) {
+ goto out;
+ }
+
+ /*
+ * Unlike other partition types, an isolated cpu pool can
+ * be empty as it is essentially a place holder for isolated
+ * CPUs.
+ */
+ if (!old_prs && cpumask_empty(cs->cpus_allowed)) {
+ /* Force effective_cpus to be empty too */
+ cpumask_clear(cs->effective_cpus);
+ goto out;
+ }
+ }
+
err = update_partition_exclusive(cs, new_prs);
if (err)
goto out;
@@ -2316,11 +2389,9 @@ static int update_prstate(struct cpuset *cs, int new_prs)
if (err)
goto out;
} else if (old_prs && new_prs) {
- /*
- * A change in load balance state only, no change in cpumasks.
- */
- goto out;
+ goto out; /* Skip cpuset and sibling task update */
} else {
+disable_partition:
/*
* Switching back to member is always allowed even if it
* disables child partitions.
@@ -2342,8 +2413,13 @@ static int update_prstate(struct cpuset *cs, int new_prs)
update_tasks_cpumask(parent, tmpmask.new_cpus);
- if (parent->child_ecpus_count)
- update_sibling_cpumasks(parent, cs, &tmpmask);
+ /*
+ * Since isolcpus partition is not exclusive, we have to update
+ * sibling hierarchies as well.
+ */
+ if ((new_prs == PRS_ISOLCPUS) || parent->child_ecpus_count)
+ update_sibling_cpumasks(parent, cs, &tmpmask,
+ new_prs == PRS_ISOLCPUS);
out:
/*
@@ -2363,6 +2439,14 @@ static int update_prstate(struct cpuset *cs, int new_prs)
/* Update sched domains and load balance flag */
update_partition_sd_lb(cs, old_prs);
+ /*
+ * Check isolcpus_cs state
+ */
+ if (new_prs == PRS_ISOLCPUS)
+ isolcpus_cs = cs;
+ else if (cs == isolcpus_cs)
+ isolcpus_cs = NULL;
+
/*
* Update child cpusets, if present.
* Force update if switching back to member.
@@ -2486,7 +2570,12 @@ static struct cpuset *cpuset_attach_old_cs;
*/
static int cpuset_can_attach_check(struct cpuset *cs)
{
+ /*
+ * Task cannot be moved to a cpuset with empty effective cpus or
+ * is an isolcpus partition.
+ */
if (cpumask_empty(cs->effective_cpus) ||
+ (cs->partition_root_state == PRS_ISOLCPUS) ||
(!is_in_v2_mode() && nodes_empty(cs->mems_allowed)))
return -ENOSPC;
return 0;
@@ -2902,24 +2991,30 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
static int sched_partition_show(struct seq_file *seq, void *v)
{
struct cpuset *cs = css_cs(seq_css(seq));
+ int prs = cs->partition_root_state;
const char *err, *type = NULL;
- switch (cs->partition_root_state) {
+ switch (prs) {
case PRS_ROOT:
seq_puts(seq, "root\n");
break;
case PRS_ISOLATED:
seq_puts(seq, "isolated\n");
break;
+ case PRS_ISOLCPUS:
+ seq_puts(seq, "isolcpus\n");
+ break;
case PRS_MEMBER:
seq_puts(seq, "member\n");
break;
- case PRS_INVALID_ROOT:
- type = "root";
- fallthrough;
- case PRS_INVALID_ISOLATED:
- if (!type)
+ default:
+ if (prs == PRS_INVALID_ROOT)
+ type = "root";
+ else if (prs == PRS_INVALID_ISOLATED)
type = "isolated";
+ else
+ type = "isolcpus";
+
err = perr_strings[READ_ONCE(cs->prs_err)];
if (err)
seq_printf(seq, "%s invalid (%s)\n", type, err);
@@ -2948,6 +3043,8 @@ static ssize_t sched_partition_write(struct kernfs_open_file *of, char *buf,
val = PRS_MEMBER;
else if (!strcmp(buf, "isolated"))
val = PRS_ISOLATED;
+ else if (!strcmp(buf, "isolcpus"))
+ val = PRS_ISOLCPUS;
else
return -EINVAL;
@@ -3157,6 +3254,7 @@ cpuset_css_alloc(struct cgroup_subsys_state *parent_css)
nodes_clear(cs->effective_mems);
fmeter_init(&cs->fmeter);
cs->relax_domain_level = -1;
+ INIT_LIST_HEAD(&cs->isol_sibling);
/* Set CS_MEMORY_MIGRATE for default hierarchy */
if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys))
@@ -3171,6 +3269,7 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
struct cpuset *parent = parent_cs(cs);
struct cpuset *tmp_cs;
struct cgroup_subsys_state *pos_css;
+ int err = 0;
if (!parent)
return 0;
@@ -3178,6 +3277,14 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
cpus_read_lock();
percpu_down_write(&cpuset_rwsem);
+ /*
+ * An isolcpus partition cannot have direct children.
+ */
+ if (parent->partition_root_state == PRS_ISOLCPUS) {
+ err = -EINVAL;
+ goto out_unlock;
+ }
+
set_bit(CS_ONLINE, &cs->flags);
if (is_spread_page(parent))
set_bit(CS_SPREAD_PAGE, &cs->flags);
@@ -3229,7 +3336,7 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
out_unlock:
percpu_up_write(&cpuset_rwsem);
cpus_read_unlock();
- return 0;
+ return err;
}
/*
@@ -3434,6 +3541,7 @@ int __init cpuset_init(void)
fmeter_init(&top_cpuset.fmeter);
set_bit(CS_SCHED_LOAD_BALANCE, &top_cpuset.flags);
top_cpuset.relax_domain_level = -1;
+ INIT_LIST_HEAD(&isol_children);
BUG_ON(!alloc_cpumask_var(&cpus_attach, GFP_KERNEL));
--
2.31.1
I have been running hid-tools for a while, but it was in its own
separate repository for multiple reasons. And the past few weeks
I finally managed to make the kernel tests in that repo in a
state where we can merge them in the kernel tree directly:
- the tests run in ~2 to 3 minutes
- the tests are way more reliable than previously
- the tests are mostly self-contained now (to the exception
of the Sony ones)
To be able to run the tests we need to use the latest release
of hid-tools, as this project still keeps the HID parsing logic
and is capable of generating the HID events.
The series also ensures we can run the tests with vmtest.sh,
allowing for a quick development and test in the tree itself.
This should allow us to require tests to be added to a series
when we see fit and keep them alive properly instead of having
to deal with 2 repositories.
In Cc are all of the people who participated in the elaboration
of those tests, so please send back a signed-off-by for each
commit you are part of.
This series applies on top of the for-6.3/hid-bpf branch, which
is the one that added the tools/testing/selftests/hid directory.
Given that this is unlikely this series will make the cut for
6.3, we might just consider this series to be based on top of
the future 6.3-rc1.
Cheers,
Benjamin
Signed-off-by: Benjamin Tissoires <benjamin.tissoires(a)redhat.com>
---
Benjamin Tissoires (11):
selftests: hid: make vmtest rely on make
selftests: hid: import hid-tools hid-core tests
selftests: hid: import hid-tools hid-gamepad tests
selftests: hid: import hid-tools hid-keyboards tests
selftests: hid: import hid-tools hid-mouse tests
selftests: hid: import hid-tools hid-multitouch and hid-tablets tests
selftests: hid: import hid-tools wacom tests
selftests: hid: import hid-tools hid-apple tests
selftests: hid: import hid-tools hid-ite tests
selftests: hid: import hid-tools hid-sony and hid-playstation tests
selftests: hid: import hid-tools usb-crash tests
tools/testing/selftests/hid/Makefile | 12 +
tools/testing/selftests/hid/config | 11 +
tools/testing/selftests/hid/hid-apple.sh | 7 +
tools/testing/selftests/hid/hid-core.sh | 7 +
tools/testing/selftests/hid/hid-gamepad.sh | 7 +
tools/testing/selftests/hid/hid-ite.sh | 7 +
tools/testing/selftests/hid/hid-keyboard.sh | 7 +
tools/testing/selftests/hid/hid-mouse.sh | 7 +
tools/testing/selftests/hid/hid-multitouch.sh | 7 +
tools/testing/selftests/hid/hid-sony.sh | 7 +
tools/testing/selftests/hid/hid-tablet.sh | 7 +
tools/testing/selftests/hid/hid-usb_crash.sh | 7 +
tools/testing/selftests/hid/hid-wacom.sh | 7 +
tools/testing/selftests/hid/run-hid-tools-tests.sh | 28 +
tools/testing/selftests/hid/settings | 3 +
tools/testing/selftests/hid/tests/__init__.py | 2 +
tools/testing/selftests/hid/tests/base.py | 345 ++++
tools/testing/selftests/hid/tests/conftest.py | 81 +
.../selftests/hid/tests/descriptors_wacom.py | 1360 +++++++++++++
.../selftests/hid/tests/test_apple_keyboard.py | 440 +++++
tools/testing/selftests/hid/tests/test_gamepad.py | 209 ++
tools/testing/selftests/hid/tests/test_hid_core.py | 154 ++
.../selftests/hid/tests/test_ite_keyboard.py | 166 ++
tools/testing/selftests/hid/tests/test_keyboard.py | 485 +++++
tools/testing/selftests/hid/tests/test_mouse.py | 977 +++++++++
.../testing/selftests/hid/tests/test_multitouch.py | 2088 ++++++++++++++++++++
tools/testing/selftests/hid/tests/test_sony.py | 282 +++
tools/testing/selftests/hid/tests/test_tablet.py | 872 ++++++++
.../testing/selftests/hid/tests/test_usb_crash.py | 103 +
.../selftests/hid/tests/test_wacom_generic.py | 844 ++++++++
tools/testing/selftests/hid/vmtest.sh | 25 +-
31 files changed, 8554 insertions(+), 10 deletions(-)
---
base-commit: 2f7f4efb9411770b4ad99eb314d6418e980248b4
change-id: 20230217-import-hid-tools-tests-dc0cd4f3c8a8
Best regards,
--
Benjamin Tissoires <benjamin.tissoires(a)redhat.com>
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign()
As a pointer is passed into posix_memalign(), initialize *one_page
to NULL to silence a warning about the function's return value being
used as uninitialized (which is not valid anyway because the error
is properly checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/split_huge_page_test.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index cbb5e6893cbf..94c7dffc4d7d 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -96,10 +96,10 @@ void split_pmd_thp(void)
char *one_page;
size_t len = 4 * pmd_pagesize;
size_t i;
+ int ret;
- one_page = memalign(pmd_pagesize, len);
-
- if (!one_page) {
+ ret = posix_memalign((void **)&one_page, pmd_pagesize, len);
+ if (ret < 0) {
printf("Fail to allocate memory\n");
exit(EXIT_FAILURE);
}
--
2.27.0
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign() and remove malloc.h include
that was there for memalign().
As a pointer is passed into posix_memalign(), initialize *s to NULL
to silence a warning about the function's return value being used as
uninitialized (which is not valid anyway because the error is properly
checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/powerpc/stringloops/strlen.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/powerpc/stringloops/strlen.c b/tools/testing/selftests/powerpc/stringloops/strlen.c
index 9055ebc484d0..f9c1f9cc2d32 100644
--- a/tools/testing/selftests/powerpc/stringloops/strlen.c
+++ b/tools/testing/selftests/powerpc/stringloops/strlen.c
@@ -1,5 +1,4 @@
// SPDX-License-Identifier: GPL-2.0
-#include <malloc.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
@@ -51,10 +50,11 @@ static void bench_test(char *s)
static int testcase(void)
{
char *s;
+ int ret;
unsigned long i;
- s = memalign(128, SIZE);
- if (!s) {
+ ret = posix_memalign((void **)&s, 128, SIZE);
+ if (ret < 0) {
perror("memalign");
exit(1);
}
--
2.27.0
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign().
As a pointer is passed into posix_memalign(),initialize *map to
NULL,to silence a warning about the function's return value being
used as uninitialized (which is not valid anyway because the
error is properly checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/soft-dirty.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 21d8830c5f24..c99350e110ec 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -80,9 +80,9 @@ static void test_hugepage(int pagemap_fd, int pagesize)
int i, ret;
size_t hpage_len = read_pmd_pagesize();
- map = memalign(hpage_len, hpage_len);
- if (!map)
- ksft_exit_fail_msg("memalign failed\n");
+ ret = posix_memalign((void **)(&map), hpage_len, hpage_len);
+ if (ret < 0)
+ ksft_exit_fail_msg("posix_memalign failed\n");
ret = madvise(map, hpage_len, MADV_HUGEPAGE);
if (ret)
--
2.27.0
The "test_encl.elf" file used by test_sgx is not installed in
INSTALL_PATH. Attempting to execute test_sgx causes false negative:
"
enclave executable open(): No such file or directory
main.c:188:unclobbered_vdso:Failed to load the test enclave.
"
Add "test_encl.elf" to TEST_FILES so that it will be installed.
Fixes: 2adcba79e69d ("selftests/x86: Add a selftest for SGX")
Signed-off-by: Yi Lai <yi1.lai(a)intel.com>
---
tools/testing/selftests/sgx/Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/sgx/Makefile b/tools/testing/selftests/sgx/Makefile
index 75af864e07b6..50aab6b57da3 100644
--- a/tools/testing/selftests/sgx/Makefile
+++ b/tools/testing/selftests/sgx/Makefile
@@ -17,6 +17,7 @@ ENCL_CFLAGS := -Wall -Werror -static -nostdlib -nostartfiles -fPIC \
-fno-stack-protector -mrdrnd $(INCLUDES)
TEST_CUSTOM_PROGS := $(OUTPUT)/test_sgx
+TEST_FILES := $(OUTPUT)/test_encl.elf
ifeq ($(CAN_BUILD_X86_64), 1)
all: $(TEST_CUSTOM_PROGS) $(OUTPUT)/test_encl.elf
--
2.25.1
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign() and remove malloc.h include
that was there for memalign().
As a pointer is passed into posix_memalign(), initialize *p to NULL
to silence a warning about the function's return value being used as
uninitialized (which is not valid anyway because the error is properly
checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/split_huge_page_test.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index cbb5e6893cbf..8f48f07bc821 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -96,10 +96,10 @@ void split_pmd_thp(void)
char *one_page;
size_t len = 4 * pmd_pagesize;
size_t i;
+ int ret;
- one_page = memalign(pmd_pagesize, len);
-
- if (!one_page) {
+ ret = posix_memalign((void **)(&one_page), pmd_pagesize, len);
+ if (ret < 0) {
printf("Fail to allocate memory\n");
exit(EXIT_FAILURE);
}
--
2.27.0
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign() and remove malloc.h include
that was there for memalign().
As a pointer is passed into posix_memalign(), initialize *p to NULL
to silence a warning about the function's return value being used as
uninitialized (which is not valid anyway because the error is properly
checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/soft-dirty.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 21d8830c5f24..4bb7421141a2 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -80,8 +80,8 @@ static void test_hugepage(int pagemap_fd, int pagesize)
int i, ret;
size_t hpage_len = read_pmd_pagesize();
- map = memalign(hpage_len, hpage_len);
- if (!map)
+ ret = posix_memalign((void *)(&map), hpage_len, hpage_len);
+ if (ret < 0)
ksft_exit_fail_msg("memalign failed\n");
ret = madvise(map, hpage_len, MADV_HUGEPAGE);
--
2.27.0
Dzień dobry,
zapoznałem się z Państwa ofertą i z przyjemnością przyznaję, że przyciąga uwagę i zachęca do dalszych rozmów.
Pomyślałem, że może mógłbym mieć swój wkład w Państwa rozwój i pomóc dotrzeć z tą ofertą do większego grona odbiorców. Pozycjonuję strony www, dzięki czemu generują świetny ruch w sieci.
Możemy porozmawiać w najbliższym czasie?
Pozdrawiam
Adam Charachuta
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign() and remove malloc.h include
that was there for memalign().
As a pointer is passed into posix_memalign(), initialize *p to NULL
to silence a warning about the function's return value being used as
uninitialized (which is not valid anyway because the error is properly
checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/soft-dirty.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 21d8830c5f24..4bb7421141a2 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -80,8 +80,8 @@ static void test_hugepage(int pagemap_fd, int pagesize)
int i, ret;
size_t hpage_len = read_pmd_pagesize();
- map = memalign(hpage_len, hpage_len);
- if (!map)
+ ret = posix_memalign((void *)(&map), hpage_len, hpage_len)
+ if (ret < 0)
ksft_exit_fail_msg("memalign failed\n");
ret = madvise(map, hpage_len, MADV_HUGEPAGE);
--
2.27.0
On Thu, Apr 06, 2023 at 09:53:37AM -0700, Stefan Roesch wrote:
> + case PR_SET_MEMORY_MERGE:
> + if (mmap_write_lock_killable(me->mm))
> + return -EINTR;
> +
> + if (arg2) {
> + int err = ksm_add_mm(me->mm);
> + if (err)
> + return err;
You'll return to userspace with the mutex held, no?
On 06.04.23 18:53, Stefan Roesch wrote:
> This adds the general_profit KSM sysfs knob and the process profit metric
> and process merge type knobs to ksm_stat.
>
> 1) expose general_profit metric
>
> The documentation mentions a general profit metric, however this
> metric is not calculated. In addition the formula depends on the size
> of internal structures, which makes it more difficult for an
> administrator to make the calculation. Adding the metric for a better
> user experience.
>
> 2) document general_profit sysfs knob
>
> 3) calculate ksm process profit metric
>
> The ksm documentation mentions the process profit metric and how to
> calculate it. This adds the calculation of the metric.
>
> 4) add ksm_merge_type() function
>
> This adds the ksm_merge_type function. The function returns the
> merge type for the process. For madvise it returns "madvise", for
> prctl it returns "process" and otherwise it returns "none".
I'm curious, why exactly is this change required in this context? It
might be sufficient to observe if the prctl is set for a process. If
not, the ksm stats can reveal whether KSM is still active for that
process -> madvise.
For your use case, I'd assume it's pretty unnecessary to expose that.
If there is no compelling reason, I'd suggest to drop this and limit
this patch to exposing the general/per-mm profit, which I can understand
why it's desirable when fine-tuning a workload.
[...]
> Signed-off-by: Stefan Roesch <shr(a)devkernel.io>
> Reviewed-by: Bagas Sanjaya <bagasdotme(a)gmail.com>
> Cc: David Hildenbrand <david(a)redhat.com>
> Cc: Johannes Weiner <hannes(a)cmpxchg.org>
> Cc: Michal Hocko <mhocko(a)suse.com>
> Cc: Rik van Riel <riel(a)surriel.com>
> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
> ---
> Documentation/ABI/testing/sysfs-kernel-mm-ksm | 8 +++++
> Documentation/admin-guide/mm/ksm.rst | 8 ++++-
> fs/proc/base.c | 5 +++
> include/linux/ksm.h | 5 +++
> mm/ksm.c | 32 +++++++++++++++++++
> 5 files changed, 57 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-ksm b/Documentation/ABI/testing/sysfs-kernel-mm-ksm
> index d244674a9480..7768e90f7a8f 100644
> --- a/Documentation/ABI/testing/sysfs-kernel-mm-ksm
> +++ b/Documentation/ABI/testing/sysfs-kernel-mm-ksm
> @@ -51,3 +51,11 @@ Description: Control merging pages across different NUMA nodes.
>
> When it is set to 0 only pages from the same node are merged,
> otherwise pages from all nodes can be merged together (default).
> +
> +What: /sys/kernel/mm/ksm/general_profit
> +Date: January 2023
^ No
> +KernelVersion: 6.1
^ Outdated
(kind of weird having to come up with the right numbers before getting
it merged)
[...]
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 07463ad4a70a..c74450318e05 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -96,6 +96,7 @@
> #include <linux/time_namespace.h>
> #include <linux/resctrl.h>
> #include <linux/cn_proc.h>
> +#include <linux/ksm.h>
> #include <trace/events/oom.h>
> #include "internal.h"
> #include "fd.h"
> @@ -3199,6 +3200,7 @@ static int proc_pid_ksm_merging_pages(struct seq_file *m, struct pid_namespace *
>
> return 0;
> }
> +
^ unrelated change
> static int proc_pid_ksm_stat(struct seq_file *m, struct pid_namespace *ns,
> struct pid *pid, struct task_struct *task)
> {
> @@ -3208,6 +3210,9 @@ static int proc_pid_ksm_stat(struct seq_file *m, struct pid_namespace *ns,
> if (mm) {
> seq_printf(m, "ksm_rmap_items %lu\n", mm->ksm_rmap_items);
> seq_printf(m, "zero_pages_sharing %lu\n", mm->ksm_zero_pages_sharing);
> + seq_printf(m, "ksm_merging_pages %lu\n", mm->ksm_merging_pages);
> + seq_printf(m, "ksm_merge_type %s\n", ksm_merge_type(mm));
> + seq_printf(m, "ksm_process_profit %ld\n", ksm_process_profit(mm));
> mmput(mm);
> }
>
> diff --git a/include/linux/ksm.h b/include/linux/ksm.h
> index c65455bf124c..4c32f9bca723 100644
> --- a/include/linux/ksm.h
> +++ b/include/linux/ksm.h
> @@ -60,6 +60,11 @@ struct page *ksm_might_need_to_copy(struct page *page,
> void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc);
> void folio_migrate_ksm(struct folio *newfolio, struct folio *folio);
>
> +#ifdef CONFIG_PROC_FS
> +long ksm_process_profit(struct mm_struct *);
> +const char *ksm_merge_type(struct mm_struct *mm);
> +#endif /* CONFIG_PROC_FS */
> +
> #else /* !CONFIG_KSM */
>
> static inline int ksm_add_mm(struct mm_struct *mm)
> diff --git a/mm/ksm.c b/mm/ksm.c
> index ab95ae0f9def..76b10ff840ac 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -3042,6 +3042,25 @@ static void wait_while_offlining(void)
> }
> #endif /* CONFIG_MEMORY_HOTREMOVE */
>
> +#ifdef CONFIG_PROC_FS
> +long ksm_process_profit(struct mm_struct *mm)
> +{
> + return (long)mm->ksm_merging_pages * PAGE_SIZE -
Do we really need the cast to long? mm->ksm_merging_pages is defined as
"unsigned long". Just like "ksm_pages_sharing" below.
> + mm->ksm_rmap_items * sizeof(struct ksm_rmap_item);
> +}
> +
> +/* Return merge type name as string. */
> +const char *ksm_merge_type(struct mm_struct *mm)
> +{
> + if (test_bit(MMF_VM_MERGE_ANY, &mm->flags))
> + return "process";
> + else if (test_bit(MMF_VM_MERGEABLE, &mm->flags))
> + return "madvise";
> + else
> + return "none";
> +}
> +#endif /* CONFIG_PROC_FS */
> +
Apart from these nits, LGTM (again, I don't see why the merge type
should belong into this patch, and why there is a real need to expose it
like that).
Acked-by: David Hildenbrand <david(a)redhat.com>
--
Thanks,
David / dhildenb
At present the kselftest header can't be used with nolibc since it makes
use of vprintf() which is not available in nolibc. Fortunately nolibc
has a vfprintf() so we can just wrap that in order to allow kselftests
to be built with nolibc and still use the standard kselftest headers
with a small change to prevent the inclusion of the standard libc
headers. This allows us to avoid open coding of KTAP output for
selftests that need to use nolibc in order to test interfaces that are
controlled by libc, we've got several open coded examples of this in the
tree already.
As an example of using this the existing za-fork test is converted to
use kselftest.h. The changes to kselftest and nolibc don't have any
interaction until they are used by a test so could be merged separately
if desired.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Turns out nolibc has a vfprintf() already which we can use so do that.
- Link to v1: https://lore.kernel.org/r/20230405-kselftest-nolibc-v1-0-63fbcd70b202@kerne…
---
Mark Brown (3):
tools/nolibc/stdio: Implement vprintf()
kselftest: Support nolibc
kselftest/arm64: Convert za-fork to use kselftest.h
tools/include/nolibc/stdio.h | 6 ++
tools/testing/selftests/arm64/fp/Makefile | 2 +-
tools/testing/selftests/arm64/fp/za-fork.c | 88 ++++++------------------------
tools/testing/selftests/kselftest.h | 2 +
4 files changed, 25 insertions(+), 73 deletions(-)
---
base-commit: e8d018dd0257f744ca50a729e3d042cf2ec9da65
change-id: 20230405-kselftest-nolibc-cb2ce0446d09
Best regards,
--
Mark Brown <broonie(a)kernel.org>
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
When access traced function arguments with type is u32*, bpf verifier failed.
Because u32 have typedef, needs to skip modifier. Add btf_type_is_modifier in
is_int_ptr. Add a selftest to check it.
Feng Zhou (2):
bpf/btf: Fix is_int_ptr()
selftests/bpf: Add test to access u32 ptr argument in tracing program
Changelog:
v2->v3: Addressed comments from jirka
- Fix an issue that caused other test items to fail
Details in here:
https://lore.kernel.org/all/20230407084608.62296-1-zhoufeng.zf@bytedance.co…
v1->v2: Addressed comments from Martin KaFai Lau
- Add a selftest.
- use btf_type_skip_modifiers.
Some details in here:
https://lore.kernel.org/all/20221012125815.76120-1-zhouchengming@bytedance.…
kernel/bpf/btf.c | 8 ++------
net/bpf/test_run.c | 8 +++++++-
.../testing/selftests/bpf/verifier/btf_ctx_access.c | 13 +++++++++++++
3 files changed, 22 insertions(+), 7 deletions(-)
--
2.20.1
This is the basic functionality for iommufd to support
iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.
iommufd_device_replace() allows changing the HWPT associated with the
device to a new IOAS or HWPT. Replace does this in way that failure leaves
things unchanged, and utilizes the iommu iommu_group_replace_domain() API
to allow the iommu driver to perform an optional non-disruptive change.
IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and
used by attach or replace. At this point it isn't very useful since the
HWPT is the same as the automatically managed HWPT from the IOAS. However
a following series will allow userspace to customize the created HWPT.
The implementation is complicated because we have to introduce some
per-iommu_group memory in iommufd and redo how we think about multi-device
groups to be more explicit. This solves all the locking problems in the
prior attempts.
This series is infrastructure work for the following series which:
- Add replace for attach
- Expose replace through VFIO APIs
- Implement driver parameters for HWPT creation (nesting)
I'll update the linux-next branch with this version and keep it on a side
branch and accumulate the following series when they are ready so we can have
a stable base and make more incremental progress. When we have all the parts
together to get a full implementation it can go to Linus.
This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
v5:
- Got back to the v3 version of the code, keep the comment changes from
v4. Syzkaller says the group lock change in v4 didn't work.
- Adjust the fail_nth test to cover the path syzkaller found. We need to
have an ioas with a mapped page installed to inject a failure during
domain attachment.
v4: https://lore.kernel.org/r/0-v4-9cd79ad52ee8+13f5-iommufd_alloc_jgg@nvidia.c…
- Refine comments and commit messages
- Move the group lock into iommufd_hw_pagetable_attach()
- Fix error unwind in iommufd_device_do_replace()
v3: https://lore.kernel.org/r/0-v3-61d41fd9e13e+1f5-iommufd_alloc_jgg@nvidia.com
- Refine comments and commit messages
- Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only
set on success
- Reject replace on non-attached devices
- Add missing __reserved check for IOMMU_HWPT_ALLOC
v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.c…
- Use WARN_ON for the igroup->group test and move that logic to a
function iommufd_group_try_get()
- Change igroup->devices to igroup->device list
Replace will need to iterate over all attached idevs
- Rename to iommufd_group_setup_msi()
- New patch to export iommu_get_resv_regions()
- New patch to use per-device reserved regions instead of per-group
regions
- Split out the reorganizing of iommufd_device_change_pt() from the
replace patch
- Replace uses the per-dev reserved regions
- Use stdev_id in a few more places in the selftest
- Fix error handling in IOMMU_HWPT_ALLOC
- Clarify comments
- Rebase on v6.3-rc1
v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia…
Jason Gunthorpe (15):
iommufd: Move isolated msi enforcement to iommufd_device_bind()
iommufd: Add iommufd_group
iommufd: Replace the hwpt->devices list with iommufd_group
iommu: Export iommu_get_resv_regions()
iommufd: Keep track of each device's reserved regions instead of
groups
iommufd: Use the iommufd_group to avoid duplicate MSI setup
iommufd: Make sw_msi_start a group global
iommufd: Move putting a hwpt to a helper function
iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
iommufd: Reorganize iommufd_device_attach into
iommufd_device_change_pt
iommufd: Add iommufd_device_replace()
iommufd: Make destroy_rwsem use a lock class per object type
iommufd: Add IOMMU_HWPT_ALLOC
iommufd/selftest: Return the real idev id from selftest mock_domain
iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
Nicolin Chen (2):
iommu: Introduce a new iommu_group_replace_domain() API
iommufd/selftest: Test iommufd_device_replace()
drivers/iommu/iommu-priv.h | 10 +
drivers/iommu/iommu.c | 41 +-
drivers/iommu/iommufd/device.c | 516 +++++++++++++-----
drivers/iommu/iommufd/hw_pagetable.c | 96 +++-
drivers/iommu/iommufd/io_pagetable.c | 27 +-
drivers/iommu/iommufd/iommufd_private.h | 51 +-
drivers/iommu/iommufd/iommufd_test.h | 6 +
drivers/iommu/iommufd/main.c | 17 +-
drivers/iommu/iommufd/selftest.c | 40 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/iommufd.h | 26 +
tools/testing/selftests/iommu/iommufd.c | 67 ++-
.../selftests/iommu/iommufd_fail_nth.c | 67 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 63 ++-
14 files changed, 825 insertions(+), 203 deletions(-)
create mode 100644 drivers/iommu/iommu-priv.h
base-commit: fd8c1a4aee973e87d890a5861e106625a33b2c4e
--
2.40.0
From: Benjamin Berg <benjamin.berg(a)intel.com>
The existing KUNIT_ARRAY_PARAM macro requires a separate function to
get the description. However, in a lot of cases the description can
just be copied directly from the array. Add a second macro that
avoids having to write a static function just for a single strscpy.
Signed-off-by: Benjamin Berg <benjamin.berg(a)intel.com>
---
include/kunit/test.h | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 08d3559dd703..519b90261c72 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -1414,6 +1414,25 @@ do { \
return NULL; \
}
+/**
+ * KUNIT_ARRAY_PARAM_DESC() - Define test parameter generator from an array.
+ * @name: prefix for the test parameter generator function.
+ * @array: array of test parameters.
+ * @desc_member: structure member from array element to use as description
+ *
+ * Define function @name_gen_params which uses @array to generate parameters.
+ */
+#define KUNIT_ARRAY_PARAM_DESC(name, array, desc_member) \
+ static const void *name##_gen_params(const void *prev, char *desc) \
+ { \
+ typeof((array)[0]) *__next = prev ? ((typeof(__next)) prev) + 1 : (array); \
+ if (__next - (array) < ARRAY_SIZE((array))) { \
+ strscpy(desc, __next->desc_member, KUNIT_PARAM_DESC_SIZE); \
+ return __next; \
+ } \
+ return NULL; \
+ }
+
// TODO(dlatypov(a)google.com): consider eventually migrating users to explicitly
// include resource.h themselves if they need it.
#include <kunit/resource.h>
--
2.39.2
No need to call maybe_mkwrite() to then wrprotect if the source PMD was not
writable.
It's worth nothing that this now allows for PTEs to be writable even if
the source PMD was not writable: if vma->vm_page_prot includes write
permissions.
As documented in commit 931298e103c2 ("mm/userfaultfd: rely on
vma->vm_page_prot in uffd_wp_range()"), any mechanism that intends to
have pages wrprotected (COW, writenotify, mprotect, uffd-wp, softdirty,
...) has to properly adjust vma->vm_page_prot upfront, to not include
write permissions. If vma->vm_page_prot includes write permissions, the
PTE/PMD can be writable as default.
This now mimics the handling in mm/migrate.c:remove_migration_pte() and in
mm/huge_memory.c:remove_migration_pmd(), which has been in place for a
long time (except that 96a9c287e25d ("mm/migrate: fix wrongly apply write
bit after mkdirty on sparc64") temporarily changed it).
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
mm/huge_memory.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 6f3af65435c8..8332e16ac97b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2235,11 +2235,10 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
entry = pte_swp_mkuffd_wp(entry);
} else {
entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot));
- entry = maybe_mkwrite(entry, vma);
+ if (write)
+ entry = maybe_mkwrite(entry, vma);
if (anon_exclusive)
SetPageAnonExclusive(page + i);
- if (!write)
- entry = pte_wrprotect(entry);
if (!young)
entry = pte_mkold(entry);
/* NOTE: this may set soft-dirty too on some archs */
--
2.39.2
[ This series depends on the VFIO device cdev series ]
Changelog
v6:
* Rebased on top of cdev v8 series
https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
* Added "Reviewed-by" from Kevin to PATCH-4
* Squashed access->ioas updating lines into iommufd_access_change_pt(),
and changed function return type accordingly for simplification.
v5:
https://lore.kernel.org/linux-iommu/cover.1679559476.git.nicolinc@nvidia.co…
* Kept the cmd->id in the iommufd_test_create_access() so the access can
be created with an ioas by default. Then, renamed the previous ioctl
IOMMU_TEST_OP_ACCESS_SET_IOAS to IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, so
it would be used to replace an access->ioas pointer.
* Added iommufd_access_replace() API after the introductions of the other
two APIs iommufd_access_attach() and iommufd_access_detach().
* Since vdev->iommufd_attached is also set in emulated pathway too, call
iommufd_access_update(), similar to the physical pathway.
v4:
https://lore.kernel.org/linux-iommu/cover.1678284812.git.nicolinc@nvidia.co…
* Rebased on top of Jason's series adding replace() and hwpt_alloc()
https://lore.kernel.org/linux-iommu/0-v2-51b9896e7862+8a8c-iommufd_alloc_jg…
* Rebased on top of cdev series v6
https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/
* Dropped the patch that's moved to cdev series.
* Added unmap function pointer sanity before calling it.
* Added "Reviewed-by" from Kevin and Yi.
* Added back the VFIO change updating the ATTACH uAPI.
v3:
https://lore.kernel.org/linux-iommu/cover.1677288789.git.nicolinc@nvidia.co…
* Rebased on top of Jason's iommufd_hwpt branch:
https://lore.kernel.org/linux-iommu/0-v2-406f7ac07936+6a-iommufd_hwpt_jgg@n…
* Dropped patches from this series accordingly. There were a couple of
VFIO patches that will be submitted after the VFIO cdev series. Also,
renamed the series to be "emulated".
* Moved dma_unmap sanity patch to the first in the series.
* Moved dma_unmap sanity to cover both VFIO and IOMMUFD pathways.
* Added Kevin's "Reviewed-by" to two of the patches.
* Fixed a NULL pointer bug in vfio_iommufd_emulated_bind().
* Moved unmap() call to the common place in iommufd_access_set_ioas().
v2:
https://lore.kernel.org/linux-iommu/cover.1675802050.git.nicolinc@nvidia.co…
* Rebased on top of vfio_device cdev v2 series.
* Update the kdoc and commit message of iommu_group_replace_domain().
* Dropped revert-to-core-domain part in iommu_group_replace_domain().
* Dropped !ops->dma_unmap check in vfio_iommufd_emulated_attach_ioas().
* Added missing rc value in vfio_iommufd_emulated_attach_ioas() from the
iommufd_access_set_ioas() call.
* Added a new patch in vfio_main to deny vfio_pin/unpin_pages() calls if
vdev->ops->dma_unmap is not implemented.
* Added a __iommmufd_device_detach helper and let the replace routine do
a partial detach().
* Added restriction on auto_domains to use the replace feature.
* Added the patch "iommufd/device: Make hwpt_list list_add/del symmetric"
from the has_group removal series.
v1:
https://lore.kernel.org/linux-iommu/cover.1675320212.git.nicolinc@nvidia.co…
Hi all,
The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
to an S1 domain, or when switching between relevant S1 domains.
This series is rebased on top of Jason Gunthorpe's series that introduces
iommu_group_replace_domain API and IOMMUFD infrastructure for the IOMMUFD
"physical" devices. The IOMMUFD "emulated" deivces will need some extra
steps to replace the access->ioas object and its iopt pointer.
You can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v6
Thank you
Nicolin Chen
Nicolin Chen (4):
vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
iommufd: Add iommufd_access_replace() API
iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
vfio: Support IO page table replacement
drivers/iommu/iommufd/device.c | 53 ++++++++++++++-----
drivers/iommu/iommufd/iommufd_test.h | 4 ++
drivers/iommu/iommufd/selftest.c | 19 +++++++
drivers/vfio/iommufd.c | 11 ++--
drivers/vfio/vfio_main.c | 4 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/vfio.h | 6 +++
tools/testing/selftests/iommu/iommfd*.c | 0
tools/testing/selftests/iommu/iommufd.c | 29 +++++++++-
tools/testing/selftests/iommu/iommufd_utils.h | 19 +++++++
10 files changed, 127 insertions(+), 19 deletions(-)
create mode 100644 tools/testing/selftests/iommu/iommfd*.c
--
2.40.0
From: Rong Tao <rongtao(a)cestc.cn>
When the number of symbols is greater than MAX_SYMS (300000), the access
array struct ksym syms[MAX_SYMS] goes out of bounds, which will result in
a segfault.
Resolve this issue by judging the maximum number and exiting the loop, and
increasing the default size appropriately. (6.2.9 = 329839 below)
$ cat /proc/kallsyms | wc -l
329839
GDB debugging:
$ cd linux/samples/bpf
$ sudo gdb ./sampleip
...
(gdb) r
...
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7e2debf in malloc () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install
elfutils-libelf-0.189-1.fc37.x86_64 glibc-2.36-9.fc37.x86_64
libzstd-1.5.4-1.fc37.x86_64 zlib-1.2.12-5.fc37.x86_64
(gdb) bt
#0 0x00007ffff7e2debf in malloc () from /lib64/libc.so.6
#1 0x00007ffff7e33f8e in strdup () from /lib64/libc.so.6
#2 0x0000000000403fb0 in load_kallsyms_refresh() from trace_helpers.c
#3 0x00000000004038b2 in main ()
Signed-off-by: Rong Tao <rongtao(a)cestc.cn>
---
tools/testing/selftests/bpf/trace_helpers.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/trace_helpers.c b/tools/testing/selftests/bpf/trace_helpers.c
index 09a16a77bae4..a9d589c560d2 100644
--- a/tools/testing/selftests/bpf/trace_helpers.c
+++ b/tools/testing/selftests/bpf/trace_helpers.c
@@ -14,7 +14,7 @@
#define DEBUGFS "/sys/kernel/debug/tracing/"
-#define MAX_SYMS 300000
+#define MAX_SYMS 400000
static struct ksym syms[MAX_SYMS];
static int sym_cnt;
@@ -44,7 +44,8 @@ int load_kallsyms_refresh(void)
continue;
syms[i].addr = (long) addr;
syms[i].name = strdup(func);
- i++;
+ if (++i >= MAX_SYMS)
+ break;
}
fclose(f);
sym_cnt = i;
--
2.39.2
There is a spelling mistake in a test assert message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/kvm/aarch64/smccc_filter.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/aarch64/smccc_filter.c b/tools/testing/selftests/kvm/aarch64/smccc_filter.c
index 0f9db0641847..82650313451a 100644
--- a/tools/testing/selftests/kvm/aarch64/smccc_filter.c
+++ b/tools/testing/selftests/kvm/aarch64/smccc_filter.c
@@ -211,7 +211,7 @@ static void expect_call_fwd_to_user(struct kvm_vcpu *vcpu, uint32_t func_id,
"KVM_HYPERCALL_EXIT_SMC is not set");
else
TEST_ASSERT(!(run->hypercall.flags & KVM_HYPERCALL_EXIT_SMC),
- "KVM_HYPERCAL_EXIT_SMC is set");
+ "KVM_HYPERCALL_EXIT_SMC is set");
}
/* SMCCC calls forwarded to userspace cause KVM_EXIT_HYPERCALL exits */
--
2.30.2
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
When access traced function arguments with type is u32*, bpf verifier failed.
Because u32 have typedef, needs to skip modifier. Add btf_type_is_modifier in
is_int_ptr. Add a selftest to check it.
Feng Zhou (2):
bpf/btf: Fix is_int_ptr()
selftests/bpf: Add test to access u32 ptr argument in tracing program
Changelog:
v1->v2: Addressed comments from Martin KaFai Lau
- Add a selftest.
- use btf_type_skip_modifiers.
Some details in here:
https://lore.kernel.org/all/20221012125815.76120-1-zhouchengming@bytedance.…
kernel/bpf/btf.c | 5 ++---
net/bpf/test_run.c | 8 +++++++-
.../testing/selftests/bpf/verifier/btf_ctx_access.c | 13 +++++++++++++
3 files changed, 22 insertions(+), 4 deletions(-)
--
2.20.1
On Thu, 6 Apr 2023 09:53:37 -0700 Stefan Roesch <shr(a)devkernel.io> wrote:
> So far KSM can only be enabled by calling madvise for memory regions. To
> be able to use KSM for more workloads, KSM needs to have the ability to be
> enabled / disabled at the process / cgroup level.
>
> ...
>
> @@ -53,6 +62,18 @@ void folio_migrate_ksm(struct folio *newfolio, struct folio *folio);
>
> #else /* !CONFIG_KSM */
>
> +static inline int ksm_add_mm(struct mm_struct *mm)
> +{
> +}
The compiler doesn't like the lack of a return value.
I queued up a patch to simply delete the above function - seems that
ksm_add_mm() has no callers if CONFIG_KSM=n.
The same might be true of the ksm_add_vma()...ksm_exit() stubs also,
Perhaps some kind soul could take a look at whether we can simply clean
those out.
This patch set adds support for using FOU or GUE encapsulation with
an ipip device operating in collect-metadata mode and a set of kfuncs
for controlling encap parameters exposed to a BPF tc-hook.
BPF tc-hooks allow us to read tunnel metadata (like remote IP addresses)
in the ingress path of an externally controlled tunnel interface via
the bpf_skb_get_tunnel_{key,opt} bpf-helpers. Packets can then be
redirected to the same or a different externally controlled tunnel
interface by overwriting metadata via the bpf_skb_set_tunnel_{key,opt}
helpers and a call to bpf_redirect. This enables us to redirect packets
between tunnel interfaces - and potentially change the encapsulation
type - using only a single BPF program.
Today this approach works fine for a couple of tunnel combinations.
For example: redirecting packets between Geneve and GRE interfaces or
GRE and plain ipip interfaces. However, redirecting using FOU or GUE is
not supported today. The ip_tunnel module does not allow us to egress
packets using additional UDP encapsulation from an ipip device in
collect-metadata mode.
Patch 1 lifts this restriction by adding a struct ip_tunnel_encap to
the tunnel metadata. It can be filled by a new BPF kfunc introduced
in Patch 2 and evaluated by the ip_tunnel egress path. This will allow
us to use FOU and GUE encap with externally controlled ipip devices.
Patch 2 introduces two new BPF kfuncs: bpf_skb_{set,get}_fou_encap.
These helpers can be used to set and get UDP encap parameters from the
BPF tc-hook doing the packet redirect.
Patch 3 adds BPF tunnel selftests using the two kfuncs.
---
v2:
- Fixes for checkpatch.pl
- Fixes for kernel test robot
Christian Ehrig (3):
ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip
devices
bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs
selftests/bpf: Test FOU kfuncs for externally controlled ipip devices
include/net/fou.h | 2 +
include/net/ip_tunnels.h | 28 +++--
net/ipv4/Makefile | 2 +-
net/ipv4/fou_bpf.c | 119 ++++++++++++++++++
net/ipv4/fou_core.c | 5 +
net/ipv4/ip_tunnel.c | 22 +++-
net/ipv4/ipip.c | 1 +
net/ipv6/sit.c | 2 +-
.../selftests/bpf/progs/test_tunnel_kern.c | 117 +++++++++++++++++
tools/testing/selftests/bpf/test_tunnel.sh | 81 ++++++++++++
10 files changed, 362 insertions(+), 17 deletions(-)
create mode 100644 net/ipv4/fou_bpf.c
--
2.39.2
Hi Christophe Leroy and gpio experts,
Greeting!
Platform: Tigerlake-H and so on x86 platforms
All detailed info is in link: https://github.com/xupengfe/syzkaller_logs/tree/main/issue_bisect/230406_gp…
Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/issue_bisect/230406_gp…
gpio-mockup.sh kslef-test failed in v6.3-rc5 kernel.
gpio-mockup.sh(gpio overflow test) in kself-test could reproduce this issue:
cd linux/tools/testing/selftests
1. ./kselftest_install.sh
2. cd linux/tools/testing/selftests/kselftest_install/gpio
# ./gpio-mockup.sh
1. Module load tests
1.1. dynamic allocation of gpio
2. Module load error tests
2.1 gpio overflow
test failed: unexpected chip - gpiochip1
GPIO gpio-mockup test FAIL
And the simplified steps to reproduce this issue are as follow:
"
# Load gpio_mockup with overflow ranges -1,1024:
modprobe -q gpio_mockup gpio_mockup_ranges="-1,1024"
# Check is there some Call Trace generated in dmesg
dmesg | grep -C 5 Call
# Should not generate any gpiochip folder like /sys/kernel/debug/gpio-mockup/gpiochip1
# Because load gpio_mockup with overflow ranges -1,1024
find "/sys/kernel/debug/gpio-mockup/" -name gpiochip* -type d | sort
# Unload the gpio_mockup module
modprobe -r gpio_mockup
# Check is there "Call Trace" generated in dmesg
dmesg | grep -C 5 Call
"
Actually the judgement "gpio-mockup.sh" test/bisect judgement point is that:
Should not generate any gpiochip folder like
/sys/kernel/debug/gpio-mockup/gpiochip1 after load gpio_mockup with overflow
ranges -1,1024.
I met gpio-mockup.sh test failed but there is no any "Call Trace" dmesg info
sometimes.
So the shortest check steps are as follow:
"
1. modprobe -q gpio_mockup gpio_mockup_ranges="-1,1024"
After above gpio_mockup module loaded with overflow range "-1,1024":
Correct behavior as previous v6.1 or older kernel:"gpio should not load "gpiochip1" due to overflow range -1,1024";
Wrong behavior in v6.3-rc5 kernel: "gpio *load* "gpiochip1" with overflow range -1,1024 and "gpiochip1" should not be loaded".
The underlying problem was already buried here.
2. Could use below command to check if "gpiochip1" generated:
As before v6.1, there was no "/sys/kernel/debug/gpio-mockup/gpiochip1" sysfs folder due to overflow range -1,1024";
Wrong behavior in v6.3-rc5 kernel: "/sys/kernel/debug/gpio-mockup/gpiochip1" sysfs folder generated as follow command check:
# find "/sys/kernel/debug/gpio-mockup/" -name gpiochip* -type d | sort
/sys/kernel/debug/gpio-mockup/gpiochip1
If there is gpiochip* generated, gpio-mockup.sh kself-test would be failed also.
"
Bisected and found the bad commit was:
"
7b61212f2a07a5afd213c8876e52b5c9946441e2
gpiolib: Get rid of ARCH_NR_GPIOS
"
And after reverted the above commit on top of v6.3-rc5 kernel, above
gpio-mockup.sh kself-test could pass and this issue was gone.
Now gpio-mockup.sh kself-test is failed on almost all x86 platform from
v6.2 cycle mainline kernel.
I hope above info is helpful to solve the "gpio-mockup.sh kself-test failed"
problem.
Thanks!
BR.
Hi Linus,
Please pull the following Kselftest fixes update for Linux 6.3-rc6.
This Kselftest fixes update for Linux 6.3-rc6 consists of one single
fix to mount_setattr_test build failure.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 05107edc910135d27fe557267dc45be9630bf3dd:
selftests: sigaltstack: fix -Wuninitialized (2023-03-20 17:28:31 -0600)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-fixes-6.3-rc6
for you to fetch changes up to f1594bc676579133a3cd906d7d27733289edfb86:
selftests mount: Fix mount_setattr_test builds failed (2023-03-31 09:18:45 -0600)
----------------------------------------------------------------
linux-kselftest-fixes-6.3-rc6
This Kselftest fixes update for Linux 6.3-rc6 consists of one single
fix to mount_setattr_test build failure.
----------------------------------------------------------------
Anh Tuan Phan (1):
selftests mount: Fix mount_setattr_test builds failed
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
----------------------------------------------------------------
So far KSM can only be enabled by calling madvise for memory regions. To
be able to use KSM for more workloads, KSM needs to have the ability to be
enabled / disabled at the process / cgroup level.
Use case 1:
The madvise call is not available in the programming language. An example for
this are programs with forked workloads using a garbage collected language without
pointers. In such a language madvise cannot be made available.
In addition the addresses of objects get moved around as they are garbage
collected. KSM sharing needs to be enabled "from the outside" for these type of
workloads.
Use case 2:
The same interpreter can also be used for workloads where KSM brings no
benefit or even has overhead. We'd like to be able to enable KSM on a workload
by workload basis.
Use case 3:
With the madvise call sharing opportunities are only enabled for the current
process: it is a workload-local decision. A considerable number of sharing
opportuniites may exist across multiple workloads or jobs. Only a higler level
entity like a job scheduler or container can know for certain if its running
one or more instances of a job. That job scheduler however doesn't have
the necessary internal worklaod knowledge to make targeted madvise calls.
Security concerns:
In previous discussions security concerns have been brought up. The problem is
that an individual workload does not have the knowledge about what else is
running on a machine. Therefore it has to be very conservative in what memory
areas can be shared or not. However, if the system is dedicated to running
multiple jobs within the same security domain, its the job scheduler that has
the knowledge that sharing can be safely enabled and is even desirable.
Performance:
Experiments with using UKSM have shown a capacity increase of around 20%.
1. New options for prctl system command
This patch series adds two new options to the prctl system call. The first
one allows to enable KSM at the process level and the second one to query the
setting.
The setting will be inherited by child processes.
With the above setting, KSM can be enabled for the seed process of a cgroup
and all processes in the cgroup will inherit the setting.
2. Changes to KSM processing
When KSM is enabled at the process level, the KSM code will iterate over all
the VMA's and enable KSM for the eligible VMA's.
When forking a process that has KSM enabled, the setting will be inherited by
the new child process.
In addition when KSM is disabled for a process, KSM will be disabled for the
VMA's where KSM has been enabled.
3. Add general_profit metric
The general_profit metric of KSM is specified in the documentation, but not
calculated. This adds the general profit metric to /sys/kernel/debug/mm/ksm.
4. Add more metrics to ksm_stat
This adds the process profit and ksm type metric to /proc/<pid>/ksm_stat.
5. Add more tests to ksm_tests
This adds an option to specify the merge type to the ksm_tests. This allows to
test madvise and prctl KSM. It also adds a new option to query if prctl KSM has
been enabled. It adds a fork test to verify that the KSM process setting is
inherited by client processes.
Changes:
- V4:
- removing check in prctl for MMF_VM_MERGEABLE in PR_SET_MEMORY_MERGE
handling
- Checking for VM_MERGEABLE AND MMF_VM_MERGE_ANY to avoid chaning vm_flags
- This requires also checking that the vma is compatible. The
compatibility check is provided by a new helper
- processes which have set MMF_VM_MERGE_ANY, only need to call the
helper and not madvise.
- removed unmerge_vmas function, this function is no longer necessary,
clearing the MMF_VM_MERGE_ANY bit is sufficient
- V3:
- folded patch 1 - 6
- folded patch 7 - 14
- folded patch 15 - 19
- Expanded on the use cases in the cover letter
- Added a section on security concerns to the cover letter
- V2:
- Added use cases to the cover letter
- Removed the tracing patch from the patch series and posted it as an
individual patch
- Refreshed repo
Stefan Roesch (3):
mm: add new api to enable ksm per process
mm: add new KSM process and sysfs knobs
selftests/mm: add new selftests for KSM
Documentation/ABI/testing/sysfs-kernel-mm-ksm | 8 +
Documentation/admin-guide/mm/ksm.rst | 8 +-
fs/proc/base.c | 5 +
include/linux/ksm.h | 19 +-
include/linux/sched/coredump.h | 1 +
include/uapi/linux/prctl.h | 2 +
kernel/sys.c | 27 ++
mm/ksm.c | 137 +++++++---
tools/include/uapi/linux/prctl.h | 2 +
tools/testing/selftests/mm/Makefile | 2 +-
tools/testing/selftests/mm/ksm_tests.c | 254 +++++++++++++++---
11 files changed, 388 insertions(+), 77 deletions(-)
--
2.34.1
So far KSM can only be enabled by calling madvise for memory regions. To
be able to use KSM for more workloads, KSM needs to have the ability to be
enabled / disabled at the process / cgroup level.
Use case 1:
The madvise call is not available in the programming language. An example for
this are programs with forked workloads using a garbage collected language without
pointers. In such a language madvise cannot be made available.
In addition the addresses of objects get moved around as they are garbage
collected. KSM sharing needs to be enabled "from the outside" for these type of
workloads.
Use case 2:
The same interpreter can also be used for workloads where KSM brings no
benefit or even has overhead. We'd like to be able to enable KSM on a workload
by workload basis.
Use case 3:
With the madvise call sharing opportunities are only enabled for the current
process: it is a workload-local decision. A considerable number of sharing
opportunities may exist across multiple workloads or jobs (if they are part
of the same security domain). Only a higler level entity like a job scheduler
or container can know for certain if its running one or more instances of a
job. That job scheduler however doesn't have the necessary internal workload
knowledge to make targeted madvise calls.
Security concerns:
In previous discussions security concerns have been brought up. The problem is
that an individual workload does not have the knowledge about what else is
running on a machine. Therefore it has to be very conservative in what memory
areas can be shared or not. However, if the system is dedicated to running
multiple jobs within the same security domain, its the job scheduler that has
the knowledge that sharing can be safely enabled and is even desirable.
Performance:
Experiments with using UKSM have shown a capacity increase of around 20%.
1. New options for prctl system command
This patch series adds two new options to the prctl system call. The first
one allows to enable KSM at the process level and the second one to query the
setting.
The setting will be inherited by child processes.
With the above setting, KSM can be enabled for the seed process of a cgroup
and all processes in the cgroup will inherit the setting.
2. Changes to KSM processing
When KSM is enabled at the process level, the KSM code will iterate over all
the VMA's and enable KSM for the eligible VMA's.
When forking a process that has KSM enabled, the setting will be inherited by
the new child process.
In addition when KSM is disabled for a process, KSM will be disabled for the
VMA's where KSM has been enabled.
3. Add general_profit metric
The general_profit metric of KSM is specified in the documentation, but not
calculated. This adds the general profit metric to /sys/kernel/debug/mm/ksm.
4. Add more metrics to ksm_stat
This adds the process profit and ksm type metric to /proc/<pid>/ksm_stat.
5. Add more tests to ksm_tests
This adds an option to specify the merge type to the ksm_tests. This allows to
test madvise and prctl KSM. It also adds a new option to query if prctl KSM has
been enabled. It adds a fork test to verify that the KSM process setting is
inherited by client processes.
Changes:
- V5:
- When the prctl system call is invoked, mark all compatible VMA
as mergeable
- Instead of checcking during scan if VMA is mergeable, mark the VMA
mergeable when the VMA is created (in case the VMA is compatible)
- Remove earlier changes, they are no longer necessary
- Unset the flag MMF_VM_MERGE_ANY in gmap_mark_unmergeable().
- When unsetting the MMF_VM_MERGE_ANY flag with prctl, only unset the
flag
- Remove pages_volatile function (with the simplar general_profit calculation,
the function is no longer needed)
- Use simpler formula for calculation of general_profit
- V4:
- removing check in prctl for MMF_VM_MERGEABLE in PR_SET_MEMORY_MERGE
handling
- Checking for VM_MERGEABLE AND MMF_VM_MERGE_ANY to avoid chaning vm_flags
- This requires also checking that the vma is compatible. The
compatibility check is provided by a new helper
- processes which have set MMF_VM_MERGE_ANY, only need to call the
helper and not madvise.
- removed unmerge_vmas function, this function is no longer necessary,
clearing the MMF_VM_MERGE_ANY bit is sufficient
- V3:
- folded patch 1 - 6
- folded patch 7 - 14
- folded patch 15 - 19
- Expanded on the use cases in the cover letter
- Added a section on security concerns to the cover letter
- V2:
- Added use cases to the cover letter
- Removed the tracing patch from the patch series and posted it as an
individual patch
- Refreshed repo
Stefan Roesch (3):
mm: add new api to enable ksm per process
mm: add new KSM process and sysfs knobs
selftests/mm: add new selftests for KSM
Documentation/ABI/testing/sysfs-kernel-mm-ksm | 8 +
Documentation/admin-guide/mm/ksm.rst | 8 +-
arch/s390/mm/gmap.c | 1 +
fs/proc/base.c | 5 +
include/linux/ksm.h | 36 ++-
include/linux/sched/coredump.h | 1 +
include/uapi/linux/prctl.h | 2 +
kernel/fork.c | 1 +
kernel/sys.c | 24 ++
mm/ksm.c | 143 ++++++++--
mm/mmap.c | 7 +
tools/include/uapi/linux/prctl.h | 2 +
tools/testing/selftests/mm/Makefile | 2 +-
tools/testing/selftests/mm/ksm_tests.c | 254 +++++++++++++++---
14 files changed, 426 insertions(+), 68 deletions(-)
--
2.34.1
This patch set makes it possible to have synchronized dynamic ATU and FDB
entries on locked ports. As locked ports are not able to automatically
learn, they depend on userspace added entries, where userspace can add
static or dynamic entries. The lifetime of static entries are completely
dependent on userspace intervention, and thus not of interest here. We
are only concerned with dynamic entries, which can be added with a
command like:
bridge fdb replace ADDR dev <DEV> master dynamic
We choose only to support this feature on locked ports, as it involves
utilizing the CPU to handle ATU related switchcore events (typically
interrupts) and thus can result in significant performance loss if
exposed to heavy traffic.
On locked ports it is important for userspace to know when an authorized
station has become silent, hence not breaking the communication of a
station that has been authorized based on the MAC-Authentication Bypass
(MAB) scheme. Thus if the station keeps being active after authorization,
it will continue to have an open port as long as it is active. Only after
a silent period will it have to be reauthorized. As the ageing process in
the ATU is dependent on incoming traffic to the switchcore port, it is
necessary for the ATU to signal that an entry has aged out, so that the
FDB can be updated at the correct time.
This patch set includes a solution for the Marvell mv88e6xxx driver, where
for this driver we use the Hold-At-One feature so that an age-out
violation interrupt occurs when a station has been silent for the
system-set age time. The age out violation interrupt allows the switchcore
driver to remove both the ATU and the FDB entry at the same time.
It is up to the maintainers of other switchcore drivers to implement the
feature for their specific driver.
LOG:
V2: Ensure the port is locked when using the feature as we
must ensure that learning is enabled at all times for
the interrupts to occur. This was missed in the previous
version.
Instead of ignoring unsupported flags, ensure that
drivers are only called when supporting the feature.
As 'dynamic' flag is legacy, all drivers support it at
least by their previous handling.
Hans J. Schultz (6):
net: bridge: add dynamic flag to switchdev notifier
net: dsa: propagate flags down towards drivers
drivers: net: dsa: add fdb entry flags incoming to switchcore drivers
net: bridge: ensure FDB offloaded flag is handled as needed
net: dsa: mv88e6xxx: implementation of dynamic ATU entries
selftests: forwarding: add dynamic FDB test
drivers/net/dsa/b53/b53_common.c | 4 +-
drivers/net/dsa/b53/b53_priv.h | 4 +-
drivers/net/dsa/hirschmann/hellcreek.c | 4 +-
drivers/net/dsa/lan9303-core.c | 4 +-
drivers/net/dsa/lantiq_gswip.c | 4 +-
drivers/net/dsa/microchip/ksz_common.c | 6 +-
drivers/net/dsa/mt7530.c | 4 +-
drivers/net/dsa/mv88e6xxx/chip.c | 20 ++++--
drivers/net/dsa/mv88e6xxx/chip.h | 9 ++-
drivers/net/dsa/mv88e6xxx/global1_atu.c | 21 +++++++
drivers/net/dsa/mv88e6xxx/port.c | 6 +-
drivers/net/dsa/mv88e6xxx/switchdev.c | 61 +++++++++++++++++++
drivers/net/dsa/mv88e6xxx/switchdev.h | 5 ++
drivers/net/dsa/mv88e6xxx/trace.h | 5 ++
drivers/net/dsa/ocelot/felix.c | 4 +-
drivers/net/dsa/qca/qca8k-common.c | 4 +-
drivers/net/dsa/qca/qca8k.h | 4 +-
drivers/net/dsa/rzn1_a5psw.c | 4 +-
drivers/net/dsa/sja1105/sja1105_main.c | 11 ++--
include/net/dsa.h | 9 ++-
include/net/switchdev.h | 1 +
net/bridge/br_fdb.c | 5 +-
net/bridge/br_switchdev.c | 1 +
net/dsa/dsa.c | 6 ++
net/dsa/port.c | 28 +++++----
net/dsa/port.h | 8 +--
net/dsa/slave.c | 20 ++++--
net/dsa/switch.c | 26 +++++---
net/dsa/switch.h | 1 +
.../net/forwarding/bridge_locked_port.sh | 36 +++++++++++
30 files changed, 258 insertions(+), 67 deletions(-)
--
2.34.1
At present the kselftest header can't be used with nolibc since it makes
use of vprintf() which is not available in nolibc and seems like it would
be inappropriate to implement given the minimal system requirements and
environment intended for nolibc. This has resulted in some open coded
kselftests which use nolibc to test features that are supposed to be
controlled via libc and therefore better exercised in an environment with
no libc.
Rather than continue this let's factor out the I/O routines in kselftest.h
into a separate header file and provide a nolibc implementation which only
allows simple strings to be provided rather than full printf() support.
This is limiting but a great improvement on sharing no code at all.
As an example of using this I've updated the arm64 za-fork test to use
the standard kselftest.h.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (2):
kselftest: Support nolibc
kselftest/arm64: Convert za-fork to use kselftest.h
tools/testing/selftests/arm64/fp/Makefile | 2 +-
tools/testing/selftests/arm64/fp/za-fork.c | 88 +++--------------
tools/testing/selftests/kselftest-nolibc.h | 93 ++++++++++++++++++
tools/testing/selftests/kselftest-std.h | 151 +++++++++++++++++++++++++++++
tools/testing/selftests/kselftest.h | 149 +++-------------------------
5 files changed, 272 insertions(+), 211 deletions(-)
---
base-commit: e8d018dd0257f744ca50a729e3d042cf2ec9da65
change-id: 20230405-kselftest-nolibc-cb2ce0446d09
Best regards,
--
Mark Brown <broonie(a)kernel.org>
On 10.03.23 19:28, Stefan Roesch wrote:
> This adds the general_profit KSM sysfs knob and the process profit metric
> and process merge type knobs to ksm_stat.
>
> 1) split off pages_volatile function
>
> This splits off the pages_volatile function. The next patch will
> use this function.
>
> 2) expose general_profit metric
>
> The documentation mentions a general profit metric, however this
> metric is not calculated. In addition the formula depends on the size
> of internal structures, which makes it more difficult for an
> administrator to make the calculation. Adding the metric for a better
> user experience.
>
> 3) document general_profit sysfs knob
>
> 4) calculate ksm process profit metric
>
> The ksm documentation mentions the process profit metric and how to
> calculate it. This adds the calculation of the metric.
>
> 5) add ksm_merge_type() function
>
> This adds the ksm_merge_type function. The function returns the
> merge type for the process. For madvise it returns "madvise", for
> prctl it returns "process" and otherwise it returns "none".
>
> 6) mm: expose ksm process profit metric and merge type in ksm_stat
>
> This exposes the ksm process profit metric in /proc/<pid>/ksm_stat.
> The name of the value is ksm_merge_type. The documentation mentions
> the formula for the ksm process profit metric, however it does not
> calculate it. In addition the formula depends on the size of internal
> structures. So it makes sense to expose it.
>
> 7) document new procfs ksm knobs
>
Often, when you have to start making a list of things that a patch does,
it might make sense to split some of the items into separate patches
such that you can avoid lists and just explain in list-free text how the
pieces in the patch fit together.
I'd suggest splitting this patch into logical pieces. For example,
separating the general profit calculation/exposure from the per-mm
profit and the per-mm ksm type indication.
> Link: https://lkml.kernel.org/r/20230224044000.3084046-3-shr@devkernel.io
> Signed-off-by: Stefan Roesch <shr(a)devkernel.io>
> Reviewed-by: Bagas Sanjaya <bagasdotme(a)gmail.com>
> Cc: David Hildenbrand <david(a)redhat.com>
> Cc: Johannes Weiner <hannes(a)cmpxchg.org>
> Cc: Michal Hocko <mhocko(a)suse.com>
> Cc: Rik van Riel <riel(a)surriel.com>
> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
> ---
[...]
> KSM_ATTR_RO(pages_volatile);
>
> @@ -3280,6 +3305,21 @@ static ssize_t zero_pages_sharing_show(struct kobject *kobj,
> }
> KSM_ATTR_RO(zero_pages_sharing);
>
> +static ssize_t general_profit_show(struct kobject *kobj,
> + struct kobj_attribute *attr, char *buf)
> +{
> + long general_profit;
> + long all_rmap_items;
> +
> + all_rmap_items = ksm_max_page_sharing + ksm_pages_shared +
> + ksm_pages_unshared + pages_volatile();
Are you sure you want to count a config knob (ksm_max_page_sharing) into
that formula? I yet have to digest what this calculation implies, but it
does feel odd.
Further, maybe just avoid pages_volatile(). Expanding the formula
(excluding ksm_max_page_sharing for now):
all_rmap = ksm_pages_shared + ksm_pages_unshared + pages_volatile();
-> expand pages_volatile() (ignoring the < 0 case)
all_rmap = ksm_pages_shared + ksm_pages_unshared + ksm_rmap_items -
ksm_pages_shared - ksm_pages_sharing - ksm_pages_unshared;
-> simplify
all_rmap = ksm_rmap_items + ksm_pages_sharing;
Or is the < 0 case relevant here?
> + general_profit = ksm_pages_sharing * PAGE_SIZE -
> + all_rmap_items * sizeof(struct ksm_rmap_item);
> +
> + return sysfs_emit(buf, "%ld\n", general_profit);
> +}
> +KSM_ATTR_RO(general_profit);
> +
> static ssize_t stable_node_dups_show(struct kobject *kobj,
> struct kobj_attribute *attr, char *buf)
> {
> @@ -3345,6 +3385,7 @@ static struct attribute *ksm_attrs[] = {
> &stable_node_dups_attr.attr,
> &stable_node_chains_prune_millisecs_attr.attr,
> &use_zero_pages_attr.attr,
> + &general_profit_attr.attr,
> NULL,
> };
>
The calculations (profit) don't include when KSM places the shared
zeropage I guess. Accounting that per MM (and eventually globally) is in
the works. [1]
[1]
https://lore.kernel.org/lkml/20230328153852.26c2577e4bd921c371c47a7e@linux-…
--
Thanks,
David / dhildenb
Commit 515bddf0ec41 ("selftests/clone3: test clone3 with CLONE_NEWTIME")
added an additional test, so the number passed to ksft_set_plan needs to
be bumped accordingly.
Also use ksft_finish to print results and exit. This will catch future
mismatches between ksft_set_plan and the number of tests being run.
Fixes: 515bddf0ec41 ("selftests/clone3: test clone3 with CLONE_NEWTIME")
Cc: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Tobias Klauser <tklauser(a)distanz.ch>
---
tools/testing/selftests/clone3/clone3.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index 4fce46afe6db..7b45c9854202 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -129,7 +129,7 @@ int main(int argc, char *argv[])
uid_t uid = getuid();
ksft_print_header();
- ksft_set_plan(17);
+ ksft_set_plan(18);
test_clone3_supported();
/* Just a simple clone3() should return 0.*/
--
2.39.1