Hi, Willy
Most of the suggestions of v2 [1] have been applied in this v3 revision,
except the local menuconfig and mrproper targets, as explained in [2].
A fresh run with tinyconfig for ppc, ppc64 and ppc64le:
$ for arch in ppc ppc64 ppc64le; do \
mkdir -p $PWD/kernel-$arch;
time make defconfig run DEFCONFIG=tinyconfig ARCH=$arch O=$PWD/kernel-$arch RUN_OUT=$PWD/run.$arch.out;
done
rerun for ppc, ppc64 and ppc64le:
$ for arch in ppc ppc64 ppc64le; do \
make rerun ARCH=$arch O=$PWD/kernel-$arch RUN_OUT=$PWD/run.$arch.out;
done
Running /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/kernel-ppc/vmlinux on qemu-system-ppc
>> [ppc] Kernel command line: console=ttyS0 panic=-1
printk: console [ttyS0] enabled
Run /init as init process
Running test 'startup'
Running test 'syscall'
Running test 'stdlib'
Running test 'vfprintf'
Running test 'protection'
Leaving init with final status: 0
reboot: Power down
powered off, test finish
qemu-system-ppc: terminating on signal 15 from pid 190248 ()
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
See all results in /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/run.ppc.out
Running /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/kernel-ppc64/vmlinux on qemu-system-ppc64
Linux version 6.4.0+ (ubuntu@linux-lab) (powerpc64le-linux-gnu-gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #2 SMP Fri Jul 28 01:40:55 CST 2023
Kernel command line: console=hvc0 panic=-1
printk: console [hvc0] enabled
printk: console [hvc0] enabled
Run /init as init process
Running test 'startup'
Running test 'syscall'
Running test 'stdlib'
Running test 'vfprintf'
Running test 'protection'
Leaving init with final status: 0
reboot: Power down
powered off, test finish
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
See all results in /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/run.ppc64.out
Running /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/kernel-ppc64le/arch/powerpc/boot/zImage on qemu-system-ppc64le
Linux version 6.4.0+ (ubuntu@linux-lab) (powerpc64le-linux-gnu-gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #2 SMP Fri Jul 28 01:41:12 CST 2023
Kernel command line: console=hvc0 panic=-1
Run /init as init process
Running test 'startup'
Running test 'syscall'
Running test 'stdlib'
Running test 'vfprintf'
Running test 'protection'
Leaving init with final status: 0
reboot: Power down
powered off, test finish
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
See all results in /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/run.ppc64le.out
A fast report on existing test logs:
$ for arch in ppc ppc64 ppc64le; do \
make report ARCH=$arch RUN_OUT=$PWD/run.$arch.out | grep status; \
done
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
Changes from v2 --> v3:
* selftests/nolibc: allow report with existing test log
selftests/nolibc: fix up O= option support
selftests/nolibc: allow customize CROSS_COMPILE by architecture
selftests/nolibc: customize CROSS_COMPILE for 32/64-bit powerpc
selftests/nolibc: tinyconfig: add extra common options
No Change.
* selftests/nolibc: add macros to reduce duplicated changes
Remove REPORT_RUN_OUT and LOG_OUT.
* selftests/nolibc: string the core targets
Removed extconfig target from our v3 powerpc patchset [3], the
operations have been merged into the defconfig target.
Let kernel depends on $(KERNEL_CONFIG) instead of the removed
extconfig.
* selftests/nolibc: add menuconfig and mrproper for development
like the other local nolibc targets, still require local menuconfig
and mrproper targets for consistent usage with the same ARCH and no -C
/path/to/srctree
Merge them together to reduce duplicated entries.
* selftests/nolibc: allow quit qemu-system when poweroff fails
Enhance timeout logic with more expected strings print and detection
about the booting of bios, kernel, init and test.
Add a default 10 seconds of QEMU_TIMEOUT for every architecture to
detect all of the potential boog hang or failed poweroff.
* selftests/nolibc: customize QEMU_TIMEOUT for ppc64/ppc64le
Reduce QEMU_TIMEOUT from 60 seconds to a more normal 15 and 20
seconds for ppc64 and ppc64le respectively. the main time cost is
the slow bios used.
* selftests/nolibc: tinyconfig: add support for 32/64-bit powerpc
Rename the file names to shorter ones as suggestions from the powerpc
patchset.
* selftests/nolibc: speed up some targets with multiple jobs
New to speed up with -j<N> by default.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1689759351.git.falcon@tinylab.org/
[2]: https://lore.kernel.org/lkml/20230727132418.117924-1-falcon@tinylab.org/
[3]: https://lore.kernel.org/lkml/8e9e5ac6283c6ec2ecf10a70ce55b219028497c1.16904…
Zhangjin Wu (12):
selftests/nolibc: allow report with existing test log
selftests/nolibc: add macros to reduce duplicated changes
selftests/nolibc: fix up O= option support
selftests/nolibc: string the core targets
selftests/nolibc: allow customize CROSS_COMPILE by architecture
selftests/nolibc: customize CROSS_COMPILE for 32/64-bit powerpc
selftests/nolibc: add menuconfig and mrproper for development
selftests/nolibc: allow quit qemu-system when poweroff fails
selftests/nolibc: customize QEMU_TIMEOUT for ppc64/ppc64le
selftests/nolibc: tinyconfig: add extra common options
selftests/nolibc: tinyconfig: add support for 32/64-bit powerpc
selftests/nolibc: speed up some targets with multiple jobs
tools/testing/selftests/nolibc/Makefile | 102 ++++++++++++++----
.../selftests/nolibc/configs/common.config | 4 +
.../selftests/nolibc/configs/ppc.config | 3 +
.../selftests/nolibc/configs/ppc64.config | 3 +
.../selftests/nolibc/configs/ppc64le.config | 4 +
5 files changed, 98 insertions(+), 18 deletions(-)
create mode 100644 tools/testing/selftests/nolibc/configs/common.config
create mode 100644 tools/testing/selftests/nolibc/configs/ppc64.config
create mode 100644 tools/testing/selftests/nolibc/configs/ppc64le.config
--
2.25.1
Hi, Willy, Thomas
v3 here is to fix up two issues introduced in v2 powerpc patchset [1].
- One is restore the wrongly removed '\' for a '\$$ARCH'
- Another is add the missing $(ARCH).config for ppc, the default variant
for powerpc is renamed to ppc in v2 (as discussed with Willy in [2]), but
ppc.config is missing in v2 patchset, not sure why this happen, may a
'git clean -fdx .' is required to do a new test, just did it.
Btw, the v3 tinyconfig-part1 for powerpc is ready, I will send it out
soon.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1690373704.git.falcon@tinylab.org/
[2]: https://lore.kernel.org/lkml/ZL9leVOI25S2+0+g@1wt.eu/
Zhangjin Wu (7):
tools/nolibc: add support for powerpc
tools/nolibc: add support for powerpc64
selftests/nolibc: add extra configs customize support
selftests/nolibc: add XARCH and ARCH mapping support
selftests/nolibc: add test support for ppc
selftests/nolibc: add test support for ppc64le
selftests/nolibc: add test support for ppc64
tools/include/nolibc/arch-powerpc.h | 202 ++++++++++++++++++
tools/include/nolibc/arch.h | 2 +
tools/testing/selftests/nolibc/Makefile | 46 +++-
.../selftests/nolibc/configs/ppc.config | 3 +
4 files changed, 246 insertions(+), 7 deletions(-)
create mode 100644 tools/include/nolibc/arch-powerpc.h
create mode 100644 tools/testing/selftests/nolibc/configs/ppc.config
--
2.25.1
Hi,
Zhangjin and I are working on a tiny shell with nolibc. This patch
enables the missing pipe() with its testcase.
Thanks.
Yuan Tan (2):
tools/nolibc: add pipe() support
selftests/nolibc: add testcase for pipe.
tools/include/nolibc/sys.h | 17 ++++++++++
tools/testing/selftests/nolibc/nolibc-test.c | 34 ++++++++++++++++++++
2 files changed, 51 insertions(+)
--
2.39.2
Hi,
This is the semi-friendly patch-bot of Greg Kroah-Hartman.
Markus, you seem to have sent a nonsensical or otherwise pointless
review comment to a patch submission on a Linux kernel developer mailing
list. I strongly suggest that you not do this anymore. Please do not
bother developers who are actively working to produce patches and
features with comments that, in the end, are a waste of time.
Patch submitter, please ignore Markus's suggestion; you do not need to
follow it at all. The person/bot/AI that sent it is being ignored by
almost all Linux kernel maintainers for having a persistent pattern of
behavior of producing distracting and pointless commentary, and
inability to adapt to feedback. Please feel free to also ignore emails
from them.
thanks,
greg k-h's patch email bot
damos_new_filter() is returning a damos_filter struct without
initializing its ->list field. And the users of the function uses the
struct without initializing the field. As a result, uninitialized
memory access error is possible. Actually, a kernel NULL pointer
dereference BUG can be triggered using DAMON user-space tool, like
below.
# damo start --damos_action stat --damos_filter anon matching
# damo tune --damos_action stat --damos_filter anon matching --damos_filter anon nomatching
# dmesg
[...]
[ 36.908136] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 36.910483] #PF: supervisor write access in kernel mode
[ 36.912238] #PF: error_code(0x0002) - not-present page
[ 36.913415] PGD 0 P4D 0
[ 36.913978] Oops: 0002 [#1] PREEMPT SMP PTI
[ 36.914878] CPU: 32 PID: 1335 Comm: kdamond.0 Not tainted 6.5.0-rc3-mm-unstable-damon+ #1
[ 36.916621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 36.919051] RIP: 0010:damos_destroy_filter (include/linux/list.h:114 include/linux/list.h:137 include/linux/list.h:148 mm/damon/core.c:345 mm/damon/core.c:355)
[...]
[ 36.938247] Call Trace:
[ 36.938721] <TASK>
[...]
[ 36.950064] ? damos_destroy_filter (include/linux/list.h:114 include/linux/list.h:137 include/linux/list.h:148 mm/damon/core.c:345 mm/damon/core.c:355)
[ 36.950883] ? damon_sysfs_set_scheme_filters.isra.0 (mm/damon/sysfs-schemes.c:1573)
[ 36.952019] damon_sysfs_set_schemes (mm/damon/sysfs-schemes.c:1674 mm/damon/sysfs-schemes.c:1686)
[ 36.952875] damon_sysfs_apply_inputs (mm/damon/sysfs.c:1312 mm/damon/sysfs.c:1298)
[ 36.953757] ? damon_pa_check_accesses (mm/damon/paddr.c:168 mm/damon/paddr.c:179)
[ 36.954648] damon_sysfs_cmd_request_callback (mm/damon/sysfs.c:1329 mm/damon/sysfs.c:1359)
[...]
The first patch of this patchset fixes the bug by initializing the field in
damos_new_filter(). The second patch adds a unit test for the problem.
Note that the second patch Cc stable@ without Fixes: tag, since it would
be better to be ingested together for avoiding any future regression.
SeongJae Park (2):
mm/damon/core: initialize damo_filter->list from damos_new_filter()
mm/damon/core-test: add a test for damos_new_filter()
mm/damon/core-test.h | 13 +++++++++++++
mm/damon/core.c | 1 +
2 files changed, 14 insertions(+)
--
2.25.1
Hi, Willy, Thomas
Here is the first part of v2 of our tinyconfig support for nolibc-test
[1], the patchset subject is reserved as before.
As discussed in v1 thread [1], to easier the review progress, the whole
tinyconfig support is divided into several parts, mainly by
architecture, here is the first part, include basic preparation and
powerpc example.
This patchset should be applied after the 32/64-bit powerpc support [2],
exactly these two are required by us:
* selftests/nolibc: add extra config file customize support
* selftests/nolibc: add XARCH and ARCH mapping support
In this patchset, we firstly add some misc preparations and at last add
the tinyconfig target and use powerpc as the first example.
Tests:
// powerpc run-user
$ for arch in powerpc powerpc64 powerpc64le; do \
rm -rf $PWD/kernel-$arch; \
mkdir -p $PWD/kernel-$arch; \
make run-user XARCH=$arch O=$PWD/kernel-$arch RUN_OUT=$PWD/run.$arch.out | grep "status: "; \
done
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
// powerpc run
$ for arch in powerpc powerpc64 powerpc64le; do \
rm -rf $PWD/kernel-$arch; \
mkdir -p $PWD/kernel-$arch; \
make tinyconfig run XARCH=$arch O=$PWD/kernel-$arch RUN_OUT=$PWD/run.$arch.out; \
done
$ for arch in powerpc powerpc64 powerpc64le; do \
make report XARCH=$arch O=$PWD/kernel-$arch RUN_OUT=$PWD/run.$arch.out | grep "status: "; \
done
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
// the others, randomly choose some
$ make run-user XARCH=arm O=$PWD/kernel-arm RUN_OUT=$PWD/run.arm.out CROSS_COMPILE=arm-linux-gnueabi- | grep status:
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
$ make run-user XARCH=x86_64 O=$PWD/kernel-arm RUN_OUT=$PWD/run.x86_64.out CROSS_COMPILE=x86_64-linux-gnu- | grep status:
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
$ make run-libc-test | grep status:
165 test(s): 153 passed, 12 skipped, 0 failed => status: warning
// x86_64, require noapic kernel command line option for old qemu-system-x86_64 (v4.2.1)
$ make run XARCH=x86_64 O=$PWD/kernel-x86_64 RUN_OUT=$PWD/run.x86_64.out CROSS_COMPILE=x86_64-linux-gnu- | grep status
$ make rerun XARCH=x86_64 O=$PWD/kernel-x86_64 RUN_OUT=$PWD/run.x86_64.out CROSS_COMPILE=x86_64-linux-gnu- | grep status
165 test(s): 159 passed, 6 skipped, 0 failed => status: warning
tinyconfig mainly targets as a time-saver, the misc preparations service
for the same goal, let's take a look:
* selftests/nolibc: allow report with existing test log
Like rerun without rebuild, Add report (without rerun) to summarize
the existing test log, this may work perfectly with the 'grep status'
* selftests/nolibc: add macros to enhance maintainability
Several macros are added to dedup the shared code to shrink lines
and easier the maintainability
The macros are added just before the using area to avoid code change
conflicts in the future.
* selftests/nolibc: print running log to screen
Enable logging to let developers learn what is happening at the
first glance, without the need to edit the Makefile and rerun it.
These helps a lot when there is a long-time running, a failed
poweroff or even a forever hang.
For test summmary, the 'grep status' can be used together with the
standalone report target.
* selftests/nolibc: fix up O= option support
With objtree instead srctree for .config and IMAGE, now, O= works.
Using O=$PWD/kernel-$arch avoid the mrproer for every build.
* selftests/nolibc: add menuconfig for development
Allow manually tuning some options, mainly for a new architecture
porting.
* selftests/nolibc: add mrproper for development
selftests/nolibc: defconfig: remove mrproper target
Split the mrproper target out of defconfig, when with O=, mrproper is not
required by defconfig, but reserve it for the other use scenes.
* selftests/nolibc: string the core targets
Allow simply 'make run' instead of 'make defconfig; make extconfig;
make kernel; make run'.
* selftests/nolibc: allow quit qemu-system when poweroff fails
When poweroff fails, allow exit while detects the power off string
from output or the wait time is too long (specified by QEMU_TIMEOUT).
This helps the boards who have no poweroff support or the kernel not
enable the poweroff options (mainly for tinyconfig).
* selftests/nolibc: allow customize CROSS_COMPILE by architecture
* selftests/nolibc: customize CROSS_COMPILE for 32/64-bit powerpc
This further saves a CROSS_COMPILE option for 'make run', it is very
important when iterates all of the supported architectures and the
compilers are not just prefixed with the XARCH variable.
For example, binary of big endian powerpc64 can be compiled with
powerpc64le-linux-gnu-, but the prefix is powerpc64le.
Even if the pre-customized compiler not exist, we can configure
CROSS_COMPILE_<ARCH> before the test loop to use the code.
* selftests/nolibc: add tinyconfig target
selftests/nolibc: tinyconfig: add extra common options
selftests/nolibc: tinyconfig: add support for 32/64-bit powerpc
Here is the first architecture(and its variants) support tinyconfig.
powerpc is actually a very good architecture, for it has 'various'
variants for test.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1687706332.git.falcon@tinylab.org/
[2]: https://lore.kernel.org/lkml/cover.1689713175.git.falcon@tinylab.org/
Zhangjin Wu (14):
selftests/nolibc: allow report with existing test log
selftests/nolibc: add macros to enhance maintainability
selftests/nolibc: print running log to screen
selftests/nolibc: fix up O= option support
selftests/nolibc: add menuconfig for development
selftests/nolibc: add mrproper for development
selftests/nolibc: defconfig: remove mrproper target
selftests/nolibc: string the core targets
selftests/nolibc: allow quit qemu-system when poweroff fails
selftests/nolibc: allow customize CROSS_COMPILE by architecture
selftests/nolibc: customize CROSS_COMPILE for 32/64-bit powerpc
selftests/nolibc: add tinyconfig target
selftests/nolibc: tinyconfig: add extra common options
selftests/nolibc: tinyconfig: add support for 32/64-bit powerpc
tools/testing/selftests/nolibc/Makefile | 102 ++++++++++++++----
.../selftests/nolibc/configs/common.config | 4 +
.../selftests/nolibc/configs/powerpc.config | 3 +
.../selftests/nolibc/configs/powerpc64.config | 3 +
.../nolibc/configs/powerpc64le.config | 4 +
5 files changed, 98 insertions(+), 18 deletions(-)
create mode 100644 tools/testing/selftests/nolibc/configs/common.config
create mode 100644 tools/testing/selftests/nolibc/configs/powerpc64.config
create mode 100644 tools/testing/selftests/nolibc/configs/powerpc64le.config
--
2.25.1
Submit the top-level headers also from the kunit test module notifier
initialization callback, so external tools that are parsing dmesg for
kunit test output are able to tell how many test suites should be expected
and whether to continue parsing after complete output from the first test
suite is collected.
Extend kunit module notifier initialization callback with a processing
path for only listing the tests provided by a module if the kunit action
parameter is set to "list", so external tools can obtain a list of test
cases to be executed in advance and can make a better job on assigning
kernel messages interleaved with kunit output to specific tests.
Use test filtering functions in kunit module notifier callback functions,
so external tools are able to execute individual test cases from kunit
test modules in order to still better isolate their potential impact on
kernel messages that appear interleaved with output from other tests.
Janusz Krzysztofik (3):
kunit: Report the count of test suites in a module
kunit: Make 'list' action available to kunit test modules
kunit: Allow kunit test modules to use test filtering
include/kunit/test.h | 14 +++++++++++
lib/kunit/executor.c | 51 ++++++++++++++++++++++-----------------
lib/kunit/test.c | 57 +++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 99 insertions(+), 23 deletions(-)
--
2.41.0
Hi, Willy
Here is the powerpc support, includes 32-bit big-endian powerpc, 64-bit
little endian and big endian powerpc.
All of them passes run-user with the default powerpc toolchain from
Ubuntu 20.04:
$ make run-user DEFCONFIG=tinyconfig XARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
$ make run-user DEFCONFIG=tinyconfig XARCH=powerpc64 CROSS_COMPILE=powerpc64le-linux-gnu- | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
$ make run-user DEFCONFIG=tinyconfig XARCH=powerpc64le CROSS_COMPILE=powerpc64le-linux-gnu- | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
For the slow 'run' target, I have run with defconfig before, and just
verified them via the fast tinyconfig + run with a new patch from next
patchset, all of them passes:
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
Note, the big endian crosstool powerpc64-linux-gcc from
https://mirrors.edge.kernel.org/pub/tools/crosstool/ has been used to
test both little endian and big endian powerpc64 too, both passed.
Here simply explain what they are:
* tools/nolibc: add support for powerpc
tools/nolibc: add support for powerpc64
32-bit & 64-bit powerpc support of nolibc.
* selftests/nolibc: select_null: fix up for big endian powerpc64
fix up a test case for big endian powerpc64.
* selftests/nolibc: add extra config file customize support
add extconfig target to allow enable extra config options via
configs/<ARCH>.config
applied suggestion from Thomas to use config files instead of config
lines.
* selftests/nolibc: add XARCH and ARCH mapping support
applied suggestions from Willy, use XARCH as the input of our nolibc
test, use ARCH as the pure kernel input, at last build the mapping
between XARCH and ARCH.
Customize the variables via the input XARCH.
* selftests/nolibc: add test support for powerpc
Require to use extconfig to enable the console options specified in
configs/powerpc.config
currently, we should manually run extconfig after defconfig, in next
patchset, we will do this automatically.
* selftests/nolibc: add test support for powerpc64le
selftests/nolibc: add test support for powerpc64
Very simple, but customize CFLAGS carefully to let them work with
powerpc64le-linux-gnu-gcc (from Linux distributions) and
powerpc64-linux-gcc (from mirrors.edge.kernel.org)
The next patchset will not be tinyconfig, but some prepare patches, will
be sent out soon.
Best regards,
Zhangjin
---
Zhangjin Wu (8):
tools/nolibc: add support for powerpc
tools/nolibc: add support for powerpc64
selftests/nolibc: select_null: fix up for big endian powerpc64
selftests/nolibc: add extra config file customize support
selftests/nolibc: add XARCH and ARCH mapping support
selftests/nolibc: add test support for powerpc
selftests/nolibc: add test support for powerpc64le
selftests/nolibc: add test support for powerpc64
tools/include/nolibc/arch-powerpc.h | 170 ++++++++++++++++++
tools/testing/selftests/nolibc/Makefile | 55 ++++--
.../selftests/nolibc/configs/powerpc.config | 3 +
tools/testing/selftests/nolibc/nolibc-test.c | 2 +-
4 files changed, 217 insertions(+), 13 deletions(-)
create mode 100644 tools/include/nolibc/arch-powerpc.h
create mode 100644 tools/testing/selftests/nolibc/configs/powerpc.config
--
2.25.1
If the test description is longer than the status alignment the
parameter 'n' to putcharn() would lead to a signed underflow that then
gets converted to a very large unsigned value.
This in turn leads out-of-bound writes in memset() crashing the
application.
The failure case of EXPECT_PTRER() used in "mmap_bad" exhibits this
exact behavior.
Fixes: 8a27526f49f9 ("selftests/nolibc: add EXPECT_PTREQ, EXPECT_PTRNE and EXPECT_PTRER")
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
tools/testing/selftests/nolibc/nolibc-test.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/nolibc/nolibc-test.c b/tools/testing/selftests/nolibc/nolibc-test.c
index 03b1d30f5507..9b76603e4ce3 100644
--- a/tools/testing/selftests/nolibc/nolibc-test.c
+++ b/tools/testing/selftests/nolibc/nolibc-test.c
@@ -151,7 +151,8 @@ static void result(int llen, enum RESULT r)
else
msg = "[FAIL]";
- putcharn(' ', 64 - llen);
+ if (llen < 64)
+ putcharn(' ', 64 - llen);
puts(msg);
}
---
base-commit: dfef4fc45d5713eb23d87f0863aff9c33bd4bfaf
change-id: 20230726-nolibc-result-width-1f4b0b4f3ca0
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
=== Context ===
In the context of a middlebox, fragmented packets are tricky to handle.
The full 5-tuple of a packet is often only available in the first
fragment which makes enforcing consistent policy difficult. There are
really only two stateless options, neither of which are very nice:
1. Enforce policy on first fragment and accept all subsequent fragments.
This works but may let in certain attacks or allow data exfiltration.
2. Enforce policy on first fragment and drop all subsequent fragments.
This does not really work b/c some protocols may rely on
fragmentation. For example, DNS may rely on oversized UDP packets for
large responses.
So stateful tracking is the only sane option. RFC 8900 [0] calls this
out as well in section 6.3:
Middleboxes [...] should process IP fragments in a manner that is
consistent with [RFC0791] and [RFC8200]. In many cases, middleboxes
must maintain state in order to achieve this goal.
=== BPF related bits ===
Policy has traditionally been enforced from XDP/TC hooks. Both hooks
run before kernel reassembly facilities. However, with the new
BPF_PROG_TYPE_NETFILTER, we can rather easily hook into existing
netfilter reassembly infra.
The basic idea is we bump a refcnt on the netfilter defrag module and
then run the bpf prog after the defrag module runs. This allows bpf
progs to transparently see full, reassembled packets. The nice thing
about this is that progs don't have to carry around logic to detect
fragments.
=== Changelog ===
Changes from v5:
* Fix defrag disable codepaths
Changes from v4:
* Refactor module handling code to not sleep in rcu_read_lock()
* Also unify the v4 and v6 hook structs so they can share codepaths
* Fixed some checkpatch.pl formatting warnings
Changes from v3:
* Correctly initialize `addrlen` stack var for recvmsg()
Changes from v2:
* module_put() if ->enable() fails
* Fix CI build errors
Changes from v1:
* Drop bpf_program__attach_netfilter() patches
* static -> static const where appropriate
* Fix callback assignment order during registration
* Only request_module() if callbacks are missing
* Fix retval when modprobe fails in userspace
* Fix v6 defrag module name (nf_defrag_ipv6_hooks -> nf_defrag_ipv6)
* Simplify priority checking code
* Add warning if module doesn't assign callbacks in the future
* Take refcnt on module while defrag link is active
[0]: https://datatracker.ietf.org/doc/html/rfc8900
Daniel Xu (5):
netfilter: defrag: Add glue hooks for enabling/disabling defrag
netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
bpf: selftests: Support not connecting client socket
bpf: selftests: Support custom type and proto for client sockets
bpf: selftests: Add defrag selftests
include/linux/netfilter.h | 10 +
include/uapi/linux/bpf.h | 5 +
net/ipv4/netfilter/nf_defrag_ipv4.c | 17 +-
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c | 11 +
net/netfilter/core.c | 6 +
net/netfilter/nf_bpf_link.c | 123 +++++++-
tools/include/uapi/linux/bpf.h | 5 +
tools/testing/selftests/bpf/Makefile | 4 +-
.../selftests/bpf/generate_udp_fragments.py | 90 ++++++
.../selftests/bpf/ip_check_defrag_frags.h | 57 ++++
tools/testing/selftests/bpf/network_helpers.c | 26 +-
tools/testing/selftests/bpf/network_helpers.h | 3 +
.../bpf/prog_tests/ip_check_defrag.c | 283 ++++++++++++++++++
.../selftests/bpf/progs/ip_check_defrag.c | 104 +++++++
14 files changed, 718 insertions(+), 26 deletions(-)
create mode 100755 tools/testing/selftests/bpf/generate_udp_fragments.py
create mode 100644 tools/testing/selftests/bpf/ip_check_defrag_frags.h
create mode 100644 tools/testing/selftests/bpf/prog_tests/ip_check_defrag.c
create mode 100644 tools/testing/selftests/bpf/progs/ip_check_defrag.c
--
2.41.0
There are use cases that need to apply DAMOS schemes to specific address
ranges or DAMON monitoring targets. NUMA nodes in the physical address
space, special memory objects in the virtual address space, and
monitoring target specific efficient monitoring results snapshot
retrieval could be examples of such use cases. This patchset extends
DAMOS filters feature for such cases, by implementing two more filter
types, namely address ranges and DAMON monitoring types.
Patches sequence
----------------
The first seven patches are for the address ranges based DAMOS filter.
The first patch implements the filter feature and expose it via DAMON
kernel API. The second patch further expose the feature to users via
DAMON sysfs interface. The third and fourth patches implement unit
tests and selftests for the feature. Three patches (fifth to seventh)
updating the documents follow.
The following six patches are for the DAMON monitoring target based
DAMOS filter. The eighth patch implements the feature in the core layer
and expose it via DAMON's kernel API. The ninth patch further expose it
to users via DAMON sysfs interface. Tenth patch add a selftest, and two
patches (eleventh and twelfth) update documents.
SeongJae Park (13):
mm/damon/core: introduce address range type damos filter
mm/damon/sysfs-schemes: support address range type DAMOS filter
mm/damon/core-test: add a unit test for __damos_filter_out()
selftests/damon/sysfs: test address range damos filter
Docs/mm/damon/design: update for address range filters
Docs/ABI/damon: update for address range DAMOS filter
Docs/admin-guide/mm/damon/usage: update for address range type DAMOS
filter
mm/damon/core: implement target type damos filter
mm/damon/sysfs-schemes: support target damos filter
selftests/damon/sysfs: test damon_target filter
Docs/mm/damon/design: update for DAMON monitoring target type DAMOS
filter
Docs/ABI/damon: update for DAMON monitoring target type DAMOS filter
Docs/admin-guide/mm/damon/usage: update for DAMON monitoring target
type DAMOS filter
.../ABI/testing/sysfs-kernel-mm-damon | 27 +++++-
Documentation/admin-guide/mm/damon/usage.rst | 34 +++++---
Documentation/mm/damon/design.rst | 24 ++++--
include/linux/damon.h | 28 +++++--
mm/damon/core-test.h | 61 ++++++++++++++
mm/damon/core.c | 62 ++++++++++++++
mm/damon/sysfs-schemes.c | 83 +++++++++++++++++++
tools/testing/selftests/damon/sysfs.sh | 5 ++
8 files changed, 299 insertions(+), 25 deletions(-)
--
2.25.1
The tried_regions directory of DAMON sysfs interface is useful for
retrieving monitoring results snapshot or DAMOS debugging. However, for
common use case that need to monitor only the total size of the scheme
tried regions (e.g., monitoring working set size), the kernel overhead
for directory construction and user overhead for reading the content
could be high if the number of monitoring region is not small. This
patchset implements DAMON sysfs files for efficient support of the use
case.
The first patch implements the sysfs file to reduce the user space
overhead, and the second patch implements a command for reducing the
kernel space overhead.
The third patch adds a selftest for the new file, and following two
patches update documents.
SeongJae Park (5):
mm/damon/sysfs-schemes: implement DAMOS tried total bytes file
mm/damon/sysfs: implement a command for updating only schemes tried
total bytes
selftests/damon/sysfs: test tried_regions/total_bytes file
Docs/ABI/damon: update for tried_regions/total_bytes
Docs/admin-guide/mm/damon/usage: update for tried_regions/total_bytes
.../ABI/testing/sysfs-kernel-mm-damon | 13 +++++-
Documentation/admin-guide/mm/damon/usage.rst | 42 ++++++++++++-------
mm/damon/sysfs-common.h | 2 +-
mm/damon/sysfs-schemes.c | 24 ++++++++++-
mm/damon/sysfs.c | 26 +++++++++---
tools/testing/selftests/damon/sysfs.sh | 1 +
6 files changed, 83 insertions(+), 25 deletions(-)
--
2.25.1
Events Tracing infrastructure contains lot of files, directories
(internally in terms of inodes, dentries). And ends up by consuming
memory in MBs. We can have multiple events of Events Tracing, which
further requires more memory.
Instead of creating inodes/dentries, eventfs could keep meta-data and
skip the creation of inodes/dentries. As and when require, eventfs will
create the inodes/dentries only for required files/directories.
Also eventfs would delete the inodes/dentries once no more requires
but preserve the meta data.
Tracing events took ~9MB, with this approach it took ~4.5MB
for ~10K files/dir.
Diff from v5:
Patch 02: removed TRACEFS_EVENT_INODE enum.
Patch 04: added TRACEFS_EVENT_INODE enum.
Patch 06: removed WARN_ON_ONCE in eventfs_set_ef_status_free()
Patch 07: added WARN_ON_ONCE in create_dentry()
moved declaration of following to internal.h:
eventfs_start_creating()
eventfs_failed_creating()
eventfs_end_creating()
Patch 08: added WARN_ON_ONCE in eventfs_set_ef_status_free()
Diff from v4:
Patch 02: moved from v4 08/10
added fs/tracefs/internal.h
Patch 03: moved from v4 02/10
removed fs/tracefs/internal.h
Patch 04: moved from v4 03/10
moved out changes of fs/tracefs/internal.h
Patch 05: moved from v4 04/10
renamed eventfs_add_top_file() -> eventfs_add_events_file()
Patch 06: moved from v4 07/10
implemented create_dentry() helper function
added create_file(), create_dir() stub function
Patch 07: moved from v4 06/10
Patch 08: moved from v4 05/10
improved eventfs remove functionality
Patch 09: removed unwanted if conditions
Patch 10: added available_filter_functions check
Diff from v3:
Patch 3,4,5,7,9:
removed all the eventfs_rwsem code and replaced it with an srcu
lock for the readers, and a mutex to synchronize the writers of
the list.
Patch 2: moved 'tracefs_inode' and 'get_tracefs()' to v4 03/10
Patch 3: moved the struct eventfs_file and eventfs_inode into event_inode.c
as it really should not be exposed to all users.
Patch 5: added a recursion check to eventfs_remove_rec() as it is really
dangerous to have unchecked recursion in the kernel (we do have
a fixed size stack).
have the free use srcu callbacks. After the srcu grace periods
are done, it adds the eventfs_file onto a llist (lockless link
list) and wakes up a work queue. Then the work queue does the
freeing (this needs to be done in task/workqueue context, as
srcu callbacks are done in softirq context).
Patch 6: renamed:
eventfs_create_file() -> create_file()
eventfs_create_dir() -> create_dir()
Diff from v2:
Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM'
Patch 02: moved from v1 1/9
Patch 03: moved from v1 2/9
As suggested by Zheng Yejian, introduced eventfs_prepare_ef()
helper function to add files or directories to eventfs
fix WARNING reported by kernel test robot in v1 8/9
Patch 04: moved from v1 3/9
used eventfs_prepare_ef() to add files
fix WARNING reported by kernel test robot in v1 8/9
Patch 05: moved from v1 4/9
fix compiling warning reported by kernel test robot in v1 4/9
Patch 06: moved from v1 5/9
Patch 07: moved from v1 6/9
Patch 08: moved from v1 7/9
Patch 09: moved from v1 8/9
rebased because of v3 01/10
Patch 10: moved from v1 9/9
Diff from v1:
Patch 1: add header file
Patch 2: resolved kernel test robot issues
protecting eventfs lists using nested eventfs_rwsem
Patch 3: protecting eventfs lists using nested eventfs_rwsem
Patch 4: improve events cleanup code to fix crashes
Patch 5: resolved kernel test robot issues
removed d_instantiate_anon() calls
Patch 6: resolved kernel test robot issues
fix kprobe test in eventfs_root_lookup()
protecting eventfs lists using nested eventfs_rwsem
Patch 7: remove header file
Patch 8: pass eventfs_rwsem as argument to eventfs functions
called eventfs_remove_events_dir() instead of tracefs_remove()
from event_trace_del_tracer()
Patch 9: new patch to fix kprobe test case
fs/tracefs/Makefile | 1 +
fs/tracefs/event_inode.c | 801 ++++++++++++++++++
fs/tracefs/inode.c | 151 +++-
fs/tracefs/internal.h | 29 +
include/linux/trace_events.h | 1 +
include/linux/tracefs.h | 23 +
kernel/trace/trace.h | 2 +-
kernel/trace/trace_events.c | 76 +-
.../ftrace/test.d/kprobe/kprobe_args_char.tc | 9 +-
.../test.d/kprobe/kprobe_args_string.tc | 9 +-
10 files changed, 1050 insertions(+), 52 deletions(-)
create mode 100644 fs/tracefs/event_inode.c
create mode 100644 fs/tracefs/internal.h
--
2.39.0
[ This series depends on the VFIO device cdev series ]
Changelog
v11:
* Added Reviewed-by from Kevin
* Dropped 'rc' in iommufd_access_detach()
* Dropped a duplicated IS_ERR check at new_ioas
* Separate into a new patch the change in iommufd_access_destroy_object
v10:
https://lore.kernel.org/all/cover.1690488745.git.nicolinc@nvidia.com/
* Added Reviewed-by from Jason
* Replaced the iommufd_ref_to_user call with refcount_inc
* Added a wrapper iommufd_access_change_ioas_id and used it in the
iommufd_access_attach() and iommufd_access_replace() APIs
v9:
https://lore.kernel.org/all/cover.1690440730.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd for-next tree
* Added Reviewed-by from Jason and Alex
* Reworked the replace API patches
* Added a new patch allowing passing in to iopt_remove_access
* Added a new patch of a helper function following Jason's design,
mainly by blocking any concurrent detach/replace and keeping the
refcount_dec at the end of the function
* Added a call of the new helper in iommufd_access_destroy_object()
to reduce race condition
* Simplified the replace API patch
v8:
https://lore.kernel.org/all/cover.1690226015.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt series and then cdev v15 series:
https://lore.kernel.org/all/0-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.…https://lore.kernel.org/kvm/20230718135551.6592-1-yi.l.liu@intel.com/
* Changed the order of detach() and attach() in replace(), to fix a bug
v7:
https://lore.kernel.org/all/cover.1683593831.git.nicolinc@nvidia.com/
* Rebased on top of v6.4-rc1 and cdev v11 candidate
* Fixed a wrong file in replace() API patch
* Added Kevin's "Reviewed-by" to replace() API patch
v6:
https://lore.kernel.org/all/cover.1679939952.git.nicolinc@nvidia.com/
* Rebased on top of cdev v8 series
https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
* Added "Reviewed-by" from Kevin to PATCH-4
* Squashed access->ioas updating lines into iommufd_access_change_pt(),
and changed function return type accordingly for simplification.
v5:
https://lore.kernel.org/all/cover.1679559476.git.nicolinc@nvidia.com/
* Kept the cmd->id in the iommufd_test_create_access() so the access can
be created with an ioas by default. Then, renamed the previous ioctl
IOMMU_TEST_OP_ACCESS_SET_IOAS to IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, so
it would be used to replace an access->ioas pointer.
* Added iommufd_access_replace() API after the introductions of the other
two APIs iommufd_access_attach() and iommufd_access_detach().
* Since vdev->iommufd_attached is also set in emulated pathway too, call
iommufd_access_update(), similar to the physical pathway.
v4:
https://lore.kernel.org/all/cover.1678284812.git.nicolinc@nvidia.com/
* Rebased on top of Jason's series adding replace() and hwpt_alloc()
https://lore.kernel.org/all/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia…
* Rebased on top of cdev series v6
https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/
* Dropped the patch that's moved to cdev series.
* Added unmap function pointer sanity before calling it.
* Added "Reviewed-by" from Kevin and Yi.
* Added back the VFIO change updating the ATTACH uAPI.
v3:
https://lore.kernel.org/all/cover.1677288789.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt branch:
https://lore.kernel.org/all/0-v2-406f7ac07936+6a-iommufd_hwpt_jgg@nvidia.co…
* Dropped patches from this series accordingly. There were a couple of
VFIO patches that will be submitted after the VFIO cdev series. Also,
renamed the series to be "emulated".
* Moved dma_unmap sanity patch to the first in the series.
* Moved dma_unmap sanity to cover both VFIO and IOMMUFD pathways.
* Added Kevin's "Reviewed-by" to two of the patches.
* Fixed a NULL pointer bug in vfio_iommufd_emulated_bind().
* Moved unmap() call to the common place in iommufd_access_set_ioas().
v2:
https://lore.kernel.org/all/cover.1675802050.git.nicolinc@nvidia.com/
* Rebased on top of vfio_device cdev v2 series.
* Update the kdoc and commit message of iommu_group_replace_domain().
* Dropped revert-to-core-domain part in iommu_group_replace_domain().
* Dropped !ops->dma_unmap check in vfio_iommufd_emulated_attach_ioas().
* Added missing rc value in vfio_iommufd_emulated_attach_ioas() from the
iommufd_access_set_ioas() call.
* Added a new patch in vfio_main to deny vfio_pin/unpin_pages() calls if
vdev->ops->dma_unmap is not implemented.
* Added a __iommmufd_device_detach helper and let the replace routine do
a partial detach().
* Added restriction on auto_domains to use the replace feature.
* Added the patch "iommufd/device: Make hwpt_list list_add/del symmetric"
from the has_group removal series.
v1:
https://lore.kernel.org/all/cover.1675320212.git.nicolinc@nvidia.com/
Hi all,
The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
to an S1 domain, or when switching between relevant S1 domains.
This series is rebased on top of Jason Gunthorpe's series that introduces
iommu_group_replace_domain API and IOMMUFD infrastructure for the IOMMUFD
"physical" devices. The IOMMUFD "emulated" deivces will need some extra
steps to replace the access->ioas object and its iopt pointer.
You can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v11
Thank you
Nicolin Chen
Nicolin Chen (7):
vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
iommufd: Allow passing in iopt_access_list_id to iopt_remove_access()
iommufd: Add iommufd_access_change_ioas(_id) helpers
iommufd: Use iommufd_access_change_ioas in
iommufd_access_destroy_object
iommufd: Add iommufd_access_replace() API
iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
vfio: Support IO page table replacement
drivers/iommu/iommufd/device.c | 128 ++++++++++++------
drivers/iommu/iommufd/io_pagetable.c | 6 +-
drivers/iommu/iommufd/iommufd_private.h | 3 +-
drivers/iommu/iommufd/iommufd_test.h | 4 +
drivers/iommu/iommufd/selftest.c | 19 +++
drivers/vfio/iommufd.c | 11 +-
drivers/vfio/vfio_main.c | 4 +
include/linux/iommufd.h | 1 +
include/uapi/linux/vfio.h | 6 +
tools/testing/selftests/iommu/iommufd.c | 29 +++-
tools/testing/selftests/iommu/iommufd_utils.h | 19 +++
11 files changed, 179 insertions(+), 51 deletions(-)
--
2.41.0
A missing break in kms_tests leads to kselftest hang when the
parameter -s is used.
In current code flow because of missing break in -s, -t parses
args spilled from -s and as -t accepts only valid values as 0,1
so any arg in -s >1 or <0, gets in ksm_test failure
This went undetected since, before the addition of option -t,
the next case -M would immediately break out of the switch
statement but that is no longer the case
Add the missing break statement.
----Before----
./ksm_tests -H -s 100
Invalid merge type
----After----
./ksm_tests -H -s 100
Number of normal pages: 0
Number of huge pages: 50
Total size: 100 MiB
Total time: 0.401732682 s
Average speed: 248.922 MiB/s
Fixes: 9e7cb94ca218 ("selftests: vm: add KSM merging time test")
Signed-off-by: Ayush Jain <ayush.jain3(a)amd.com>
---
tools/testing/selftests/mm/ksm_tests.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mm/ksm_tests.c b/tools/testing/selftests/mm/ksm_tests.c
index 435acebdc325..380b691d3eb9 100644
--- a/tools/testing/selftests/mm/ksm_tests.c
+++ b/tools/testing/selftests/mm/ksm_tests.c
@@ -831,6 +831,7 @@ int main(int argc, char *argv[])
printf("Size must be greater than 0\n");
return KSFT_FAIL;
}
+ break;
case 't':
{
int tmp = atoi(optarg);
--
2.34.1
[ This series depends on the VFIO device cdev series ]
Changelog
v8:
* Rebased on top of Jason's iommufd_hwpt series and then cdev v15 series:
https://lore.kernel.org/all/0-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.…https://lore.kernel.org/kvm/20230718135551.6592-1-yi.l.liu@intel.com/
* Changed the order of detach() and attach() in replace(), to fix a bug
v7:
https://lore.kernel.org/all/cover.1683593831.git.nicolinc@nvidia.com/
* Rebased on top of v6.4-rc1 and cdev v11 candidate
* Fixed a wrong file in replace() API patch
* Added Kevin's "Reviewed-by" to replace() API patch
v6:
https://lore.kernel.org/all/cover.1679939952.git.nicolinc@nvidia.com/
* Rebased on top of cdev v8 series
https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
* Added "Reviewed-by" from Kevin to PATCH-4
* Squashed access->ioas updating lines into iommufd_access_change_pt(),
and changed function return type accordingly for simplification.
v5:
https://lore.kernel.org/all/cover.1679559476.git.nicolinc@nvidia.com/
* Kept the cmd->id in the iommufd_test_create_access() so the access can
be created with an ioas by default. Then, renamed the previous ioctl
IOMMU_TEST_OP_ACCESS_SET_IOAS to IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, so
it would be used to replace an access->ioas pointer.
* Added iommufd_access_replace() API after the introductions of the other
two APIs iommufd_access_attach() and iommufd_access_detach().
* Since vdev->iommufd_attached is also set in emulated pathway too, call
iommufd_access_update(), similar to the physical pathway.
v4:
https://lore.kernel.org/all/cover.1678284812.git.nicolinc@nvidia.com/
* Rebased on top of Jason's series adding replace() and hwpt_alloc()
https://lore.kernel.org/all/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia…
* Rebased on top of cdev series v6
https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/
* Dropped the patch that's moved to cdev series.
* Added unmap function pointer sanity before calling it.
* Added "Reviewed-by" from Kevin and Yi.
* Added back the VFIO change updating the ATTACH uAPI.
v3:
https://lore.kernel.org/all/cover.1677288789.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt branch:
https://lore.kernel.org/all/0-v2-406f7ac07936+6a-iommufd_hwpt_jgg@nvidia.co…
* Dropped patches from this series accordingly. There were a couple of
VFIO patches that will be submitted after the VFIO cdev series. Also,
renamed the series to be "emulated".
* Moved dma_unmap sanity patch to the first in the series.
* Moved dma_unmap sanity to cover both VFIO and IOMMUFD pathways.
* Added Kevin's "Reviewed-by" to two of the patches.
* Fixed a NULL pointer bug in vfio_iommufd_emulated_bind().
* Moved unmap() call to the common place in iommufd_access_set_ioas().
v2:
https://lore.kernel.org/all/cover.1675802050.git.nicolinc@nvidia.com/
* Rebased on top of vfio_device cdev v2 series.
* Update the kdoc and commit message of iommu_group_replace_domain().
* Dropped revert-to-core-domain part in iommu_group_replace_domain().
* Dropped !ops->dma_unmap check in vfio_iommufd_emulated_attach_ioas().
* Added missing rc value in vfio_iommufd_emulated_attach_ioas() from the
iommufd_access_set_ioas() call.
* Added a new patch in vfio_main to deny vfio_pin/unpin_pages() calls if
vdev->ops->dma_unmap is not implemented.
* Added a __iommmufd_device_detach helper and let the replace routine do
a partial detach().
* Added restriction on auto_domains to use the replace feature.
* Added the patch "iommufd/device: Make hwpt_list list_add/del symmetric"
from the has_group removal series.
v1:
https://lore.kernel.org/all/cover.1675320212.git.nicolinc@nvidia.com/
Hi all,
The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
to an S1 domain, or when switching between relevant S1 domains.
This series is rebased on top of Jason Gunthorpe's series that introduces
iommu_group_replace_domain API and IOMMUFD infrastructure for the IOMMUFD
"physical" devices. The IOMMUFD "emulated" deivces will need some extra
steps to replace the access->ioas object and its iopt pointer.
You can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v8
Thank you
Nicolin Chen
Nicolin Chen (4):
vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
iommufd: Add iommufd_access_replace() API
iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
vfio: Support IO page table replacement
drivers/iommu/iommufd/device.c | 72 ++++++++++++++-----
drivers/iommu/iommufd/iommufd_test.h | 4 ++
drivers/iommu/iommufd/selftest.c | 19 +++++
drivers/vfio/iommufd.c | 11 +--
drivers/vfio/vfio_main.c | 4 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/vfio.h | 6 ++
tools/testing/selftests/iommu/iommufd.c | 29 +++++++-
tools/testing/selftests/iommu/iommufd_utils.h | 19 +++++
9 files changed, 142 insertions(+), 23 deletions(-)
--
2.41.0
PTP_SYS_OFFSET_EXTENDED was added in November 2018 in
361800876f80 (" ptp: add PTP_SYS_OFFSET_EXTENDED ioctl")
and PTP_SYS_OFFSET_PRECISE was added in February 2016 in
719f1aa4a671 ("ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping")
The PTP selftest code is lacking support for these two IOCTLS.
This short series of patches adds support for them.
Changes in v2:
- Fixed rebase issues (v1 somehow ended up with patch 1 being from the
first manual split of my changes and patch 2 being from rebase 2 out
of 3)
- Rebased on top of net-next
Alex Maftei (2):
selftests/ptp: Add -x option for testing PTP_SYS_OFFSET_EXTENDED
selftests/ptp: Add -X option for testing PTP_SYS_OFFSET_PRECISE
tools/testing/selftests/ptp/testptp.c | 73 ++++++++++++++++++++++++++-
1 file changed, 71 insertions(+), 2 deletions(-)
--
2.25.1
[ This series depends on the VFIO device cdev series ]
Changelog
v10:
* Added Reviewed-by from Jason
* Replaced the iommufd_ref_to_user call with refcount_inc
* Added a wrapper iommufd_access_change_ioas_id and used it in the
iommufd_access_attach() and iommufd_access_replace() APIs
v9:
https://lore.kernel.org/linux-iommu/cover.1690440730.git.nicolinc@nvidia.co…
* Rebased on top of Jason's iommufd for-next tree
* Added Reviewed-by from Jason and Alex
* Reworked the replace API patches
* Added a new patch allowing passing in to iopt_remove_access
* Added a new patch of a helper function following Jason's design,
mainly by blocking any concurrent detach/replace and keeping the
refcount_dec at the end of the function
* Added a call of the new helper in iommufd_access_destroy_object()
to reduce race condition
* Simplified the replace API patch
v8:
https://lore.kernel.org/all/cover.1690226015.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt series and then cdev v15 series:
https://lore.kernel.org/all/0-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.…https://lore.kernel.org/kvm/20230718135551.6592-1-yi.l.liu@intel.com/
* Changed the order of detach() and attach() in replace(), to fix a bug
v7:
https://lore.kernel.org/all/cover.1683593831.git.nicolinc@nvidia.com/
* Rebased on top of v6.4-rc1 and cdev v11 candidate
* Fixed a wrong file in replace() API patch
* Added Kevin's "Reviewed-by" to replace() API patch
v6:
https://lore.kernel.org/all/cover.1679939952.git.nicolinc@nvidia.com/
* Rebased on top of cdev v8 series
https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
* Added "Reviewed-by" from Kevin to PATCH-4
* Squashed access->ioas updating lines into iommufd_access_change_pt(),
and changed function return type accordingly for simplification.
v5:
https://lore.kernel.org/all/cover.1679559476.git.nicolinc@nvidia.com/
* Kept the cmd->id in the iommufd_test_create_access() so the access can
be created with an ioas by default. Then, renamed the previous ioctl
IOMMU_TEST_OP_ACCESS_SET_IOAS to IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, so
it would be used to replace an access->ioas pointer.
* Added iommufd_access_replace() API after the introductions of the other
two APIs iommufd_access_attach() and iommufd_access_detach().
* Since vdev->iommufd_attached is also set in emulated pathway too, call
iommufd_access_update(), similar to the physical pathway.
v4:
https://lore.kernel.org/all/cover.1678284812.git.nicolinc@nvidia.com/
* Rebased on top of Jason's series adding replace() and hwpt_alloc()
https://lore.kernel.org/all/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia…
* Rebased on top of cdev series v6
https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/
* Dropped the patch that's moved to cdev series.
* Added unmap function pointer sanity before calling it.
* Added "Reviewed-by" from Kevin and Yi.
* Added back the VFIO change updating the ATTACH uAPI.
v3:
https://lore.kernel.org/all/cover.1677288789.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt branch:
https://lore.kernel.org/all/0-v2-406f7ac07936+6a-iommufd_hwpt_jgg@nvidia.co…
* Dropped patches from this series accordingly. There were a couple of
VFIO patches that will be submitted after the VFIO cdev series. Also,
renamed the series to be "emulated".
* Moved dma_unmap sanity patch to the first in the series.
* Moved dma_unmap sanity to cover both VFIO and IOMMUFD pathways.
* Added Kevin's "Reviewed-by" to two of the patches.
* Fixed a NULL pointer bug in vfio_iommufd_emulated_bind().
* Moved unmap() call to the common place in iommufd_access_set_ioas().
v2:
https://lore.kernel.org/all/cover.1675802050.git.nicolinc@nvidia.com/
* Rebased on top of vfio_device cdev v2 series.
* Update the kdoc and commit message of iommu_group_replace_domain().
* Dropped revert-to-core-domain part in iommu_group_replace_domain().
* Dropped !ops->dma_unmap check in vfio_iommufd_emulated_attach_ioas().
* Added missing rc value in vfio_iommufd_emulated_attach_ioas() from the
iommufd_access_set_ioas() call.
* Added a new patch in vfio_main to deny vfio_pin/unpin_pages() calls if
vdev->ops->dma_unmap is not implemented.
* Added a __iommmufd_device_detach helper and let the replace routine do
a partial detach().
* Added restriction on auto_domains to use the replace feature.
* Added the patch "iommufd/device: Make hwpt_list list_add/del symmetric"
from the has_group removal series.
v1:
https://lore.kernel.org/all/cover.1675320212.git.nicolinc@nvidia.com/
Hi all,
The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
to an S1 domain, or when switching between relevant S1 domains.
This series is rebased on top of Jason Gunthorpe's series that introduces
iommu_group_replace_domain API and IOMMUFD infrastructure for the IOMMUFD
"physical" devices. The IOMMUFD "emulated" deivces will need some extra
steps to replace the access->ioas object and its iopt pointer.
You can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v10
Thank you
Nicolin Chen
Nicolin Chen (6):
vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
iommufd: Allow passing in iopt_access_list_id to iopt_remove_access()
iommufd: Add iommufd_access_change_ioas(_id) helpers
iommufd: Add iommufd_access_replace() API
iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
vfio: Support IO page table replacement
drivers/iommu/iommufd/device.c | 132 ++++++++++++------
drivers/iommu/iommufd/io_pagetable.c | 6 +-
drivers/iommu/iommufd/iommufd_private.h | 3 +-
drivers/iommu/iommufd/iommufd_test.h | 4 +
drivers/iommu/iommufd/selftest.c | 19 +++
drivers/vfio/iommufd.c | 11 +-
drivers/vfio/vfio_main.c | 4 +
include/linux/iommufd.h | 1 +
include/uapi/linux/vfio.h | 6 +
tools/testing/selftests/iommu/iommufd.c | 29 +++-
tools/testing/selftests/iommu/iommufd_utils.h | 19 +++
11 files changed, 184 insertions(+), 50 deletions(-)
--
2.41.0
Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Users can now select a
desired address space using a non-zero hint address to mmap. Previously,
requesting the default address space from mmap by passing zero as the hint
address would result in using the largest address space possible. Some
applications depend on empty bits in the virtual address space, like Go and
Java, so this patch provides more flexibility for application developers.
-Charlie
---
v7:
- Changing RLIMIT_STACK inside of an executing program does not trigger
arch_pick_mmap_layout(), so rewrite tests to change RLIMIT_STACK from a
script before executing tests. RLIMIT_STACK of infinity forces bottomup
mmap allocation.
- Make arch_get_mmap_base macro more readible by extracting out the rnd
calculation.
- Use MMAP_MIN_VA_BITS in TASK_UNMAPPED_BASE to support case when mmap
attempts to allocate address smaller than DEFAULT_MAP_WINDOW.
- Fix incorrect wording in documentation.
v6:
- Rebase onto the correct base
v5:
- Minor wording change in documentation
- Change some parenthesis in arch_get_mmap_ macros
- Added case for addr==0 in arch_get_mmap_ because without this, programs would
crash if RLIMIT_STACK was modified before executing the program. This was
tested using the libhugetlbfs tests.
v4:
- Split testcases/document patch into test cases, in-code documentation, and
formal documentation patches
- Modified the mmap_base macro to be more legible and better represent memory
layout
- Fixed documentation to better reflect the implmentation
- Renamed DEFAULT_VA_BITS to MMAP_VA_BITS
- Added additional test case for rlimit changes
---
Charlie Jenkins (4):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Document mmap changes
Documentation/riscv/vm-layout.rst | 22 +++++++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 21 ++++--
arch/riscv/include/asm/processor.h | 47 ++++++++++++--
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 2 +
tools/testing/selftests/riscv/mm/Makefile | 15 +++++
.../riscv/mm/testcases/mmap_bottomup.c | 35 ++++++++++
.../riscv/mm/testcases/mmap_default.c | 35 ++++++++++
.../selftests/riscv/mm/testcases/mmap_test.h | 64 +++++++++++++++++++
.../selftests/riscv/mm/testcases/run_mmap.sh | 12 ++++
11 files changed, 244 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_default.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_test.h
create mode 100755 tools/testing/selftests/riscv/mm/testcases/run_mmap.sh
--
2.41.0
[ This series depends on the VFIO device cdev series ]
Changelog
v9:
* Rebased on top of Jason's iommufd for-next tree
* Added Reviewed-by from Jason and Alex
* Reworked the replace API patches
* Added a new patch allowing passing in to iopt_remove_access
* Added a new patch of a helper function following Jason's design,
mainly by blocking any concurrent detach/replace and keeping the
refcount_dec at the end of the function
* Added a call of the new helper in iommufd_access_destroy_object()
to reduce race condition
* Simplified the replace API patch
v8:
https://lore.kernel.org/all/cover.1690226015.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt series and then cdev v15 series:
https://lore.kernel.org/all/0-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.…https://lore.kernel.org/kvm/20230718135551.6592-1-yi.l.liu@intel.com/
* Changed the order of detach() and attach() in replace(), to fix a bug
v7:
https://lore.kernel.org/all/cover.1683593831.git.nicolinc@nvidia.com/
* Rebased on top of v6.4-rc1 and cdev v11 candidate
* Fixed a wrong file in replace() API patch
* Added Kevin's "Reviewed-by" to replace() API patch
v6:
https://lore.kernel.org/all/cover.1679939952.git.nicolinc@nvidia.com/
* Rebased on top of cdev v8 series
https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
* Added "Reviewed-by" from Kevin to PATCH-4
* Squashed access->ioas updating lines into iommufd_access_change_pt(),
and changed function return type accordingly for simplification.
v5:
https://lore.kernel.org/all/cover.1679559476.git.nicolinc@nvidia.com/
* Kept the cmd->id in the iommufd_test_create_access() so the access can
be created with an ioas by default. Then, renamed the previous ioctl
IOMMU_TEST_OP_ACCESS_SET_IOAS to IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, so
it would be used to replace an access->ioas pointer.
* Added iommufd_access_replace() API after the introductions of the other
two APIs iommufd_access_attach() and iommufd_access_detach().
* Since vdev->iommufd_attached is also set in emulated pathway too, call
iommufd_access_update(), similar to the physical pathway.
v4:
https://lore.kernel.org/all/cover.1678284812.git.nicolinc@nvidia.com/
* Rebased on top of Jason's series adding replace() and hwpt_alloc()
https://lore.kernel.org/all/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia…
* Rebased on top of cdev series v6
https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/
* Dropped the patch that's moved to cdev series.
* Added unmap function pointer sanity before calling it.
* Added "Reviewed-by" from Kevin and Yi.
* Added back the VFIO change updating the ATTACH uAPI.
v3:
https://lore.kernel.org/all/cover.1677288789.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt branch:
https://lore.kernel.org/all/0-v2-406f7ac07936+6a-iommufd_hwpt_jgg@nvidia.co…
* Dropped patches from this series accordingly. There were a couple of
VFIO patches that will be submitted after the VFIO cdev series. Also,
renamed the series to be "emulated".
* Moved dma_unmap sanity patch to the first in the series.
* Moved dma_unmap sanity to cover both VFIO and IOMMUFD pathways.
* Added Kevin's "Reviewed-by" to two of the patches.
* Fixed a NULL pointer bug in vfio_iommufd_emulated_bind().
* Moved unmap() call to the common place in iommufd_access_set_ioas().
v2:
https://lore.kernel.org/all/cover.1675802050.git.nicolinc@nvidia.com/
* Rebased on top of vfio_device cdev v2 series.
* Update the kdoc and commit message of iommu_group_replace_domain().
* Dropped revert-to-core-domain part in iommu_group_replace_domain().
* Dropped !ops->dma_unmap check in vfio_iommufd_emulated_attach_ioas().
* Added missing rc value in vfio_iommufd_emulated_attach_ioas() from the
iommufd_access_set_ioas() call.
* Added a new patch in vfio_main to deny vfio_pin/unpin_pages() calls if
vdev->ops->dma_unmap is not implemented.
* Added a __iommmufd_device_detach helper and let the replace routine do
a partial detach().
* Added restriction on auto_domains to use the replace feature.
* Added the patch "iommufd/device: Make hwpt_list list_add/del symmetric"
from the has_group removal series.
v1:
https://lore.kernel.org/all/cover.1675320212.git.nicolinc@nvidia.com/
Hi all,
The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
to an S1 domain, or when switching between relevant S1 domains.
This series is rebased on top of Jason Gunthorpe's series that introduces
iommu_group_replace_domain API and IOMMUFD infrastructure for the IOMMUFD
"physical" devices. The IOMMUFD "emulated" deivces will need some extra
steps to replace the access->ioas object and its iopt pointer.
You can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v9
Thank you
Nicolin Chen
Nicolin Chen (6):
vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
iommufd: Allow passing in iopt_access_list_id to iopt_remove_access()
iommufd: Add iommufd_access_change_ioas helper
iommufd: Add iommufd_access_replace() API
iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
vfio: Support IO page table replacement
drivers/iommu/iommufd/device.c | 123 ++++++++++++------
drivers/iommu/iommufd/io_pagetable.c | 6 +-
drivers/iommu/iommufd/iommufd_private.h | 3 +-
drivers/iommu/iommufd/iommufd_test.h | 4 +
drivers/iommu/iommufd/selftest.c | 19 +++
drivers/vfio/iommufd.c | 11 +-
drivers/vfio/vfio_main.c | 4 +
include/linux/iommufd.h | 1 +
include/uapi/linux/vfio.h | 6 +
tools/testing/selftests/iommu/iommufd.c | 29 ++++-
tools/testing/selftests/iommu/iommufd_utils.h | 19 +++
11 files changed, 175 insertions(+), 50 deletions(-)
--
2.41.0
When we collect a signal context with one of the SME modes enabled we will
have enabled that mode behind the compiler and libc's back so they may
issue some instructions not valid in streaming mode, causing spurious
failures.
For the code prior to issuing the BRK to trigger signal handling we need to
stay in streaming mode if we were already there since that's a part of the
signal context the caller is trying to collect. Unfortunately this code
includes a memset() which is likely to be heavily optimised and is likely
to use FP instructions incompatible with streaming mode. We can avoid this
happening by open coding the memset(), inserting a volatile assembly
statement to avoid the compiler recognising what's being done and doing
something in optimisation. This code is not performance critical so the
inefficiency should not be an issue.
After collecting the context we can simply exit streaming mode, avoiding
these issues. Use a full SMSTOP for safety to prevent any issues appearing
with ZA.
Reported-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v3:
- Open code OPTIMISER_HIDE_VAR() instead of the memory clobber.
- Link to v2: https://lore.kernel.org/r/20230712-arm64-signal-memcpy-fix-v2-1-494f7025caf…
Changes in v2:
- Rebase onto v6.5-rc1.
- Link to v1: https://lore.kernel.org/r/20230628-arm64-signal-memcpy-fix-v1-1-db3e0300829…
---
.../selftests/arm64/signal/test_signals_utils.h | 25 +++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.h b/tools/testing/selftests/arm64/signal/test_signals_utils.h
index 222093f51b67..c7f5627171dd 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.h
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.h
@@ -60,13 +60,25 @@ static __always_inline bool get_current_context(struct tdescr *td,
size_t dest_sz)
{
static volatile bool seen_already;
+ int i;
+ char *uc = (char *)dest_uc;
assert(td && dest_uc);
/* it's a genuine invocation..reinit */
seen_already = 0;
td->live_uc_valid = 0;
td->live_sz = dest_sz;
- memset(dest_uc, 0x00, td->live_sz);
+
+ /*
+ * This is a memset() but we don't want the compiler to
+ * optimise it into either instructions or a library call
+ * which might be incompatible with streaming mode.
+ */
+ for (i = 0; i < td->live_sz; i++) {
+ uc[i] = 0;
+ __asm__ ("" : "=r" (uc[i]) : "0" (uc[i]));
+ }
+
td->live_uc = dest_uc;
/*
* Grab ucontext_t triggering a SIGTRAP.
@@ -103,6 +115,17 @@ static __always_inline bool get_current_context(struct tdescr *td,
:
: "memory");
+ /*
+ * If we were grabbing a streaming mode context then we may
+ * have entered streaming mode behind the system's back and
+ * libc or compiler generated code might decide to do
+ * something invalid in streaming mode, or potentially even
+ * the state of ZA. Issue a SMSTOP to exit both now we have
+ * grabbed the state.
+ */
+ if (td->feats_supported & FEAT_SME)
+ asm volatile("msr S0_3_C4_C6_3, xzr");
+
/*
* If we get here with seen_already==1 it implies the td->live_uc
* context has been used to get back here....this probably means
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230628-arm64-signal-memcpy-fix-7de3b3c8fa10
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This series fixes an issue which David Spickett found where if we change
the SVE VL while SME is in use we can end up attempting to save state to
an unallocated buffer and adds testing coverage for that plus a bit more
coverage of VL changes, just for paranioa.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Always reallocate the SVE state.
- Rebase onto v6.5-rc2.
- Link to v1: https://lore.kernel.org/r/20230713-arm64-fix-sve-sme-vl-change-v1-0-129dd86…
---
Mark Brown (3):
arm64/fpsimd: Ensure SME storage is allocated after SVE VL changes
kselftest/arm64: Add a test case for SVE VL changes with SME active
kselftest/arm64: Validate that changing one VL type does not affect another
arch/arm64/kernel/fpsimd.c | 33 +++++--
tools/testing/selftests/arm64/fp/vec-syscfg.c | 127 +++++++++++++++++++++++++-
2 files changed, 148 insertions(+), 12 deletions(-)
---
base-commit: 06785562d1b99ff6dc1cd0af54be5e3ff999dc02
change-id: 20230713-arm64-fix-sve-sme-vl-change-60eb1fa6a707
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The openvswitch selftests currently contain a few cases for managing the
datapath, which includes creating datapath instances, adding interfaces,
and doing some basic feature / upcall tests. This is useful to validate
the control path.
Add the ability to program some of the more common flows with actions. This
can be improved overtime to include regression testing, etc.
Aaron Conole (4):
selftests: openvswitch: add an initial flow programming case
selftests: openvswitch: add a test for ipv4 forwarding
selftests: openvswitch: add basic ct test case parsing
selftests: openvswitch: add ct-nat test case with ipv4
.../selftests/net/openvswitch/openvswitch.sh | 223 ++++++++
.../selftests/net/openvswitch/ovs-dpctl.py | 507 ++++++++++++++++++
2 files changed, 730 insertions(+)
--
2.40.1
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
With this proposal we're able to reach ~96.6% line rate speeds with data sent
and received directly from/to device memory.
* Patch overview:
** Part 1: struct paged device memory
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to understand a new memory type.
This proposal implements option #1. We implement a small framework to generate
struct pages for an sg_table returned from dma_buf_map_attachment(). The support
added here should be generic and easily extended to other use cases interested
in struct paged device memory. We use this framework to generate pages that can
be used in the networking stack.
** Part 2: recvmsg() & sendmsg() APIs
We define user APIs for the user to send and receive these dmabuf pages.
** part 3: support for unreadable skb frags
Dmabuf pages are not accessible by the host; we implement changes throughput the
networking stack to correctly handle skbs with unreadable frags.
** part 4: page pool support
We piggy back on Jakub's page pool memory providers idea:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory TCP
changes from the driver.
This is not strictly necessary, the driver can choose to allocate dmabuf pages
and use them directly without going through the page pool (if acceptable to
their maintainers).
Not included with this RFC is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This RFC is built on top of v6.4-rc7 with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using dmabuf pages
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
Not included in this series is our devmem TCP benchmark, which
transfers data to/from GPU dmabufs directly.
With this implementation & benchmark we're able to reach ~96.6% line rate
speeds with 4 GPU/NIC pairs running bi-direction traffic, with all the
packet payloads going straight to the GPU memory (no host buffer bounce).
** Test Setup
Kernel: v6.4-rc7, with this RFC and Jakub's memory provider API
cherry-picked locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Benchmark: custom devmem TCP benchmark not yet open sourced.
Mina Almasry (10):
dma-buf: add support for paged attachment mappings
dma-buf: add support for NET_RX pages
dma-buf: add support for NET_TX pages
net: add support for skbs with unreadable frags
tcp: implement recvmsg() RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX pages
tcp: implement sendmsg() TX path for for devmem tcp
selftests: add ncdevmem, netcat for devmem TCP
memory-provider: updates core provider API for devmem TCP
memory-provider: add dmabuf devmem provider
drivers/dma-buf/dma-buf.c | 444 ++++++++++++++++
include/linux/dma-buf.h | 142 +++++
include/linux/netdevice.h | 1 +
include/linux/skbuff.h | 34 +-
include/linux/socket.h | 1 +
include/net/page_pool.h | 21 +
include/net/sock.h | 4 +
include/net/tcp.h | 6 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/dma-buf.h | 12 +
include/uapi/linux/uio.h | 10 +
net/core/datagram.c | 3 +
net/core/page_pool.c | 111 +++-
net/core/skbuff.c | 81 ++-
net/core/sock.c | 47 ++
net/ipv4/tcp.c | 262 +++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 8 +
net/ipv4/tcp_output.c | 5 +-
net/packet/af_packet.c | 4 +-
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/ncdevmem.c | 693 +++++++++++++++++++++++++
23 files changed, 1868 insertions(+), 42 deletions(-)
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.41.0.390.g38632f3daf-goog
Events Tracing infrastructure contains lot of files, directories
(internally in terms of inodes, dentries). And ends up by consuming
memory in MBs. We can have multiple events of Events Tracing, which
further requires more memory.
Instead of creating inodes/dentries, eventfs could keep meta-data and
skip the creation of inodes/dentries. As and when require, eventfs will
create the inodes/dentries only for required files/directories.
Also eventfs would delete the inodes/dentries once no more requires
but preserve the meta data.
Tracing events took ~9MB, with this approach it took ~4.5MB
for ~10K files/dir.
Diff from v4:
Patch 02: moved from v4 08/10
added fs/tracefs/internal.h
Patch 03: moved from v4 02/10
removed fs/tracefs/internal.h
Patch 04: moved from v4 03/10
moved out changes of fs/tracefs/internal.h
Patch 05: moved from v4 04/10
renamed eventfs_add_top_file() -> eventfs_add_events_file()
Patch 06: moved from v4 07/10
implemented create_dentry() helper function
added create_file(), create_dir() stub function
Patch 07: moved from v4 06/10
Patch 08: moved from v4 05/10
improved eventfs remove functionality
Patch 09: removed unwanted if conditions
Patch 10: added available_filter_functions check
Diff from v3:
Patch 3,4,5,7,9:
removed all the eventfs_rwsem code and replaced it with an srcu
lock for the readers, and a mutex to synchronize the writers of
the list.
Patch 2: moved 'tracefs_inode' and 'get_tracefs()' to v4 03/10
Patch 3: moved the struct eventfs_file and eventfs_inode into event_inode.c
as it really should not be exposed to all users.
Patch 5: added a recursion check to eventfs_remove_rec() as it is really
dangerous to have unchecked recursion in the kernel (we do have
a fixed size stack).
have the free use srcu callbacks. After the srcu grace periods
are done, it adds the eventfs_file onto a llist (lockless link
list) and wakes up a work queue. Then the work queue does the
freeing (this needs to be done in task/workqueue context, as
srcu callbacks are done in softirq context).
Patch 6: renamed:
eventfs_create_file() -> create_file()
eventfs_create_dir() -> create_dir()
Diff from v2:
Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM'
Patch 02: moved from v1 1/9
Patch 03: moved from v1 2/9
As suggested by Zheng Yejian, introduced eventfs_prepare_ef()
helper function to add files or directories to eventfs
fix WARNING reported by kernel test robot in v1 8/9
Patch 04: moved from v1 3/9
used eventfs_prepare_ef() to add files
fix WARNING reported by kernel test robot in v1 8/9
Patch 05: moved from v1 4/9
fix compiling warning reported by kernel test robot in v1 4/9
Patch 06: moved from v1 5/9
Patch 07: moved from v1 6/9
Patch 08: moved from v1 7/9
Patch 09: moved from v1 8/9
rebased because of v3 01/10
Patch 10: moved from v1 9/9
Diff from v1:
Patch 1: add header file
Patch 2: resolved kernel test robot issues
protecting eventfs lists using nested eventfs_rwsem
Patch 3: protecting eventfs lists using nested eventfs_rwsem
Patch 4: improve events cleanup code to fix crashes
Patch 5: resolved kernel test robot issues
removed d_instantiate_anon() calls
Patch 6: resolved kernel test robot issues
fix kprobe test in eventfs_root_lookup()
protecting eventfs lists using nested eventfs_rwsem
Patch 7: remove header file
Patch 8: pass eventfs_rwsem as argument to eventfs functions
called eventfs_remove_events_dir() instead of tracefs_remove()
from event_trace_del_tracer()
Patch 9: new patch to fix kprobe test case
fs/tracefs/Makefile | 1 +
fs/tracefs/event_inode.c | 795 ++++++++++++++++++
fs/tracefs/inode.c | 151 +++-
fs/tracefs/internal.h | 26 +
include/linux/trace_events.h | 1 +
include/linux/tracefs.h | 30 +
kernel/trace/trace.h | 2 +-
kernel/trace/trace_events.c | 76 +-
.../ftrace/test.d/kprobe/kprobe_args_char.tc | 9 +-
.../test.d/kprobe/kprobe_args_string.tc | 9 +-
10 files changed, 1048 insertions(+), 52 deletions(-)
create mode 100644 fs/tracefs/event_inode.c
create mode 100644 fs/tracefs/internal.h
--
2.39.0
Hello,
This is v4 of the patch series for TDX selftests.
It has been updated for Intel’s v14 of the TDX host patches which was
proposed here:
https://lore.kernel.org/lkml/cover.1685333727.git.isaku.yamahata@intel.com/
The tree can be found at:
https://github.com/googleprodkernel/linux-cc/tree/tdx-selftests-rfc-v4
Changes from RFC v3:
In v14, TDX can only run with UPM enabled so the necessary changes were
made to handle that.
td_vcpu_run() was added to handle TdVmCalls that are now handled in
userspace.
The comments under the patch "KVM: selftests: Require GCC to realign
stacks on function entry" were addressed with the following patch:
https://lore.kernel.org/lkml/Y%2FfHLdvKHlK6D%2F1v@google.com/T/
And other minor tweaks were made to integrate the selftest
infrastructure onto v14.
In RFCv4, TDX selftest code is organized into:
+ headers in tools/testing/selftests/kvm/include/x86_64/tdx/
+ common code in tools/testing/selftests/kvm/lib/x86_64/tdx/
+ selftests in tools/testing/selftests/kvm/x86_64/tdx_*
Dependencies
+ Peter’s patches, which provide functions for the host to allocate
and track protected memory in the
guest. https://lore.kernel.org/lkml/20221018205845.770121-1-pgonda@google.com/T/
Further work for this patch series/TODOs
+ Sean’s comments for the non-confidential UPM selftests patch series
at https://lore.kernel.org/lkml/Y8dC8WDwEmYixJqt@google.com/T/#u apply
here as well
+ Add ucall support for TDX selftests
I would also like to acknowledge the following people, who helped
review or test patches in RFCv1, RFCv2, and RFCv3:
+ Sean Christopherson <seanjc(a)google.com>
+ Zhenzhong Duan <zhenzhong.duan(a)intel.com>
+ Peter Gonda <pgonda(a)google.com>
+ Andrew Jones <drjones(a)redhat.com>
+ Maxim Levitsky <mlevitsk(a)redhat.com>
+ Xiaoyao Li <xiaoyao.li(a)intel.com>
+ David Matlack <dmatlack(a)google.com>
+ Marc Orr <marcorr(a)google.com>
+ Isaku Yamahata <isaku.yamahata(a)gmail.com>
+ Maciej S. Szmigiero <maciej.szmigiero(a)oracle.com>
Links to earlier patch series
+ RFC v1: https://lore.kernel.org/lkml/20210726183816.1343022-1-erdemaktas@google.com…
+ RFC v2: https://lore.kernel.org/lkml/20220830222000.709028-1-sagis@google.com/T/#u
+ RFC v3: https://lore.kernel.org/lkml/20230121001542.2472357-1-ackerleytng@google.co…
Ackerley Tng (12):
KVM: selftests: Add function to allow one-to-one GVA to GPA mappings
KVM: selftests: Expose function that sets up sregs based on VM's mode
KVM: selftests: Store initial stack address in struct kvm_vcpu
KVM: selftests: Refactor steps in vCPU descriptor table initialization
KVM: selftests: TDX: Use KVM_TDX_CAPABILITIES to validate TDs'
attribute configuration
KVM: selftests: TDX: Update load_td_memory_region for VM memory backed
by guest memfd
KVM: selftests: Add functions to allow mapping as shared
KVM: selftests: Expose _vm_vaddr_alloc
KVM: selftests: TDX: Add support for TDG.MEM.PAGE.ACCEPT
KVM: selftests: TDX: Add support for TDG.VP.VEINFO.GET
KVM: selftests: TDX: Add TDX UPM selftest
KVM: selftests: TDX: Add TDX UPM selftests for implicit conversion
Erdem Aktas (3):
KVM: selftests: Add helper functions to create TDX VMs
KVM: selftests: TDX: Add TDX lifecycle test
KVM: selftests: TDX: Adding test case for TDX port IO
Roger Wang (1):
KVM: selftests: TDX: Add TDG.VP.INFO test
Ryan Afranji (2):
KVM: selftests: TDX: Verify the behavior when host consumes a TD
private memory
KVM: selftests: TDX: Add shared memory test
Sagi Shahar (10):
KVM: selftests: TDX: Add report_fatal_error test
KVM: selftests: TDX: Add basic TDX CPUID test
KVM: selftests: TDX: Add basic get_td_vmcall_info test
KVM: selftests: TDX: Add TDX IO writes test
KVM: selftests: TDX: Add TDX IO reads test
KVM: selftests: TDX: Add TDX MSR read/write tests
KVM: selftests: TDX: Add TDX HLT exit test
KVM: selftests: TDX: Add TDX MMIO reads test
KVM: selftests: TDX: Add TDX MMIO writes test
KVM: selftests: TDX: Add TDX CPUID TDVMCALL test
tools/testing/selftests/kvm/Makefile | 8 +
.../selftests/kvm/include/kvm_util_base.h | 35 +
.../selftests/kvm/include/x86_64/processor.h | 4 +
.../kvm/include/x86_64/tdx/td_boot.h | 82 +
.../kvm/include/x86_64/tdx/td_boot_asm.h | 16 +
.../selftests/kvm/include/x86_64/tdx/tdcall.h | 59 +
.../selftests/kvm/include/x86_64/tdx/tdx.h | 65 +
.../kvm/include/x86_64/tdx/tdx_util.h | 19 +
.../kvm/include/x86_64/tdx/test_util.h | 164 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 115 +-
.../selftests/kvm/lib/x86_64/processor.c | 77 +-
.../selftests/kvm/lib/x86_64/tdx/td_boot.S | 101 ++
.../selftests/kvm/lib/x86_64/tdx/tdcall.S | 158 ++
.../selftests/kvm/lib/x86_64/tdx/tdx.c | 262 ++++
.../selftests/kvm/lib/x86_64/tdx/tdx_util.c | 565 +++++++
.../selftests/kvm/lib/x86_64/tdx/test_util.c | 101 ++
.../kvm/x86_64/tdx_shared_mem_test.c | 134 ++
.../selftests/kvm/x86_64/tdx_upm_test.c | 469 ++++++
.../selftests/kvm/x86_64/tdx_vm_tests.c | 1322 +++++++++++++++++
19 files changed, 3730 insertions(+), 26 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/td_boot.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/td_boot_asm.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/tdcall.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/tdx.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/tdx_util.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/test_util.h
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/td_boot.S
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/tdcall.S
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/tdx.c
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/tdx_util.c
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/test_util.c
create mode 100644 tools/testing/selftests/kvm/x86_64/tdx_shared_mem_test.c
create mode 100644 tools/testing/selftests/kvm/x86_64/tdx_upm_test.c
create mode 100644 tools/testing/selftests/kvm/x86_64/tdx_vm_tests.c
--
2.41.0.487.g6d72f3e995-goog
[ Resending because claws-mail is messing with the Cc again. It doesn't like quotes :-p ]
On Fri, 21 Jul 2023 08:48:39 -0400
Steven Rostedt <rostedt(a)goodmis.org> wrote:
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 4db048250cdb..2718de1533e6 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -36,16 +36,36 @@ struct eventfs_file {
> const struct file_operations *fop;
> const struct inode_operations *iop;
> union {
> + struct list_head del_list;
> struct rcu_head rcu;
> - struct llist_node llist; /* For freeing after RCU */
> + unsigned long is_freed; /* Freed if one of the above is set */
I changed the freeing around. The dentries are freed before returning from
eventfs_remove_dir().
I also added a "is_freed" field that is part of the union and is set if
list elements have content. Note, since the union was criticized before, I
will state the entire purpose of doing this patch set is to save memory.
This structure will be used for every event file. What's the point of
getting rid of dentries if we are replacing it with something just as big?
Anyway, struct dentry does the exact same thing!
> };
> void *data;
> umode_t mode;
> - bool created;
> + unsigned int flags;
Bah, I forgot to remove flags (one iteration replaced the created with
flags to set both created and freed). I removed the freed with the above
"is_freed" and noticed that created is set if and only if ef->dentry is
set. So instead of using the created boolean, just test ef->dentry.
The flags isn't used and can be removed. I just forgot to do so.
> };
>
> static DEFINE_MUTEX(eventfs_mutex);
> DEFINE_STATIC_SRCU(eventfs_srcu);
> +
> +static struct dentry *eventfs_root_lookup(struct inode *dir,
> + struct dentry *dentry,
> + unsigned int flags);
> +static int dcache_dir_open_wrapper(struct inode *inode, struct file *file);
> +static int eventfs_release(struct inode *inode, struct file *file);
> +
> +static const struct inode_operations eventfs_root_dir_inode_operations = {
> + .lookup = eventfs_root_lookup,
> +};
> +
> +static const struct file_operations eventfs_file_operations = {
> + .open = dcache_dir_open_wrapper,
> + .read = generic_read_dir,
> + .iterate_shared = dcache_readdir,
> + .llseek = generic_file_llseek,
> + .release = eventfs_release,
> +};
> +
In preparing for getting rid of eventfs_file, I noticed that all
directories are set to the above ops. In create_dir() instead of passing in
ef->*ops, just use these directly. This does help with future work.
> /**
> * create_file - create a file in the tracefs filesystem
> * @name: the name of the file to create.
> @@ -123,17 +143,12 @@ static struct dentry *create_file(const char *name, umode_t mode,
> * If tracefs is not enabled in the kernel, the value -%ENODEV will be
> * returned.
> */
> -static struct dentry *create_dir(const char *name, umode_t mode,
> - struct dentry *parent, void *data,
> - const struct file_operations *fop,
> - const struct inode_operations *iop)
> +static struct dentry *create_dir(const char *name, struct dentry *parent, void *data)
> {
As stated, the directories always used the same *op values, so I just hard
coded it.
> struct tracefs_inode *ti;
> struct dentry *dentry;
> struct inode *inode;
>
> - WARN_ON(!S_ISDIR(mode));
> -
> dentry = eventfs_start_creating(name, parent);
> if (IS_ERR(dentry))
> return dentry;
> @@ -142,9 +157,9 @@ static struct dentry *create_dir(const char *name, umode_t mode,
> if (unlikely(!inode))
> return eventfs_failed_creating(dentry);
>
> - inode->i_mode = mode;
> - inode->i_op = iop;
> - inode->i_fop = fop;
> + inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
> + inode->i_op = &eventfs_root_dir_inode_operations;
> + inode->i_fop = &eventfs_file_operations;
> inode->i_private = data;
>
> ti = get_tracefs(inode);
> @@ -169,15 +184,27 @@ void eventfs_set_ef_status_free(struct dentry *dentry)
> struct tracefs_inode *ti_parent;
> struct eventfs_file *ef;
>
> + mutex_lock(&eventfs_mutex);
To synchronize with the removals, I needed to add locking here.
> ti_parent = get_tracefs(dentry->d_parent->d_inode);
> if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE))
> - return;
> + goto out;
>
> ef = dentry->d_fsdata;
> if (!ef)
> - return;
> - ef->created = false;
> + goto out;
> + /*
> + * If ef was freed, then the LSB bit is set for d_fsdata.
> + * But this should not happen, as it should still have a
> + * ref count that prevents it. Warn in case it does.
> + */
> + if (WARN_ON_ONCE((unsigned long)ef & 1))
> + goto out;
During the remove, a dget() is done to keep the dentry from freeing. To
make sure that it doesn't get freed, I added this test.
> +
> + dentry->d_fsdata = NULL;
> +
> ef->dentry = NULL;
> + out:
> + mutex_unlock(&eventfs_mutex);
> }
>
> /**
> @@ -202,6 +229,79 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
> ti->private = ef->ei;
> }
>
> +static struct dentry *
> +create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
> +{
Because both the lookup and the dir_open_wrapper did basically the same
thing, I created a helper function so that I didn't have to update both
locations.
> + bool invalidate = false;
> + struct dentry *dentry;
> +
> + mutex_lock(&eventfs_mutex);
> + if (ef->is_freed) {
> + mutex_unlock(&eventfs_mutex);
> + return NULL;
> + }
Ignore if the ef is on its way to be freed.
> + if (ef->dentry) {
> + dentry = ef->dentry;
If the ef already has a dentry (created) then use it.
> + /* On dir open, up the ref count */
> + if (!lookup)
> + dget(dentry);
> + mutex_unlock(&eventfs_mutex);
> + return dentry;
> + }
> + mutex_unlock(&eventfs_mutex);
> +
> + if (!lookup)
> + inode_lock(parent->d_inode);
> +
> + if (ef->ei)
> + dentry = create_dir(ef->name, parent, ef->data);
> + else
> + dentry = create_file(ef->name, ef->mode, parent,
> + ef->data, ef->fop);
> +
> + if (!lookup)
> + inode_unlock(parent->d_inode);
> +
> + mutex_lock(&eventfs_mutex);
> + if (IS_ERR_OR_NULL(dentry)) {
With the lock dropped, the dentry could have been created causing it to
fail. Check if the ef->dentry exists, and if so, use it instead.
Note, if the ef is freed, it should not have a dentry.
> + /* If the ef was already updated get it */
> + dentry = ef->dentry;
> + if (dentry && !lookup)
> + dget(dentry);
> + mutex_unlock(&eventfs_mutex);
> + return dentry;
> + }
> +
> + if (!ef->dentry && !ef->is_freed) {
With the lock dropped, the dentry could have been filled too. If so, drop
the created dentry and use the one owned by the ef->dentry.
> + ef->dentry = dentry;
> + if (ef->ei)
> + eventfs_post_create_dir(ef);
> + dentry->d_fsdata = ef;
> + } else {
> + /* A race here, should try again (unless freed) */
> + invalidate = true;
I had a WARN_ON() once here. Probably could add a:
WARN_ON_ONCE(!ef->is_freed);
> + }
> + mutex_unlock(&eventfs_mutex);
> + if (invalidate)
> + d_invalidate(dentry);
> +
> + if (lookup || invalidate)
> + dput(dentry);
> +
> + return invalidate ? NULL : dentry;
> +}
> +
> +static bool match_event_file(struct eventfs_file *ef, const char *name)
> +{
A bit of a paranoid helper function. I wanted to make sure to synchronize
with the removals.
> + bool ret;
> +
> + mutex_lock(&eventfs_mutex);
> + ret = !ef->is_freed && strcmp(ef->name, name) == 0;
> + mutex_unlock(&eventfs_mutex);
> +
> + return ret;
> +}
> +
> /**
> * eventfs_root_lookup - lookup routine to create file/dir
> * @dir: directory in which lookup to be done
> @@ -211,7 +311,6 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
> * Used to create dynamic file/dir with-in @dir, search with-in ei
> * list, if @dentry found go ahead and create the file/dir
> */
> -
> static struct dentry *eventfs_root_lookup(struct inode *dir,
> struct dentry *dentry,
> unsigned int flags)
> @@ -230,30 +329,10 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
> idx = srcu_read_lock(&eventfs_srcu);
> list_for_each_entry_srcu(ef, &ei->e_top_files, list,
> srcu_read_lock_held(&eventfs_srcu)) {
> - if (strcmp(ef->name, dentry->d_name.name))
> + if (!match_event_file(ef, dentry->d_name.name))
> continue;
> ret = simple_lookup(dir, dentry, flags);
> - if (ef->created)
> - continue;
> - mutex_lock(&eventfs_mutex);
> - ef->created = true;
> - if (ef->ei)
> - ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent,
> - ef->data, ef->fop, ef->iop);
> - else
> - ef->dentry = create_file(ef->name, ef->mode, ef->d_parent,
> - ef->data, ef->fop);
> -
> - if (IS_ERR_OR_NULL(ef->dentry)) {
> - ef->created = false;
> - mutex_unlock(&eventfs_mutex);
> - } else {
> - if (ef->ei)
> - eventfs_post_create_dir(ef);
> - ef->dentry->d_fsdata = ef;
> - mutex_unlock(&eventfs_mutex);
> - dput(ef->dentry);
> - }
> + create_dentry(ef, ef->d_parent, true);
> break;
> }
> srcu_read_unlock(&eventfs_srcu, idx);
> @@ -270,6 +349,7 @@ static int eventfs_release(struct inode *inode, struct file *file)
> struct tracefs_inode *ti;
> struct eventfs_inode *ei;
> struct eventfs_file *ef;
> + struct dentry *dentry;
> int idx;
>
> ti = get_tracefs(inode);
> @@ -280,8 +360,11 @@ static int eventfs_release(struct inode *inode, struct file *file)
> idx = srcu_read_lock(&eventfs_srcu);
> list_for_each_entry_srcu(ef, &ei->e_top_files, list,
> srcu_read_lock_held(&eventfs_srcu)) {
> - if (ef->created)
> - dput(ef->dentry);
> + mutex_lock(&eventfs_mutex);
> + dentry = ef->dentry;
> + mutex_unlock(&eventfs_mutex);
> + if (dentry)
> + dput(dentry);
> }
> srcu_read_unlock(&eventfs_srcu, idx);
> return dcache_dir_close(inode, file);
> @@ -312,47 +395,12 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
> ei = ti->private;
> idx = srcu_read_lock(&eventfs_srcu);
> list_for_each_entry_rcu(ef, &ei->e_top_files, list) {
> - if (ef->created) {
> - dget(ef->dentry);
> - continue;
> - }
> - mutex_lock(&eventfs_mutex);
> - ef->created = true;
> -
> - inode_lock(dentry->d_inode);
> - if (ef->ei)
> - ef->dentry = create_dir(ef->name, ef->mode, dentry,
> - ef->data, ef->fop, ef->iop);
> - else
> - ef->dentry = create_file(ef->name, ef->mode, dentry,
> - ef->data, ef->fop);
> - inode_unlock(dentry->d_inode);
> -
> - if (IS_ERR_OR_NULL(ef->dentry)) {
> - ef->created = false;
> - } else {
> - if (ef->ei)
> - eventfs_post_create_dir(ef);
> - ef->dentry->d_fsdata = ef;
> - }
> - mutex_unlock(&eventfs_mutex);
> + create_dentry(ef, dentry, false);
> }
> srcu_read_unlock(&eventfs_srcu, idx);
> return dcache_dir_open(inode, file);
> }
>
> -static const struct file_operations eventfs_file_operations = {
> - .open = dcache_dir_open_wrapper,
> - .read = generic_read_dir,
> - .iterate_shared = dcache_readdir,
> - .llseek = generic_file_llseek,
> - .release = eventfs_release,
> -};
> -
> -static const struct inode_operations eventfs_root_dir_inode_operations = {
> - .lookup = eventfs_root_lookup,
> -};
> -
> /**
> * eventfs_prepare_ef - helper function to prepare eventfs_file
> * @name: the name of the file/directory to create.
> @@ -470,11 +518,7 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
> ti_parent = get_tracefs(parent->d_inode);
> ei_parent = ti_parent->private;
>
> - ef = eventfs_prepare_ef(name,
> - S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> - &eventfs_file_operations,
> - &eventfs_root_dir_inode_operations, NULL);
> -
> + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);
For directories, just use the hard coded values.
> if (IS_ERR(ef))
> return ef;
>
> @@ -502,11 +546,7 @@ struct eventfs_file *eventfs_add_dir(const char *name,
> if (!ef_parent)
> return ERR_PTR(-EINVAL);
>
> - ef = eventfs_prepare_ef(name,
> - S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> - &eventfs_file_operations,
> - &eventfs_root_dir_inode_operations, NULL);
> -
> + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);
ditto.
> if (IS_ERR(ef))
> return ef;
>
> @@ -601,37 +641,15 @@ int eventfs_add_file(const char *name, umode_t mode,
> return 0;
> }
>
> -static LLIST_HEAD(free_list);
> -
> -static void eventfs_workfn(struct work_struct *work)
> -{
> - struct eventfs_file *ef, *tmp;
> - struct llist_node *llnode;
> -
> - llnode = llist_del_all(&free_list);
> - llist_for_each_entry_safe(ef, tmp, llnode, llist) {
> - if (ef->created && ef->dentry)
> - dput(ef->dentry);
> - kfree(ef->name);
> - kfree(ef->ei);
> - kfree(ef);
> - }
> -}
> -
> -DECLARE_WORK(eventfs_work, eventfs_workfn);
> -
> static void free_ef(struct rcu_head *head)
> {
> struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu);
>
> - if (!llist_add(&ef->llist, &free_list))
> - return;
> -
> - queue_work(system_unbound_wq, &eventfs_work);
> + kfree(ef->name);
> + kfree(ef->ei);
> + kfree(ef);
Since I did not do the dput() or d_invalidate() here I don't need call this
from task context. This simplifies the process.
> }
>
> -
> -
> /**
> * eventfs_remove_rec - remove eventfs dir or file from list
> * @ef: eventfs_file to be removed.
> @@ -639,7 +657,7 @@ static void free_ef(struct rcu_head *head)
> * This function recursively remove eventfs_file which
> * contains info of file or dir.
> */
> -static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> +static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, int level)
> {
> struct eventfs_file *ef_child;
>
> @@ -659,15 +677,12 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> /* search for nested folders or files */
> list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
> lockdep_is_held(&eventfs_mutex)) {
> - eventfs_remove_rec(ef_child, level + 1);
> + eventfs_remove_rec(ef_child, head, level + 1);
> }
> }
>
> - if (ef->created && ef->dentry)
> - d_invalidate(ef->dentry);
> -
> list_del_rcu(&ef->list);
> - call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
> + list_add_tail(&ef->del_list, head);
Hold off on freeing the ef. Add it to a link list to do so later.
> }
>
> /**
> @@ -678,12 +693,62 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> */
> void eventfs_remove(struct eventfs_file *ef)
> {
> + struct eventfs_file *tmp;
> + LIST_HEAD(ef_del_list);
> + struct dentry *dentry_list = NULL;
> + struct dentry *dentry;
> +
> if (!ef)
> return;
>
> mutex_lock(&eventfs_mutex);
> - eventfs_remove_rec(ef, 0);
> + eventfs_remove_rec(ef, &ef_del_list, 0);
The above returns back with ef_del_list holding all the ef's to be freed.
I probably could have just passed the dentry_list down instead, but I
wanted the below complexity done in a non recursive function.
> +
> + list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) {
> + if (ef->dentry) {
> + unsigned long ptr = (unsigned long)dentry_list;
> +
> + /* Keep the dentry from being freed yet */
> + dget(ef->dentry);
> +
> + /*
> + * Paranoid: The dget() above should prevent the dentry
> + * from being freed and calling eventfs_set_ef_status_free().
> + * But just in case, set the link list LSB pointer to 1
> + * and have eventfs_set_ef_status_free() check that to
> + * make sure that if it does happen, it will not think
> + * the d_fsdata is an event_file.
> + *
> + * For this to work, no event_file should be allocated
> + * on a odd space, as the ef should always be allocated
> + * to be at least word aligned. Check for that too.
> + */
> + WARN_ON_ONCE(ptr & 1);
> +
> + ef->dentry->d_fsdata = (void *)(ptr | 1);
Set the d_fsdata to be a link list. The comment above needs to say to say
struct eventfs_file and struct dentry should be word aligned. Anyway, while
the eventfs_mutex is held, set all the dentries belonging to eventfs_files
to the dentry_list and clear the ef->dentry.
> + dentry_list = ef->dentry;
> + ef->dentry = NULL;
> + }
> + call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
> + }
> mutex_unlock(&eventfs_mutex);
> +
> + while (dentry_list) {
> + unsigned long ptr;
> +
> + dentry = dentry_list;
> + ptr = (unsigned long)dentry->d_fsdata & ~1UL;
> + dentry_list = (struct dentry *)ptr;
> + dentry->d_fsdata = NULL;
With the mutex released, it is safe to free the dentries here. This also
must be done before returning from this function, as when I had it done in
the workqueue, it was failing some tests that would remove a dynamic event
and still see that the directory was still around!
> + d_invalidate(dentry);
> + mutex_lock(&eventfs_mutex);
> + /* dentry should now have at least a single reference */
> + WARN_ONCE((int)d_count(dentry) < 1,
> + "dentry %px less than one reference (%d) after invalidate\n",
I did update the above to:
WARN_ONCE((int)d_count(dentry) < 1,
"dentry %px (%s) less than one reference (%d) after invalidate\n",
dentry, dentry->d_name.name, d_count(dentry));
To include the name of the dentry (my current work is triggering this still).
> + dentry, d_count(dentry));
> + mutex_unlock(&eventfs_mutex);
> + dput(dentry);
> + }
> }
>
> /**
> diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
> index c443a0c32a8c..1b880b5cd29d 100644
> --- a/fs/tracefs/internal.h
> +++ b/fs/tracefs/internal.h
> @@ -22,4 +22,6 @@ struct dentry *tracefs_end_creating(struct dentry *dentry);
> struct dentry *tracefs_failed_creating(struct dentry *dentry);
> struct inode *tracefs_get_inode(struct super_block *sb);
>
> +void eventfs_set_ef_status_free(struct dentry *dentry);
> +
> #endif /* _TRACEFS_INTERNAL_H */
> diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
> index 4d30b0cafc5f..47c1b4d21735 100644
> --- a/include/linux/tracefs.h
> +++ b/include/linux/tracefs.h
> @@ -51,8 +51,6 @@ void eventfs_remove(struct eventfs_file *ef);
>
> void eventfs_remove_events_dir(struct dentry *dentry);
>
> -void eventfs_set_ef_status_free(struct dentry *dentry);
> -
Oh, and eventfs_set_ef_status_free() should not be exported to outside the
tracefs system.
-- Steve
> struct dentry *tracefs_create_file(const char *name, umode_t mode,
> struct dentry *parent, void *data,
> const struct file_operations *fops);
Hi, Willy, Thomas
The suggestions of v1 nolibc powerpc patchset [1] from you have been applied,
here is v2.
Testing results:
- run with tinyconfig
arch/board | result
------------|------------
ppc/g3beige | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc/ppce500 | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc64le/pseries | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc64le/powernv | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc64/pseries | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc64/powernv | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
- run-user
(Tested with -Os, -O0 and -O2)
// for 32-bit PowerPC
$ for arch in powerpc ppc; do make run-user ARCH=$arch CROSS_COMPILE=powerpc-linux-gnu- ; done | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
// for 64-bit big-endian PowerPC and 64-bit little-endian PowerPC
$ for arch in ppc64 ppc64le; do make run-user ARCH=$arch CROSS_COMPILE=powerpc64le-linux-gnu- ; done | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
Changes from v1 --> v2:
- tools/nolibc: add support for powerpc
Add missing arch-powerpc.h lines to arch.h
Align with the other arch-<ARCH>.h, naming the variables
with more meaningful words, such as _ret, _num, _arg1 ...
Clean up the syscall instructions
No line from musl now.
Suggestons from Thomas
* tools/nolibc: add support for pppc64
No change
* selftests/nolibc: add extra configs customize support
To reduce complexity, merge the commands from the new extraconfig
target to defconfig target and drop the extconfig target completely.
Derived from Willy's suggestion of the tinyconfig patchset
* selftests/nolibc: add XARCH and ARCH mapping support
To reduce complexity, let's use XARCH internally and only reserve
ARCH as the input variable.
Derived from Willy's suggestion
* selftests/nolibc: add test support for powerpc
Add ppc as the default 32-bit variant for powerpc target, allow pass
ARCH=ppc or ARCH=powerpc to test 32-bit powerpc
Derived from Willy's suggestion
* selftests/nolibc: add test support for pppc64le
Rename powerpc64le to ppc64le
Suggestion from Willy
* selftests/nolibc: add test support for pppc64
Rename powerpc64 to ppc64
Suggestion from Willy
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1689713175.git.falcon@tinylab.org/
Zhangjin Wu (7):
tools/nolibc: add support for powerpc
tools/nolibc: add support for powerpc64
selftests/nolibc: add extra configs customize support
selftests/nolibc: add XARCH and ARCH mapping support
selftests/nolibc: add test support for ppc
selftests/nolibc: add test support for ppc64le
selftests/nolibc: add test support for ppc64
tools/include/nolibc/arch-powerpc.h | 202 ++++++++++++++++++++++++
tools/include/nolibc/arch.h | 2 +
tools/testing/selftests/nolibc/Makefile | 48 +++++-
3 files changed, 244 insertions(+), 8 deletions(-)
create mode 100644 tools/include/nolibc/arch-powerpc.h
--
2.25.1
Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Users can now select a
desired address space using a non-zero hint address to mmap. Previously,
requesting the default address space from mmap by passing zero as the hint
address would result in using the largest address space possible. Some
applications depend on empty bits in the virtual address space, like Go and
Java, so this patch provides more flexibility for application developers.
-Charlie
---
v6:
- Rebase onto the correct base
v5:
- Minor wording change in documentation
- Change some parenthesis in arch_get_mmap_ macros
- Added case for addr==0 in arch_get_mmap_ because without this, programs would
crash if RLIMIT_STACK was modified before executing the program. This was
tested using the libhugetlbfs tests.
v4:
- Split testcases/document patch into test cases, in-code documentation, and
formal documentation patches
- Modified the mmap_base macro to be more legible and better represent memory
layout
- Fixed documentation to better reflect the implmentation
- Renamed DEFAULT_VA_BITS to MMAP_VA_BITS
- Added additional test case for rlimit changes
---
Charlie Jenkins (4):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Document mmap changes
Documentation/riscv/vm-layout.rst | 22 +++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 20 ++-
arch/riscv/include/asm/processor.h | 46 +++++-
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 1 +
tools/testing/selftests/riscv/mm/Makefile | 21 +++
.../selftests/riscv/mm/testcases/mmap.c | 133 ++++++++++++++++++
8 files changed, 234 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap.c
--
2.41.0
This is the basic functionality for iommufd to support
iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.
iommufd_device_replace() allows changing the HWPT associated with the
device to a new IOAS or HWPT. Replace does this in way that failure leaves
things unchanged, and utilizes the iommu iommu_group_replace_domain() API
to allow the iommu driver to perform an optional non-disruptive change.
IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and
used by attach or replace. At this point it isn't very useful since the
HWPT is the same as the automatically managed HWPT from the IOAS. However
a following series will allow userspace to customize the created HWPT.
The implementation is complicated because we have to introduce some
per-iommu_group memory in iommufd and redo how we think about multi-device
groups to be more explicit. This solves all the locking problems in the
prior attempts.
This series is infrastructure work for the following series which:
- Add replace for attach
- Expose replace through VFIO APIs
- Implement driver parameters for HWPT creation (nesting)
Once review of this is complete I will keep it on a side branch and
accumulate the following series when they are ready so we can have a
stable base and make more incremental progress. When we have all the parts
together to get a full implementation it can go to Linus.
This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
v8:
- Rebase to v6.5-rc2, update to new behavior of __iommu_group_set_domain()
v7: https://lore.kernel.org/r/0-v7-6c0fd698eda2+5e3-iommufd_alloc_jgg@nvidia.com
- Rebase to v6.4-rc2, update to new signature of iommufd_get_ioas()
v6: https://lore.kernel.org/r/0-v6-fdb604df649a+369-iommufd_alloc_jgg@nvidia.com
- Go back to the v4 locking arragnment with now both the attach/detach
igroup->locks inside the functions, Kevin says he needs this for a
followup series. This still fixes the syzkaller bug
- Fix two more error unwind locking bugs where
iommufd_object_abort_and_destroy(hwpt) would deadlock or be mislocked.
Make sure fail_nth will catch these mistakes
- Add a patch allowing objects to have different abort than destroy
function, it allows hwpt abort to require the caller to continue
to hold the lock and enforces this with lockdep.
v5: https://lore.kernel.org/r/0-v5-6716da355392+c5-iommufd_alloc_jgg@nvidia.com
- Go back to the v3 version of the code, keep the comment changes from
v4. Syzkaller says the group lock change in v4 didn't work.
- Adjust the fail_nth test to cover the path syzkaller found. We need to
have an ioas with a mapped page installed to inject a failure during
domain attachment.
v4: https://lore.kernel.org/r/0-v4-9cd79ad52ee8+13f5-iommufd_alloc_jgg@nvidia.c…
- Refine comments and commit messages
- Move the group lock into iommufd_hw_pagetable_attach()
- Fix error unwind in iommufd_device_do_replace()
v3: https://lore.kernel.org/r/0-v3-61d41fd9e13e+1f5-iommufd_alloc_jgg@nvidia.com
- Refine comments and commit messages
- Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only
set on success
- Reject replace on non-attached devices
- Add missing __reserved check for IOMMU_HWPT_ALLOC
v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.c…
- Use WARN_ON for the igroup->group test and move that logic to a
function iommufd_group_try_get()
- Change igroup->devices to igroup->device list
Replace will need to iterate over all attached idevs
- Rename to iommufd_group_setup_msi()
- New patch to export iommu_get_resv_regions()
- New patch to use per-device reserved regions instead of per-group
regions
- Split out the reorganizing of iommufd_device_change_pt() from the
replace patch
- Replace uses the per-dev reserved regions
- Use stdev_id in a few more places in the selftest
- Fix error handling in IOMMU_HWPT_ALLOC
- Clarify comments
- Rebase on v6.3-rc1
v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia…
Jason Gunthorpe (17):
iommufd: Move isolated msi enforcement to iommufd_device_bind()
iommufd: Add iommufd_group
iommufd: Replace the hwpt->devices list with iommufd_group
iommu: Export iommu_get_resv_regions()
iommufd: Keep track of each device's reserved regions instead of
groups
iommufd: Use the iommufd_group to avoid duplicate MSI setup
iommufd: Make sw_msi_start a group global
iommufd: Move putting a hwpt to a helper function
iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
iommufd: Allow a hwpt to be aborted after allocation
iommufd: Fix locking around hwpt allocation
iommufd: Reorganize iommufd_device_attach into
iommufd_device_change_pt
iommufd: Add iommufd_device_replace()
iommufd: Make destroy_rwsem use a lock class per object type
iommufd: Add IOMMU_HWPT_ALLOC
iommufd/selftest: Return the real idev id from selftest mock_domain
iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
Nicolin Chen (2):
iommu: Introduce a new iommu_group_replace_domain() API
iommufd/selftest: Test iommufd_device_replace()
drivers/iommu/iommu-priv.h | 10 +
drivers/iommu/iommu.c | 38 +-
drivers/iommu/iommufd/device.c | 555 +++++++++++++-----
drivers/iommu/iommufd/hw_pagetable.c | 112 +++-
drivers/iommu/iommufd/io_pagetable.c | 32 +-
drivers/iommu/iommufd/iommufd_private.h | 52 +-
drivers/iommu/iommufd/iommufd_test.h | 6 +
drivers/iommu/iommufd/main.c | 24 +-
drivers/iommu/iommufd/selftest.c | 40 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/iommufd.h | 26 +
tools/testing/selftests/iommu/iommufd.c | 67 ++-
.../selftests/iommu/iommufd_fail_nth.c | 67 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 63 +-
14 files changed, 867 insertions(+), 226 deletions(-)
create mode 100644 drivers/iommu/iommu-priv.h
base-commit: fdf0eaf11452d72945af31804e2a1048ee1b574c
--
2.41.0
Apologize for sending previous mail from a wrong app (not text mode).
Resending to keep the mailing list thread consistent.
On Wed, Jul 26, 2023 at 3:10 AM Markus Elfring <Markus.Elfring(a)web.de>
wrote:
>
> > Tests BPF redirect at the lwt xmit hook to ensure error handling are
> > safe, i.e. won't panic the kernel.
>
> Are imperative change descriptions still preferred?
Hi Markus,
I think you linked this to me yesterday that it should be described
imperatively:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Doc…
>
> See also:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Doc…
>
I don’t follow the purpose of this reference. This points to user impact
but this is a selftest, so I don’t see any user impact here. Or is there
anything I missed?
>
> Can remaining wording weaknesses be adjusted accordingly?
I am not following this question . Can you be more specific or provide an
example?
Yan
>
> Regards,
> Markus
>
Dzień dobry,
zapoznałem się z Państwa ofertą i z przyjemnością przyznaję, że przyciąga uwagę i zachęca do dalszych rozmów.
Pomyślałem, że może mógłbym mieć swój wkład w Państwa rozwój i pomóc dotrzeć z tą ofertą do większego grona odbiorców. Pozycjonuję strony www, dzięki czemu generują świetny ruch w sieci.
Możemy porozmawiać w najbliższym czasie?
Pozdrawiam
Adam Charachuta
Add specification for declaring test metadata to the KTAP v2 spec.
The purpose of test metadata is to allow for the declaration of essential
testing information in KTAP output. This information includes test
names, test configuration info, test attributes, and test files.
There have been similar ideas around the idea of test metadata such as test
prefixes and test name lines. However, I propose this specification as an
overall fix for these issues.
These test metadata lines are a form of diagnostic lines with the
format: "# <metadata_type>: <data>". As a type of diagnostic line, test
metadata lines are compliant with KTAP v1, which will help to not
interfere too much with current parsers.
Specifically the "# Subtest:" line is derived from the TAP 14 spec:
https://testanything.org/tap-version-14-specification.html.
The proposed location for test metadata is in the test header, between the
version line and the test plan line. Note including diagnostic lines in
the test header is a depature from KTAP v1.
This location provides two main benefits:
First, metadata will be printed prior to when subtests are run. Then if a
test fails, test metadata can help discern which test is causing the issue
and potentially why.
Second, this location ensures that the lines will not be accidentally
parsed as a subtest's diagnostic lines because the lines are bordered by
the version line and plan line.
Here is an example of test metadata:
KTAP version 2
# Config: CONFIG_TEST=y
1..1
KTAP version 2
# Subtest: test_suite
# File: /sys/kernel/...
# Attributes: slow
# Other: example_test
1..2
ok 1 test_1
ok 2 test_2
ok 1 test_suite
Here is a link to a version of the KUnit parser that is able to parse test
metadata lines for KTAP version 2. Note this includes test metadata
lines for the main level of KTAP.
Link: https://kunit-review.googlesource.com/c/linux/+/5809
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
Hi everyone,
I would like to use this proposal similar to an RFC to gather ideas on the
topic of test metadata. Let me know what you think.
I am also interested in brainstorming a list of recognized metadata types.
Providing recognized metadata types would be helpful in parsing and
displaying test metadata in a useful way.
Current ideas:
- "# Subtest: <test_name>" to indicate test name (name must match
corresponding result line)
- "# Attributes: <attributes list>" to indicate test attributes (list
separated by commas)
- "# File: <file_path>" to indicate file used in testing
Any other ideas?
Note this proposal replaces two of my previous proposals: "ktap_v2: add
recognized test name line" and "ktap_v2: allow prefix to KTAP lines."
Thanks!
-Rae
Note: this patch is based on Frank's ktap_spec_version_2 branch.
Documentation/dev-tools/ktap.rst | 51 ++++++++++++++++++++++++++++++--
1 file changed, 48 insertions(+), 3 deletions(-)
diff --git a/Documentation/dev-tools/ktap.rst b/Documentation/dev-tools/ktap.rst
index ff77f4aaa6ef..a2d0a196c115 100644
--- a/Documentation/dev-tools/ktap.rst
+++ b/Documentation/dev-tools/ktap.rst
@@ -17,7 +17,9 @@ KTAP test results describe a series of tests (which may be nested: i.e., test
can have subtests), each of which can contain both diagnostic data -- e.g., log
lines -- and a final result. The test structure and results are
machine-readable, whereas the diagnostic data is unstructured and is there to
-aid human debugging.
+aid human debugging. One exception to this is test metadata lines - a type
+of diagnostic lines. Test metadata is located between the version line and
+plan line of a test and can be machine-readable.
KTAP output is built from four different types of lines:
- Version lines
@@ -28,8 +30,7 @@ KTAP output is built from four different types of lines:
In general, valid KTAP output should also form valid TAP output, but some
information, in particular nested test results, may be lost. Also note that
there is a stagnant draft specification for TAP14, KTAP diverges from this in
-a couple of places (notably the "Subtest" header), which are described where
-relevant later in this document.
+a couple of places, which are described where relevant later in this document.
Version lines
-------------
@@ -166,6 +167,45 @@ even if they do not start with a "#": this is to capture any other useful
kernel output which may help debug the test. It is nevertheless recommended
that tests always prefix any diagnostic output they have with a "#" character.
+Test metadata lines
+-------------------
+
+Test metadata lines are a type of diagnostic lines used to the declare the
+name of a test and other helpful testing information in the test header.
+These lines are often helpful for parsing and for providing context during
+crashes.
+
+Test metadata lines must follow the format: "# <metadata_type>: <data>".
+These lines must be located between the version line and the plan line
+within a test header.
+
+There are a few currently recognized metadata types:
+- "# Subtest: <test_name>" to indicate test name (name must match
+ corresponding result line)
+- "# Attributes: <attributes list>" to indicate test attributes (list
+ separated by commas)
+- "# File: <file_path>" to indicate file used in testing
+
+As a rule, the "# Subtest:" line is generally first to declare the test
+name. Note that metadata lines do not necessarily need to use a
+recognized metadata type.
+
+An example of using metadata lines:
+
+::
+
+ KTAP version 2
+ 1..1
+ # File: /sys/kernel/...
+ KTAP version 2
+ # Subtest: example
+ # Attributes: slow, example_test
+ 1..1
+ ok 1 test_1
+ # example passed
+ ok 1 example
+
+
Unknown lines
-------------
@@ -206,6 +246,7 @@ An example of a test with two nested subtests:
KTAP version 2
1..1
KTAP version 2
+ # Subtest: example
1..2
ok 1 test_1
not ok 2 test_2
@@ -219,6 +260,7 @@ An example format with multiple levels of nested testing:
KTAP version 2
1..2
KTAP version 2
+ # Subtest: example_test_1
1..2
KTAP version 2
1..2
@@ -254,6 +296,7 @@ Example KTAP output
KTAP version 2
1..1
KTAP version 2
+ # Subtest: main_test
1..3
KTAP version 2
1..1
@@ -261,11 +304,13 @@ Example KTAP output
ok 1 test_1
ok 1 example_test_1
KTAP version 2
+ # Attributes: slow
1..2
ok 1 test_1 # SKIP test_1 skipped
ok 2 test_2
ok 2 example_test_2
KTAP version 2
+ # Subtest: example_test_3
1..3
ok 1 test_1
# test_2: FAIL
base-commit: 906f02e42adfbd5ae70d328ee71656ecb602aaf5
--
2.40.0.396.gfff15efe05-goog
lwt xmit hook does not expect positive return values in function
ip_finish_output2 and ip6_finish_output2. However, BPF redirect programs
can return positive values such like NET_XMIT_DROP, NET_RX_DROP, and etc
as errors. Such return values can panic the kernel unexpectedly:
https://gist.github.com/zhaiyan920/8fbac245b261fe316a7ef04c9b1eba48
This patch fixes the return values from BPF redirect, so the error
handling would be consistent at xmit hook. It also adds a few test cases
to prevent future regressions.
v2: https://lore.kernel.org/netdev/ZLdY6JkWRccunvu0@debian.debian/
v1: https://lore.kernel.org/bpf/ZLbYdpWC8zt9EJtq@debian.debian/
changes since v2:
* subject name changed
* also covered redirect to ingress case
* added selftests
changes since v1:
* minor code style changes
Yan Zhai (2):
bpf: fix skb_do_redirect return values
bpf: selftests: add lwt redirect regression test cases
net/core/filter.c | 12 +-
tools/testing/selftests/bpf/Makefile | 1 +
.../selftests/bpf/progs/test_lwt_redirect.c | 67 +++++++
.../selftests/bpf/test_lwt_redirect.sh | 165 ++++++++++++++++++
4 files changed, 244 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/progs/test_lwt_redirect.c
create mode 100755 tools/testing/selftests/bpf/test_lwt_redirect.sh
--
2.30.2
Remove clean target in Makefile to fix the following warning
and use the one in common lib.mk
Makefile:14: warning: overriding recipe for target 'clean'
../lib.mk:160: warning: ignoring old recipe for target 'clean'
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
tools/testing/selftests/prctl/Makefile | 2 --
1 file changed, 2 deletions(-)
diff --git a/tools/testing/selftests/prctl/Makefile b/tools/testing/selftests/prctl/Makefile
index cfc35d29fc2e..01dc90fbb509 100644
--- a/tools/testing/selftests/prctl/Makefile
+++ b/tools/testing/selftests/prctl/Makefile
@@ -10,7 +10,5 @@ all: $(TEST_PROGS)
include ../lib.mk
-clean:
- rm -fr $(TEST_PROGS)
endif
endif
--
2.39.2
On Tue, Jul 25, 2023 at 12:11 AM Markus Elfring <Markus.Elfring(a)web.de> wrote:
>
> > … unexpected problems. This change
> > converts the positive status code to proper error code.
>
> Please choose a corresponding imperative change suggestion.
>
> See also:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Doc…
>
>
> Did you provide sufficient justification for a possible addition of the tag “Fixes”?
>
>
> …
> > v2: code style change suggested by Stanislav Fomichev
> > ---
> > net/core/filter.c | 12 +++++++++++-
> …
>
> How do you think about to replace this marker by a line break?
>
> See also:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Doc…
>
> Regards,
> Markus
Hi Markus,
Thanks for the suggestions, those are what I could use more help with.
Will address these in the next version.
Yan