Hi Shua,
Here is the set with cleanup as suggested by Kees on v3.
Configured, built, and tested all modules loaded by
tools/testing/selftests/lib/*.sh
>From previous cover letters ...
While doing the testing for strscpy_pad() it was noticed that there is
duplication in how test modules are being fed to kselftest and also in
the test modules themselves.
This set makes an attempt at adding a framework to kselftest for writing
kernel test modules. It also adds a script for use in creating script
test runners for kselftest. My macro-foo is not great, all criticism
and suggestions very much appreciated. The design is based on test
modules lib/test_printf.c, lib/test_bitmap.c, lib/test_xarray.c.
Changes since last version:
- Remove dependency on Bash (thanks Kees)
- Use oneliner to implement kselftest test runners (thanks Kees)
- Squash patch that adds kselftest script creator script with patch
that uses it.
- Fix typos (thanks Randy)
- Add Kees' Acked-by tags to all patches
thanks,
Tobin.
Tobin C. Harding (6):
lib/test_printf: Add empty module_exit function
kselftest: Add test runner creation script
kselftest: Add test module framework header
lib: Use new kselftest header
lib/string: Add strscpy_pad() function
lib: Add test module for strscpy_pad
Documentation/dev-tools/kselftest.rst | 94 +++++++++++-
include/linux/string.h | 4 +
lib/Kconfig.debug | 3 +
lib/Makefile | 1 +
lib/string.c | 47 +++++-
lib/test_bitmap.c | 20 +--
lib/test_printf.c | 17 +--
lib/test_strscpy.c | 150 +++++++++++++++++++
tools/testing/selftests/kselftest_module.h | 48 ++++++
tools/testing/selftests/kselftest_module.sh | 84 +++++++++++
tools/testing/selftests/lib/Makefile | 2 +-
tools/testing/selftests/lib/bitmap.sh | 18 +--
tools/testing/selftests/lib/config | 1 +
tools/testing/selftests/lib/prime_numbers.sh | 17 +--
tools/testing/selftests/lib/printf.sh | 19 +--
tools/testing/selftests/lib/strscpy.sh | 3 +
16 files changed, 440 insertions(+), 88 deletions(-)
create mode 100644 lib/test_strscpy.c
create mode 100644 tools/testing/selftests/kselftest_module.h
create mode 100755 tools/testing/selftests/kselftest_module.sh
create mode 100755 tools/testing/selftests/lib/strscpy.sh
--
2.21.0
This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.
Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
and does not require tests to be written in userspace running on a host
kernel. Additionally, KUnit is fast: From invocation to completion KUnit
can run several dozen tests in under a second. Currently, the entire
KUnit test suite for KUnit runs in under a second from the initial
invocation (build time excluded).
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.
## What's so special about unit testing?
A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.
## Is KUnit trying to replace other testing frameworks for the kernel?
No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.
## More information on KUnit
There is a bunch of documentation near the end of this patch set that
describes how to use KUnit and best practices for writing unit tests.
For convenience I am hosting the compiled docs here:
https://google.github.io/kunit-docs/third_party/kernel/docs/
Additionally for convenience, I have applied these patches to a branch:
https://kunit.googlesource.com/linux/+/kunit/rfc/4.19/v3
The repo may be cloned with:
git clone https://kunit.googlesource.com/linux
This patchset is on the kunit/rfc/4.19/v3 branch.
## Changes Since Last Version
- Changed namespace prefix from `test_*` to `kunit_*` as requested by
Shuah.
- Started converting/cleaning up the device tree unittest to use KUnit.
- Started adding KUnit expectations with custom messages.
--
2.20.0.rc0.387.gc7a69e6b6c-goog
vDSO (virtual dynamic shared object) is a mechanism that the Linux
kernel provides as an alternative to system calls to reduce where
possible the costs in terms of cycles.
This is possible because certain syscalls like gettimeofday() do
not write any data and return one or more values that are stored
in the kernel, which makes relatively safe calling them directly
as a library function.
Even if the mechanism is pretty much standard, every architecture
in the last few years ended up implementing their own vDSO library
in the architectural code.
The purpose of this patch-set is to identify the commonalities in
between the architectures and try to consolidate the common code
paths, starting with gettimeofday().
This implementation contains the following design choices:
* Every architecture defines the arch specific code in an header in
"asm/vdso/".
* The generic implementation includes the arch specific one and lives
in "lib/vdso".
* The arch specific code for gettimeofday lives in
"<arch path>/vdso/gettimeofday.c" and includes the generic code only.
* The generic implementation of update_vsyscall and update_vsyscall_tz
lives in kernel/vdso and provide the bindings that can be implemented
by each architecture.
* Each architecture provides its implementation of the bindings in
"asm/vdso/vsyscall.h".
* This approach allows to consolidate the common code in a single place
with the benefit of avoiding code duplication.
This implementation contains the portings to the common library for: arm64,
compat mode for arm64, arm, mips, x86_64, x32, compat mode for x86_64 and
i386.
The mips porting has been tested on qemu for mips32el. A configuration to
repeat the tests can be found at [4].
The x86_64 porting has been tested on an Intel Xeon 5120T based machine
running Ubuntu 18.04 and using the Ubuntu provided defconfig.
The i386 porting has been tested on qemu using the i386_defconfig
configuration.
Last but not least from this porting arm64, compat arm64, arm and mips gain
the support for:
* CLOCK_BOOTTIME that can be useful in certain scenarios since it keeps
track of the time during sleep as well.
* CLOCK_TAI that is like CLOCK_REALTIME, but uses the International
Atomic Time (TAI) reference instead of UTC to avoid jumping on leap
second updates.
for both clock_gettime and clock_getres.
The porting has been validated using the vdsotest test-suite [1] extended
to cover all the clock ids [2].
A new test has been added to the linux kselftest in order to validate the
newly added library.
The porting has been benchmarked and the performance results are
provided as part of this cover letter.
To simplify the testing, a copy of the patchset on top of a recent linux
tree can be found at [3] and [4].
[1] https://github.com/nathanlynch/vdsotest
[2] https://github.com/fvincenzo/vdsotest
[3] git://linux-arm.org/linux-vf.git vdso/v6
[4] git://linux-arm.org/linux-vf.git vdso-mips/v6
Changes:
--------
v6:
- Rebased on 5.2-rc2.
- Added performance numbers.
- Removed vdso_types.h.
- Unified update_vsyscall and update_vsyscall_tz.
- Reworked the kselftest included in this patchset.
- Addressed review comments.
v5:
- Rebased on 5.0-rc7.
- Added x86_64, compat mode for x86_64 and i386 portings.
- Extended vDSO kselftest.
- Addressed review comments.
v4:
- Rebased on 5.0-rc2.
- Addressed review comments.
- Disabled compat vdso on arm64 when the kernel is compiled with
clang.
v3:
- Ported the latest fixes and optimizations done on the x86
architecture to the generic library.
- Addressed review comments.
- Improved the documentation of the interfaces.
- Changed the HAVE_ARCH_TIMER config option to a more generic
HAVE_HW_COUNTER.
v2:
- Added -ffixed-x18 to arm64
- Repleced occurrences of timeval and timespec
- Modified datapage.h to be compliant with y2038 on all the architectures
- Removed __u_vdso type
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Russell King <linux(a)armlinux.org.uk>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: Paul Burton <paul.burton(a)mips.com>
Cc: Daniel Lezcano <daniel.lezcano(a)linaro.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Mark Salyzyn <salyzyn(a)android.com>
Cc: Peter Collingbourne <pcc(a)google.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Dmitry Safonov <0x7f454c46(a)gmail.com>
Cc: Rasmus Villemoes <linux(a)rasmusvillemoes.dk>
Cc: Huw Davies <huw(a)codeweavers.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Performance Numbers: Linux 5.2.0-rc2 - Xeon Gold 5120T
======================================================
Unified vDSO:
-------------
clock-gettime-monotonic: syscall: 342 nsec/call
clock-gettime-monotonic: libc: 25 nsec/call
clock-gettime-monotonic: vdso: 24 nsec/call
clock-getres-monotonic: syscall: 296 nsec/call
clock-getres-monotonic: libc: 296 nsec/call
clock-getres-monotonic: vdso: 3 nsec/call
clock-gettime-monotonic-coarse: syscall: 294 nsec/call
clock-gettime-monotonic-coarse: libc: 5 nsec/call
clock-gettime-monotonic-coarse: vdso: 5 nsec/call
clock-getres-monotonic-coarse: syscall: 295 nsec/call
clock-getres-monotonic-coarse: libc: 292 nsec/call
clock-getres-monotonic-coarse: vdso: 5 nsec/call
clock-gettime-monotonic-raw: syscall: 343 nsec/call
clock-gettime-monotonic-raw: libc: 25 nsec/call
clock-gettime-monotonic-raw: vdso: 23 nsec/call
clock-getres-monotonic-raw: syscall: 290 nsec/call
clock-getres-monotonic-raw: libc: 290 nsec/call
clock-getres-monotonic-raw: vdso: 4 nsec/call
clock-gettime-tai: syscall: 332 nsec/call
clock-gettime-tai: libc: 24 nsec/call
clock-gettime-tai: vdso: 23 nsec/call
clock-getres-tai: syscall: 288 nsec/call
clock-getres-tai: libc: 288 nsec/call
clock-getres-tai: vdso: 3 nsec/call
clock-gettime-boottime: syscall: 342 nsec/call
clock-gettime-boottime: libc: 24 nsec/call
clock-gettime-boottime: vdso: 23 nsec/call
clock-getres-boottime: syscall: 284 nsec/call
clock-getres-boottime: libc: 291 nsec/call
clock-getres-boottime: vdso: 3 nsec/call
clock-gettime-realtime: syscall: 337 nsec/call
clock-gettime-realtime: libc: 24 nsec/call
clock-gettime-realtime: vdso: 23 nsec/call
clock-getres-realtime: syscall: 287 nsec/call
clock-getres-realtime: libc: 284 nsec/call
clock-getres-realtime: vdso: 3 nsec/call
clock-gettime-realtime-coarse: syscall: 307 nsec/call
clock-gettime-realtime-coarse: libc: 4 nsec/call
clock-gettime-realtime-coarse: vdso: 4 nsec/call
clock-getres-realtime-coarse: syscall: 294 nsec/call
clock-getres-realtime-coarse: libc: 291 nsec/call
clock-getres-realtime-coarse: vdso: 4 nsec/call
getcpu: syscall: 246 nsec/call
getcpu: libc: 14 nsec/call
getcpu: vdso: 11 nsec/call
gettimeofday: syscall: 293 nsec/call
gettimeofday: libc: 26 nsec/call
gettimeofday: vdso: 25 nsec/call
Stock Kernel:
-------------
clock-gettime-monotonic: syscall: 338 nsec/call
clock-gettime-monotonic: libc: 24 nsec/call
clock-gettime-monotonic: vdso: 23 nsec/call
clock-getres-monotonic: syscall: 291 nsec/call
clock-getres-monotonic: libc: 304 nsec/call
clock-getres-monotonic: vdso: not tested
Note: vDSO version of clock_getres not found
clock-gettime-monotonic-coarse: syscall: 297 nsec/call
clock-gettime-monotonic-coarse: libc: 5 nsec/call
clock-gettime-monotonic-coarse: vdso: 4 nsec/call
clock-getres-monotonic-coarse: syscall: 281 nsec/call
clock-getres-monotonic-coarse: libc: 286 nsec/call
clock-getres-monotonic-coarse: vdso: not tested
Note: vDSO version of clock_getres not found
clock-gettime-monotonic-raw: syscall: 336 nsec/call
clock-gettime-monotonic-raw: libc: 340 nsec/call
clock-gettime-monotonic-raw: vdso: 346 nsec/call
clock-getres-monotonic-raw: syscall: 297 nsec/call
clock-getres-monotonic-raw: libc: 301 nsec/call
clock-getres-monotonic-raw: vdso: not tested
Note: vDSO version of clock_getres not found
clock-gettime-tai: syscall: 351 nsec/call
clock-gettime-tai: libc: 24 nsec/call
clock-gettime-tai: vdso: 23 nsec/call
clock-getres-tai: syscall: 298 nsec/call
clock-getres-tai: libc: 290 nsec/call
clock-getres-tai: vdso: not tested
Note: vDSO version of clock_getres not found
clock-gettime-boottime: syscall: 342 nsec/call
clock-gettime-boottime: libc: 347 nsec/call
clock-gettime-boottime: vdso: 355 nsec/call
clock-getres-boottime: syscall: 296 nsec/call
clock-getres-boottime: libc: 295 nsec/call
clock-getres-boottime: vdso: not tested
Note: vDSO version of clock_getres not found
clock-gettime-realtime: syscall: 346 nsec/call
clock-gettime-realtime: libc: 24 nsec/call
clock-gettime-realtime: vdso: 22 nsec/call
clock-getres-realtime: syscall: 295 nsec/call
clock-getres-realtime: libc: 291 nsec/call
clock-getres-realtime: vdso: not tested
Note: vDSO version of clock_getres not found
clock-gettime-realtime-coarse: syscall: 292 nsec/call
clock-gettime-realtime-coarse: libc: 5 nsec/call
clock-gettime-realtime-coarse: vdso: 4 nsec/call
clock-getres-realtime-coarse: syscall: 300 nsec/call
clock-getres-realtime-coarse: libc: 301 nsec/call
clock-getres-realtime-coarse: vdso: not tested
Note: vDSO version of clock_getres not found
getcpu: syscall: 252 nsec/call
getcpu: libc: 14 nsec/call
getcpu: vdso: 11 nsec/call
gettimeofday: syscall: 293 nsec/call
gettimeofday: libc: 24 nsec/call
gettimeofday: vdso: 25 nsec/call
Peter Collingbourne (1):
arm64: Build vDSO with -ffixed-x18
Vincenzo Frascino (18):
kernel: Standardize vdso_datapage
kernel: Define gettimeofday vdso common code
kernel: Unify update_vsyscall implementation
arm64: Substitute gettimeofday with C implementation
arm64: compat: Add missing syscall numbers
arm64: compat: Expose signal related structures
arm64: compat: Generate asm offsets for signals
lib: vdso: Add compat support
arm64: compat: Add vDSO
arm64: Refactor vDSO code
arm64: compat: vDSO setup for compat layer
arm64: elf: vDSO code page discovery
arm64: compat: Get sigreturn trampolines from vDSO
arm64: Add vDSO compat support
arm: Add support for generic vDSO
mips: Add support for generic vDSO
x86: Add support for generic vDSO
kselftest: Extend vDSO selftest
arch/arm/Kconfig | 3 +
arch/arm/include/asm/vdso/gettimeofday.h | 96 +++++
arch/arm/include/asm/vdso/vsyscall.h | 71 ++++
arch/arm/include/asm/vdso_datapage.h | 29 +-
arch/arm/kernel/vdso.c | 87 +----
arch/arm/vdso/Makefile | 13 +-
arch/arm/vdso/note.c | 15 +
arch/arm/vdso/vdso.lds.S | 2 +
arch/arm/vdso/vgettimeofday.c | 268 +------------
arch/arm64/Kconfig | 3 +
arch/arm64/Makefile | 23 +-
arch/arm64/include/asm/elf.h | 14 +
arch/arm64/include/asm/signal32.h | 46 +++
arch/arm64/include/asm/unistd.h | 5 +
arch/arm64/include/asm/vdso.h | 3 +
arch/arm64/include/asm/vdso/compat_barrier.h | 51 +++
.../include/asm/vdso/compat_gettimeofday.h | 108 ++++++
arch/arm64/include/asm/vdso/gettimeofday.h | 84 +++++
arch/arm64/include/asm/vdso/vsyscall.h | 53 +++
arch/arm64/include/asm/vdso_datapage.h | 48 ---
arch/arm64/kernel/Makefile | 6 +-
arch/arm64/kernel/asm-offsets.c | 39 +-
arch/arm64/kernel/signal32.c | 72 ++--
arch/arm64/kernel/vdso.c | 356 ++++++++++++------
arch/arm64/kernel/vdso/Makefile | 34 +-
arch/arm64/kernel/vdso/gettimeofday.S | 334 ----------------
arch/arm64/kernel/vdso/vgettimeofday.c | 28 ++
arch/arm64/kernel/vdso32/.gitignore | 2 +
arch/arm64/kernel/vdso32/Makefile | 184 +++++++++
arch/arm64/kernel/vdso32/note.c | 15 +
arch/arm64/kernel/vdso32/sigreturn.S | 62 +++
arch/arm64/kernel/vdso32/vdso.S | 19 +
arch/arm64/kernel/vdso32/vdso.lds.S | 82 ++++
arch/arm64/kernel/vdso32/vgettimeofday.c | 59 +++
arch/mips/Kconfig | 2 +
arch/mips/include/asm/vdso.h | 78 +---
arch/mips/include/asm/vdso/gettimeofday.h | 175 +++++++++
arch/mips/{ => include/asm}/vdso/vdso.h | 6 +-
arch/mips/include/asm/vdso/vsyscall.h | 43 +++
arch/mips/kernel/vdso.c | 37 +-
arch/mips/vdso/Makefile | 25 +-
arch/mips/vdso/elf.S | 2 +-
arch/mips/vdso/gettimeofday.c | 273 --------------
arch/mips/vdso/sigreturn.S | 2 +-
arch/mips/vdso/vdso.lds.S | 4 +
arch/mips/vdso/vgettimeofday.c | 57 +++
arch/x86/Kconfig | 3 +
arch/x86/entry/vdso/Makefile | 9 +
arch/x86/entry/vdso/vclock_gettime.c | 251 +++---------
arch/x86/entry/vdso/vdso.lds.S | 2 +
arch/x86/entry/vdso/vdso32/vdso32.lds.S | 2 +
arch/x86/entry/vdso/vdsox32.lds.S | 1 +
arch/x86/entry/vsyscall/Makefile | 2 -
arch/x86/entry/vsyscall/vsyscall_gtod.c | 83 ----
arch/x86/include/asm/mshyperv-tsc.h | 76 ++++
arch/x86/include/asm/mshyperv.h | 70 +---
arch/x86/include/asm/pvclock.h | 2 +-
arch/x86/include/asm/vdso/gettimeofday.h | 203 ++++++++++
arch/x86/include/asm/vdso/vsyscall.h | 44 +++
arch/x86/include/asm/vgtod.h | 75 +---
arch/x86/include/asm/vvar.h | 7 +-
arch/x86/kernel/pvclock.c | 1 +
include/asm-generic/vdso/vsyscall.h | 56 +++
include/linux/hrtimer.h | 15 +-
include/linux/hrtimer_defs.h | 25 ++
include/linux/timekeeper_internal.h | 9 +
include/vdso/datapage.h | 91 +++++
include/vdso/helpers.h | 56 +++
include/vdso/vsyscall.h | 11 +
kernel/Makefile | 1 +
kernel/vdso/Makefile | 2 +
kernel/vdso/vsyscall.c | 139 +++++++
lib/Kconfig | 5 +
lib/vdso/Kconfig | 36 ++
lib/vdso/Makefile | 22 ++
lib/vdso/gettimeofday.c | 229 +++++++++++
tools/testing/selftests/vDSO/Makefile | 2 +
tools/testing/selftests/vDSO/vdso_full_test.c | 261 +++++++++++++
78 files changed, 3042 insertions(+), 1767 deletions(-)
create mode 100644 arch/arm/include/asm/vdso/gettimeofday.h
create mode 100644 arch/arm/include/asm/vdso/vsyscall.h
create mode 100644 arch/arm/vdso/note.c
create mode 100644 arch/arm64/include/asm/vdso/compat_barrier.h
create mode 100644 arch/arm64/include/asm/vdso/compat_gettimeofday.h
create mode 100644 arch/arm64/include/asm/vdso/gettimeofday.h
create mode 100644 arch/arm64/include/asm/vdso/vsyscall.h
delete mode 100644 arch/arm64/include/asm/vdso_datapage.h
delete mode 100644 arch/arm64/kernel/vdso/gettimeofday.S
create mode 100644 arch/arm64/kernel/vdso/vgettimeofday.c
create mode 100644 arch/arm64/kernel/vdso32/.gitignore
create mode 100644 arch/arm64/kernel/vdso32/Makefile
create mode 100644 arch/arm64/kernel/vdso32/note.c
create mode 100644 arch/arm64/kernel/vdso32/sigreturn.S
create mode 100644 arch/arm64/kernel/vdso32/vdso.S
create mode 100644 arch/arm64/kernel/vdso32/vdso.lds.S
create mode 100644 arch/arm64/kernel/vdso32/vgettimeofday.c
create mode 100644 arch/mips/include/asm/vdso/gettimeofday.h
rename arch/mips/{ => include/asm}/vdso/vdso.h (90%)
create mode 100644 arch/mips/include/asm/vdso/vsyscall.h
delete mode 100644 arch/mips/vdso/gettimeofday.c
create mode 100644 arch/mips/vdso/vgettimeofday.c
delete mode 100644 arch/x86/entry/vsyscall/vsyscall_gtod.c
create mode 100644 arch/x86/include/asm/mshyperv-tsc.h
create mode 100644 arch/x86/include/asm/vdso/gettimeofday.h
create mode 100644 arch/x86/include/asm/vdso/vsyscall.h
create mode 100644 include/asm-generic/vdso/vsyscall.h
create mode 100644 include/linux/hrtimer_defs.h
create mode 100644 include/vdso/datapage.h
create mode 100644 include/vdso/helpers.h
create mode 100644 include/vdso/vsyscall.h
create mode 100644 kernel/vdso/Makefile
create mode 100644 kernel/vdso/vsyscall.c
create mode 100644 lib/vdso/Kconfig
create mode 100644 lib/vdso/Makefile
create mode 100644 lib/vdso/gettimeofday.c
create mode 100644 tools/testing/selftests/vDSO/vdso_full_test.c
--
2.21.0
This adds the pidfd_open() syscall. It allows a caller to retrieve pollable
pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
process that is created via traditional fork()/clone() calls that is only
referenced by a PID:
int pidfd = pidfd_open(1234, 0);
ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0);
With the introduction of pidfds through CLONE_PIDFD it is possible to
created pidfds at process creation time.
However, a lot of processes get created with traditional PID-based calls
such as fork() or clone() (without CLONE_PIDFD). For these processes a
caller can currently not create a pollable pidfd. This is a problem for
Android's low memory killer (LMK) and service managers such as systemd.
Both are examples of tools that want to make use of pidfds to get reliable
notification of process exit for non-parents (pidfd polling) and race-free
signal sending (pidfd_send_signal()). They intend to switch to this API for
process supervision/management as soon as possible. Having no way to get
pollable pidfds from PID-only processes is one of the biggest blockers for
them in adopting this api. With pidfd_open() making it possible to retrieve
pidfds for PID-based processes we enable them to adopt this api.
In line with Arnd's recent changes to consolidate syscall numbers across
architectures, I have added the pidfd_open() syscall to all architectures
at the same time.
Signed-off-by: Christian Brauner <christian(a)brauner.io>
Reviewed-by: Oleg Nesterov <oleg(a)redhat.com>
Acked-by: Arnd Bergmann <arnd(a)arndb.de>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Joel Fernandes (Google) <joel(a)joelfernandes.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Jann Horn <jannh(a)google.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Andy Lutomirsky <luto(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Aleksa Sarai <cyphar(a)cyphar.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: linux-api(a)vger.kernel.org
---
v1:
- kbuild test robot <lkp(a)intel.com>:
- add missing entry for pidfd_open to arch/arm/tools/syscall.tbl
- Oleg Nesterov <oleg(a)redhat.com>:
- use simpler thread-group leader check
v2:
- Oleg Nesterov <oleg(a)redhat.com>:
- avoid using additional variable
- remove unneeded comment
- Arnd Bergmann <arnd(a)arndb.de>:
- switch from 428 to 434 since the new mount api has taken it
- bump syscall numbers in arch/arm64/include/asm/unistd.h
- Joel Fernandes (Google) <joel(a)joelfernandes.org>:
- switch from ESRCH to EINVAL when the passed-in pid does not refer to a
thread-group leader
- Christian Brauner <christian(a)brauner.io>:
- rebase on v5.2-rc1
- adapt syscall number to account for new mount api syscalls
v3:
- Arnd Bergmann <arnd(a)arndb.de>:
- add missing syscall entries for mips-o32 and mips-n64
---
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/ia64/kernel/syscalls/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
include/linux/pid.h | 1 +
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 4 +-
kernel/fork.c | 2 +-
kernel/pid.c | 43 +++++++++++++++++++++
23 files changed, 68 insertions(+), 3 deletions(-)
diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index 9e7704e44f6d..1db9bbcfb84e 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -473,3 +473,4 @@
541 common fsconfig sys_fsconfig
542 common fsmount sys_fsmount
543 common fspick sys_fspick
+544 common pidfd_open sys_pidfd_open
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index aaf479a9e92d..81e6e1817c45 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -447,3 +447,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 70e6882853c0..e8f7d95a1481 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -44,7 +44,7 @@
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)
-#define __NR_compat_syscalls 434
+#define __NR_compat_syscalls 435
#endif
#define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index c39e90600bb3..7a3158ccd68e 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -886,6 +886,8 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
__SYSCALL(__NR_fsmount, sys_fsmount)
#define __NR_fspick 433
__SYSCALL(__NR_fspick, sys_fspick)
+#define __NR_pidfd_open 434
+__SYSCALL(__NR_pidfd_open, sys_pidfd_open)
/*
* Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index e01df3f2f80d..ecc44926737b 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -354,3 +354,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 7e3d0734b2f3..9a3eb2558568 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -433,3 +433,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 26339e417695..ad706f83c755 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -439,3 +439,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 0e2dd68ade57..97035e19ad03 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -372,3 +372,4 @@
431 n32 fsconfig sys_fsconfig
432 n32 fsmount sys_fsmount
433 n32 fspick sys_fspick
+434 n32 pidfd_open sys_pidfd_open
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 5eebfa0d155c..d7292722d3b0 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -348,3 +348,4 @@
431 n64 fsconfig sys_fsconfig
432 n64 fsmount sys_fsmount
433 n64 fspick sys_fspick
+434 n64 pidfd_open sys_pidfd_open
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 3cc1374e02d0..dba084c92f14 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -421,3 +421,4 @@
431 o32 fsconfig sys_fsconfig
432 o32 fsmount sys_fsmount
433 o32 fspick sys_fspick
+434 o32 pidfd_open sys_pidfd_open
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index c9e377d59232..5022b9e179c2 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -430,3 +430,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 103655d84b4b..f2c3bda2d39f 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -515,3 +515,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index e822b2964a83..6ebacfeaf853 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -436,3 +436,4 @@
431 common fsconfig sys_fsconfig sys_fsconfig
432 common fsmount sys_fsmount sys_fsmount
433 common fspick sys_fspick sys_fspick
+434 common pidfd_open sys_pidfd_open sys_pidfd_open
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 016a727d4357..834c9c7d79fa 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -436,3 +436,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index e047480b1605..c58e71f21129 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -479,3 +479,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ad968b7bac72..43e4429a5272 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -438,3 +438,4 @@
431 i386 fsconfig sys_fsconfig __ia32_sys_fsconfig
432 i386 fsmount sys_fsmount __ia32_sys_fsmount
433 i386 fspick sys_fspick __ia32_sys_fspick
+434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index b4e6f9e6204a..1bee0a77fdd3 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -355,6 +355,7 @@
431 common fsconfig __x64_sys_fsconfig
432 common fsmount __x64_sys_fsmount
433 common fspick __x64_sys_fspick
+434 common pidfd_open __x64_sys_pidfd_open
#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 5fa0ee1c8e00..782b81945ccc 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -404,3 +404,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 3c8ef5a199ca..c938a92eab99 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -67,6 +67,7 @@ struct pid
extern struct pid init_struct_pid;
extern const struct file_operations pidfd_fops;
+extern int pidfd_create(struct pid *pid);
static inline struct pid *get_pid(struct pid *pid)
{
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e2870fe1be5b..989055e0b501 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -929,6 +929,7 @@ asmlinkage long sys_clock_adjtime32(clockid_t which_clock,
struct old_timex32 __user *tx);
asmlinkage long sys_syncfs(int fd);
asmlinkage long sys_setns(int fd, int nstype);
+asmlinkage long sys_pidfd_open(pid_t pid, unsigned int flags);
asmlinkage long sys_sendmmsg(int fd, struct mmsghdr __user *msg,
unsigned int vlen, unsigned flags);
asmlinkage long sys_process_vm_readv(pid_t pid,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index a87904daf103..e5684a4512c0 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -844,9 +844,11 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
__SYSCALL(__NR_fsmount, sys_fsmount)
#define __NR_fspick 433
__SYSCALL(__NR_fspick, sys_fspick)
+#define __NR_pidfd_open 434
+__SYSCALL(__NR_pidfd_open, sys_pidfd_open)
#undef __NR_syscalls
-#define __NR_syscalls 434
+#define __NR_syscalls 435
/*
* 32 bit systems traditionally used different
diff --git a/kernel/fork.c b/kernel/fork.c
index b4cba953040a..c3df226f47a1 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1724,7 +1724,7 @@ const struct file_operations pidfd_fops = {
* Return: On success, a cloexec pidfd is returned.
* On error, a negative errno number will be returned.
*/
-static int pidfd_create(struct pid *pid)
+int pidfd_create(struct pid *pid)
{
int fd;
diff --git a/kernel/pid.c b/kernel/pid.c
index 89548d35eefb..8fc9d94f6ac1 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -37,6 +37,7 @@
#include <linux/syscalls.h>
#include <linux/proc_ns.h>
#include <linux/proc_fs.h>
+#include <linux/sched/signal.h>
#include <linux/sched/task.h>
#include <linux/idr.h>
@@ -450,6 +451,48 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
return idr_get_next(&ns->idr, &nr);
}
+/**
+ * pidfd_open() - Open new pid file descriptor.
+ *
+ * @pid: pid for which to retrieve a pidfd
+ * @flags: flags to pass
+ *
+ * This creates a new pid file descriptor with the O_CLOEXEC flag set for
+ * the process identified by @pid. Currently, the process identified by
+ * @pid must be a thread-group leader. This restriction currently exists
+ * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot
+ * be used with CLONE_THREAD) and pidfd polling (only supports thread group
+ * leaders).
+ *
+ * Return: On success, a cloexec pidfd is returned.
+ * On error, a negative errno number will be returned.
+ */
+SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
+{
+ int fd, ret;
+ struct pid *p;
+
+ if (flags)
+ return -EINVAL;
+
+ if (pid <= 0)
+ return -EINVAL;
+
+ p = find_get_pid(pid);
+ if (!p)
+ return -ESRCH;
+
+ ret = 0;
+ rcu_read_lock();
+ if (!pid_task(p, PIDTYPE_TGID))
+ ret = -EINVAL;
+ rcu_read_unlock();
+
+ fd = ret ?: pidfd_create(p);
+ put_pid(p);
+ return fd;
+}
+
void __init pid_idr_init(void)
{
/* Verify no one has done anything silly: */
--
2.21.0
## TLDR
A quick follow up to yesterday's revision. I got some feedback that I
wanted to incorporate before anyone else read the update. For this
reason, I will leave a TLDR of the biggest changes since v2.
Biggest things to look out for (since v2):
- KUnit core now outputs results in TAP14.
- Heavily reworked tools/testing/kunit/kunit.py
- Changed how parsing works.
- Added testing.
- Greg, Logan, you might want to re-review this.
- Added documentation on how to use KUnit on non-UML kernels. You can
see the docs rendered here[1].
There is still some discussion going on on the [PATCH v2 00/17] thread,
but I wanted to get some of these updates out before they got too stale
(and too difficult for me to keep track of). I hope no one minds.
## Background
This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.
Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
(however, KUnit still allows you to run tests on test machines or in VMs
if you want) and does not require tests to be written in userspace
running on a host kernel. Additionally, KUnit is fast: From invocation
to completion KUnit can run several dozen tests in under a second.
Currently, the entire KUnit test suite for KUnit runs in under a second
from the initial invocation (build time excluded).
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.
## What's so special about unit testing?
A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.
## Is KUnit trying to replace other testing frameworks for the kernel?
No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.
## More information on KUnit
There is a bunch of documentation near the end of this patch set that
describes how to use KUnit and best practices for writing unit tests.
For convenience I am hosting the compiled docs here[2].
Additionally for convenience, I have applied these patches to a
branch[3].
The repo may be cloned with:
git clone https://kunit.googlesource.com/linux
This patchset is on the kunit/rfc/v5.1/v4 branch.
## Changes Since Last Version
As I mentioned above, there are a significant number of updates since
v2:
- Converted KUnit core to print test results in TAP14 format as
suggested by Greg and Frank.
- Heavily reworked tools/testing/kunit/kunit.py
- Changed how parsing works.
- Added testing.
- Added documentation on how to use KUnit on non-UML kernels. You can
see the docs rendered here[1].
- Added a new set of EXPECTs and ASSERTs for pointer comparison.
- Removed more function indirection as suggested by Logan.
- Added a new patch that adds `kunit_try_catch_throw` to objtool's
noreturn list.
- Fixed a number of minorish issues pointed out by Shuah, Masahiro, and
kbuild bot.
Nevertheless, there are only a couple of minor updates since v3:
- Added more context to the changelog on the objtool patch, as per
Peter's request.
- Moved all KUnit documentation under the Documentation/dev-tools/
directory as per Jonathan's suggestion.
[1] https://google.github.io/kunit-docs/third_party/kernel/docs/usage.html#kuni…
[2] https://google.github.io/kunit-docs/third_party/kernel/docs/
[3] https://kunit.googlesource.com/linux/+/kunit/rfc/v5.1/v4
--
2.21.0.1020.gf2820cf01a-goog
Updates to UDP GSO selftests ot optionally stress test CMSG
subsytem, and report the reliability and performance of both
TX Timestamping and ZEROCOPY messages.
Fred Klassen (3):
net/udpgso_bench_tx: options to exercise TX CMSG
net/udpgso_bench.sh add UDP GSO audit tests
net/udpgso_bench.sh test fails on error
tools/testing/selftests/net/udpgso_bench.sh | 51 +++-
tools/testing/selftests/net/udpgso_bench_tx.c | 324 ++++++++++++++++++++++++--
2 files changed, 357 insertions(+), 18 deletions(-)
--
2.11.0
This is another resend as there has been no feedback since v4.
Seems Jon has been MIA this past cycle so hopefully he appears on the
list soon.
I've addressed the feedback so far and rebased on the latest kernel
and would like this to be considered for merging this cycle.
The only outstanding issue I know of is that it still will not work
with IDT hardware, but ntb_transport doesn't work with IDT hardware
and there is still no sensible common infrastructure to support
ntb_peer_mw_set_trans(). Thus, I decline to consider that complication
in this patchset. However, I'll be happy to review work that adds this
feature in the future.
Also, as the port number and resource index stuff is a bit complicated,
I made a quick out of tree test fixture to ensure it's correct[1]. As
an excerise I also wrote some test code[2] using the upcomming KUnit
feature.
Logan
[1] https://repl.it/repls/ExcitingPresentFile
[2] https://github.com/sbates130272/linux-p2pmem/commits/ntb_kunit
--
Changes in v5:
* Rebased onto v5.2-rc1 (plus the patches in ntb-next)
--
Changes in v4:
* Rebased onto v5.1-rc6 (No changes)
* Numerous grammar and spelling mistakes spotted by Bjorn
--
Changes in v3:
* Rebased onto v5.1-rc1 (Dropped the first two patches as they have
been merged, and cleaned up some minor conflicts in the PCI tree)
* Added a new patch (#3) to calculate logical port numbers that
are port numbers from 0 to (number of ports - 1). This is
then used in ntb_peer_resource_idx() to fix the issues brought
up by Serge.
* Fixed missing __iomem and iowrite calls (as noticed by Serge)
* Added patch 10 which describes ntb_msi_test in the documentation
file (as requested by Serge)
* A couple other minor nits and documentation fixes
--
Changes in v2:
* Cleaned up the changes in intel_irq_remapping.c to make them
less confusing and add a comment. (Per discussion with Jacob and
Joerg)
* Fixed a nit from Bjorn and collected his Ack
* Added a Kconfig dependancy on CONFIG_PCI_MSI for CONFIG_NTB_MSI
as the Kbuild robot hit a random config that didn't build
without it.
* Worked in a callback for when the MSI descriptor changes so that
the clients can resend the new address and data values to the peer.
On my test system this was never necessary, but there may be
other platforms where this can occur. I tested this by hacking
in a path to rewrite the MSI descriptor when I change the cpu
affinity of an IRQ. There's a bit of uncertainty over the latency
of the change, but without hardware this can acctually occur on
we can't test this. This was the result of a discussion with Dave.
--
This patch series adds optional support for using MSI interrupts instead
of NTB doorbells in ntb_transport. This is desirable seeing doorbells on
current hardware are quite slow and therefore switching to MSI interrupts
provides a significant performance gain. On switchtec hardware, a simple
apples-to-apples comparison shows ntb_netdev/iperf numbers going from
3.88Gb/s to 14.1Gb/s when switching to MSI interrupts.
To do this, a couple changes are required outside of the NTB tree:
1) The IOMMU must know to accept MSI requests from aliased bused numbers
seeing NTB hardware typically sends proxied request IDs through
additional requester IDs. The first patch in this series adds support
for the Intel IOMMU. A quirk to add these aliases for switchtec hardware
was already accepted. See commit ad281ecf1c7d ("PCI: Add DMA alias quirk
for Microsemi Switchtec NTB") for a description of NTB proxy IDs and why
this is necessary.
2) NTB transport (and other clients) may often need more MSI interrupts
than the NTB hardware actually advertises support for. However, seeing
these interrupts will not be triggered by the hardware but through an
NTB memory window, the hardware does not actually need support or need
to know about them. Therefore we add the concept of Virtual MSI
interrupts which are allocated just like any other MSI interrupt but
are not programmed into the hardware's MSI table. This is done in
Patch 2 and then made use of in Patch 3.
The remaining patches in this series add a library for dealing with MSI
interrupts, a test client and finally support in ntb_transport.
The series is based off of v5.1-rc6 plus the patches in ntb-next.
A git repo is available here:
https://github.com/sbates130272/linux-p2pmem/ ntb_transport_msi_v4
Thanks,
Logan
--
Logan Gunthorpe (10):
PCI/MSI: Support allocating virtual MSI interrupts
PCI/switchtec: Add module parameter to request more interrupts
NTB: Introduce helper functions to calculate logical port number
NTB: Introduce functions to calculate multi-port resource index
NTB: Rename ntb.c to support multiple source files in the module
NTB: Introduce MSI library
NTB: Introduce NTB MSI Test Client
NTB: Add ntb_msi_test support to ntb_test
NTB: Add MSI interrupt support to ntb_transport
NTB: Describe the ntb_msi_test client in the documentation.
Documentation/ntb.txt | 27 ++
drivers/ntb/Kconfig | 11 +
drivers/ntb/Makefile | 3 +
drivers/ntb/{ntb.c => core.c} | 0
drivers/ntb/msi.c | 415 +++++++++++++++++++++++
drivers/ntb/ntb_transport.c | 169 ++++++++-
drivers/ntb/test/Kconfig | 9 +
drivers/ntb/test/Makefile | 1 +
drivers/ntb/test/ntb_msi_test.c | 433 ++++++++++++++++++++++++
drivers/pci/msi.c | 54 ++-
drivers/pci/switch/switchtec.c | 12 +-
include/linux/msi.h | 8 +
include/linux/ntb.h | 196 ++++++++++-
include/linux/pci.h | 9 +
tools/testing/selftests/ntb/ntb_test.sh | 54 ++-
15 files changed, 1386 insertions(+), 15 deletions(-)
rename drivers/ntb/{ntb.c => core.c} (100%)
create mode 100644 drivers/ntb/msi.c
create mode 100644 drivers/ntb/test/ntb_msi_test.c
--
2.20.1
clock_getres in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().
In particular, posix_get_hrtimer_res() does:
sec = 0;
ns = hrtimer_resolution;
and hrtimer_resolution depends on the enablement of the high
resolution timers that can happen either at compile or at run time.
A possible fix is to change the vdso implementation of clock_getres,
keeping a copy of hrtimer_resolution in vdso data and using that
directly [1].
This patchset implements the proposed fix for arm64, powerpc, s390,
nds32 and adds a test to verify that the syscall and the vdso library
implementation of clock_getres return the same values.
Even if these patches are unified by the same topic, there is no
dependency between them, hence they can be merged singularly by each
arch maintainer.
Note: arm64 and nds32 respective fixes have been merged in 5.2-rc1,
hence they have been removed from this series.
[1] https://marc.info/?l=linux-arm-kernel&m=155110381930196&w=2
Changes:
--------
v4:
- Address review comments.
v3:
- Rebased on 5.2-rc1.
- Address review comments.
v2:
- Rebased on 5.1-rc5.
- Addressed review comments.
Cc: Christophe Leroy <christophe.leroy(a)c-s.fr>
Cc: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
Cc: Paul Mackerras <paulus(a)samba.org>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Vincenzo Frascino (3):
powerpc: Fix vDSO clock_getres()
s390: Fix vDSO clock_getres()
kselftest: Extend vDSO selftest to clock_getres
arch/powerpc/include/asm/vdso_datapage.h | 2 +
arch/powerpc/kernel/asm-offsets.c | 2 +-
arch/powerpc/kernel/time.c | 1 +
arch/powerpc/kernel/vdso32/gettimeofday.S | 7 +-
arch/powerpc/kernel/vdso64/gettimeofday.S | 7 +-
arch/s390/include/asm/vdso.h | 1 +
arch/s390/kernel/asm-offsets.c | 2 +-
arch/s390/kernel/time.c | 1 +
arch/s390/kernel/vdso32/clock_getres.S | 12 +-
arch/s390/kernel/vdso64/clock_getres.S | 10 +-
tools/testing/selftests/vDSO/Makefile | 2 +
.../selftests/vDSO/vdso_clock_getres.c | 124 ++++++++++++++++++
12 files changed, 155 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/vDSO/vdso_clock_getres.c
--
2.21.0
Use udf as the guard instruction for the restartable sequence abort
handler.
Previously, the chosen signature was not a valid instruction, based
on the assumption that it could always sit in a literal pool. However,
there are compilation environments in which literal pools are not
availble, for instance execute-only code. Therefore, we need to
choose a signature value that is also a valid instruction.
Handle compiling with -mbig-endian on ARMv6+, which generates binaries
with mixed code vs data endianness (little endian code, big endian
data).
Else mismatch between code endianness for the generated signatures and
data endianness for the RSEQ_SIG parameter passed to the rseq
registration will trigger application segmentation faults when the
kernel try to abort rseq critical sections.
Prior to ARMv6, -mbig-endian generates big-endian code and data, so
endianness should not be reversed in that case.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
CC: Peter Zijlstra <peterz(a)infradead.org>
CC: Thomas Gleixner <tglx(a)linutronix.de>
CC: Joel Fernandes <joelaf(a)google.com>
CC: Catalin Marinas <catalin.marinas(a)arm.com>
CC: Dave Watson <davejwatson(a)fb.com>
CC: Will Deacon <will.deacon(a)arm.com>
CC: Shuah Khan <shuah(a)kernel.org>
CC: Andi Kleen <andi(a)firstfloor.org>
CC: linux-kselftest(a)vger.kernel.org
CC: "H . Peter Anvin" <hpa(a)zytor.com>
CC: Chris Lameter <cl(a)linux.com>
CC: Russell King <linux(a)arm.linux.org.uk>
CC: Michael Kerrisk <mtk.manpages(a)gmail.com>
CC: "Paul E . McKenney" <paulmck(a)linux.vnet.ibm.com>
CC: Paul Turner <pjt(a)google.com>
CC: Boqun Feng <boqun.feng(a)gmail.com>
CC: Josh Triplett <josh(a)joshtriplett.org>
CC: Steven Rostedt <rostedt(a)goodmis.org>
CC: Ben Maurer <bmaurer(a)fb.com>
CC: linux-api(a)vger.kernel.org
CC: Andy Lutomirski <luto(a)amacapital.net>
CC: Andrew Morton <akpm(a)linux-foundation.org>
CC: Linus Torvalds <torvalds(a)linux-foundation.org>
---
tools/testing/selftests/rseq/rseq-arm.h | 52 +++++++++++++++++++++++++++++++--
1 file changed, 50 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
index 5f262c54364f..e8ccfc37d685 100644
--- a/tools/testing/selftests/rseq/rseq-arm.h
+++ b/tools/testing/selftests/rseq/rseq-arm.h
@@ -5,7 +5,54 @@
* (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
*/
-#define RSEQ_SIG 0x53053053
+/*
+ * RSEQ_SIG uses the udf A32 instruction with an uncommon immediate operand
+ * value 0x5de3. This traps if user-space reaches this instruction by mistake,
+ * and the uncommon operand ensures the kernel does not move the instruction
+ * pointer to attacker-controlled code on rseq abort.
+ *
+ * The instruction pattern in the A32 instruction set is:
+ *
+ * e7f5def3 udf #24035 ; 0x5de3
+ *
+ * This translates to the following instruction pattern in the T16 instruction
+ * set:
+ *
+ * little endian:
+ * def3 udf #243 ; 0xf3
+ * e7f5 b.n <7f5>
+ *
+ * pre-ARMv6 big endian code:
+ * e7f5 b.n <7f5>
+ * def3 udf #243 ; 0xf3
+ *
+ * ARMv6+ -mbig-endian generates mixed endianness code vs data: little-endian
+ * code and big-endian data. Ensure the RSEQ_SIG data signature matches code
+ * endianness. Prior to ARMv6, -mbig-endian generates big-endian code and data
+ * (which match), so there is no need to reverse the endianness of the data
+ * representation of the signature. However, the choice between BE32 and BE8
+ * is done by the linker, so we cannot know whether code and data endianness
+ * will be mixed before the linker is invoked.
+ */
+
+#define RSEQ_SIG_CODE 0xe7f5def3
+
+#ifndef __ASSEMBLER__
+
+#define RSEQ_SIG_DATA \
+ ({ \
+ int sig; \
+ asm volatile ( "b 2f\n\t" \
+ "1: .inst " __rseq_str(RSEQ_SIG_CODE) "\n\t" \
+ "2:\n\t" \
+ "ldr %[sig], 1b\n\t" \
+ : [sig] "=r" (sig)); \
+ sig; \
+ })
+
+#define RSEQ_SIG RSEQ_SIG_DATA
+
+#endif
#define rseq_smp_mb() __asm__ __volatile__ ("dmb" ::: "memory", "cc")
#define rseq_smp_rmb() __asm__ __volatile__ ("dmb" ::: "memory", "cc")
@@ -78,7 +125,8 @@ do { \
__rseq_str(table_label) ":\n\t" \
".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
- ".word " __rseq_str(RSEQ_SIG) "\n\t" \
+ ".arm\n\t" \
+ ".inst " __rseq_str(RSEQ_SIG_CODE) "\n\t" \
__rseq_str(label) ":\n\t" \
teardown \
"b %l[" __rseq_str(abort_label) "]\n\t"
--
2.11.0