The perf subsystem today unifies various tracing and monitoring
features, from both software and hardware. One benefit of the perf
subsystem is automatically inheriting events to child tasks, which
enables process-wide events monitoring with low overheads. By default
perf events are non-intrusive, not affecting behaviour of the tasks
being monitored.
For certain use-cases, however, it makes sense to leverage the
generality of the perf events subsystem and optionally allow the tasks
being monitored to receive signals on events they are interested in.
This patch series adds the option to synchronously signal user space on
events.
To better support process-wide synchronous self-monitoring, without
events propagating to children that do not share the current process's
shared environment, two pre-requisite patches are added to optionally
restrict inheritance to CLONE_THREAD, and remove events on exec (without
affecting the parent).
Examples how to use these features can be found in the tests added at
the end of the series. In addition to the tests added, the series has
also been subjected to syzkaller fuzzing (focus on 'kernel/events/'
coverage).
Motivation and Example Uses
---------------------------
1. Our immediate motivation is low-overhead sampling-based race
detection for user space [1]. By using perf_event_open() at
process initialization, we can create hardware
breakpoint/watchpoint events that are propagated automatically
to all threads in a process. As far as we are aware, today no
existing kernel facility (such as ptrace) allows us to set up
process-wide watchpoints with minimal overheads (that are
comparable to mprotect() of whole pages).
2. Other low-overhead error detectors that rely on detecting
accesses to certain memory locations or code, process-wide and
also only in a specific set of subtasks or threads.
[1] https://llvm.org/devmtg/2020-09/slides/Morehouse-GWP-Tsan.pdf
Other ideas for use-cases we found interesting, but should only
illustrate the range of potential to further motivate the utility (we're
sure there are more):
3. Code hot patching without full stop-the-world. Specifically, by
setting a code breakpoint to entry to the patched routine, then
send signals to threads and check that they are not in the
routine, but without stopping them further. If any of the
threads will enter the routine, it will receive SIGTRAP and
pause.
4. Safepoints without mprotect(). Some Java implementations use
"load from a known memory location" as a safepoint. When threads
need to be stopped, the page containing the location is
mprotect()ed and threads get a signal. This could be replaced with
a watchpoint, which does not require a whole page nor DTLB
shootdowns.
5. Threads receiving signals on performance events to
throttle/unthrottle themselves.
6. Tracking data flow globally.
Changelog
---------
v4:
* Fix for parent and child racing to exit in sync_child_event().
* Fix race between irq_work running and task's sighand being released by
release_task().
* Generalize setting si_perf and si_addr independent of event type;
introduces perf_event_attr::sig_data, which can be set by user space
to be propagated to si_perf.
* Warning in perf_sigtrap() if ctx->task and current mismatch; we expect
this on architectures that do not properly implement
arch_irq_work_raise().
* Require events that want sigtrap to be associated with a task.
* Dropped "perf: Add breakpoint information to siginfo on SIGTRAP"
in favor of more generic solution (perf_event_attr::sig_data).
v3:
* Add patch "perf: Rework perf_event_exit_event()" to beginning of
series, courtesy of Peter Zijlstra.
* Rework "perf: Add support for event removal on exec" based on
the added "perf: Rework perf_event_exit_event()".
* Fix kselftests to work with more recent libc, due to the way it forces
using the kernel's own siginfo_t.
* Add basic perf-tool built-in test.
v2/RFC: https://lkml.kernel.org/r/20210310104139.679618-1-elver@google.com
* Patch "Support only inheriting events if cloned with CLONE_THREAD"
added to series.
* Patch "Add support for event removal on exec" added to series.
* Patch "Add kselftest for process-wide sigtrap handling" added to
series.
* Patch "Add kselftest for remove_on_exec" added to series.
* Implicitly restrict inheriting events if sigtrap, but the child was
cloned with CLONE_CLEAR_SIGHAND, because it is not generally safe if
the child cleared all signal handlers to continue sending SIGTRAP.
* Various minor fixes (see details in patches).
v1/RFC: https://lkml.kernel.org/r/20210223143426.2412737-1-elver@google.com
Pre-series: The discussion at [2] led to the changes in this series. The
approach taken in "Add support for SIGTRAP on perf events" to trigger
the signal was suggested by Peter Zijlstra in [3].
[2] https://lore.kernel.org/lkml/CACT4Y+YPrXGw+AtESxAgPyZ84TYkNZdP0xpocX2jwVAbZ…
[3] https://lore.kernel.org/lkml/YBv3rAT566k+6zjg@hirez.programming.kicks-ass.n…
Marco Elver (9):
perf: Apply PERF_EVENT_IOC_MODIFY_ATTRIBUTES to children
perf: Support only inheriting events if cloned with CLONE_THREAD
perf: Add support for event removal on exec
signal: Introduce TRAP_PERF si_code and si_perf to siginfo
perf: Add support for SIGTRAP on perf events
selftests/perf_events: Add kselftest for process-wide sigtrap handling
selftests/perf_events: Add kselftest for remove_on_exec
tools headers uapi: Sync tools/include/uapi/linux/perf_event.h
perf test: Add basic stress test for sigtrap handling
Peter Zijlstra (1):
perf: Rework perf_event_exit_event()
arch/m68k/kernel/signal.c | 3 +
arch/x86/kernel/signal_compat.c | 5 +-
fs/signalfd.c | 4 +
include/linux/compat.h | 2 +
include/linux/perf_event.h | 9 +-
include/linux/signal.h | 1 +
include/uapi/asm-generic/siginfo.h | 6 +-
include/uapi/linux/perf_event.h | 12 +-
include/uapi/linux/signalfd.h | 4 +-
kernel/events/core.c | 302 +++++++++++++-----
kernel/fork.c | 2 +-
kernel/signal.c | 11 +
tools/include/uapi/linux/perf_event.h | 12 +-
tools/perf/tests/Build | 1 +
tools/perf/tests/builtin-test.c | 5 +
tools/perf/tests/sigtrap.c | 150 +++++++++
tools/perf/tests/tests.h | 1 +
.../testing/selftests/perf_events/.gitignore | 3 +
tools/testing/selftests/perf_events/Makefile | 6 +
tools/testing/selftests/perf_events/config | 1 +
.../selftests/perf_events/remove_on_exec.c | 260 +++++++++++++++
tools/testing/selftests/perf_events/settings | 1 +
.../selftests/perf_events/sigtrap_threads.c | 210 ++++++++++++
23 files changed, 924 insertions(+), 87 deletions(-)
create mode 100644 tools/perf/tests/sigtrap.c
create mode 100644 tools/testing/selftests/perf_events/.gitignore
create mode 100644 tools/testing/selftests/perf_events/Makefile
create mode 100644 tools/testing/selftests/perf_events/config
create mode 100644 tools/testing/selftests/perf_events/remove_on_exec.c
create mode 100644 tools/testing/selftests/perf_events/settings
create mode 100644 tools/testing/selftests/perf_events/sigtrap_threads.c
--
2.31.0.208.g409f899ff0-goog
From: SeongJae Park <sjpark(a)amazon.de>
When running a test program, 'run_one()' checks if the program has the
execution permission and fails if it doesn't. However, it's easy to
mistakenly missing the permission, as some common tools like 'diff'
don't support the permission change well[1]. Compared to that, making
mistakes in the test program's path would only rare, as those are
explicitly listed in 'TEST_PROGS'. Therefore, it might make more sense
to resolve the situation on our own and run the program.
For the reason, this commit makes the test program runner function to
still print the warning message but try parsing the interpreter of the
program and explicitly run it with the interpreter, in the case.
[1] https://lore.kernel.org/mm-commits/YRJisBs9AunccCD4@kroah.com/
Suggested-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: SeongJae Park <sjpark(a)amazon.de>
---
Changes from v1
(https://lore.kernel.org/linux-kselftest/20210810140459.23990-1-sj38.park@gm…)
- Parse and use the interpreter instead of changing the file
tools/testing/selftests/kselftest/runner.sh | 28 +++++++++++++--------
1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index cc9c846585f0..a9ba782d8ca0 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -33,9 +33,9 @@ tap_timeout()
{
# Make sure tests will time out if utility is available.
if [ -x /usr/bin/timeout ] ; then
- /usr/bin/timeout --foreground "$kselftest_timeout" "$1"
+ /usr/bin/timeout --foreground "$kselftest_timeout" $1
else
- "$1"
+ $1
fi
}
@@ -65,17 +65,25 @@ run_one()
TEST_HDR_MSG="selftests: $DIR: $BASENAME_TEST"
echo "# $TEST_HDR_MSG"
- if [ ! -x "$TEST" ]; then
- echo -n "# Warning: file $TEST is "
- if [ ! -e "$TEST" ]; then
- echo "missing!"
- else
- echo "not executable, correct this."
- fi
+ if [ ! -e "$TEST" ]; then
+ echo "# Warning: file $TEST is missing!"
echo "not ok $test_num $TEST_HDR_MSG"
else
+ cmd="./$BASENAME_TEST"
+ if [ ! -x "$TEST" ]; then
+ echo "# Warning: file $TEST is not executable"
+
+ if [ $(head -n 1 "$TEST" | cut -c -2) = "#!" ]
+ then
+ interpreter=$(head -n 1 "$TEST" | cut -c 3-)
+ cmd="$interpreter ./$BASENAME_TEST"
+ else
+ echo "not ok $test_num $TEST_HDR_MSG"
+ return
+ fi
+ fi
cd `dirname $TEST` > /dev/null
- ((((( tap_timeout ./$BASENAME_TEST 2>&1; echo $? >&3) |
+ ((((( tap_timeout "$cmd" 2>&1; echo $? >&3) |
tap_prefix >&4) 3>&1) |
(read xs; exit $xs)) 4>>"$logfile" &&
echo "ok $test_num $TEST_HDR_MSG") ||
--
2.17.1
Real-time setups try hard to ensure proper isolation between time
critical applications and e.g. network processing performed by the
network stack in softirq and RPS is used to move the softirq
activity away from the isolated core.
If the network configuration is dynamic, with netns and devices
routinely created at run-time, enforcing the correct RPS setting
on each newly created device allowing to transient bad configuration
became complex.
These series try to address the above, introducing a new
sysctl knob: rps_default_mask. The new sysctl entry allows
configuring a systemwide RPS mask, to be enforced since receive
queue creation time without any fourther per device configuration
required.
Additionally, a simple self-test is introduced to check the
rps_default_mask behavior.
v1 -> v2:
- fix sparse warning in patch 2/3
Paolo Abeni (3):
net/sysctl: factor-out netdev_rx_queue_set_rps_mask() helper
net/core: introduce default_rps_mask netns attribute
self-tests: introduce self-tests for RPS default mask
Documentation/admin-guide/sysctl/net.rst | 6 ++
include/linux/netdevice.h | 1 +
net/core/net-sysfs.c | 73 +++++++++++--------
net/core/sysctl_net_core.c | 58 +++++++++++++++
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/config | 3 +
.../testing/selftests/net/rps_default_mask.sh | 57 +++++++++++++++
7 files changed, 169 insertions(+), 30 deletions(-)
create mode 100755 tools/testing/selftests/net/rps_default_mask.sh
--
2.26.2
From: Vincent Cheng <vincent.cheng.xh(a)renesas.com>
This series adds adjust phase to the PTP Hardware Clock device interface.
Some PTP hardware clocks have a write phase mode that has
a built-in hardware filtering capability. The write phase mode
utilizes a phase offset control word instead of a frequency offset
control word. Add adjust phase function to take advantage of this
capability.
Changes since v1:
- As suggested by Richard Cochran:
1. ops->adjphase is new so need to check for non-null function pointer.
2. Kernel coding style uses lower_case_underscores.
3. Use existing PTP clock API for delayed worker.
Vincent Cheng (3):
ptp: Add adjphase function to support phase offset control.
ptp: Add adjust_phase to ptp_clock_caps capability.
ptp: ptp_clockmatrix: Add adjphase() to support PHC write phase mode.
drivers/ptp/ptp_chardev.c | 1 +
drivers/ptp/ptp_clock.c | 3 ++
drivers/ptp/ptp_clockmatrix.c | 92 +++++++++++++++++++++++++++++++++++
drivers/ptp/ptp_clockmatrix.h | 8 ++-
include/linux/ptp_clock_kernel.h | 6 ++-
include/uapi/linux/ptp_clock.h | 4 +-
tools/testing/selftests/ptp/testptp.c | 6 ++-
7 files changed, 114 insertions(+), 6 deletions(-)
--
2.7.4
Hello Juergen,
Hello All,
Since the RC1 of kernel 5.13, -smp 2 and -smp 4 don't work with a
virtual e5500 QEMU KVM-HV machine anymore. [1]
I see in the serial console, that the uImage doesn't load. I use the
following QEMU command for booting:
qemu-system-ppc64 -M ppce500 -cpu e5500 -enable-kvm -m 1024 -kernel
uImage -drive format=raw,file=MintPPC32-X5000.img,index=0,if=virtio
-netdev user,id=mynet0 -device virtio-net,netdev=mynet0 -append "rw
root=/dev/vda" -device virtio-vga -device virtio-mouse-pci -device
virtio-keyboard-pci -device pci-ohci,id=newusb -device
usb-audio,bus=newusb.0 -smp 4
The kernels boot without KVM-HV.
Summary for KVM-HV:
-smp 1 -> works
-smp 2 -> doesn't work
-smp 3 -> works
-smp 4 -> doesn't work
I used -smp 4 before the RC1 of kernel 5.13 because my FSL P5040 BookE
machine [2] has 4 cores.
Does this patch solve this issue? [3]
Thanks,
Christian
[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-May/229103.html
[2] http://wiki.amiga.org/index.php?title=X5000
[3]
https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-September/234152.html
Hi!
I would like to publish two debug features which were needed for other stuff
I work on.
One is the reworked lx-symbols script which now actually works on at least
gdb 9.1 (gdb 9.2 was reported to fail to load the debug symbols from the kernel
for some reason, not related to this patch) and upstream qemu.
The other feature is the ability to trap all guest exceptions (on SVM for now)
and see them in kvmtrace prior to potential merge to double/triple fault.
This can be very useful and I already had to manually patch KVM a few
times for this.
I will, once time permits, implement this feature on Intel as well.
V2:
* Some more refactoring and workarounds for lx-symbols script
* added KVM_GUESTDBG_BLOCKIRQ flag to enable 'block interrupts on
single step' together with KVM_CAP_SET_GUEST_DEBUG2 capability
to indicate which guest debug flags are supported.
This is a replacement for unconditional block of interrupts on single
step that was done in previous version of this patch set.
Patches to qemu to use that feature will be sent soon.
* Reworked the the 'intercept all exceptions for debug' feature according
to the review feedback:
- renamed the parameter that enables the feature and
moved it to common kvm module.
(only SVM part is currently implemented though)
- disable the feature for SEV guests as was suggested during the review
- made the vmexit table const again, as was suggested in the review as well.
V3:
* Modified a selftest to cover the KVM_GUESTDBG_BLOCKIRQ
* Rebased on kvm/queue
Best regards,
Maxim Levitsky
Maxim Levitsky (6):
KVM: SVM: split svm_handle_invalid_exit
KVM: x86: add force_intercept_exceptions_mask
KVM: SVM: implement force_intercept_exceptions_mask
scripts/gdb: rework lx-symbols gdb script
KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ
KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ
Documentation/virt/kvm/api.rst | 1 +
arch/x86/include/asm/kvm_host.h | 5 +-
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/svm/svm.c | 87 +++++++-
arch/x86/kvm/svm/svm.h | 6 +-
arch/x86/kvm/x86.c | 12 +-
arch/x86/kvm/x86.h | 2 +
kernel/module.c | 8 +-
scripts/gdb/linux/symbols.py | 203 ++++++++++++------
.../testing/selftests/kvm/x86_64/debug_regs.c | 24 ++-
10 files changed, 266 insertions(+), 83 deletions(-)
--
2.26.3
We are looking to further standardise the output format used by kernel
test frameworks like kselftest and KUnit. Thus far we have used the
TAP (Test Anything Protocol) specification, but it has been extended
in many different ways, so we would like to agree on a common "Kernel
TAP" (KTAP) format to resolve these differences. Thus, below is a
draft of a specification of KTAP. Note that this specification is
largely based on the current format of test results for KUnit tests.
Additionally, this specification was heavily inspired by the KTAP
specification draft by Tim Bird
(https://lore.kernel.org/linux-kselftest/CY4PR13MB1175B804E31E502221BC8163FD…).
However, there are some notable differences to his specification. One
such difference is the format of nested tests is more fully specified
in the following specification. However, they are specified in a way
which may not be compatible with many kselftest nested tests.
=====================
Specification of KTAP
=====================
TAP, or the Test Anything Protocol is a format for specifying test
results used by a number of projects. It's website and specification
are found at: https://testanything.org/. The Linux Kernel uses TAP
output for test results. However, KUnit (and other Kernel testing
frameworks such as kselftest) have some special needs for test results
which don't gel perfectly with the original TAP specification. Thus, a
"Kernel TAP" (KTAP) format is specified to extend and alter TAP to
support these use-cases.
KTAP Output consists of 5 major elements (all line-based):
- The version line
- Plan lines
- Test case result lines
- Diagnostic lines
- A bail out line
An important component in this specification of KTAP is the
specification of the format of nested tests. This can be found in the
section on nested tests below.
The version line
----------------
The first line of KTAP output must be the version line. As this
specification documents the first version of KTAP, the recommended
version line is "KTAP version 1". However, since all kernel testing
frameworks use TAP version lines, "TAP version 14" and "TAP version
13" are all acceptable version lines. Version lines with other
versions of TAP or KTAP will not cause the parsing of the test results
to fail but it will produce an error.
Plan lines
----------
Plan lines must follow the format of "1..N" where N is the number of
subtests. The second line of KTAP output must be a plan line, which
indicates the number of tests at the highest level, such that the
tests do not have a parent. Also, in the instance of a test having
subtests, the second line of the test after the subtest header must be
a plan line which indicates the number of subtests within that test.
Test case result lines
----------------------
Test case result lines must have the format:
<result> <number> [-] [<description>] [<directive>] [<diagnostic data>]
The result can be either "ok", which indicates the test case passed,
or "not ok", which indicates that the test case failed.
The number represents the number of the test case or suite being
performed. The first test case or suite must have the number 1 and the
number must increase by 1 for each additional test case or result at
the same level and within the same testing suite.
The "-" character is optional.
The description is a description of the test, generally the name of
the test, and can be any string of words (can't include #). The
description is optional.
The directive is used to indicate if a test was skipped. The format
for the directive is: "# SKIP [<skip_description>]". The
skip_description is optional and can be any string of words to
describe why the test was skipped. The result of the test case result
line can be either "ok" or "not ok" if the skip directive is used.
Finally, note that TAP 14 specification includes TODO directives but
these are not supported for KTAP.
Examples of test case result lines:
Test passed:
ok 1 - test_case_name
Test was skipped:
not ok 1 - test_case_name # SKIP test_case_name should be skipped
Test failed:
not_ok 1 - test_case_name
Diagnostic lines
----------------
Diagnostic lines are used for description of testing operations.
Diagnostic lines are generally formatted as "#
<diagnostic_description>", where the description can be any string.
However, in practice, diagnostic lines are all lines that don't follow
the format of any other KTAP line format. Diagnostic lines can be
anywhere in the test output after the first two lines. There are a few
special diagnostic lines. Diagnostic lines of the format "# Subtest:
<test_name>" indicate the start of a test with subtests. Also,
diagnostic lines of the format "# <test_name>: <description>" refer to
a specific test and tend to occur before the test result line of that
test but are optional.
Bail out line
-------------
A bail out line can occur anywhere in the KTAP output and will
indicate that a test has crashed. The format of a bail out line is
"Bail out! [<description>]", where the description can give
information on why the bail out occurred and can be any string.
Nested tests
------------
The new specification for KTAP will support an arbitrary number of
nested subtests. Thus, tests can now have subtests and those subtests
can have subtests. This can be useful to further categorize tests and
organize test results.
The new required format for a test with subtests consists of: a
subtest header line, a plan line, all subtests, and a final test
result line.
The first line of the test must be the subtest header line with the
format: "# Subtest: <test_name>".
The second line of the test must be the plan line, which is formatted
as "1..N", where N is the number of subtests.
Following the plan line, all lines pertaining to the subtests will follow.
Finally, the last line of the test is a final test result line with
the format: "(ok|not ok) <number> [-] [<description>] [<directive>]
[<diagnostic data>]", which follows the same format as the general
test result lines described in this section. The result line should
indicate the result of the subtests. Thus, if one of the subtests
fail, the test should fail. The description in the final test result
line should match the name of the test in the subtest header.
An example format:
KTAP version 1
1..1
# Subtest: test_suite
1..2
ok 1 - test_1
ok 2 - test_2
ok 1 - test_suite
An example format with multiple levels of nested testing:
KTAP version 1
1..1
# Subtest: test_suite
1..2
# Subtest: sub_test_suite
1..2
ok 1 - test_1
ok 2 test_2
ok 1 - sub_test_suite
ok 2 - test
ok 1 - test_suite
In the instance that the plan line is missing, the end of the test
will be denoted by the final result line containing a description that
matches the name of the test given in the subtest header. Note that
thus, if the plan line is missing and one of the subtests have a
matching name to the test suite this will cause errors.
Lastly, indentation is also recommended for improved readability.
Major differences between TAP 14 and KTAP specification
-------------------------------------------------------
Note that the major differences between TAP 14 and KTAP specification:
- yaml and json are not allowed in diagnostic messages
- TODO directive not allowed
- KTAP allows for an arbitrary number of tests to be nested with
specified nested test format
Example of KTAP
---------------
KTAP version 1
1..1
# Subtest: test_suite
1..1
# Subtest: sub_test_suite
1..2
ok 1 - test_1
ok 2 test_2
ok 1 - sub_test_suite
ok 1 - test_suite
=========================================
Note on incompatibilities with kselftests
=========================================
To my knowledge, the above specification seems to generally accept the
TAP format of many non-nested test results of kselftests.
An example of a common kselftests TAP format for non-nested test
results that are accepted by the above specification:
TAP version 13
1..2
# selftests: vDSO: vdso_test_gettimeofday
# The time is 1628024856.096879
ok 1 selftests: vDSO: vdso_test_gettimeofday
# selftests: vDSO: vdso_test_getcpu
# Could not find __vdso_getcpu
ok 2 selftests: vDSO: vdso_test_getcpu # SKIP
However, one major difference noted with kselftests is the use of more
directives than the "# SKIP" directive. kselftest also supports XPASS
and XFAIL directives. Some additional examples found in kselftests:
not ok 5 selftests: netfilter: nft_concat_range.sh # TIMEOUT 45 seconds
not ok 45 selftests: kvm: kvm_binary_stats_test # exit=127
Should the specification be expanded to include these directives?
However, the general format for kselftests with nested test results
seems to differ from the above specification. It seems that a general
format for nested tests is as follows:
TAP version 13
1..2
# selftests: membarrier: membarrier_test_single_thread
# TAP version 13
# 1..2
# ok 1 sys_membarrier available
# ok 2 sys membarrier invalid command test: command = -1, flags = 0,
errno = 22. Failed as expected
ok 1 selftests: membarrier: membarrier_test_single_thread
# selftests: membarrier: membarrier_test_multi_thread
# TAP version 13
# 1..2
# ok 1 sys_membarrier available
# ok 2 sys membarrier invalid command test: command = -1, flags = 0,
errno = 22. Failed as expected
ok 2 selftests: membarrier: membarrier_test_multi_thread
The major differences here, that do not match the above specification,
are use of "# " as an indentation and using a TAP version line to
denote a new test with subtests rather than the subtest header line
described above. If these are widely utilized formats in kselftests,
should we include both versions in the specification or should we
attempt to agree on a single format for nested tests? I personally
believe we should try to agree on a single format for nested tests.
This would allow for a cleaner specification of KTAP and would reduce
possible confusion.
====
So what do people think about the above specification?
How should we handle the differences with kselftests?
If this specification is accepted, where should the specification be documented?
User Interrupts Introduction
============================
User Interrupts (Uintr) is a hardware technology that enables delivering
interrupts directly to user space.
Today, virtually all communication across privilege boundaries happens by going
through the kernel. These include signals, pipes, remote procedure calls and
hardware interrupt based notifications. User interrupts provide the foundation
for more efficient (low latency and low CPU utilization) versions of these
common operations by avoiding transitions through the kernel.
In the User Interrupts hardware architecture, a receiver is always expected to
be a user space task. However, a user interrupt can be sent by another user
space task, kernel or an external source (like a device).
In addition to the general infrastructure to receive user interrupts, this
series introduces a single source: interrupts from another user task. These
are referred to as User IPIs.
The first implementation of User IPIs will be in the Intel processor code-named
Sapphire Rapids. Refer Chapter 11 of the Intel Architecture instruction set
extensions for details of the hardware architecture [1].
Series-reviewed-by: Tony Luck <tony.luck(a)intel.com>
Main goals of this RFC
======================
- Introduce this upcoming technology to the community.
This cover letter includes a hardware architecture summary along with the
software architecture and kernel design choices. This post is a bit long as a
result. Hopefully, it helps answer more questions than it creates :) I am also
planning to talk about User Interrupts next week at the LPC Kernel summit.
- Discuss potential use cases.
We are starting to look at actual usages and libraries (like libevent[2] and
liburing[3]) that can take advantage of this technology. Unfortunately, we
don't have much to share on this right now. We need some help from the
community to identify usages that can benefit from this. We would like to make
sure the proposed APIs work for the eventual consumers.
- Get early feedback on the software architecture.
We are hoping to get some feedback on the direction of overall software
architecture - starting with User IPI, extending it for kernel-to-user
interrupt notifications and external interrupts in the future.
- Discuss some of the main architecture opens.
There is lot of work that still needs to happen to enable this technology. We
are looking for some input on future patches that would be of interest. Here
are some of the big opens that we are looking to resolve.
* Should Uintr interrupt all blocking system calls like sleep(), read(),
poll(), etc? If so, should we implement an SA_RESTART type of mechanism
similar to signals? - Refer Blocking for interrupts section below.
* Should the User Interrupt Target table (UITT) be shared between threads of a
multi-threaded application or maybe even across processes? - Refer Sharing
the UITT section below.
Why care about this? - Micro benchmark performance
==================================================
There is a ~9x or higher performance improvement using User IPI over other IPC
mechanisms for event signaling.
Below is the average normalized latency for a 1M ping-pong IPC notifications
with message size=1.
+------------+-------------------------+
| IPC type | Relative Latency |
| |(normalized to User IPI) |
+------------+-------------------------+
| User IPI | 1.0 |
| Signal | 14.8 |
| Eventfd | 9.7 |
| Pipe | 16.3 |
| Domain | 17.3 |
+------------+-------------------------+
Results have been estimated based on tests on internal hardware with Linux
v5.14 + User IPI patches.
Original benchmark: https://github.com/goldsborough/ipc-bench
Updated benchmark: https://github.com/intel/uintr-ipc-bench/tree/linux-rfc-v1
*Performance varies by use, configuration and other factors.
How it works underneath? - Hardware Summary
===========================================
User Interrupts is a posted interrupt delivery mechanism. The interrupts are
first posted to a memory location and then delivered to the receiver when they
are running with CPL=3.
Kernel managed architectural data structures
--------------------------------------------
UPID: User Posted Interrupt Descriptor - Holds receiver interrupt vector
information and notification state (like an ongoing notification, suppressed
notifications).
UITT: User Interrupt Target Table - Stores UPID pointer and vector information
for interrupt routing on the sender side. Referred by the senduipi instruction.
The interrupt state of each task is referenced via MSRs which are saved and
restored by the kernel during context switch.
Instructions
------------
senduipi <index> - send a user IPI to a target task based on the UITT index.
clui - Mask user interrupts by clearing UIF (User Interrupt Flag).
stui - Unmask user interrupts by setting UIF.
testui - Test current value of UIF.
uiret - return from a user interrupt handler.
User IPI
--------
When a User IPI sender executes 'senduipi <index>', the hardware refers the
UITT table entry pointed by the index and posts the interrupt vector (63-0)
into the receiver's UPID.
If the receiver is running (CPL=3), the sender cpu would send a physical IPI to
the receiver's cpu. On the receiver side this IPI is detected as a User
Interrupt. The User Interrupt handler for the receiver is invoked and the
vector number (63-0) is pushed onto the stack.
Upon execution of 'uiret' in the interrupt handler, the control is transferred
back to instruction that was interrupted.
Refer Chapter 11 of the Intel Architecture instruction set extensions [1] for
more details.
Application interface - Software Architecture
=============================================
User Interrupts (Uintr) is an opt-in feature (unlike signals). Applications
wanting to use Uintr are expected to register themselves with the kernel using
the Uintr related system calls. A Uintr receiver is always a userspace task. A
Uintr sender can be another userspace task, kernel or a device.
1) A receiver can register/unregister an interrupt handler using the Uintr
receiver related syscalls.
uintr_register_handler(handler, flags)
uintr_unregister_handler(flags)
2) A syscall also allows a receiver to register a vector and create a user
interrupt file descriptor - uintr_fd.
uintr_fd = uintr_create_fd(vector, flags)
Uintr can be useful in some of the usages where eventfd or signals are used for
frequent userspace event notifications. The semantics of uintr_fd are somewhat
similar to an eventfd() or the write end of a pipe.
3) Any sender with access to uintr_fd can use it to deliver events (in this
case - interrupts) to a receiver. A sender task can manage its connection with
the receiver using the sender related syscalls based on uintr_fd.
uipi_index = uintr_register_sender(uintr_fd, flags)
Using an FD abstraction provides a secure mechanism to connect with a receiver.
The FD sharing and isolation mechanisms put in place by the kernel would extend
to Uintr as well.
4a) After the initial setup, a sender task can use the SENDUIPI instruction
along with the uipi_index to generate user IPIs without any kernel
intervention.
SENDUIPI <uipi_index>
If the receiver is running (CPL=3), then the user interrupt is delivered
directly without a kernel transition. If the receiver isn't running the
interrupt is delivered when the receiver gets context switched back. If the
receiver is blocked in the kernel, the user interrupt is delivered to the
kernel which then unblocks the intended receiver to deliver the interrupt.
4b) If the sender is the kernel or a device, the uintr_fd can be passed onto
the related kernel entity to allow them to setup a connection and then generate
a user interrupt for event delivery. <The exact details of this API are still
being worked upon.>
For details of the user interface and associated system calls refer the Uintr
man-pages draft:
https://github.com/intel/uintr-linux-kernel/tree/rfc-v1/tools/uintr/manpages.
We have also included the same content as patch 1 of this series to make it
easier to review.
Refer the Uintr compiler programming guide [4] for details on Uintr integration
with GCC and Binutils.
Kernel design choices
=====================
Here are some of the reasons and trade-offs for the current design of the APIs.
System call interface
---------------------
Why a system call interface?: The 2 options we considered are using a char
device at /dev or use system calls (current approach). A syscall approach
avoids exposing a core cpu feature through a driver model. Also, we want to
have a user interrupt FD per vector and share a single common interrupt handler
among all vectors. This seems easier for the kernel and userspace to accomplish
using a syscall based approach.
Data sharing using user interrupts: Uintr doesn't include a mechanism to
share/transmit data. The expectation is applications use existing data sharing
mechanisms to share data and use Uintr only for signaling.
An FD for each vector: A uintr_fd is assigned to each vector to allow fine
grained priority and event management by the receiver. The alternative we
considered was to allocate an FD to the interrupt handler and having that
shared with the sender. However, that approach relies on the sender selecting
the vector and moves the vector priority management to the sender. Also, if
multiple senders want to send unique user interrupts they would need to
coordinate the vector selection amongst them.
Extending the APIs: Currently, the system calls are only extendable using the
flags argument. We can add a variable size struct to some of the syscalls if
needed.
Extending existing mechanisms
-----------------------------
Uintr can be beneficial in some of the usages where eventfd() or signals are
used. Since Uintr is hardware-dependent, thread-specific and bypasses the
kernel in the fast path, it makes extending existing mechanisms harder.
Main issues with extending signals:
Signal handlers are defined significantly differently than a User interrupt
handler. An application needs to save/restore registers in a user interrupt
handler and call uiret to return from it. Also, signals can be process directed
(or thread directed) but user interrupts are always thread directed.
Comparison of signals with User Interrupts:
+=====================+===========================+===========================+
| | Signals | User Interrupts |
+=====================+===========================+===========================+
| Stacks | Has alt stacks | Uses application stack |
| | | (alternate stack option |
| | | not yet enabled) |
+---------------------+---------------------------+---------------------------+
| Registers state | Kernel manages incl. | App responsible (Use GCC |
| | FPU/XSTATE area | 'interrupt' attribute for |
| | | general purpose registers)|
+---------------------+---------------------------+---------------------------+
| Blocking/Masking | sigprocmask(2)/sa_mask | CLUI instruction (No per |
| | | vector masking) |
+---------------------+---------------------------+---------------------------+
| Direction | Uni-directional | Uni-directional |
+---------------------+---------------------------+---------------------------+
| Post event | kill(), signal(), | SENDUIPI <index> - index |
| | sigqueue(), etc. | derived from uintr_fd |
+---------------------+---------------------------+---------------------------+
| Target | Process-directed or | Thread-directed |
| | thread-directed | |
+---------------------+---------------------------+---------------------------+
| Fork/inheritance | Empty signal set | Nothing is inherited |
+---------------------+---------------------------+---------------------------+
| Execv | Pending signals preserved | Nothing is inherited |
+---------------------+---------------------------+---------------------------+
| Order of delivery | Undetermined | High to low vector numbers|
| for multiple signals| | |
+---------------------+---------------------------+---------------------------+
| Handler re-entry | All signals except the | No interrupts can cause |
| | one being handled | handler re-entry. |
+---------------------+---------------------------+---------------------------+
| Delivery feedback | 0 or -1 based on whether | No feedback on whether the|
| | the signal was sent | interrupt was sent or |
| | | received. |
+---------------------+---------------------------+---------------------------+
Main issues with extending eventfd():
eventfd() has a counter value that is core to the API. User interrupts can't
have an associated counter since the signaling happens at the user level and
the hardware doesn't have a memory counter mechanism. Also, eventfd can be used
for bi-directional signaling where as uintr_fd is uni-directional.
Comparison of eventfd with uintr_fd:
+====================+======================+==============================+
| | Eventfd | uintr_fd (User Interrupt FD) |
+====================+======================+==============================+
| Object | Counter - uint64 | Receiver vector information |
+--------------------+----------------------+------------------------------+
| Post event | write() to eventfd | SENDUIPI <index> - index |
| | | derived from uintr_fd |
+--------------------+----------------------+------------------------------+
| Receive event | read() on eventfd | Implicit - Handler is |
| | | invoked with associated |
| | | vector. |
+--------------------+----------------------+------------------------------+
| Direction | Bi-directional | Uni-directional |
+--------------------+----------------------+------------------------------+
| Data transmitted | Counter - uint64 | None |
+--------------------+----------------------+------------------------------+
| Waiting for events | Poll() family of | No per vector wait. |
| | syscalls | uintr_wait() allows waiting |
| | | for all user interrupts |
+--------------------+----------------------+------------------------------+
Security Model
==============
User Interrupts is designed as an opt-in feature (unlike signals). The security
model for user interrupts is intended to be similar to eventfd(). The general
idea is that any sender with access to uintr_fd would be able to generate the
associated interrupt vector for the receiver task that created the fd.
Untrusted processes
-------------------
The current implementation expects only trusted and cooperating processes to
communicate using user interrupts. Coordination is expected between processes
for a connection teardown. In situations where coordination doesn't happen
(say, due to abrupt process exit), the kernel would end up keeping shared
resources (like UPID) allocated to avoid faults.
Currently, a sender can easily cause a denial of service for the receiver by
generating a storm of user interrupts. A user interrupt handler is invoked with
interrupts disabled, but upon execution of uiret, interrupts get enabled again
by the hardware. This can lead to the handler being invoked again before normal
execution can resume. There isn't a hardware mechanism to mask specific
interrupt vectors.
To enable untrusted processes to communicate, we need to add a per-vector
masking option through another syscall (or maybe IOCTL). However, this can add
some complexity to the kernel code. A vector can only be masked by modifying
the UITT entries at the source. We need to be careful about races while
removing and restoring the UPID from the UITT.
Resource limits
---------------
The maximum number of receiver-sender connections would be limited by the
maximum number of open file descriptors and the size of the UITT.
The UITT size is chosen as 4kB fixed size arbitrarily right now. We plan to
make it dynamic and configurable in size. RLIMIT_MEMLOCK or ENOMEM should be
triggered when the size limits have been hit.
Main Opens
==========
Blocking for interrupts
-----------------------
User interrupts are delivered to applications immediately if they are running
in userspace. If a receiver task has blocked in the kernel using the placeholder
uintr_wait() syscall, the task would be woken up to deliver the user interrupt.
However, if the task is blocked due to any other blocking calls like read(),
sleep(), etc; the interrupt will only get delivered when the application gets
scheduled again. We need to consider if applications need to receive User
Interrupts as soon as they are posted (similar to signals) when they are
blocked due to some other reason. Adding this capability would likely make the
kernel implementation more complex.
Interrupting system calls using User Interrupts would also mean we need to
consider an SA_RESTART type of mechanism. We also need to evaluate if some of
the signal handler related semantics in the kernel can be reused for User
Interrupts.
Sharing the User Interrupt Target Table (UITT)
----------------------------------------------
The current implementation assigns a unique UITT to each task. This assumes
that User interrupts are used for point-to-point communication between 2 tasks.
Also, this keeps the kernel implementation relatively simple.
However, there are of benefits to sharing the UITT between threads of a
multi-threaded application. One, they would see a consistent view of the UITT.
i.e. SENDUIPI <index> would mean the same on all threads of the application.
Also, each thread doesn't have to register itself using the common uintr_fd.
This would simplify the userspace setup and make efficient use of kernel
memory. The potential downside is that the kernel implementation to allocate,
modify, expand and free the UITT would be more complex.
A similar argument can be made for a set of processes that do a lot of IPC
amongst them. They would prefer to have a shared UITT that lets them target any
process from any process. With the current file descriptor based approach, the
connection setup can be time consuming and somewhat cumbersome. We need to
evaluate if this can be made simpler as well.
Kernel page table isolation (KPTI)
----------------------------------
SENDUIPI is a special ring-3 instruction that makes a supervisor mode memory
access to the UPID and UITT memory. The current patches need KPTI to be
disabled for User IPIs to work. To make User IPI work with KPTI, we need to
allocate these structures from a special memory region that has supervisor
access but it is mapped into userspace. The plan is to implement a mechanism
similar to LDT.
Processors that support user interrupts are not affected by Meltdown so the
auto mode of KPTI will default to off. Users who want to force enable KPTI will
need to wait for a later version of this patch series to use user interrupts.
Please let us know if you want the development of these patches to be
prioritized (or deprioritized).
FAQs
====
Q: What happens if a process is "surprised" by a user interrupt?
A: For tasks that haven't registered with the kernel and requested for user
interrupts aren't expected or able to receive to user interrupts.
Q: Do user interrupts affect kernel scheduling?
A: No. If a task is blocked waiting for user interrupts, when the kernel
receives a notification on behalf of that task we only put it back on the
runqueue. Delivery of a user interrupt in no way changes the scheduling
priorities of a task.
Q: Does the sender get to know if the interrupt was delivered?
A: No. User interrupts only provides a posted interrupt delivery mechanism. If
applications need to rely on whether the interrupt was delivered they should
consider a userspace mechanism for feedback (like a shared memory counter or a
user interrupt back to the sender).
Q: Why is there no feedback on interrupt delivery?
A: Being a posted interrupt delivery mechanism, the interrupt delivery
happens in 2 steps:
1) The interrupt information is stored in a memory location (UPID).
2) The physical interrupt is delivered to the interrupt receiver.
The 2nd step could happen immediately, after an extended period, or it might
never happen based on the state of the receiver after step 1. (The receiver
could have disabled interrupts, have been context switched out or it might have
crashed during that time.) This makes it very hard for the hardware to reliably
provide feedback upon execution of SENDUIPI.
Q: Can user interrupts be nested?
A: Yes. Using STUI instruction in the interrupt handler would allow new user
interrupts to be delivered. However, there no TPR(thread priority register)
like mechanism to allow only higher priority interrupts. Any user interrupt can
be taken when nesting is enabled.
Q: Can a task receive all pending user interrupts in one go?
A: No. The hardware allows only one vector to be processed at a time. If a task
is interested in knowing all the interrupts that are pending then we could add
a syscall that provides the pending interrupts information.
Q: Do the processes need to be pinned to a cpu?
A: No. User interrupts will be routed correctly to whichever cpu the receiver
is running on. The kernel updates the cpu information in the UPID during
context switch.
Q: Why are UPID and UITT allocated by the kernel?
A: If allocated by user space, applications could misuse the UPID and UITT to
write to unauthorized memory and generate interrupts on any cpu. The UPID and
UITT are allocated by the kernel and accessed by the hardware with supervisor
privilege.
Patch structure for this series
===============================
- Man-pages and Kernel documentation (patch 1,2)
- Hardware enumeration (patch 3, 4)
- User IPI kernel vector reservation (patch 5)
- Syscall interface for interrupt receiver, sender and vector
management(uintr_fd) (patch 6-12)
- Basic selftests (patch 13)
Along with the patches in this RFC, there are additional tests and samples that
are available at:
https://github.com/intel/uintr-linux-kernel/tree/rfc-v1
Links
=====
[1]: https://software.intel.com/content/www/us/en/develop/download/intel-archite…
[2]: https://libevent.org/
[3]: https://github.com/axboe/liburing
[4]: https://github.com/intel/uintr-compiler-guide/blob/uintr-gcc-11.1/UINTR-com…
Sohil Mehta (13):
x86/uintr/man-page: Include man pages draft for reference
Documentation/x86: Add documentation for User Interrupts
x86/cpu: Enumerate User Interrupts support
x86/fpu/xstate: Enumerate User Interrupts supervisor state
x86/irq: Reserve a user IPI notification vector
x86/uintr: Introduce uintr receiver syscalls
x86/process/64: Add uintr task context switch support
x86/process/64: Clean up uintr task fork and exit paths
x86/uintr: Introduce vector registration and uintr_fd syscall
x86/uintr: Introduce user IPI sender syscalls
x86/uintr: Introduce uintr_wait() syscall
x86/uintr: Wire up the user interrupt syscalls
selftests/x86: Add basic tests for User IPI
.../admin-guide/kernel-parameters.txt | 2 +
Documentation/x86/index.rst | 1 +
Documentation/x86/user-interrupts.rst | 107 +++
arch/x86/Kconfig | 12 +
arch/x86/entry/syscalls/syscall_32.tbl | 6 +
arch/x86/entry/syscalls/syscall_64.tbl | 6 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/entry-common.h | 4 +
arch/x86/include/asm/fpu/types.h | 20 +-
arch/x86/include/asm/fpu/xstate.h | 3 +-
arch/x86/include/asm/hardirq.h | 4 +
arch/x86/include/asm/idtentry.h | 5 +
arch/x86/include/asm/irq_vectors.h | 6 +-
arch/x86/include/asm/msr-index.h | 8 +
arch/x86/include/asm/processor.h | 8 +
arch/x86/include/asm/uintr.h | 76 ++
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/cpu/common.c | 61 ++
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/fpu/core.c | 17 +
arch/x86/kernel/fpu/xstate.c | 20 +-
arch/x86/kernel/idt.c | 4 +
arch/x86/kernel/irq.c | 51 +
arch/x86/kernel/process.c | 10 +
arch/x86/kernel/process_64.c | 4 +
arch/x86/kernel/uintr_core.c | 880 ++++++++++++++++++
arch/x86/kernel/uintr_fd.c | 300 ++++++
include/linux/syscalls.h | 8 +
include/uapi/asm-generic/unistd.h | 15 +-
kernel/sys_ni.c | 8 +
scripts/checksyscalls.sh | 6 +
tools/testing/selftests/x86/Makefile | 10 +
tools/testing/selftests/x86/uintr.c | 147 +++
tools/uintr/manpages/0_overview.txt | 265 ++++++
tools/uintr/manpages/1_register_receiver.txt | 122 +++
.../uintr/manpages/2_unregister_receiver.txt | 62 ++
tools/uintr/manpages/3_create_fd.txt | 104 +++
tools/uintr/manpages/4_register_sender.txt | 121 +++
tools/uintr/manpages/5_unregister_sender.txt | 79 ++
tools/uintr/manpages/6_wait.txt | 59 ++
42 files changed, 2626 insertions(+), 8 deletions(-)
create mode 100644 Documentation/x86/user-interrupts.rst
create mode 100644 arch/x86/include/asm/uintr.h
create mode 100644 arch/x86/kernel/uintr_core.c
create mode 100644 arch/x86/kernel/uintr_fd.c
create mode 100644 tools/testing/selftests/x86/uintr.c
create mode 100644 tools/uintr/manpages/0_overview.txt
create mode 100644 tools/uintr/manpages/1_register_receiver.txt
create mode 100644 tools/uintr/manpages/2_unregister_receiver.txt
create mode 100644 tools/uintr/manpages/3_create_fd.txt
create mode 100644 tools/uintr/manpages/4_register_sender.txt
create mode 100644 tools/uintr/manpages/5_unregister_sender.txt
create mode 100644 tools/uintr/manpages/6_wait.txt
base-commit: 6880fa6c56601bb8ed59df6c30fd390cc5f6dd8f
--
2.33.0
This is a follow up to my v7 series of fixes for the zram driver [0]
which ended up uncovering a generic deadlock issue with sysfs and module
removal. I've reported this issue and proposed a few patches first since
March 2021 [1]. At the end of this email you will find an itemized list
of changes since that v1 series, you can also find these changes on my
branch 20210927-sysfs-generic-deadlock-fix [4] which is based on
linux-next tag next-20210927.
Just a heads up, I'm goin on vacation in two days, won't be back until
Monday October 11th.
On this v8 I incorporate feedback from the v7 series, namely:
- Tejun requested I move the struct module to the last attribute when
extending functions
- As per discussion with Tejun, trimmed and clarified the commit log
and documentation on the generic fix on patch 7
- As requested by Bart Van Assche, I simplied the setting of the
struct test_config *config into one line instead of two on many
places on patch 3 which adds the new sysfs selftest
- Dan Williams had some questions about patch 7, and so clarified these
questions using a more elaborate example on the commit log to show
where the lock call was happening.
- Trimmed the Cc list considerably as it was way too long before
- Rebased onto linux-next tag next-20210927
Below a list of changes of this patch set since its inception:
On v1:
- Open coded the sysfs deadlock race to only be localized by the zram
driver
Changes on v2:
- used bdgrab() as well for another race which was speculated by
Minchan
- improved documentation of fixes
Changes on v3:
- used a localized zram macros for the sysfs attributes instead of
open coding on each routine
- replaced bdget() stuff for a generic get_device() and bus_get() on
dev_attr_show() / dev_attr_store() for the issue speculated by
Michan
Changes on v4:
- Cosmetic fixes on the zram fixes as requested by Greg
- Split out the driver core fix as requested by Greg for the
issue speculated by Michan. This fix ended up getting up to its 4th
patch iteration [2] and eventually hit linux-next. We got a 0day
0day suspend stres fail for this patch [3]
Changes on v5:
- I ended up writing a test_sysfs driver and with it I ended up
proving that the issue speculated by Michen was not possible and
so I asked Greg to drop the patch from his queue titled
"sysfs: fix kobject refcount to address races with kobject removal"
- checkpatch fixes for the zram changes
Changes on v6:
- I submitted my test_sysfs driver for inclusion upstream which easily
abstracted the deadlock issue in a driver generically [4]
- I rebased the zram fixes and added also a new patch for zram to use
ATTRIBUTE_GROUPS As per Minchen I sent the patches to be merged
through Andrew Morton.
- Greg ended up NACK'ing the patchset because he was not sure the fix
was correct still
Changes on v7:
- Formalizes the original proposed generic sysfs fix intead of using
macro helpers to work around the issue
- I decided it is best to merge all the effort together into
one patch set because communication was being lost when I split the
patches up. This was not helping in any way to either fix the zram
issues or come to consensus on a generic solution. The patches are
also merged now because they are all related now.
- Running checkpatch exposed that S_IRWXUGO and S_IRWXU|S_IRUGO|S_IXUGO
should be replaced, so I did that in this series in two new patches
- Adds a try_module_get() documentation extension with tribal
knowledge and new information I don't think some folks still believe
in. The new test_sysfs selftest however proves this information to
be correct, the same selftest can be used to try to prove that
documentation incorrect
- Because the fix is now generic zram's deadlock can easily be fixed
now by just making it use ATTRIBUTE_GROUPS().
[0] https://lkml.kernel.org/r/YUjLAbnEB5qPfnL8@slm.duckdns.org
[1] https://lkml.kernel.org/r/20210306022035.11266-1-mcgrof@kernel.org
[2] https://lkml.kernel.org/r/20210623215007.862787-1-mcgrof@kernel.org
[3] https://lkml.kernel.org/r/20210701022737.GC21279@xsang-OptiPlex-9020
[4] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?…
Luis Chamberlain (12):
LICENSES: Add the copyleft-next-0.3.1 license
testing: use the copyleft-next-0.3.1 SPDX tag
selftests: add tests_sysfs module
kernfs: add initial failure injection support
test_sysfs: add support to use kernfs failure injection
kernel/module: add documentation for try_module_get()
fs/kernfs/symlink.c: replace S_IRWXUGO with 0777 on
kernfs_create_link()
fs/sysfs/dir.c: replace S_IRWXU|S_IRUGO|S_IXUGO with 0755
sysfs_create_dir_ns()
sysfs: fix deadlock race with module removal
test_sysfs: enable deadlock tests by default
zram: fix crashes with cpu hotplug multistate
zram: use ATTRIBUTE_GROUPS to fix sysfs deadlock module removal
.../fault-injection/fault-injection.rst | 22 +
LICENSES/dual/copyleft-next-0.3.1 | 237 +++
MAINTAINERS | 9 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 4 +-
drivers/block/zram/zram_drv.c | 74 +-
fs/kernfs/Makefile | 1 +
fs/kernfs/dir.c | 44 +-
fs/kernfs/failure-injection.c | 91 ++
fs/kernfs/file.c | 19 +-
fs/kernfs/kernfs-internal.h | 75 +-
fs/kernfs/symlink.c | 4 +-
fs/sysfs/dir.c | 5 +-
fs/sysfs/file.c | 6 +-
fs/sysfs/group.c | 3 +-
include/linux/kernfs.h | 19 +-
include/linux/module.h | 34 +-
include/linux/sysfs.h | 52 +-
kernel/cgroup/cgroup.c | 2 +-
lib/Kconfig.debug | 25 +
lib/Makefile | 1 +
lib/test_kmod.c | 12 +-
lib/test_sysctl.c | 12 +-
lib/test_sysfs.c | 952 ++++++++++++
tools/testing/selftests/kmod/kmod.sh | 13 +-
tools/testing/selftests/sysctl/sysctl.sh | 12 +-
tools/testing/selftests/sysfs/Makefile | 12 +
tools/testing/selftests/sysfs/config | 5 +
tools/testing/selftests/sysfs/sysfs.sh | 1383 +++++++++++++++++
28 files changed, 3026 insertions(+), 102 deletions(-)
create mode 100644 LICENSES/dual/copyleft-next-0.3.1
create mode 100644 fs/kernfs/failure-injection.c
create mode 100644 lib/test_sysfs.c
create mode 100644 tools/testing/selftests/sysfs/Makefile
create mode 100644 tools/testing/selftests/sysfs/config
create mode 100755 tools/testing/selftests/sysfs/sysfs.sh
--
2.30.2
The Testing & Fuzzing Micro-Conference[1] at Linux Plumbers 2021 will
remain open to new proposals for talks and discussion topics until the
end of next week (Friday 10th Sept). Please feel free to submit yours
with the "Submit new proposal" form on this page:
https://linuxplumbersconf.org/event/11/abstracts/
The MC is currently scheduled for Wednesday 22nd. This is where the
timetable will appear as submissions get accepted:
https://linuxplumbersconf.org/event/11/sessions/110/#20210922
Last year's edition was very effective in spite of being fully online
rather than in-person. Topics around testing were mentioned in many
other tracks too, such as real-time and toolchains. See also the
related KernelCI blog post with community notes[2]. We're looking
forward to having an equally good virtual experience this time again.
Best wishes,
Guillaume
[1] https://www.linuxplumbersconf.org/blog/2021/index.php/2021/07/09/testing-an…
The Testing and Fuzzing microconference focuses on advancing the current
state of testing of the Linux kernel. We aim to create connections
between folks working on similar projects, and help individual projects
make progress.
We ask that any topic discussions will focus on issues/problems they are
facing and possible alternatives to resolving them. The Microconference
is open to all topics related to testing & fuzzing on Linux, not
necessarily in the kernel space.
Suggested topics:
KernelCI: Extending coverage and improving user experience.
Growing KCIDB, integrating more sources.
Better sanitizers: KFENCE, improving KCSAN.
Using Clang for better testing coverage.
How to spread KUnit throughout the kernel?
Testing in-kernel Rust code.
MC leads:
Sasha Levin <sashal(a)kernel.org>
Guillaume Tucker <guillaume.tucker(a)collabora.com>
[2] https://foundation.kernelci.org/blog/2020/09/23/kernelci-notes-from-plumber…
We refactored the lib/test_hash.c file into KUnit as part of the student
group LKCAMP [1] introductory hackathon for kernel development.
This test was pointed to our group by Daniel Latypov [2], so its full
conversion into a pure KUnit test was our goal in this patch series, but
we ran into many problems relating to it not being split as unit tests,
which complicated matters a bit, as the reasoning behind the original
tests is quite cryptic for those unfamiliar with hash implementations.
Some interesting developments we'd like to highlight are:
- In patch 1/5 we noticed that there was an unused define directive that
could be removed.
- In patch 4/5 we noticed how stringhash and hash tests are all under
the lib/test_hash.c file, which might cause some confusion, and we
also broke those kernel config entries up.
Overall KUnit developments have been made in the other patches in this
series:
In patches 2/5, 3/5 and 5/5 we refactored the lib/test_hash.c
file so as to make it more compatible with the KUnit style, whilst
preserving the original idea of the maintainer who designed it (i.e.
George Spelvin), which might be undesirable for unit tests, but we
assume it is enough for a first patch.
This is our first patch series so we hope our contributions are
interesting and also hope to get some useful criticism from the
community. :)
Changes since V1:
- Fixed compilation on parisc and m68k.
- Fixed whitespace mistakes.
- Renamed a few functions.
- Refactored globals into struct for test function params, thus removing
a patch.
- Reworded some commit messages.
[1] - https://lkcamp.dev/
[2] - https://lore.kernel.org/linux-kselftest/CAGS_qxojszgM19u=3HLwFgKX5bm5Khywvs…
Isabella Basso (5):
hash.h: remove unused define directive
test_hash.c: split test_int_hash into arch-specific functions
test_hash.c: split test_hash_init
lib/Kconfig.debug: properly split hash test kernel entries
test_hash.c: refactor into kunit
include/linux/hash.h | 5 +-
lib/Kconfig.debug | 28 ++++-
lib/Makefile | 3 +-
lib/test_hash.c | 247 +++++++++++++++++--------------------
tools/include/linux/hash.h | 5 +-
5 files changed, 139 insertions(+), 149 deletions(-)
--
2.33.0
Status
======
This version of the patch set implements the suggestions received for
version 2. Apart from one patch added for the IMA API and few fixes, there
are no substantial changes. It has been tested on: x86_64, UML (x86_64),
s390x (big endian).
The long term goal is to boot a system with appraisal enabled and with
DIGLIM as repository for reference values, taken from the RPM database.
Changes required:
- new execution policies in IMA
(https://lore.kernel.org/linux-integrity/20210409114313.4073-1-roberto.sassu…)
- support for the euid policy keyword for critical data
(https://lore.kernel.org/linux-integrity/20210705115650.3373599-1-roberto.sa…)
- basic DIGLIM
(this patch set)
- additional DIGLIM features (loader, LSM, user space utilities)
- support for DIGLIM in IMA
- support for PGP keys and signatures
(from David Howells)
- support for PGP appended signatures in IMA
Introduction
============
Digest Lists Integrity Module (DIGLIM) is a component of the integrity
subsystem in the kernel, primarily aiming to aid Integrity Measurement
Architecture (IMA) in the process of checking the integrity of file
content and metadata. It accomplishes this task by storing reference
values coming from software vendors and by reporting whether or not the
digest of file content or metadata calculated by IMA (or EVM) is found
among those values. In this way, IMA can decide, depending on the result
of a query, if a measurement should be taken or access to the file
should be granted. The Security Assumptions section explains more in
detail why this component has been placed in the kernel.
The main benefits of using IMA in conjunction with DIGLIM are the
ability to implement advanced remote attestation schemes based on the
usage of a TPM key for establishing a TLS secure channel[1][2], and to
reduce the burden on Linux distribution vendors to extend secure boot at
OS level to applications.
DIGLIM does not have the complexity of feature-rich databases. In fact,
its main functionality comes from the hash table primitives already in
the kernel. It does not have an ad-hoc storage module, it just indexes
data in a fixed format (digest lists, a set of concatenated digests
preceded by a header), copied to kernel memory as they are. Lastly, it
does not support database-oriented languages such as SQL, but only
accepts a digest and its algorithm as a query.
The only digest list format supported by DIGLIM is called compact.
However, Linux distribution vendors don't have to generate new digest
lists in this format for the packages they release, as already available
information, such as RPM headers and DEB package metadata, can be used
as a source for reference values (they include file digests), with a
user space parser taking care of the conversion to the compact format.
Although one might perceive that storing file or metadata digests for a
Linux distribution would significantly increase the memory usage, this
does not seem to be the case. As an anticipation of the evaluation done
in the Preliminary Performance Evaluation section, protecting binaries
and shared libraries of a minimal Fedora 33 installation requires 208K
of memory for the digest lists plus 556K for indexing.
In exchange for a slightly increased memory usage, DIGLIM improves the
performance of the integrity subsystem. In the considered scenario, IMA
measurement and appraisal of 5896 files with digest lists requires
respectively less than one quarter and less than half the time, compared
to the current solution.
DIGLIM also keeps track of whether digest lists have been processed in
some way (e.g. measured or appraised by IMA). This is important for
example for remote attestation, so that remote verifiers understand what
has been uploaded to the kernel.
Operations in DIGLIM are atomic: if an error occurs during the addition
of a digest list, DIGLIM rolls back the entire insert operation;
deletions instead always succeed. This capability has been tested with
an ad-hoc fault injection mechanism capable of simulating failures
during the operations.
Finally, DIGLIM exposes to user space, through securityfs, the digest
lists currently loaded, the number of digests added, a query interface
and an interface to set digest list labels.
Binary Integrity
Integrity is a fundamental security property in information systems.
Integrity could be described as the condition in which a generic
component is just after it has been released by the entity that created
it.
One way to check whether a component is in this condition (called binary
integrity) is to calculate its digest and to compare it with a reference
value (i.e. the digest calculated in controlled conditions, when the
component is released).
IMA, a software part of the integrity subsystem, can perform such
evaluation and execute different actions:
- store the digest in an integrity-protected measurement list, so that
it can be sent to a remote verifier for analysis;
- compare the calculated digest with a reference value (usually
protected with a signature) and deny operations if the file is found
corrupted;
- store the digest in the system log.
Benefits
DIGLIM further enhances the capabilities offered by IMA-based solutions
and, at the same time, makes them more practical to adopt by reusing
existing sources as reference values for integrity decisions.
Possible sources for digest lists are:
- RPM headers;
- Debian repository metadata.
Benefits for IMA Measurement
One of the issues that arises when files are measured by the OS is that,
due to parallel execution, the order in which file accesses happen
cannot be predicted. Since the TPM Platform Configuration Register (PCR)
extend operation, executed after each file measurement,
cryptographically binds the current measurement to the previous ones,
the PCR value at the end of a workload cannot be predicted too.
Thus, even if the usage of a TPM key, bound to a PCR value, should be
allowed when only good files were accessed, the TPM could unexpectedly
deny an operation on that key if files accesses did not happen as stated
by the key policy (which allows only one of the possible sequences).
DIGLIM solves this issue by making the PCR value stable over the time
and not dependent on file accesses. The following figure depicts the
current and the new approaches:
IMA measurement list (current)
entry# 1st boot 2nd boot 3rd boot
+----+---------------+ +----+---------------+ +----+---------------+
1: | 10 | file1 measur. | | 10 | file3 measur. | | 10 | file2 measur. |
+----+---------------+ +----+---------------+ +----+---------------+
2: | 10 | file2 measur. | | 10 | file2 measur. | | 10 | file3 measur. |
+----+---------------+ +----+---------------+ +----+---------------+
3: | 10 | file3 measur. | | 10 | file1 measur. | | 10 | file4 measur. |
+----+---------------+ +----+---------------+ +----+---------------+
PCR: Extend != Extend != Extend
file1, file2, file3 file3, file2, file1 file2, file3, file4
PCR Extend definition:
PCR(new value) = Hash(Hash(meas. entry), PCR(previous value))
A new entry in the measurement list is created by IMA for each file
access. Assuming that file1, file2 and file3 are files provided by the
software vendor, file4 is an unknown file, the first two PCR values
above represent a good system state, the third a bad system state. The
PCR values are the result of the PCR extend operation performed for each
measurement entry with the digest of the measurement entry as an input.
IMA measurement list (with DIGLIM)
dlist
+--------------+
| header |
+--------------+
| file1 digest |
| file2 digest |
| file3 digest |
+--------------+
dlist is a digest list containing the digest of file1, file2 and file3.
In the intended scenario, it is generated by a software vendor at the
end of the building process, and retrieved by the administrator of the
system where the digest list is loaded.
entry# 1st boot 2nd boot 3rd boot
+----+---------------+ +----+---------------+ +----+---------------+
0: | 11 | dlist measur. | | 11 | dlist measur. | | 11 | dlist measur. |
+----+---------------+ +----+---------------+ +----+---------------+
1: < file1 measur. skip > < file3 measur. skip > < file2 measur. skip >
2: < file2 measur. skip > < file2 measur. skip > < file3 measur. skip >
+----+---------------+
3: < file3 measur. skip > < file1 measur. skip > | 11 | file4 measur. |
+----+---------------+
PCR: Extend = Extend != Extend
dlist dlist dlist, file4
The first entry in the measurement list contains the digest of the
digest list uploaded to the kernel at kernel initialization time.
When a file is accessed, IMA queries DIGLIM with the calculated file
digest and, if it is found, IMA skips the measurement.
Thus, the only information sent to remote verifiers are: the list of
files that could possibly be accessed (from the digest list), but not if
they were accessed and when; the measurement of unknown files.
Despite providing less information, this solution has the advantage that
the good system state (i.e. when only file1, file2 and file3 are
accessed) now can be represented with a deterministic PCR value (the PCR
is extended only with the measurement of the digest list). Also, the bad
system state can still be distinguished from the good state (the PCR is
extended also with the measurement of file4).
If a TPM key is bound to the good PCR value, the TPM would allow the key
to be used if file1, file2 or file3 are accessed, regardless of the
sequence in which they are accessed (the PCR value does not change), and
would revoke the permission when the unknown file4 is accessed (the PCR
value changes). If a system is able to establish a TLS connection with a
peer, this implicitly means that the system was in a good state (i.e.
file4 was not accessed, otherwise the TPM would have denied the usage of
the TPM key due to the key policy).
Benefits for IMA Appraisal
Extending secure boot to applications means being able to verify the
provenance of files accessed. IMA does it by verifying file signatures
with a key that it trusts, which requires Linux distribution vendors to
additionally include in the package header a signature for each file
that must be verified (there is the dedicated RPMTAG_FILESIGNATURES
section in the RPM header).
The proposed approach would be instead to verify data provenance from
already available metadata (file digests) in existing packages. IMA
would verify the signature of package metadata and search file digests
extracted from package metadata and added to the hash table in the
kernel.
For RPMs, file digests can be found in the RPMTAG_FILEDIGESTS section of
RPMTAG_IMMUTABLE, whose signature is in RPMTAG_RSAHEADER. For DEBs, file
digests (unsafe to use due to a weak digest algorithm) can be found in
the md5sum file, which can be indirectly verified from Release.gpg.
The following figure highlights the differences between the current and
the proposed approach.
IMA appraisal (current solution, with file signatures):
appraise
+-----------+
V |
+-------------------------+-----+ +-------+-----+ |
| RPM header | | ima rpm | file1 | sig | |
| ... | | plugin +-------+-----+ +-----+
| file1 sig [to be added] | sig |--------> ... | IMA |
| ... | | +-------+-----+ +-----+
| fileN sig [to be added] | | | fileN | sig |
+-------------------------+-----+ +-------+-----+
In this case, file signatures must be added to the RPM header, so that
the ima rpm plugin can extract them together with the file content. The
RPM header signature is not used.
IMA appraisal (with DIGLIM):
kernel hash table
with RPM header content
+---+ +--------------+
| |--->| file1 digest |
+---+ +--------------+
...
+---+ appraise (file1)
| | <--------------+
+----------------+-----+ +---+ |
| RPM header | | ^ |
| ... | | digest_list | |
| file1 digest | sig | rpm plugin | +-------+ +-----+
| ... | |-------------+--->| file1 | | IMA |
| fileN digest | | +-------+ +-----+
+----------------+-----+ |
^ |
+------------------------------------+
appraise (RPM header)
In this case, the RPM header is used as it is, and its signature is used
for IMA appraisal. Then, the digest_list rpm plugin executes the user
space parser to parse the RPM header and add the extracted digests to an
hash table in the kernel. IMA appraisal of the files in the RPM package
consists in searching their digest in the hash table.
Other than reusing available information as digest list, another
advantage is the lower computational overhead compared to the solution
with file signatures (only one signature verification for many files and
digest lookup, instead of per file signature verification, see
Preliminary Performance Evaluation for more details).
Lifecycle
The lifecycle of DIGLIM is represented in the following figure:
Vendor premises (release process with modifications):
+------------+ +-----------------------+ +------------------------+
| 1. build a | | 2. generate and sign | | 3. publish the package |
| package |-->| a digest list from |-->| and digest list in |
| | | packaged files | | a repository |
+------------+ +-----------------------+ +------------------------+
|
|
User premises: |
V
+---------------------+ +------------------------+ +-----------------+
| 6. use digest lists | | 5. download the digest | | 4. download and |
| for measurement |<--| list and upload to |<--| install the |
| and/or appraisal | | the kernel | | package |
+---------------------+ +------------------------+ +-----------------+
The figure above represents all the steps when a digest list is
generated separately. However, as mentioned in Benefits, in most cases
existing packages can be already used as a source for digest lists,
limiting the effort for software vendors.
If, for example, RPMs are used as a source for digest lists, the figure
above becomes:
Vendor premises (release process without modifications):
+------------+ +------------------------+
| 1. build a | | 2. publish the package |
| package |-->| in a repository |---------------------+
| | | | |
+------------+ +------------------------+ |
|
|
User premises: |
V
+---------------------+ +------------------------+ +-----------------+
| 5. use digest lists | | 4. extract digest list | | 3. download and |
| for measurement |<--| from the package |<--| install the |
| and/or appraisal | | and upload to the | | package |
| | | kernel | | |
+---------------------+ +------------------------+ +-----------------+
Step 4 can be performed with the digest_list rpm plugin and the user
space parser, without changes to rpm itself.
Security Assumptions
As mentioned in the Introduction, DIGLIM will be primarily used in
conjunction with IMA to enforce a mandatory policy on all user space
processes, including those owned by root. Even root, in a system with a
locked-down kernel, cannot affect the enforcement of the mandatory
policy or, if changes are permitted, it cannot do so without being
detected.
Given that the target of the enforcement are user space processes,
DIGLIM cannot be placed in the target, as a Mandatory Access Control
(MAC) design is required to have the components responsible to enforce
the mandatory policy separated from the target.
While locking-down a system and limiting actions with a mandatory policy
is generally perceived by users as an obstacle, it has noteworthy
benefits for the users themselves.
First, it would timely block attempts by malicious software to steal or
misuse user assets. Although users could query the package managers to
detect them, detection would happen after the fact, or it wouldn't
happen at all if the malicious software tampered with package managers.
With a mandatory policy enforced by the kernel, users would still be
able to decide which software they want to be executed except that,
unlike package managers, the kernel is not affected by user space
processes or root.
Second, it might make systems more easily verifiable from outside, due
to the limited actions the system allows. When users connect to a
server, not only they would be able to verify the server identity, which
is already possible with communication protocols like TLS, but also if
the software running on that server can be trusted to handle their
sensitive data.
Adoption
A former version of DIGLIM is used in the following OSes:
- openEuler 20.09
https://github.com/openeuler-mirror/kernel/tree/openEuler-20.09
- openEuler 21.03
https://github.com/openeuler-mirror/kernel/tree/openEuler-21.03
Originally, DIGLIM was part of IMA (known as IMA Digest Lists). In this
version, it has been redesigned as a standalone module with an API that
makes its functionality accessible by IMA and, eventually, other
subsystems.
User Space Support
Digest lists can be generated and managed with digest-list-tools:
https://github.com/openeuler-mirror/digest-list-tools
It includes two main applications:
- gen_digest_lists: generates digest lists from files in the
filesystem or from the RPM database (more digest list sources can be
supported);
- manage_digest_lists: converts and uploads digest lists to the
kernel.
Integration with rpm is done with the digest_list plugin:
https://gitee.com/src-openeuler/rpm/blob/master/Add-digest-list-plugin.patch
This plugin writes the RPM header and its signature to a file, so that
the file is ready to be appraised by IMA, and calls the user space
parser to convert and upload the digest list to the kernel.
Simple Usage Example (Tested with Fedora 33)
1. Digest list generation (RPM headers and their signature are copied
to the specified directory):
# mkdir /etc/digest_lists
# gen_digest_lists -t file -f rpm+db -d /etc/digest_lists -o add
2. Digest list upload with the user space parser:
# manage_digest_lists -p add-digest -d /etc/digest_lists
3. First digest list query:
# echo sha256-$(sha256sum /bin/cat) > /sys/kernel/security/integrity/diglim/digest_query
# cat /sys/kernel/security/integrity/diglim/digest_query
sha256-[...]-0-file_list-rpm-coreutils-8.32-18.fc33.x86_64 (actions: 0): version: 1, algo: sha256, type: 2, modifiers: 1, count: 106, datalen: 3392
4. Second digest list query:
# echo sha256-$(sha256sum /bin/zip) > /sys/kernel/security/integrity/diglim/digest_query
# cat /sys/kernel/security/integrity/diglim/digest_query
sha256-[...]-0-file_list-rpm-zip-3.0-27.fc33.x86_64 (actions: 0): version: 1, algo: sha256, type: 2, modifiers: 1, count: 4, datalen: 128
Preliminary Performance Evaluation
This section provides an initial estimation of the overhead introduced
by DIGLIM. The estimation has been performed on a Fedora 33 virtual
machine with 1447 packages installed. The virtual machine has 16 vCPU
(host CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores) and 2G of RAM
(host memory: 64G). The virtual machine also has a vTPM with libtpms and
swtpm as backend.
After writing the RPM headers to files, the size of the directory
containing them is 36M.
After converting the RPM headers to the compact digest list, the size of
the data being uploaded to the kernel is 3.6M.
The time to load the entire RPM database is 0.628s.
After loading the digest lists to the kernel, the slab usage due to
indexing is (obtained with slab_nomerge in the kernel command line):
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
118144 118144 100% 0,03K 923 128 3692K digest_list_item_ref_cache
102400 102400 100% 0,03K 800 128 3200K digest_item_cache
2646 2646 100% 0,09K 63 42 252K digest_list_item_cache
The stats, obtained from the digests_count interface, introduced later,
are:
Parser digests: 0
File digests: 99100
Metadata digests: 0
Digest list digests: 1423
On this installation, this would be the worst case in which all files
are measured and/or appraised, which is currently not recommended
without enforcing an integrity policy protecting mutable files. Infoflow
LSM is a component to accomplish this task:
https://patchwork.kernel.org/project/linux-integrity/cover/20190818235745.1…
The first manageable goal of IMA with DIGLIM is to use an execution
policy, with measurement and/or appraisal of files executed or mapped in
memory as executable (in addition to kernel modules and firmware). In
this case, the digest list contains the digest only for those files. The
numbers above change as follows.
After converting the RPM headers to the compact digest list, the size of
the data being uploaded to the kernel is 208K.
The time to load the digest of binaries and shared libraries is 0.062s.
After loading the digest lists to the kernel, the slab usage due to
indexing is:
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
7168 7168 100% 0,03K 56 128 224K digest_list_item_ref_cache
7168 7168 100% 0,03K 56 128 224K digest_item_cache
1134 1134 100% 0,09K 27 42 108K digest_list_item_cache
The stats, obtained from the digests_count interface, are:
Parser digests: 0
File digests: 5986
Metadata digests: 0
Digest list digests: 1104
Comparison with IMA
This section compares the performance between the current solution for
IMA measurement and appraisal, and IMA with DIGLIM.
Workload A (without DIGLIM):
1. cat file[0-5985] > /dev/null
Workload B (with DIGLIM):
1. echo $PWD/0-file_list-compact-file[0-1103] >
<securityfs>/integrity/diglim/digest_list_add
2. cat file[0-5985] > /dev/null
Workload A execution time without IMA policy:
real 0m0,155s
user 0m0,008s
sys 0m0,066s
Measurement
IMA policy:
measure fowner=2000 func=FILE_CHECK mask=MAY_READ use_diglim=allow pcr=11 ima_template=ima-sig
use_diglim is a policy keyword not yet supported by IMA.
Workload A execution time with IMA and 5986 files with signature
measured:
real 0m8,273s
user 0m0,008s
sys 0m2,537s
Workload B execution time with IMA, 1104 digest lists with signature
measured and uploaded to the kernel, and 5986 files with signature
accessed but not measured (due to the file digest being found in the
hash table):
real 0m1,837s
user 0m0,036s
sys 0m0,583s
Appraisal
IMA policy:
appraise fowner=2000 func=FILE_CHECK mask=MAY_READ use_diglim=allow
use_diglim is a policy keyword not yet supported by IMA.
Workload A execution time with IMA and 5986 files with file signature
appraised:
real 0m2,197s
user 0m0,011s
sys 0m2,022s
Workload B execution time with IMA, 1104 digest lists with signature
appraised and uploaded to the kernel, and with 5986 files with signature
not verified (due to the file digest being found in the hash table):
real 0m0,982s
user 0m0,020s
sys 0m0,865s
[1] LSS EU 2019 slides and video
[2] FutureTPM EU project, final review meeting demo slides and video
v2:
- fix documentation content and style issues (suggested by Mauro)
- fix basic definitions description and ensure that the _reserved field of
compact list headers is zero (suggested by Greg KH)
- document the static inline functions to access compact list data
(suggested by Mauro)
- rename htable global variable to diglim_htable (suggested by Mauro)
- add IMA API to retrieve integrity information about a file or buffer
- display the digest list in the original format (same endianness as when
it was uploaded)
- support digest lists with appended signature (for IMA appraisal)
- fix bugs in the tests
- allocate the digest list label in digest_list_add()
- rename digest_label interface to digest_list_label
- check input for digest_query and digest_list_label interfaces
- don't remove entries in digest_lists_loaded if the same digest list is
uploaded again to the kernel
- deny write access to the digest lists while IMA actions are retrieved
- add new test digest_list_add_del_test_file_upload_measured_chown
- remove unused COMPACT_KEY type
v1:
- remove 'ima: Add digest, algo, measured parameters to
ima_measure_critical_data()', replaced by:
https://lore.kernel.org/linux-integrity/20210705090922.3321178-1-roberto.sa…
- add 'Lifecycle' subsection to better clarify how digest lists are
generated and used (suggested by Greg KH)
- remove 'Possible Usages' subsection and add 'Benefits for IMA
Measurement' and 'Benefits for IMA Appraisal' subsubsections
- add 'Preliminary Performance Evaluation' subsection
- declare digest_offset and hdr_offset in the digest_list_item_ref
structure as u32 (sufficient for digest lists of 4G) to make room for a
list_head structure (digest_list_item_ref size: 32)
- implement digest list reference management with a linked list instead of
an array
- reorder structure members for better alignment (suggested by Mauro)
- rename digest_lookup() to __digest_lookup() (suggested by Mauro)
- introduce an object cache for each defined structure
- replace atomic_long_t with unsigned long in h_table structure definition
(suggested by Greg KH)
- remove GPL2 license text and file names (suggested by Greg KH)
- ensure that the _reserved field of compact_list_hdr is equal to zero
(suggested by Greg KH)
- dynamically allocate the buffer in digest_lists_show_htable_len() to
avoid frame size warning (reported by kernel test robot, dynamic
allocation suggested by Mauro)
- split documentation in multiple files and reference the source code
(suggested by Mauro)
- use #ifdef in include/linux/diglim.h
- improve generation of event name for IMA measurements
- add new patch to introduce the 'Remote Attestation' section in the
documentation
- fix assignment of actions variable in digest_list_read() and
digest_list_write()
- always release dentry reference when digest_list_get_secfs_files() is
called
- rewrite add/del and query interfaces to take advantage of m->private
- prevent deletion of a digest list only if there are actions done at
addition time that are not currently being performed
- fix doc warnings (replace Returns with Return:)
- perform queries of digest list digests in the existing tests
- add new tests: digest_list_add_del_test_file_upload_measured,
digest_list_check_measurement_list_test_file_upload and
digest_list_check_measurement_list_test_buffer_upload
- don't return a value from digest_del(), digest_list_ref_del, and
digest_list_del()
- improve Makefile for tests
Roberto Sassu (13):
diglim: Overview
diglim: Basic definitions
diglim: Objects
diglim: Methods
diglim: Parser
diglim: IMA info
diglim: Interfaces - digest_list_add, digest_list_del
diglim: Interfaces - digest_lists_loaded
diglim: Interfaces - digest_list_label
diglim: Interfaces - digest_query
diglim: Interfaces - digests_count
diglim: Remote Attestation
diglim: Tests
.../security/diglim/architecture.rst | 46 +
.../security/diglim/implementation.rst | 228 +++
Documentation/security/diglim/index.rst | 14 +
.../security/diglim/introduction.rst | 599 +++++++
.../security/diglim/remote_attestation.rst | 87 +
Documentation/security/diglim/tests.rst | 70 +
Documentation/security/index.rst | 1 +
MAINTAINERS | 20 +
include/linux/diglim.h | 28 +
include/linux/kernel_read_file.h | 1 +
include/uapi/linux/diglim.h | 51 +
security/integrity/Kconfig | 1 +
security/integrity/Makefile | 1 +
security/integrity/diglim/Kconfig | 11 +
security/integrity/diglim/Makefile | 8 +
security/integrity/diglim/diglim.h | 232 +++
security/integrity/diglim/fs.c | 865 ++++++++++
security/integrity/diglim/ima.c | 122 ++
security/integrity/diglim/methods.c | 513 ++++++
security/integrity/diglim/parser.c | 274 ++++
security/integrity/integrity.h | 4 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/diglim/Makefile | 19 +
tools/testing/selftests/diglim/common.c | 135 ++
tools/testing/selftests/diglim/common.h | 32 +
tools/testing/selftests/diglim/config | 3 +
tools/testing/selftests/diglim/selftest.c | 1442 +++++++++++++++++
27 files changed, 4808 insertions(+)
create mode 100644 Documentation/security/diglim/architecture.rst
create mode 100644 Documentation/security/diglim/implementation.rst
create mode 100644 Documentation/security/diglim/index.rst
create mode 100644 Documentation/security/diglim/introduction.rst
create mode 100644 Documentation/security/diglim/remote_attestation.rst
create mode 100644 Documentation/security/diglim/tests.rst
create mode 100644 include/linux/diglim.h
create mode 100644 include/uapi/linux/diglim.h
create mode 100644 security/integrity/diglim/Kconfig
create mode 100644 security/integrity/diglim/Makefile
create mode 100644 security/integrity/diglim/diglim.h
create mode 100644 security/integrity/diglim/fs.c
create mode 100644 security/integrity/diglim/ima.c
create mode 100644 security/integrity/diglim/methods.c
create mode 100644 security/integrity/diglim/parser.c
create mode 100644 tools/testing/selftests/diglim/Makefile
create mode 100644 tools/testing/selftests/diglim/common.c
create mode 100644 tools/testing/selftests/diglim/common.h
create mode 100644 tools/testing/selftests/diglim/config
create mode 100644 tools/testing/selftests/diglim/selftest.c
--
2.25.1
Changes from v1 -> v2:
- Substantially rewrote "fix feature support detection"; previously, it tried to
do some larger refactor wherein the global test_uffdio_* variables were
removed. This was controversial, so it now simply queries features in
set_test_type, and leaves the rest of the program structure largely the same.
- The "fix calculation of expected ioctls" patch is conceptually the same as v1,
but changed slightly to fit with the modified feature support detection in v2.
- Moved patch 3/3 to 1/3, since it is uncontroversial and could be merged on its
own. I don't want the other two to cause merge conflicts for it in future
versions.
- Picked up a R-B.
Axel Rasmussen (3):
userfaultfd/selftests: don't rely on GNU extensions for random numbers
userfaultfd/selftests: fix feature support detection
userfaultfd/selftests: fix calculation of expected ioctls
tools/testing/selftests/vm/userfaultfd.c | 157 +++++++++++------------
1 file changed, 73 insertions(+), 84 deletions(-)
--
2.33.0.800.g4c38ced690-goog
The frequency of the rss_stat trace event is known to be of the same
magnitude as that of the sched_switch event on Android devices. This can
cause flooding of the trace buffer with rss_stat traces leading to a
decreased trace buffer capacity and loss of data.
If it is not necessary to monitor very small changes in rss (as is the
case in Android) then the rss_stat tracepoint can be throttled to only
emit the event once there is a large enough change in the rss size.
The original patch that introduced the rss_stat tracepoint also proposed
a fixed throttling mechanism that only emits the rss_stat event
when the rss size crosses a 512KB boundary. It was concluded that more
generic support for this type of filtering/throttling was need, so that
it can be applied to any trace event. [1]
>From the discussion in [1], histogram triggers seemed the most likely
candidate to support this type of throttling. For instance to achieve the
same throttling as was proposed in [1]:
(1) Create a histogram variable to save the 512KB bucket of the rss size
(2) Use the onchange handler to generate a synthetic event when the
rss size bucket changes.
The only missing pieces to support such a hist trigger are:
(1) Support for setting a hist variable to a specific value -- to set
the bucket size / granularity.
(2) Support for division arithmetic operation -- to determine the
corresponding bucket for an rss size.
This series extends histogram trigger expressions to:
(1) Allow assigning numeric literals to hist variable (eg. x=1234)
and using literals directly in expressions (eg. x=size/1234)
(2) Support division and multiplication in hist expressions.
(eg. a=$x/$y*z); and
(3) Fixes expression parsing for non-associative operators: subtraction
and division. (eg. 8-4-2 should be 2 not 6)
The rss_stat event can then be throttled using histogram triggers as
below:
# Create a synthetic event to monitor instead of the high frequency
# rss_stat event
echo 'rss_stat_throttled unsigned int mm_id; unsigned int curr;
int member; long size' >> tracing/synthetic_events
# Create a hist trigger that emits the synthetic rss_stat_throttled
# event only when the rss size crosses a 512KB boundary.
echo 'hist:keys=common_pid:bucket=size/0x80000:onchange($bucket)
.rss_stat_throttled(mm_id,curr,member,size)'
>> events/kmem/rss_stat/trigger
------ Test Results ------
Histograms can also be used to evaluate the effectiveness of this
throttling by noting the Total Hits on each trigger:
echo 'hist:keys=common_pid' >> events/sched/sched_switch/trigger
echo 'hist:keys=common_pid' >> events/kmem/rss_stat/trigger
echo 'hist:keys=common_pid'
>> events/synthetic/rss_stat_throttled/trigger
Allowing the above example (512KB granularity) run for 5 minutes on
an arm64 device with 5.10 kernel:
sched_switch : total hits = 147153
rss_stat : total hits = 38863
rss_stat_throttled: total hits = 2409
The synthetic rss_stat_throttled event is ~16x less frequent than the
rss_stat event when using a 512KB granularity.
The results are more pronounced when rss size is changing at a higher
rate in small increments. For instance the following results were obtained
by recording the hits on the above events for a run of Android's
lmkd_unit_test [2], which continually forks processes that map anonymous
memory until there is an oom kill:
sched_switch : total hits = 148832
rss_stat : total hits = 4754802
rss_stat_throttled: total hits = 96214
In this stress this, the synthetic rss_stat_throttled event is ~50x less
frequent than the rss_stat event when using a 512KB granularity.
[1] https://lore.kernel.org/lkml/20190903200905.198642-1-joel@joelfernandes.org/
[2] https://cs.android.com/android/platform/superproject/+/master:system/memory…
Signed-off-by: Kalesh Singh <kaleshsingh(a)google.com>
Kalesh Singh (5):
tracing: Add support for creating hist trigger variables from literal
tracing: Add division and multiplication support for hist triggers
tracing: Fix operator precedence for hist triggers expression
tracing/selftests: Add tests for hist trigger expression parsing
tracing/histogram: Document expression arithmetic and constants
Documentation/trace/histogram.rst | 14 +
kernel/trace/trace_events_hist.c | 318 +++++++++++++++---
.../testing/selftests/ftrace/test.d/functions | 4 +-
.../trigger/trigger-hist-expressions.tc | 73 ++++
4 files changed, 357 insertions(+), 52 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-expressions.tc
base-commit: 3ca706c189db861b2ca2019a0901b94050ca49d8
--
2.33.0.309.g3052b89438-goog
From: "George G. Davis" <davis.george(a)siemens.com>
When executing transhuge-stress with an argument to specify the virtual
memory size for testing, the ram size is reported as 0, e.g.
transhuge-stress 384
thp-mmap: allocate 192 transhuge pages, using 384 MiB virtual memory and 0 MiB of ram
thp-mmap: 0.184 s/loop, 0.957 ms/page, 2090.265 MiB/s 192 succeed, 0 failed
This appears to be due to a thinko in commit 0085d61fe05e
("selftests/vm/transhuge-stress: stress test for memory compaction"),
where, at a guess, the intent was to base "xyz MiB of ram" on `ram`
size. Here are results after using `ram` size:
thp-mmap: allocate 192 transhuge pages, using 384 MiB virtual memory and 14 MiB of ram
Fixes: 0085d61fe05e ("selftests/vm/transhuge-stress: stress test for memory compaction")
Signed-off-by: George G. Davis <davis.george(a)siemens.com>
---
tools/testing/selftests/vm/transhuge-stress.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vm/transhuge-stress.c b/tools/testing/selftests/vm/transhuge-stress.c
index fd7f1b4a96f9..5e4c036f6ad3 100644
--- a/tools/testing/selftests/vm/transhuge-stress.c
+++ b/tools/testing/selftests/vm/transhuge-stress.c
@@ -79,7 +79,7 @@ int main(int argc, char **argv)
warnx("allocate %zd transhuge pages, using %zd MiB virtual memory"
" and %zd MiB of ram", len >> HPAGE_SHIFT, len >> 20,
- len >> (20 + HPAGE_SHIFT - PAGE_SHIFT - 1));
+ ram >> (20 + HPAGE_SHIFT - PAGE_SHIFT - 1));
pagemap_fd = open("/proc/self/pagemap", O_RDONLY);
if (pagemap_fd < 0)
--
2.17.1
Hi David,
I am running into the following warning when try to build this test:
madv_populate.c:334:2: warning: #warning "missing MADV_POPULATE_READ or MADV_POPULATE_WRITE definition" [-Wcpp]
334 | #warning "missing MADV_POPULATE_READ or MADV_POPULATE_WRITE definition"
| ^~~~~~~
I see that the following handling is in place. However there is no
other information to explain why the check is necessary.
#if defined(MADV_POPULATE_READ) && defined(MADV_POPULATE_WRITE)
#else /* defined(MADV_POPULATE_READ) && defined(MADV_POPULATE_WRITE) */
#warning "missing MADV_POPULATE_READ or MADV_POPULATE_WRITE definition"
I do see these defined in:
include/uapi/asm-generic/mman-common.h:#define MADV_POPULATE_READ 22
include/uapi/asm-generic/mman-common.h:#define MADV_POPULATE_WRITE 23
Is this the case of missing include from madv_populate.c?
thanks,
-- Shuah
This series provides initial support for the ARMv9 Scalable Matrix
Extension (SME). SME takes the approach used for vectors in SVE and
extends this to provide architectural support for matrix operations. A
more detailed overview can be found in [1].
For the kernel SME can be thought of as a series of features which are
intended to be used together by applications but operate mostly
orthogonally:
- The ZA matrix register.
- Streaming mode, in which ZA can be accessed and a subset of SVE
features are available.
- A second vector length, used for streaming mode SVE and ZA and
controlled using a similar interface to that for SVE.
- TPIDR2, a new userspace controllable system register intended for use
by the C library for storing context related to the ZA ABI.
A substantial part of the series is dedicated to refactoring the
existing SVE support so that we don't need to duplicate code for
handling vector lengths and the SVE registers, this involves creating an
array of vector types and making the users take the vector type as a
parameter. I'm not 100% happy with this but wasn't able to come up with
anything better, duplicating code definitely felt like a bad idea so
this felt like the least bad thing. If this approach makes sense to
people it might make sense to split this off into a separate series
and/or merge it while the rest is pending review to try to make things a
little more digestable, the series is very large so it'd probably make
things easier to digest if some of the preparatory refactoring could be
merged before the rest is ready.
One feature of the architecture of particular note is that switching
to and from streaming mode may change the size of and invalidate the
contents of the SVE registers, and when in streaming mode the FFR is not
accessible. This complicates aspects of the ABI like signal handling
and ptrace.
This initial implementation is mainly intended to get the ABI in place,
there are several areas which will be worked on going forwards - some of
these will be blockers, others could be handled in followup serieses:
- KVM is not currently supported and we depend on !KVM, in hopefully
the next version I will add support for coexisting with KVM and then
in a subsequent series implement real support for KVM guests.
- It is likely some build configurations have issues, I've not fully
checked this yet. In general testing is still ongoing, I anticipate
finding and fixing some issues in the implementation.
- No support is currently provided for scheduler control of SME or SME
applications, given the size of the SME register state the context
switch overhead may be noticable so this may be needed especially for
real time applications. Similar concerns already exist for larger
SVE vector lengths but are amplified for SME, particularly as the
vector length increases.
- There has been no work on optimising the performance of anything the
kernel does.
It is not expected that any systems will be encountered that support SME
but not SVE, SME is an ARMv9 feature and SVE is mandatory for ARMv9.
The code attempts to handle any such systems that are encountered but
this hasn't been tested extensively.
Due to dependencies on kselftest changes already upstreamed this series
is based on for-next/kselftest in the arm64 tree.
[1] https://community.arm.com/developer/ip-products/processors/b/processors-ip-…
Mark Brown (38):
arm64/fp: Reindent fpsimd_save()
arm64/sve: Remove sve_load_from_fpsimd_state()
arm64/sve: Make access to FFR optional
arm64/sve: Rename find_supported_vector_length()
arm64/sve: Use accessor functions for vector lengths in thread_struct
arm64/sve: Put system wide vector length information into structs
arm64/sve: Explicitly load vector length when restoring SVE state
arm64/sve: Track vector lengths for tasks in an array
arm64/sve: Make sysctl interface for SVE reusable by SME
arm64/sve: Generalise vector length configuration prctl() for SME
selftests: arm64: Parameterise ptrace vector length information
arm64/sme: Provide ABI documentation for SME
arm64/sme: System register and exception syndrome definitions
arm64/sme: Define macros for manually encoding SME instructions
arm64/sme: Early CPU setup for SME
arm64/sme: Basic enumeration support
arm64/sme: Identify supported SME vector lengths at boot
arm64/sme: Implement sysctl to set the default vector length
arm64/sme: Implement vector length configuration prctl()s
arm64/sme: Implement support for TPIDR2
arm64/sme: Implement SVCR context switching
arm64/sme: Implement streaming SVE context switching
arm64/sme: Implement ZA context switching
arm64/sme: Implement traps and syscall handling for SME
arm64/sme: Implement streaming SVE signal handling
arm64/sme: Implement ZA signal handling
arm64/sme: Implement ptrace support for streaming mode SVE registers
arm64/sme: Add ptrace support for ZA
arm64/sme: Disable streaming mode and ZA when flushing CPU state
arm64/sme: Save and restore streaming mode over EFI runtime calls
arm64/sme: Provide Kconfig for SME
kselftest/arm64: Add tests for TPIDR2
kselftest/arm64: Extend vector configuration API tests to cover SME
kselftest/arm64: sme: Provide streaming mode SVE stress test
kselftest/arm64: Add stress test for SME ZA context switching
kselftest/arm64: signal: Add SME signal handling tests
selftests: arm64: Add streaming SVE to SVE ptrace tests
selftests: arm64: Add coverage for the ZA ptrace interface
Documentation/arm64/elf_hwcaps.rst | 29 +
Documentation/arm64/index.rst | 1 +
Documentation/arm64/sme.rst | 427 +++++++++
Documentation/arm64/sve.rst | 62 +-
arch/arm64/Kconfig | 11 +
arch/arm64/include/asm/cpu.h | 4 +
arch/arm64/include/asm/cpufeature.h | 18 +
arch/arm64/include/asm/el2_setup.h | 36 +
arch/arm64/include/asm/esr.h | 3 +-
arch/arm64/include/asm/exception.h | 1 +
arch/arm64/include/asm/fpsimd.h | 178 +++-
arch/arm64/include/asm/fpsimdmacros.h | 94 +-
arch/arm64/include/asm/hwcap.h | 7 +
arch/arm64/include/asm/kvm_arm.h | 1 +
arch/arm64/include/asm/processor.h | 67 +-
arch/arm64/include/asm/sysreg.h | 53 ++
arch/arm64/include/asm/thread_info.h | 4 +-
arch/arm64/include/uapi/asm/hwcap.h | 7 +
arch/arm64/include/uapi/asm/ptrace.h | 69 +-
arch/arm64/include/uapi/asm/sigcontext.h | 55 +-
arch/arm64/kernel/cpufeature.c | 96 +-
arch/arm64/kernel/cpuinfo.c | 12 +
arch/arm64/kernel/entry-common.c | 10 +
arch/arm64/kernel/entry-fpsimd.S | 63 +-
arch/arm64/kernel/fpsimd.c | 900 ++++++++++++++----
arch/arm64/kernel/process.c | 19 +-
arch/arm64/kernel/ptrace.c | 358 ++++++-
arch/arm64/kernel/signal.c | 189 +++-
arch/arm64/kernel/syscall.c | 49 +-
arch/arm64/kernel/traps.c | 1 +
arch/arm64/kvm/fpsimd.c | 3 +-
arch/arm64/kvm/hyp/fpsimd.S | 6 +-
arch/arm64/kvm/reset.c | 14 +-
arch/arm64/tools/cpucaps | 1 +
include/uapi/linux/elf.h | 2 +
include/uapi/linux/prctl.h | 9 +
kernel/sys.c | 6 +
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/abi/.gitignore | 1 +
tools/testing/selftests/arm64/abi/Makefile | 13 +
tools/testing/selftests/arm64/abi/tpidr2.c | 204 ++++
tools/testing/selftests/arm64/fp/.gitignore | 4 +
tools/testing/selftests/arm64/fp/Makefile | 12 +-
tools/testing/selftests/arm64/fp/rdvl-sme.c | 14 +
tools/testing/selftests/arm64/fp/rdvl.S | 16 +
tools/testing/selftests/arm64/fp/rdvl.h | 1 +
tools/testing/selftests/arm64/fp/ssve-stress | 59 ++
tools/testing/selftests/arm64/fp/sve-ptrace.c | 203 ++--
tools/testing/selftests/arm64/fp/sve-test.S | 30 +
tools/testing/selftests/arm64/fp/vec-syscfg.c | 10 +
tools/testing/selftests/arm64/fp/za-ptrace.c | 353 +++++++
tools/testing/selftests/arm64/fp/za-stress | 59 ++
tools/testing/selftests/arm64/fp/za-test.S | 545 +++++++++++
.../testing/selftests/arm64/signal/.gitignore | 2 +
.../selftests/arm64/signal/test_signals.h | 2 +
.../arm64/signal/test_signals_utils.c | 3 +
.../testcases/fake_sigreturn_sme_change_vl.c | 92 ++
.../selftests/arm64/signal/testcases/sme_vl.c | 70 ++
.../arm64/signal/testcases/ssve_regs.c | 129 +++
59 files changed, 4293 insertions(+), 396 deletions(-)
create mode 100644 Documentation/arm64/sme.rst
create mode 100644 tools/testing/selftests/arm64/abi/.gitignore
create mode 100644 tools/testing/selftests/arm64/abi/Makefile
create mode 100644 tools/testing/selftests/arm64/abi/tpidr2.c
create mode 100644 tools/testing/selftests/arm64/fp/rdvl-sme.c
create mode 100644 tools/testing/selftests/arm64/fp/ssve-stress
create mode 100644 tools/testing/selftests/arm64/fp/za-ptrace.c
create mode 100644 tools/testing/selftests/arm64/fp/za-stress
create mode 100644 tools/testing/selftests/arm64/fp/za-test.S
create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_sme_change_vl.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_vl.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/ssve_regs.c
base-commit: 8694e5e6388695195a32bd5746635ca166a8df56
--
2.20.1
Stop tracing before searching the pattern which is expected to
not exist on the trace buffer. In some case, it will take too
long and may not come back eternally because while searching
the tracing data will be increased by the searching activity.
I found this with enabling kernel debug options, like kmemleak,
lockdep etc. and run it on qemu with 2 CPUs. It did not come
back in 20 minutes and finally I need to interrupt it to stop.
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
---
.../ftrace/test.d/ftrace/func_profiler.tc | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func_profiler.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func_profiler.tc
index 1dbd766c0cd2..440f4d87aa4b 100644
--- a/tools/testing/selftests/ftrace/test.d/ftrace/func_profiler.tc
+++ b/tools/testing/selftests/ftrace/test.d/ftrace/func_profiler.tc
@@ -56,6 +56,9 @@ clear_trace
sleep 1
echo "make sure something other than scheduler is being traced"
+
+echo 0 > tracing_on
+
if ! grep -v -e '^#' -e 'schedule' trace > /dev/null; then
cat trace
fail "no other functions besides schedule was found"
Those are few patches I was working on lately, all somewhat related
to the two CVEs that I found recently.
First 7 patches fix various minor bugs that relate to these CVEs.
The rest of the patches implement various optional SVM features,
some of which the guest could enable anyway due to incorrect
checking of virt_ext field.
Last patch is somewhat an RFC, I would like to hear your opinion
on that.
I also implemented nested TSC scaling while at it.
As for other optional SVM features here is my summary of few features
I took a look at:
X86_FEATURE_DECODEASSISTS:
this feature should make it easier
for the L1 to emulate an instruction on MMIO access, by not
needing to read the guest memory but rather using the instruction
bytes that the CPU already fetched.
The challenge of implementing this is that we sometimes inject
#PF and #NPT syntenically and in those cases we must be sure
we set the correct instruction bytes.
Also this feature adds assists for MOV CR/DR, INTn, and INVLPG,
which aren't that interesting but must be supported as well to
expose this feature to the nested guest.
X86_FEATURE_VGIF
Might allow the L2 to run the L3 a bit faster, but due to crazy complex
logic we already have around int_ctl and vgif probably not worth it.
X86_FEATURE_VMCBCLEAN
Should just be enabled, because otherwise L1 doesn't even attempt
to set the clean bits. But we need to know if we can take an
advantage of these bits first.
X86_FEATURE_FLUSHBYASID
X86_FEATURE_AVIC
These two features would be very good to enable, but that
would require lots of work, and will be done eventually.
There are few more nested SVM features that I didn't yet had a
chance to take a look at.
Best regards,
Maxim Levitsky
Maxim Levitsky (14):
KVM: x86: nSVM: restore int_vector in svm_clear_vintr
KVM: x86: selftests: test simultaneous uses of V_IRQ from L1 and L0
KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround
KVM: x86: nSVM: don't copy pause related settings
KVM: x86: nSVM: don't copy virt_ext from vmcb12
KVM: x86: SVM: don't set VMLOAD/VMSAVE intercepts on vCPU reset
KVM: x86: SVM: add warning for CVE-2021-3656
KVM: x86: SVM: add module param to control LBR virtualization
KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running
KVM: x86: nSVM: implement nested LBR virtualization
KVM: x86: nSVM: implement nested VMLOAD/VMSAVE
KVM: x86: SVM: add module param to control TSC scaling
KVM: x86: nSVM: implement nested TSC scaling
KVM: x86: nSVM: support PAUSE filter threshold and count
arch/x86/kvm/svm/nested.c | 105 +++++++--
arch/x86/kvm/svm/svm.c | 218 +++++++++++++++---
arch/x86/kvm/svm/svm.h | 20 +-
arch/x86/kvm/vmx/vmx.c | 1 +
arch/x86/kvm/x86.c | 1 +
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/x86_64/svm_int_ctl_test.c | 128 ++++++++++
8 files changed, 427 insertions(+), 48 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/svm_int_ctl_test.c
--
2.26.3
Allow running each suite or each test case alone per kernel boot.
The motivation for this is to debug "test hermeticity" issues.
This new --run_isolated flag would be a good first step to try and
narrow down root causes.
Context: sometimes tests pass/fail depending on what ran before them.
Memory corruption errors in particular might only cause noticeable
issues later on. But you can also have the opposite, where "fixing" one
test causes another to start failing.
Usage:
$ ./tools/testing/kunit/kunit.py run --kunitconfig=lib/kunit --run_isolated=suite
$ ./tools/testing/kunit/kunit.py run --kunitconfig=lib/kunit --run_isolated=test
$ ./tools/testing/kunit/kunit.py run --kunitconfig=lib/kunit --run_isolated=test example
The last one would provide output like
======== [PASSED] example ========
[PASSED] example_simple_test
============================================================
Testing complete. 1 tests run. 0 failed. 0 crashed. 0 skipped.
Starting KUnit Kernel (2/3)...
============================================================
======== [SKIPPED] example ========
[SKIPPED] example_skip_test # SKIP this test should be skipped
============================================================
Testing complete. 1 tests run. 0 failed. 0 crashed. 1 skipped.
Starting KUnit Kernel (3/3)...
============================================================
======== [SKIPPED] example ========
[SKIPPED] example_mark_skipped_test # SKIP this test should be skipped
============================================================
Testing complete. 1 tests run. 0 failed. 0 crashed. 1 skipped.
See the last patch's description for a bit more detail.
Meta:
The first patch is from another series with just a reworded commit
message, https://lore.kernel.org/linux-kselftest/20210805235145.2528054-2-dlatypov@g…
This patch series is based on Shuah's kunit branch:
https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git/?…
Changes:
v1 -> v2: rebase onto Shuah's kunit branch, fix missing code in patch 1.
v2 -> v3: fix mypy errors, drop test plan from output, fix pre-existing
bug where kunit was not actually tracking test execution time (new patch 3).
v3 -> v4: attempt to filter out non-KUnit dmesg output when getting list
of test names, using this regex: ^[^\s.]+\.[^\s.]+$
Daniel Latypov (4):
kunit: add 'kunit.action' param to allow listing out tests
kunit: tool: factor exec + parse steps into a function
kunit: tool: actually track how long it took to run tests
kunit: tool: support running each suite/test separately
lib/kunit/executor.c | 45 ++++++++-
tools/testing/kunit/kunit.py | 134 +++++++++++++++++--------
tools/testing/kunit/kunit_tool_test.py | 40 ++++++++
3 files changed, 173 insertions(+), 46 deletions(-)
base-commit: 3b29021ddd10cfb6b2565c623595bd3b02036f33
--
2.33.0.800.g4c38ced690-goog
[root@iaas-rpma gpio]# make
gcc gpio-mockup-cdev.c -o /home/lizhijian/linux/tools/testing/selftests/gpio/gpio-mockup-cdev
gpio-mockup-cdev.c: In function ‘request_line_v2’:
gpio-mockup-cdev.c:24:30: error: storage size of ‘req’ isn’t known
24 | struct gpio_v2_line_request req;
| ^~~
gpio-mockup-cdev.c:32:14: error: ‘GPIO_V2_LINE_FLAG_OUTPUT’ undeclared (first use in this function); did you mean ‘GPIOLINE_FLAG_IS_OUT’?
32 | if (flags & GPIO_V2_LINE_FLAG_OUTPUT) {
| ^~~~~~~~~~~~~~~~~~~~~~~~
gpio-mockup-cdev.c includes <linux/gpio.h> which could be provided by
kernel-headers package, and where it's expected to declare
GPIO_V2_LINE_FLAG_OUTPUT. However distros or developers will not always
install the same kernel-header as we are compiling.
So we can tell compiler to search headers from linux tree simply like others,
such as sched.
CC: Philip Li <philip.li(a)intel.com>
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Li Zhijian <lizhijian(a)cn.fujitsu.com>
---
V2: add more details about the fix
---
tools/testing/selftests/gpio/Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/gpio/Makefile b/tools/testing/selftests/gpio/Makefile
index 39f2bbe8dd3d..42ea7d2aa844 100644
--- a/tools/testing/selftests/gpio/Makefile
+++ b/tools/testing/selftests/gpio/Makefile
@@ -3,5 +3,6 @@
TEST_PROGS := gpio-mockup.sh
TEST_FILES := gpio-mockup-sysfs.sh
TEST_GEN_PROGS_EXTENDED := gpio-mockup-cdev
+CFLAGS += -I../../../../usr/include
include ../lib.mk
--
2.31.1
Consider this attempt to run KUnit in QEMU:
$ ./tools/testing/kunit/kunit.py run --arch=x86
Before you'd get this error message:
kunit_kernel.ConfigError: x86 is not a valid arch
After:
kunit_kernel.ConfigError: x86 is not a valid arch, options are ['alpha', 'arm', 'arm64', 'i386', 'powerpc', 'riscv', 's390', 'sparc', 'x86_64']
This should make it a bit easier for people to notice when they make
typos, etc. Currently, one would have to dive into the python code to
figure out what the valid set is.
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
tools/testing/kunit/kunit_kernel.py | 5 +++--
tools/testing/kunit/kunit_tool_test.py | 4 ++++
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
index 1870e75ff153..a6b3cee3f0d0 100644
--- a/tools/testing/kunit/kunit_kernel.py
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -198,8 +198,9 @@ def get_source_tree_ops(arch: str, cross_compile: Optional[str]) -> LinuxSourceT
return LinuxSourceTreeOperationsUml(cross_compile=cross_compile)
elif os.path.isfile(config_path):
return get_source_tree_ops_from_qemu_config(config_path, cross_compile)[1]
- else:
- raise ConfigError(arch + ' is not a valid arch')
+
+ options = [f[:-3] for f in os.listdir(QEMU_CONFIGS_DIR) if f.endswith('.py')]
+ raise ConfigError(arch + ' is not a valid arch, options are ' + str(sorted(options)))
def get_source_tree_ops_from_qemu_config(config_path: str,
cross_compile: Optional[str]) -> Tuple[
diff --git a/tools/testing/kunit/kunit_tool_test.py b/tools/testing/kunit/kunit_tool_test.py
index cad37a98e599..2ae72f04cbe0 100755
--- a/tools/testing/kunit/kunit_tool_test.py
+++ b/tools/testing/kunit/kunit_tool_test.py
@@ -289,6 +289,10 @@ class LinuxSourceTreeTest(unittest.TestCase):
pass
kunit_kernel.LinuxSourceTree('', kunitconfig_path=dir)
+ def test_invalid_arch(self):
+ with self.assertRaisesRegex(kunit_kernel.ConfigError, 'not a valid arch, options are.*x86_64'):
+ kunit_kernel.LinuxSourceTree('', arch='invalid')
+
# TODO: add more test cases.
base-commit: 865a0a8025ee0b54d1cc74834c57197d184a441e
--
2.33.0.685.g46640cef36-goog
Drop some variables in unit tests that were unused and/or add assertions
based on them.
For ExitStack, it was imported, but the `es` variable wasn't used so it
didn't do anything, and we were leaking the file objects.
Refactor it to just use nested `with` statements to properly close them.
And drop the direct use of .close() on file objects in the kunit tool
unit test, as these can be leaked if test assertions fail.
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
tools/testing/kunit/kunit.py | 1 -
tools/testing/kunit/kunit_kernel.py | 12 ++++--------
tools/testing/kunit/kunit_tool_test.py | 18 ++++++++----------
3 files changed, 12 insertions(+), 19 deletions(-)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index 66f67af97971..1b2b7f06bb8c 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -18,7 +18,6 @@ from collections import namedtuple
from enum import Enum, auto
from typing import Iterable
-import kunit_config
import kunit_json
import kunit_kernel
import kunit_parser
diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
index 2c6f916ccbaf..1870e75ff153 100644
--- a/tools/testing/kunit/kunit_kernel.py
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -14,10 +14,6 @@ import shutil
import signal
from typing import Iterator, Optional, Tuple
-from contextlib import ExitStack
-
-from collections import namedtuple
-
import kunit_config
import kunit_parser
import qemu_config
@@ -168,10 +164,10 @@ class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations):
process.wait()
kunit_parser.print_with_timestamp(
'Disabling broken configs to run KUnit tests...')
- with ExitStack() as es:
- config = open(get_kconfig_path(build_dir), 'a')
- disable = open(BROKEN_ALLCONFIG_PATH, 'r').read()
- config.write(disable)
+
+ with open(get_kconfig_path(build_dir), 'a') as config:
+ with open(BROKEN_ALLCONFIG_PATH, 'r') as disable:
+ config.write(disable.read())
kunit_parser.print_with_timestamp(
'Starting Kernel with all configs takes a few minutes...')
diff --git a/tools/testing/kunit/kunit_tool_test.py b/tools/testing/kunit/kunit_tool_test.py
index 619c4554cbff..cad37a98e599 100755
--- a/tools/testing/kunit/kunit_tool_test.py
+++ b/tools/testing/kunit/kunit_tool_test.py
@@ -185,7 +185,7 @@ class KUnitParserTest(unittest.TestCase):
kunit_parser.extract_tap_lines(file.readlines()))
print_mock.assert_any_call(StrContains('could not parse test results!'))
print_mock.stop()
- file.close()
+ self.assertEqual(0, len(result.suites))
def test_crashed_test(self):
crashed_log = test_data_path('test_is_test_passed-crash.log')
@@ -197,24 +197,22 @@ class KUnitParserTest(unittest.TestCase):
def test_skipped_test(self):
skipped_log = test_data_path('test_skip_tests.log')
- file = open(skipped_log)
- result = kunit_parser.parse_run_tests(file.readlines())
+ with open(skipped_log) as file:
+ result = kunit_parser.parse_run_tests(file.readlines())
# A skipped test does not fail the whole suite.
self.assertEqual(
kunit_parser.TestStatus.SUCCESS,
result.status)
- file.close()
def test_skipped_all_tests(self):
skipped_log = test_data_path('test_skip_all_tests.log')
- file = open(skipped_log)
- result = kunit_parser.parse_run_tests(file.readlines())
+ with open(skipped_log) as file:
+ result = kunit_parser.parse_run_tests(file.readlines())
self.assertEqual(
kunit_parser.TestStatus.SKIPPED,
result.status)
- file.close()
def test_ignores_prefix_printk_time(self):
@@ -283,13 +281,13 @@ class LinuxSourceTreeTest(unittest.TestCase):
def test_valid_kunitconfig(self):
with tempfile.NamedTemporaryFile('wt') as kunitconfig:
- tree = kunit_kernel.LinuxSourceTree('', kunitconfig_path=kunitconfig.name)
+ kunit_kernel.LinuxSourceTree('', kunitconfig_path=kunitconfig.name)
def test_dir_kunitconfig(self):
with tempfile.TemporaryDirectory('') as dir:
- with open(os.path.join(dir, '.kunitconfig'), 'w') as f:
+ with open(os.path.join(dir, '.kunitconfig'), 'w'):
pass
- tree = kunit_kernel.LinuxSourceTree('', kunitconfig_path=dir)
+ kunit_kernel.LinuxSourceTree('', kunitconfig_path=dir)
# TODO: add more test cases.
base-commit: 3b29021ddd10cfb6b2565c623595bd3b02036f33
--
2.33.0.685.g46640cef36-goog
Good day,
My name is Luis Fernandez.I would like to discuss something
important that will benefit both of us. I will send you more
details upon your response
Regards
Luis Fernandez
Allow running each suite or each test case alone per kernel boot.
The motivation for this is to debug "test hermeticity" issues.
This new --run_isolated flag would be a good first step to try and
narrow down root causes.
Context: sometimes tests pass/fail depending on what ran before them.
Memory corruption errors in particular might only cause noticeable
issues later on. But you can also have the opposite, where "fixing" one
test causes another to start failing.
Usage:
$ ./tools/testing/kunit/kunit.py run --kunitconfig=lib/kunit --run_isolated=suite
$ ./tools/testing/kunit/kunit.py run --kunitconfig=lib/kunit --run_isolated=test
$ ./tools/testing/kunit/kunit.py run --kunitconfig=lib/kunit --run_isolated=test example
The last one would provide output like
======== [PASSED] example ========
[PASSED] example_simple_test
============================================================
Testing complete. 1 tests run. 0 failed. 0 crashed. 0 skipped.
Starting KUnit Kernel (2/3)...
============================================================
======== [SKIPPED] example ========
[SKIPPED] example_skip_test # SKIP this test should be skipped
============================================================
Testing complete. 1 tests run. 0 failed. 0 crashed. 1 skipped.
Starting KUnit Kernel (3/3)...
============================================================
======== [SKIPPED] example ========
[SKIPPED] example_mark_skipped_test # SKIP this test should be skipped
============================================================
Testing complete. 1 tests run. 0 failed. 0 crashed. 1 skipped.
See the last patch's description for a bit more detail.
Meta:
The first patch is from another series with just a reworded commit
message, https://lore.kernel.org/linux-kselftest/20210805235145.2528054-2-dlatypov@g…
This patch series is based on Shuah's kunit branch:
https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git/?…
Changes:
v1 -> v2: rebase onto Shuah's kunit branch, fix missing code in patch 1.
v2 -> v3: fix mypy errors, drop test plan from output, fix pre-existing
bug where kunit was not actually tracking test execution time (new patch 3).
Daniel Latypov (4):
kunit: add 'kunit.action' param to allow listing out tests
kunit: tool: factor exec + parse steps into a function
kunit: tool: actually track how long it took to run tests
kunit: tool: support running each suite/test separately
lib/kunit/executor.c | 45 ++++++++-
tools/testing/kunit/kunit.py | 129 +++++++++++++++++--------
tools/testing/kunit/kunit_tool_test.py | 40 ++++++++
3 files changed, 169 insertions(+), 45 deletions(-)
base-commit: 3b29021ddd10cfb6b2565c623595bd3b02036f33
--
2.33.0.685.g46640cef36-goog
The structleak plugin causes the stack frame size to grow immensely when
used with KUnit; this is caused because KUnit allocates lots of
moderately sized structs on the stack as part of its assertion macro
implementation. For most tests with small to moderately sized tests
cases there are never enough KUnit assertions to be an issue at all;
even when a single test cases has many KUnit assertions, the compiler
should never put all these struct allocations on the stack at the same
time since the scope of the structs is so limited; however, the
structleak plugin does not seem to respect the compiler doing the right
thing and will still warn of excessive stack size in some cases.
These patches are not a permanent solution since new tests can be added
with huge test cases, but this serves as a stop gap to stop structleak
from being used on KUnit tests which will currently result in excessive
stack size.
Please see the discussion thread here[1] for more context.
Changes since last revision:
- Dropped mmc: sdhci-of-aspeed patch since it was not a pure test and I
could not reproduce the stack size warning anyway.
- Removed Wframe-larger-than=10240 warning from the bitfield kunit
test.
- All other patches are the same except with updated
reviewers/contributor commit footers.
[1] https://lore.kernel.org/linux-arm-kernel/CAFd5g44udqkDiYBWh+VeDVJ=ELXeoXwun…
Arnd Bergmann (1):
bitfield: build kunit tests without structleak plugin
Brendan Higgins (4):
gcc-plugins/structleak: add makefile var for disabling structleak
iio/test-format: build kunit tests without structleak plugin
device property: build kunit tests without structleak plugin
thunderbolt: build kunit tests without structleak plugin
drivers/base/test/Makefile | 2 +-
drivers/iio/test/Makefile | 1 +
drivers/thunderbolt/Makefile | 1 +
lib/Makefile | 2 +-
scripts/Makefile.gcc-plugins | 4 ++++
5 files changed, 8 insertions(+), 2 deletions(-)
base-commit: 02d5e016800d082058b3d3b7c3ede136cdc6ddcb
--
2.33.0.685.g46640cef36-goog
The structleak plugin causes the stack frame size to grow immensely when
used with KUnit; this is caused because KUnit allocates lots of
moderately sized structs on the stack as part of its assertion macro
implementation. For most tests with small to moderately sized tests
cases there are never enough KUnit assertions to be an issue at all;
even when a single test cases has many KUnit assertions, the compiler
should never put all these struct allocations on the stack at the same
time since the scope of the structs is so limited; however, the
structleak plugin does not seem to respect the compiler doing the right
thing and will still warn of excessive stack size in some cases.
These patches are not a permanent solution since new tests can be added
with huge test cases, but this serves as a stop gap to stop structleak
from being used on KUnit tests which will currently result in excessive
stack size.
Of the following patches, I think the thunderbolt patch may be
unnecessary since Linus already fixed that test. Additionally, I was not
able to reproduce the error on the sdhci-of-aspeed test. Nevertheless, I
included these tests cases for completeness. Please see my discussion
with Arnd for more context[1].
NOTE: Arnd did the legwork for most of these patches, but did not
actually share code for some of them, so I left his Signed-off-by off of
those patches as I don't want to misrepresent him. Arnd, please sign off
on those patches at your soonest convenience.
[1] https://lore.kernel.org/linux-arm-kernel/CAFd5g44udqkDiYBWh+VeDVJ=ELXeoXwun…
Arnd Bergmann (1):
bitfield: build kunit tests without structleak plugin
Brendan Higgins (5):
gcc-plugins/structleak: add makefile var for disabling structleak
iio/test-format: build kunit tests without structleak plugin
device property: build kunit tests without structleak plugin
thunderbolt: build kunit tests without structleak plugin
mmc: sdhci-of-aspeed: build kunit tests without structleak plugin
drivers/base/test/Makefile | 2 +-
drivers/iio/test/Makefile | 1 +
drivers/mmc/host/Makefile | 1 +
drivers/thunderbolt/Makefile | 1 +
lib/Makefile | 2 +-
scripts/Makefile.gcc-plugins | 4 ++++
6 files changed, 9 insertions(+), 2 deletions(-)
base-commit: 316346243be6df12799c0b64b788e06bad97c30b
--
2.33.0.464.g1972c5931b-goog
Allow running each suite or each test case alone per kernel boot.
The motivation for this is to debug "test hermeticity" issues.
This new --run_isolated flag would be a good first step to try and
narrow down root causes.
Context: sometimes tests pass/fail depending on what ran before them.
Memory corruption errors in particular might only cause noticeable
issues later on. But you can also have the opposite, where "fixing" one
test causes another to start failing.
Usage:
$ ./tools/testing/kunit/kunit.py run --kunitconfig=lib/kunit --run_isolated=suite
$ ./tools/testing/kunit/kunit.py run --kunitconfig=lib/kunit --run_isolated=test
$ ./tools/testing/kunit/kunit.py run --kunitconfig=lib/kunit --run_isolated=test example
The last one would provide output like
======== [PASSED] example ========
[PASSED] example_simple_test
============================================================
Testing complete. 1 tests run. 0 failed. 0 crashed. 0 skipped.
Starting KUnit Kernel (2/3)...
============================================================
======== [SKIPPED] example ========
[SKIPPED] example_skip_test # SKIP this test should be skipped
============================================================
Testing complete. 1 tests run. 0 failed. 0 crashed. 1 skipped.
Starting KUnit Kernel (3/3)...
============================================================
======== [SKIPPED] example ========
[SKIPPED] example_mark_skipped_test # SKIP this test should be skipped
============================================================
Testing complete. 1 tests run. 0 failed. 0 crashed. 1 skipped.
See the last patch's description for a bit more detail.
Meta:
The first patch is from another series with just a reworded commit
message, https://lore.kernel.org/linux-kselftest/20210805235145.2528054-2-dlatypov@g…
This patch series is based on Shuah's kunit branch:
https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git/?…
Changes:
v1 -> v2: rebase onto Shuah's kunit branch, fix missing code in patch 1.
Daniel Latypov (3):
kunit: add 'kunit.action' param to allow listing out tests
kunit: tool: factor exec + parse steps into a function
kunit: tool: support running each suite/test separately
lib/kunit/executor.c | 45 ++++++++-
tools/testing/kunit/kunit.py | 127 +++++++++++++++++--------
tools/testing/kunit/kunit_tool_test.py | 40 ++++++++
3 files changed, 167 insertions(+), 45 deletions(-)
base-commit: 3b29021ddd10cfb6b2565c623595bd3b02036f33
--
2.33.0.685.g46640cef36-goog
The nr_cpus = CPU_COUNT(&possible_mask) is the number of available CPUs in
possible_mask. As a result, the "cpu = i % nr_cpus" may always return CPU
that is not available in possible_mask.
Suppose the server has 8 CPUs. The below Failure is encountered immediately
if the task is bound to CPU 5 and 6.
==== Test Assertion Failure ====
rseq_test.c:228: i > (NR_TASK_MIGRATIONS / 2)
pid=10127 tid=10127 errno=4 - Interrupted system call
1 0x00000000004018e5: main at rseq_test.c:227
2 0x00007fcc8fc66bf6: ?? ??:0
3 0x0000000000401959: _start at ??:?
Only performed 4 KVM_RUNs, task stalled too much?
Signed-off-by: Dongli Zhang <dongli.zhang(a)oracle.com>
---
tools/testing/selftests/kvm/rseq_test.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
index c5e0dd664a7b..41df5173970c 100644
--- a/tools/testing/selftests/kvm/rseq_test.c
+++ b/tools/testing/selftests/kvm/rseq_test.c
@@ -10,6 +10,7 @@
#include <signal.h>
#include <syscall.h>
#include <sys/ioctl.h>
+#include <sys/sysinfo.h>
#include <asm/barrier.h>
#include <linux/atomic.h>
#include <linux/rseq.h>
@@ -43,6 +44,18 @@ static bool done;
static atomic_t seq_cnt;
+static int get_max_cpu_idx(void)
+{
+ int nproc = get_nprocs_conf();
+ int i, max = -ENOENT;
+
+ for (i = 0; i < nproc; i++)
+ if (CPU_ISSET(i, &possible_mask))
+ max = i;
+
+ return max;
+}
+
static void guest_code(void)
{
for (;;)
@@ -61,10 +74,13 @@ static void *migration_worker(void *ign)
{
cpu_set_t allowed_mask;
int r, i, nr_cpus, cpu;
+ int max_cpu_idx;
CPU_ZERO(&allowed_mask);
- nr_cpus = CPU_COUNT(&possible_mask);
+ max_cpu_idx = get_max_cpu_idx();
+ TEST_ASSERT(max_cpu_idx >= 0, "Invalid possible_mask");
+ nr_cpus = max_cpu_idx + 1;
for (i = 0; i < NR_TASK_MIGRATIONS; i++) {
cpu = i % nr_cpus;
--
2.17.1
This series fixes up a few issues introduced into vec-syscfg during
refactoring in the review process, then adds a new test which ensures
that the behaviour when we attempt to set a vector length which is not
supported by the current system matches what is documented in the SVE
ABI documentation.
v2:
- Fix handling of missing VLs when checking that vector length setting
works as expected.
Mark Brown (4):
selftests: arm64: Fix printf() format mismatch in vec-syscfg
selftests: arm64: Remove bogus error check on writing to files
selftests: arm64: Fix and enable test for setting current VL in
vec-syscfg
selftests: arm64: Verify that all possible vector lengths are handled
tools/testing/selftests/arm64/fp/vec-syscfg.c | 89 ++++++++++++++++---
1 file changed, 76 insertions(+), 13 deletions(-)
base-commit: 6880fa6c56601bb8ed59df6c30fd390cc5f6dd8f
--
2.20.1
This series fixes up a few issues introduced into vec-syscfg during
refactoring in the review process, then adds a new test which ensures
that the behaviour when we attempt to set a vector length which is not
supported by the current system matches what is documented in the SVE
ABI documentation.
v3:
- Rebased onto v5.14-rc3.
- Check to see if we discovered the system vector lengths before trying
to set all possible vector lengths since we need that information to
validate the results.
v2:
- Fix handling of missing VLs when checking that vector length setting
works as expected.
Mark Brown (4):
selftests: arm64: Fix printf() format mismatch in vec-syscfg
selftests: arm64: Remove bogus error check on writing to files
selftests: arm64: Fix and enable test for setting current VL in
vec-syscfg
selftests: arm64: Verify that all possible vector lengths are handled
tools/testing/selftests/arm64/fp/vec-syscfg.c | 95 ++++++++++++++++---
1 file changed, 82 insertions(+), 13 deletions(-)
base-commit: 5816b3e6577eaa676ceb00a848f0fd65fe2adc29
--
2.20.1
This series overhauls the selftests we have for the SVE ptrace interface
to make them much more comprehensive than they are currently, making the
coverage of the data read and written more complete. The new coverage
for setting data on all vector lengths showed the issue with using the
wrong buffer size with ptrace reported and fixed by:
https://lore.kernel.org/linux-arm-kernel/20210909165356.10675-1-broonie@ker…
(arm64/sve: Use correct size when reinitialising SVE state).
Mark Brown (8):
selftests: arm64: Use a define for the number of SVE ptrace tests to
be run
selftests: arm64: Don't log child creation as a test in SVE ptrace
test
selftests: arm64: Remove extraneous register setting code
selftests: arm64: Document what the SVE ptrace test is doing
selftests: arm64: Clarify output when verifying SVE register set
selftests: arm64: Verify interoperation of SVE and FPSIMD register
sets
selftests: arm64: More comprehensively test the SVE ptrace interface
selftests: arm64: Move FPSIMD in SVE ptrace test into a function
tools/testing/selftests/arm64/fp/Makefile | 2 +-
tools/testing/selftests/arm64/fp/TODO | 9 +-
.../selftests/arm64/fp/sve-ptrace-asm.S | 33 --
tools/testing/selftests/arm64/fp/sve-ptrace.c | 460 ++++++++++++------
4 files changed, 321 insertions(+), 183 deletions(-)
delete mode 100644 tools/testing/selftests/arm64/fp/sve-ptrace-asm.S
base-commit: 6880fa6c56601bb8ed59df6c30fd390cc5f6dd8f
--
2.20.1
RFC: https://lkml.org/lkml/2021/6/4/791
PATCH v1: https://lkml.org/lkml/2021/6/16/805
PATCH v2: https://lkml.org/lkml/2021/7/6/138
PATCH v3: https://lkml.org/lkml/2021/7/12/2799
PATCH v4: https://lkml.org/lkml/2021/7/16/532
PATCH v5: https://lkml.org/lkml/2021/7/19/247
PATCH v6: https://lkml.org/lkml/2021/7/20/36
PATCH v7: https://lkml.org/lkml/2021/7/23/26
Changelog v7-->v8
1. Rebased and tested against 5.15
2. Added a selftest to check if the energy and frequency attribues
exist and their files populated
Also, have implemented a POC using this interface for the powerpc-utils'
ppc64_cpu --frequency command-line tool to utilize this information
in userspace.
The POC for the new interface has been sent to the powerpc-utils mailing
list for early review: https://groups.google.com/g/powerpc-utils-devel/c/r4i7JnlyQ8s
Sample output from the powerpc-utils tool is as follows:
# ppc64_cpu --frequency
Power and Performance Mode: XXXX
Idle Power Saver Status : XXXX
Processor Folding Status : XXXX --> Printed if Idle power save status is supported
Platform reported frequencies --> Frequencies reported from the platform's H_CALL i.e PAPR interface
min : NNNN GHz
max : NNNN GHz
static : NNNN GHz
Tool Computed frequencies
min : NNNN GHz (cpu XX)
max : NNNN GHz (cpu XX)
avg : NNNN GHz
Pratik R. Sampat (2):
powerpc/pseries: Interface to represent PAPR firmware attributes
selftest/powerpc: Add PAPR sysfs attributes sniff test
.../sysfs-firmware-papr-energy-scale-info | 26 ++
arch/powerpc/include/asm/hvcall.h | 24 +-
arch/powerpc/kvm/trace_hv.h | 1 +
arch/powerpc/platforms/pseries/Makefile | 3 +-
.../pseries/papr_platform_attributes.c | 312 ++++++++++++++++++
tools/testing/selftests/powerpc/Makefile | 1 +
.../powerpc/papr_attributes/.gitignore | 2 +
.../powerpc/papr_attributes/Makefile | 7 +
.../powerpc/papr_attributes/attr_test.c | 107 ++++++
9 files changed, 481 insertions(+), 2 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c
create mode 100644 tools/testing/selftests/powerpc/papr_attributes/.gitignore
create mode 100644 tools/testing/selftests/powerpc/papr_attributes/Makefile
create mode 100644 tools/testing/selftests/powerpc/papr_attributes/attr_test.c
--
2.31.1
The phc.sh script in the ptp directory is still using exit 0 when
the test has been skipped due to some unmet requirements.
Use kselftest framework skip code instead so it can help us to
distinguish the return status.
Criterion to filter out what should be fixed in ptp directory:
grep -r "exit 0" -B1 | grep -i skip
This change might cause some false-positives if people are running
these test scripts directly and only checking their return codes,
which will change from 0 to 4. However I think the impact should be
small as most of our scripts here are already using this skip code.
And there will be no such issue if running them with the kselftest
framework.
Note that there are some SKIP messages exit with 1, I leave those
unchanged.
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
---
tools/testing/selftests/ptp/phc.sh | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/ptp/phc.sh b/tools/testing/selftests/ptp/phc.sh
index ac6e5a6..0820544 100755
--- a/tools/testing/selftests/ptp/phc.sh
+++ b/tools/testing/selftests/ptp/phc.sh
@@ -1,6 +1,9 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+# Kselftest framework requirement - SKIP code is 4.
+readonly KSFT_SKIP=4
+
ALL_TESTS="
settime
adjtime
@@ -13,12 +16,12 @@ DEV=$1
if [[ "$(id -u)" -ne 0 ]]; then
echo "SKIP: need root privileges"
- exit 0
+ exit $KSFT_SKIP
fi
if [[ "$DEV" == "" ]]; then
echo "SKIP: PTP device not provided"
- exit 0
+ exit $KSFT_SKIP
fi
require_command()
--
2.7.4
There are several test cases in the bpf directory are still using
exit 0 when they need to be skipped. Use kselftest framework skip
code instead so it can help us to distinguish the return status.
Criterion to filter out what should be fixed in bpf directory:
grep -r "exit 0" -B1 | grep -i skip
This change might cause some false-positives if people are running
these test scripts directly and only checking their return codes,
which will change from 0 to 4. However I think the impact should be
small as most of our scripts here are already using this skip code.
And there will be no such issue if running them with the kselftest
framework.
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
---
tools/testing/selftests/bpf/test_bpftool_build.sh | 5 ++++-
tools/testing/selftests/bpf/test_xdp_meta.sh | 5 ++++-
tools/testing/selftests/bpf/test_xdp_vlan.sh | 7 +++++--
3 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_bpftool_build.sh b/tools/testing/selftests/bpf/test_bpftool_build.sh
index ac349a5..b6fab1e 100755
--- a/tools/testing/selftests/bpf/test_bpftool_build.sh
+++ b/tools/testing/selftests/bpf/test_bpftool_build.sh
@@ -1,6 +1,9 @@
#!/bin/bash
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
case $1 in
-h|--help)
echo -e "$0 [-j <n>]"
@@ -22,7 +25,7 @@ KDIR_ROOT_DIR=$(realpath $PWD/$SCRIPT_REL_DIR/../../../../)
cd $KDIR_ROOT_DIR
if [ ! -e tools/bpf/bpftool/Makefile ]; then
echo -e "skip: bpftool files not found!\n"
- exit 0
+ exit $ksft_skip
fi
ERROR=0
diff --git a/tools/testing/selftests/bpf/test_xdp_meta.sh b/tools/testing/selftests/bpf/test_xdp_meta.sh
index 637fcf4..fd3f218 100755
--- a/tools/testing/selftests/bpf/test_xdp_meta.sh
+++ b/tools/testing/selftests/bpf/test_xdp_meta.sh
@@ -1,5 +1,8 @@
#!/bin/sh
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
cleanup()
{
if [ "$?" = "0" ]; then
@@ -17,7 +20,7 @@ cleanup()
ip link set dev lo xdp off 2>/dev/null > /dev/null
if [ $? -ne 0 ];then
echo "selftests: [SKIP] Could not run test without the ip xdp support"
- exit 0
+ exit $ksft_skip
fi
set -e
diff --git a/tools/testing/selftests/bpf/test_xdp_vlan.sh b/tools/testing/selftests/bpf/test_xdp_vlan.sh
index bb8b0da..1aa7404 100755
--- a/tools/testing/selftests/bpf/test_xdp_vlan.sh
+++ b/tools/testing/selftests/bpf/test_xdp_vlan.sh
@@ -2,6 +2,9 @@
# SPDX-License-Identifier: GPL-2.0
# Author: Jesper Dangaard Brouer <hawk(a)kernel.org>
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
# Allow wrapper scripts to name test
if [ -z "$TESTNAME" ]; then
TESTNAME=xdp_vlan
@@ -94,7 +97,7 @@ while true; do
-h | --help )
usage;
echo "selftests: $TESTNAME [SKIP] usage help info requested"
- exit 0
+ exit $ksft_skip
;;
* )
shift
@@ -117,7 +120,7 @@ fi
ip link set dev lo xdpgeneric off 2>/dev/null > /dev/null
if [ $? -ne 0 ]; then
echo "selftests: $TESTNAME [SKIP] need ip xdp support"
- exit 0
+ exit $ksft_skip
fi
# Interactive mode likely require us to cleanup netns
--
2.7.4
There are several test cases in the bpf directory are still using
exit 0 when they need to be skipped. Use kselftest framework skip
code instead so it can help us to distinguish the return status.
Criterion to filter out what should be fixed in bpf directory:
grep -r "exit 0" -B1 | grep -i skip
This change might cause some false-positives if people are running
these test scripts directly and only checking their return codes,
which will change from 0 to 4. However I think the impact should be
small as most of our scripts here are already using this skip code.
And there will be no such issue if running them with the kselftest
framework.
v1 -> v2:
- Ignore bpf/test_bpftool_build.sh as similar changes has been made.
- Make KSFT_SKIP readonly as suggested by Jakub Sitnicki.
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
---
tools/testing/selftests/bpf/test_xdp_meta.sh | 5 ++++-
tools/testing/selftests/bpf/test_xdp_vlan.sh | 7 +++++--
2 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_xdp_meta.sh b/tools/testing/selftests/bpf/test_xdp_meta.sh
index 637fcf4..d10cefd 100755
--- a/tools/testing/selftests/bpf/test_xdp_meta.sh
+++ b/tools/testing/selftests/bpf/test_xdp_meta.sh
@@ -1,5 +1,8 @@
#!/bin/sh
+# Kselftest framework requirement - SKIP code is 4.
+readonly KSFT_SKIP=4
+
cleanup()
{
if [ "$?" = "0" ]; then
@@ -17,7 +20,7 @@ cleanup()
ip link set dev lo xdp off 2>/dev/null > /dev/null
if [ $? -ne 0 ];then
echo "selftests: [SKIP] Could not run test without the ip xdp support"
- exit 0
+ exit $KSFT_SKIP
fi
set -e
diff --git a/tools/testing/selftests/bpf/test_xdp_vlan.sh b/tools/testing/selftests/bpf/test_xdp_vlan.sh
index bb8b0da..0cbc760 100755
--- a/tools/testing/selftests/bpf/test_xdp_vlan.sh
+++ b/tools/testing/selftests/bpf/test_xdp_vlan.sh
@@ -2,6 +2,9 @@
# SPDX-License-Identifier: GPL-2.0
# Author: Jesper Dangaard Brouer <hawk(a)kernel.org>
+# Kselftest framework requirement - SKIP code is 4.
+readonly KSFT_SKIP=4
+
# Allow wrapper scripts to name test
if [ -z "$TESTNAME" ]; then
TESTNAME=xdp_vlan
@@ -94,7 +97,7 @@ while true; do
-h | --help )
usage;
echo "selftests: $TESTNAME [SKIP] usage help info requested"
- exit 0
+ exit $KSFT_SKIP
;;
* )
shift
@@ -117,7 +120,7 @@ fi
ip link set dev lo xdpgeneric off 2>/dev/null > /dev/null
if [ $? -ne 0 ]; then
echo "selftests: $TESTNAME [SKIP] need ip xdp support"
- exit 0
+ exit $KSFT_SKIP
fi
# Interactive mode likely require us to cleanup netns
--
2.7.4
Problem:
What does this do?
$ kunit.py run --json
Well, it runs all the tests and prints test results out as JSON.
And next is
$ kunit.py run my-test-suite --json
This runs just `my-test-suite` and prints results out as JSON.
But what about?
$ kunit.py run --json my-test-suite
This runs all the tests and stores the json results in a "my-test-suite"
file.
Why:
--json, and now --raw_output are actually string flags. They just have a
default value. --json in particular takes the name of an output file.
It was intended that you'd do
$ kunit.py run --json=my_output_file my-test-suite
if you ever wanted to specify the value.
Workaround:
It doesn't seem like there's a way to make
https://docs.python.org/3/library/argparse.html only accept arg values
after a '='.
I believe that `--json` should "just work" regardless of where it is.
So this patch automatically rewrites a bare `--json` to `--json=stdout`.
That makes the examples above work the same way.
Add a regression test that can catch this for --raw_output.
Fixes: 6a499c9c42d0 ("kunit: tool: make --raw_output support only showing kunit output")
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
Tested-by: David Gow <davidgow(a)google.com>
---
v1 -> v2: fix mypy error by converting mapped argv to a list.
---
tools/testing/kunit/kunit.py | 24 ++++++++++++++++++++++--
tools/testing/kunit/kunit_tool_test.py | 8 ++++++++
2 files changed, 30 insertions(+), 2 deletions(-)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index 5a931456e718..ac35c61f65f5 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -16,7 +16,7 @@ assert sys.version_info >= (3, 7), "Python version is too old"
from collections import namedtuple
from enum import Enum, auto
-from typing import Iterable
+from typing import Iterable, Sequence
import kunit_config
import kunit_json
@@ -186,6 +186,26 @@ def run_tests(linux: kunit_kernel.LinuxSourceTree,
exec_result.elapsed_time))
return parse_result
+# Problem:
+# $ kunit.py run --json
+# works as one would expect and prints the parsed test results as JSON.
+# $ kunit.py run --json suite_name
+# would *not* pass suite_name as the filter_glob and print as json.
+# argparse will consider it to be another way of writing
+# $ kunit.py run --json=suite_name
+# i.e. it would run all tests, and dump the json to a `suite_name` file.
+# So we hackily automatically rewrite --json => --json=stdout
+pseudo_bool_flag_defaults = {
+ '--json': 'stdout',
+ '--raw_output': 'kunit',
+}
+def massage_argv(argv: Sequence[str]) -> Sequence[str]:
+ def massage_arg(arg: str) -> str:
+ if arg not in pseudo_bool_flag_defaults:
+ return arg
+ return f'{arg}={pseudo_bool_flag_defaults[arg]}'
+ return list(map(massage_arg, argv))
+
def add_common_opts(parser) -> None:
parser.add_argument('--build_dir',
help='As in the make command, it specifies the build '
@@ -303,7 +323,7 @@ def main(argv, linux=None):
help='Specifies the file to read results from.',
type=str, nargs='?', metavar='input_file')
- cli_args = parser.parse_args(argv)
+ cli_args = parser.parse_args(massage_argv(argv))
if get_kernel_root_path():
os.chdir(get_kernel_root_path())
diff --git a/tools/testing/kunit/kunit_tool_test.py b/tools/testing/kunit/kunit_tool_test.py
index 619c4554cbff..1edcc8373b4e 100755
--- a/tools/testing/kunit/kunit_tool_test.py
+++ b/tools/testing/kunit/kunit_tool_test.py
@@ -408,6 +408,14 @@ class KUnitMainTest(unittest.TestCase):
self.assertNotEqual(call, mock.call(StrContains('Testing complete.')))
self.assertNotEqual(call, mock.call(StrContains(' 0 tests run')))
+ def test_run_raw_output_does_not_take_positional_args(self):
+ # --raw_output is a string flag, but we don't want it to consume
+ # any positional arguments, only ones after an '='
+ self.linux_source_mock.run_kernel = mock.Mock(return_value=[])
+ kunit.main(['run', '--raw_output', 'filter_glob'], self.linux_source_mock)
+ self.linux_source_mock.run_kernel.assert_called_once_with(
+ args=None, build_dir='.kunit', filter_glob='filter_glob', timeout=300)
+
def test_exec_timeout(self):
timeout = 3453
kunit.main(['exec', '--timeout', str(timeout)], self.linux_source_mock)
base-commit: 4c17ca27923c16fd73bbb9ad033c7d749c3bcfcc
--
2.33.0.464.g1972c5931b-goog
From: Li Zhijian <lizhijian(a)cn.fujitsu.com>
[ Upstream commit 8914a7a247e065438a0ec86a58c1c359223d2c9e ]
LKP/0Day reported some building errors about kvm, and errors message
are not always same:
- lib/x86_64/processor.c:1083:31: error: ‘KVM_CAP_NESTED_STATE’ undeclared
(first use in this function); did you mean ‘KVM_CAP_PIT_STATE2’?
- lib/test_util.c:189:30: error: ‘MAP_HUGE_16KB’ undeclared (first use
in this function); did you mean ‘MAP_HUGE_16GB’?
Although kvm relies on the khdr, they still be built in parallel when -j
is specified. In this case, it will cause compiling errors.
Here we mark target khdr as NOTPARALLEL to make it be always built
first.
CC: Philip Li <philip.li(a)intel.com>
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Li Zhijian <lizhijian(a)cn.fujitsu.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/lib.mk | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index a5d40653a921..9700281bee4c 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -26,6 +26,7 @@ include $(top_srcdir)/scripts/subarch.include
ARCH ?= $(SUBARCH)
.PHONY: khdr
+.NOTPARALLEL:
khdr:
make ARCH=$(ARCH) -C $(top_srcdir) headers_install
--
2.33.0
From: Li Zhijian <lizhijian(a)cn.fujitsu.com>
[ Upstream commit 8914a7a247e065438a0ec86a58c1c359223d2c9e ]
LKP/0Day reported some building errors about kvm, and errors message
are not always same:
- lib/x86_64/processor.c:1083:31: error: ‘KVM_CAP_NESTED_STATE’ undeclared
(first use in this function); did you mean ‘KVM_CAP_PIT_STATE2’?
- lib/test_util.c:189:30: error: ‘MAP_HUGE_16KB’ undeclared (first use
in this function); did you mean ‘MAP_HUGE_16GB’?
Although kvm relies on the khdr, they still be built in parallel when -j
is specified. In this case, it will cause compiling errors.
Here we mark target khdr as NOTPARALLEL to make it be always built
first.
CC: Philip Li <philip.li(a)intel.com>
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Li Zhijian <lizhijian(a)cn.fujitsu.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/lib.mk | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 67386aa3f31d..8794ce382bf5 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -48,6 +48,7 @@ ARCH ?= $(SUBARCH)
# When local build is done, headers are installed in the default
# INSTALL_HDR_PATH usr/include.
.PHONY: khdr
+.NOTPARALLEL:
khdr:
ifndef KSFT_KHDR_INSTALL_DONE
ifeq (1,$(DEFAULT_INSTALL_HDR_PATH))
--
2.33.0
From: Li Zhijian <lizhijian(a)cn.fujitsu.com>
[ Upstream commit 8914a7a247e065438a0ec86a58c1c359223d2c9e ]
LKP/0Day reported some building errors about kvm, and errors message
are not always same:
- lib/x86_64/processor.c:1083:31: error: ‘KVM_CAP_NESTED_STATE’ undeclared
(first use in this function); did you mean ‘KVM_CAP_PIT_STATE2’?
- lib/test_util.c:189:30: error: ‘MAP_HUGE_16KB’ undeclared (first use
in this function); did you mean ‘MAP_HUGE_16GB’?
Although kvm relies on the khdr, they still be built in parallel when -j
is specified. In this case, it will cause compiling errors.
Here we mark target khdr as NOTPARALLEL to make it be always built
first.
CC: Philip Li <philip.li(a)intel.com>
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Li Zhijian <lizhijian(a)cn.fujitsu.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/lib.mk | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 0af84ad48aa7..b7217b5251f5 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -48,6 +48,7 @@ ARCH ?= $(SUBARCH)
# When local build is done, headers are installed in the default
# INSTALL_HDR_PATH usr/include.
.PHONY: khdr
+.NOTPARALLEL:
khdr:
ifndef KSFT_KHDR_INSTALL_DONE
ifeq (1,$(DEFAULT_INSTALL_HDR_PATH))
--
2.33.0
From: Li Zhijian <lizhijian(a)cn.fujitsu.com>
[ Upstream commit 8914a7a247e065438a0ec86a58c1c359223d2c9e ]
LKP/0Day reported some building errors about kvm, and errors message
are not always same:
- lib/x86_64/processor.c:1083:31: error: ‘KVM_CAP_NESTED_STATE’ undeclared
(first use in this function); did you mean ‘KVM_CAP_PIT_STATE2’?
- lib/test_util.c:189:30: error: ‘MAP_HUGE_16KB’ undeclared (first use
in this function); did you mean ‘MAP_HUGE_16GB’?
Although kvm relies on the khdr, they still be built in parallel when -j
is specified. In this case, it will cause compiling errors.
Here we mark target khdr as NOTPARALLEL to make it be always built
first.
CC: Philip Li <philip.li(a)intel.com>
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Li Zhijian <lizhijian(a)cn.fujitsu.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/lib.mk | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index fa2ac0e56b43..fe7ee2b0f29c 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -48,6 +48,7 @@ ARCH ?= $(SUBARCH)
# When local build is done, headers are installed in the default
# INSTALL_HDR_PATH usr/include.
.PHONY: khdr
+.NOTPARALLEL:
khdr:
ifndef KSFT_KHDR_INSTALL_DONE
ifeq (1,$(DEFAULT_INSTALL_HDR_PATH))
--
2.33.0
On Fri, Sep 17, 2021 at 10:04:18PM -0700, Luis Chamberlain wrote:
> In this v7 I've decided it is best to merge all the effort together into
> one patch set because communication was being lost when I split the
> patches up. This was not helping in any way to either fix the zram
> issues or come to consensus on a generic solution. The patches are also
> merged now because they are all related now.
Building up all the testing framewoork is really great. I have no opinions
about the license related stuff but all other changes generally look good to
me.
Thanks.
--
tejun
Hi Linus,
Please pull the following Kselftest fixes update for Linux 5.15-rc3.
This Kselftest fixes update for Linux 5.15-rc3 consists of:
- fix to Kselftest common framework header install to run before
other targets for it work correctly in parallel build case.
- fixes to kvm test to not ignore fscanf() returns which could
result in inconsistent test behavior and failures.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 6880fa6c56601bb8ed59df6c30fd390cc5f6dd8f:
Linux 5.15-rc1 (2021-09-12 16:28:37 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-fixes-5.15-rc3
for you to fetch changes up to f5013d412a43662b63f3d5f3a804d63213acd471:
selftests: kvm: fix get_run_delay() ignoring fscanf() return warn (2021-09-16 12:57:32 -0600)
----------------------------------------------------------------
linux-kselftest-fixes-5.15-rc3
This Kselftest fixes update for Linux 5.15-rc3 consists of:
- fix to Kselftest common framework header install to run before
other targets for it work correctly in parallel build case.
- fixes to kvm test to not ignore fscanf() returns which could
result in inconsistent test behavior and failures.
----------------------------------------------------------------
Li Zhijian (1):
selftests: be sure to make khdr before other targets
Shuah Khan (4):
selftests:kvm: fix get_warnings_count() ignoring fscanf() return warn
selftests:kvm: fix get_trans_hugepagesz() ignoring fscanf() return warn
selftests: kvm: move get_run_delay() into lib/test_util
selftests: kvm: fix get_run_delay() ignoring fscanf() return warn
tools/testing/selftests/kvm/include/test_util.h | 3 +++
tools/testing/selftests/kvm/lib/test_util.c | 22 +++++++++++++++++++++-
tools/testing/selftests/kvm/steal_time.c | 16 ----------------
.../selftests/kvm/x86_64/mmio_warning_test.c | 3 ++-
.../testing/selftests/kvm/x86_64/xen_shinfo_test.c | 15 ---------------
tools/testing/selftests/lib.mk | 1 +
6 files changed, 27 insertions(+), 33 deletions(-)
----------------------------------------------------------------
This is similar to TCP MD5 in functionality but it's sufficiently
different that wire formats are incompatible. Compared to TCP-MD5 more
algorithms are supported and multiple keys can be used on the same
connection but there is still no negotiation mechanism.
Expected use-case is protecting long-duration BGP/LDP connections
between routers using pre-shared keys. The goal of this series is to
allow routers using the linux TCP stack to interoperate with vendors
such as Cisco and Juniper.
Both algorithms described in RFC5926 are implemented but the code is not
very easily extensible beyond that. In particular there are several code
paths making stack allocations based on RFC5926 maximum, those would
have to be increased.
This version is incorporates previous feedback and expands the handling
of timewait sockets and RST packets. Here are some known flaws and
limits:
* Interaction with TCP-MD5 is not tested in all corners
* Interaction with FASTOPEN not tested but unlikely to work because
sequence number assumptions for syn/ack.
* Sequence Number Extension not implemented so connections will flap
every ~4G of traffic.
* Not clear if crypto_shash_setkey might sleep. If some implementation
do that then maybe they could be excluded through alloc flags.
* Traffic key is not cached (reducing performance)
* User is responsible for ensuring keys do not overlap
I labeled this as [PATCH]] because the issues above are not critical.
Test suite was added to tools/selftests/tcp_authopt. Tests are written
in python using pytest and scapy and check the API in some detail and
validate packet captures. Python code is already used in linux and in
kselftests but virtualenvs not very much. This test suite uses `tox` to
create a private virtualenv and hide dependencies. There is no clear
guidance for how to add python-based kselftests so I made it up.
Limited testing support is also included in nettest and fcnal-test.sh.
Coverage is extremely limited, I did not expand it because the tests run
too slowly.
Changes for frr: https://github.com/FRRouting/frr/pull/9442
That PR was made early for ABI feedback, it has many issues.
Changes for yabgp: https://github.com/cdleonard/yabgp/commits/tcp_authopt
This can be use for easy interoperability testing with cisco/juniper/etc.
Changes since RFCv3:
* Implement TCP_AUTHOPT handling for timewait and reset replies. Write
tests to execute these paths by injecting packets with scapy
* Handle combining md5 and authopt: if both are configured use authopt.
* Fix locking issues around send_key, introduced in on of the later
patches.
* Handle IPv4-mapped-IPv6 addresses: it used to be that an ipv4 SYN sent
to an ipv6 socket with TCP-AO triggered WARN
* Implement un-namespaced sysctl disabled this feature by default
* Allocate new key before removing any old one in setsockopt (Dmitry)
* Remove tcp_authopt_key_info.local_id because it's no longer used (Dmitry)
* Propagate errors from TCP_AUTHOPT getsockopt (Dmitry)
* Fix no-longer-correct TCP_AUTHOPT_KEY_DEL docs (Dmitry)
* Simplify crypto allocation (Eric)
* Use kzmalloc instead of __GFP_ZERO (Eric)
* Add static_key_false tcp_authopt_needed (Eric)
* Clear authopt_info copied from oldsk in __tcp_authopt_openreq (Eric)
* Replace memcmp in ipv4 and ipv6 addr comparisons (Eric)
* Export symbols for CONFIG_IPV6=m (kernel test robot)
* Mark more functions static (kernel test robot)
* Fix build with CONFIG_PROVE_RCU_LIST=y (kernel test robot)
Link: https://lore.kernel.org/netdev/cover.1629840814.git.cdleonard@gmail.com/
Changes since RFCv2:
* Removed local_id from ABI and match on send_id/recv_id/addr
* Add all relevant out-of-tree tests to tools/testing/selftests
* Return an error instead of ignoring unknown flags, hopefully this makes
it easier to extend.
* Check sk_family before __tcp_authopt_info_get_or_create in tcp_set_authopt_key
* Use sock_owned_by_me instead of WARN_ON(!lockdep_sock_is_held(sk))
* Fix some intermediate build failures reported by kbuild robot
* Improve documentation
Link: https://lore.kernel.org/netdev/cover.1628544649.git.cdleonard@gmail.com/
Changes since RFC:
* Split into per-topic commits for ease of review. The intermediate
commits compile with a few "unused function" warnings and don't do
anything useful by themselves.
* Add ABI documention including kernel-doc on uapi
* Fix lockdep warnings from crypto by creating pools with one shash for
each cpu
* Accept short options to setsockopt by padding with zeros; this
approach allows increasing the size of the structs in the future.
* Support for aes-128-cmac-96
* Support for binding addresses to keys in a way similar to old tcp_md5
* Add support for retrieving received keyid/rnextkeyid and controling
the keyid/rnextkeyid being sent.
Link: https://lore.kernel.org/netdev/01383a8751e97ef826ef2adf93bfde3a08195a43.162…
Leonard Crestez (19):
tcp: authopt: Initial support and key management
docs: Add user documentation for tcp_authopt
selftests: Initial tcp_authopt test module
selftests: tcp_authopt: Initial sockopt manipulation
tcp: authopt: Add crypto initialization
tcp: authopt: Compute packet signatures
tcp: authopt: Hook into tcp core
tcp: authopt: Disable via sysctl by default
selftests: tcp_authopt: Test key address binding
tcp: ipv6: Add AO signing for tcp_v6_send_response
tcp: authopt: Add support for signing skb-less replies
tcp: ipv4: Add AO signing for skb-less replies
selftests: tcp_authopt: Add scapy-based packet signing code
selftests: tcp_authopt: Add packet-level tests
selftests: Initial tcp_authopt support for nettest
selftests: Initial tcp_authopt support for fcnal-test
selftests: Add -t tcp_authopt option for fcnal-test.sh
tcp: authopt: Add key selection controls
selftests: tcp_authopt: Add tests for rollover
Documentation/networking/index.rst | 1 +
Documentation/networking/ip-sysctl.rst | 6 +
Documentation/networking/tcp_authopt.rst | 69 +
include/linux/tcp.h | 9 +
include/net/tcp.h | 1 +
include/net/tcp_authopt.h | 200 +++
include/uapi/linux/snmp.h | 1 +
include/uapi/linux/tcp.h | 110 ++
net/ipv4/Kconfig | 14 +
net/ipv4/Makefile | 1 +
net/ipv4/proc.c | 1 +
net/ipv4/sysctl_net_ipv4.c | 10 +
net/ipv4/tcp.c | 30 +
net/ipv4/tcp_authopt.c | 1450 +++++++++++++++++
net/ipv4/tcp_input.c | 17 +
net/ipv4/tcp_ipv4.c | 101 +-
net/ipv4/tcp_minisocks.c | 12 +
net/ipv4/tcp_output.c | 80 +-
net/ipv6/tcp_ipv6.c | 56 +-
tools/testing/selftests/net/fcnal-test.sh | 34 +
tools/testing/selftests/net/nettest.c | 34 +-
tools/testing/selftests/tcp_authopt/Makefile | 5 +
.../testing/selftests/tcp_authopt/README.rst | 15 +
tools/testing/selftests/tcp_authopt/config | 6 +
.../selftests/tcp_authopt/requirements.txt | 40 +
tools/testing/selftests/tcp_authopt/run.sh | 15 +
tools/testing/selftests/tcp_authopt/setup.cfg | 17 +
tools/testing/selftests/tcp_authopt/setup.py | 5 +
.../tcp_authopt/tcp_authopt_test/__init__.py | 0
.../tcp_authopt/tcp_authopt_test/conftest.py | 41 +
.../full_tcp_sniff_session.py | 81 +
.../tcp_authopt_test/linux_tcp_authopt.py | 248 +++
.../tcp_authopt_test/linux_tcp_md5sig.py | 95 ++
.../tcp_authopt_test/netns_fixture.py | 83 +
.../tcp_authopt_test/scapy_conntrack.py | 150 ++
.../tcp_authopt_test/scapy_tcp_authopt.py | 211 +++
.../tcp_authopt_test/scapy_utils.py | 176 ++
.../tcp_authopt/tcp_authopt_test/server.py | 95 ++
.../tcp_authopt/tcp_authopt_test/sockaddr.py | 112 ++
.../tcp_connection_fixture.py | 269 +++
.../tcp_authopt/tcp_authopt_test/test_bind.py | 145 ++
.../tcp_authopt_test/test_rollover.py | 180 ++
.../tcp_authopt_test/test_sockopt.py | 185 +++
.../tcp_authopt_test/test_vectors.py | 359 ++++
.../tcp_authopt_test/test_verify_capture.py | 555 +++++++
.../tcp_authopt/tcp_authopt_test/utils.py | 102 ++
.../tcp_authopt/tcp_authopt_test/validator.py | 127 ++
47 files changed, 5544 insertions(+), 10 deletions(-)
create mode 100644 Documentation/networking/tcp_authopt.rst
create mode 100644 include/net/tcp_authopt.h
create mode 100644 net/ipv4/tcp_authopt.c
create mode 100644 tools/testing/selftests/tcp_authopt/Makefile
create mode 100644 tools/testing/selftests/tcp_authopt/README.rst
create mode 100644 tools/testing/selftests/tcp_authopt/config
create mode 100644 tools/testing/selftests/tcp_authopt/requirements.txt
create mode 100755 tools/testing/selftests/tcp_authopt/run.sh
create mode 100644 tools/testing/selftests/tcp_authopt/setup.cfg
create mode 100644 tools/testing/selftests/tcp_authopt/setup.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/__init__.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/conftest.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/full_tcp_sniff_session.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_authopt.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_md5sig.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/netns_fixture.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/scapy_conntrack.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/scapy_tcp_authopt.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/scapy_utils.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/server.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/sockaddr.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/tcp_connection_fixture.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_bind.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_rollover.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_sockopt.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_vectors.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_verify_capture.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/utils.py
create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/validator.py
base-commit: 07b855628c226511542d0911cba1b180541fbb84
--
2.25.1
Subject: Introduction: I am a Linux and open source software enthusiast
Greetings from Singapore,
My name is Mr. Turritopsis Dohrnii Teo En Ming, 43 years old as of 25
September 2021. My country is Singapore. Presently I am an IT
Consultant with a System Integrator (SI)/computer firm in Singapore. I
am also a Linux and open source software and information technology
enthusiast.
You can read my autobiography on my redundant blogs. The title of my
autobiography is:
"Autobiography of Singaporean Targeted Individual Mr. Turritopsis
Dohrnii Teo En Ming (Very First Draft, Lots More to Add in Future)"
Links to my redundant blogs (Blogger and Wordpress) can be found in my
email signature below. These are my main blogs.
I have three other redundant blogs, namely:
https://teo-en-ming.tumblr.com/https://teo-en-ming.medium.com/https://teo-en-ming.livejournal.com/
Future/subsequent versions of my autobiography will be published on my
redundant blogs.
My Blog Books (in PDF format) are also available for download on my
redundant blogs.
I have also published many guides, howtos, tutorials, and information
technology articles on my redundant blogs.
Thank you very much.
-----BEGIN EMAIL SIGNATURE-----
The Gospel for all Targeted Individuals (TIs):
[The New York Times] Microwave Weapons Are Prime Suspect in Ills of
U.S. Embassy Workers
Link:
https://www.nytimes.com/2018/09/01/science/sonic-attack-cuba-microwave.html
********************************************************************************************
Singaporean Targeted Individual Mr. Turritopsis Dohrnii Teo En Ming's
Academic Qualifications as at 14 Feb 2019 and refugee seeking attempts
at the United Nations Refugee Agency Bangkok (21 Mar 2017), in Taiwan
(5 Aug 2019) and Australia (25 Dec 2019 to 9 Jan 2020):
[1] https://tdtemcerts.wordpress.com/
[2] https://tdtemcerts.blogspot.sg/
[3] https://www.scribd.com/user/270125049/Teo-En-Ming
-----END EMAIL SIGNATURE-----
Currently, the test decides whether or not to test certain features
(e.g., writeprotect support) essentially by examining command-line
arguments. For example, if we're testing anonymous memory, then we
should test writeprotect support as well (since it generally is
supported for anonymous).
This is broken, however. Take writeprotect support as an example: sure
it's supported for anon, but it also requires that we have
CONFIG_HAVE_ARCH_USERFAULTFD_WP. I.e., it is not supported at all on
aarch64. So, running the test on such an arch fails: it tries to test
writeprotect for anon, but since it isn't *actually* supported, it
fails.
So, instead of checking command-line arguments to the test, check the
features the way the UFFD API intends: when we open a new userfaultfd,
pass in the feature(s) this test case would like to try to exercise. The
kernel reports back a subset of those features which are actually
supported: check these returned flags to see if the features are
*actually* supported.
(For a couple of cases, where *registration* would fail [with -EINVAL]
even though UFFDIO_API reports the feature as supported, we have to
check test_type as well as the feature flag.)
In some cases, we check immediately after opening the userfaultfd, and
if the features are missing, we skip the entire test. In some other
cases, we can proceed with "most" of the test, only skipping a few
pieces.
This lets us remove the global test_uffdio_wp and test_uffdio_minor
variables entirely.
Signed-off-by: Axel Rasmussen <axelrasmussen(a)google.com>
---
tools/testing/selftests/vm/userfaultfd.c | 94 +++++++++++-------------
1 file changed, 43 insertions(+), 51 deletions(-)
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 10ab56c2484a..2366caf90435 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -79,10 +79,6 @@ static int test_type;
#define ALARM_INTERVAL_SECS 10
static volatile bool test_uffdio_copy_eexist = true;
static volatile bool test_uffdio_zeropage_eexist = true;
-/* Whether to test uffd write-protection */
-static bool test_uffdio_wp = false;
-/* Whether to test uffd minor faults */
-static bool test_uffdio_minor = false;
static bool map_shared;
static int shm_fd;
@@ -90,6 +86,7 @@ static int huge_fd;
static char *huge_fd_off0;
static unsigned long long *count_verify;
static int uffd = -1;
+static uint64_t uffd_features;
static int uffd_flags, finished, *pipefd;
static char *area_src, *area_src_alias, *area_dst, *area_dst_alias;
static char *zeropage;
@@ -345,7 +342,7 @@ static struct uffd_test_ops hugetlb_uffd_test_ops = {
static struct uffd_test_ops *uffd_test_ops;
-static void userfaultfd_open(uint64_t *features)
+static void userfaultfd_open(uint64_t features)
{
struct uffdio_api uffdio_api;
@@ -355,14 +352,20 @@ static void userfaultfd_open(uint64_t *features)
uffd_flags = fcntl(uffd, F_GETFD, NULL);
uffdio_api.api = UFFD_API;
- uffdio_api.features = *features;
+ uffdio_api.features = features;
if (ioctl(uffd, UFFDIO_API, &uffdio_api))
err("UFFDIO_API failed.\nPlease make sure to "
"run with either root or ptrace capability.");
if (uffdio_api.api != UFFD_API)
err("UFFDIO_API error: %" PRIu64, (uint64_t)uffdio_api.api);
- *features = uffdio_api.features;
+ uffd_features = uffdio_api.features;
+}
+
+static inline bool uffd_wp_supported(void)
+{
+ return test_type == TEST_ANON &&
+ (uffd_features & UFFD_FEATURE_PAGEFAULT_FLAG_WP);
}
static inline void munmap_area(void **area)
@@ -397,6 +400,7 @@ static void uffd_test_ctx_clear(void)
err("close uffd");
uffd = -1;
}
+ uffd_features = 0;
huge_fd_off0 = NULL;
munmap_area((void **)&area_src);
@@ -405,7 +409,7 @@ static void uffd_test_ctx_clear(void)
munmap_area((void **)&area_dst_alias);
}
-static void uffd_test_ctx_init_ext(uint64_t *features)
+static void uffd_test_ctx_init(uint64_t features)
{
unsigned long nr, cpu;
@@ -445,11 +449,6 @@ static void uffd_test_ctx_init_ext(uint64_t *features)
err("pipe");
}
-static inline void uffd_test_ctx_init(uint64_t features)
-{
- uffd_test_ctx_init_ext(&features);
-}
-
static int my_bcmp(char *str1, char *str2, size_t n)
{
unsigned long i;
@@ -587,7 +586,7 @@ static int __copy_page(int ufd, unsigned long offset, bool retry)
uffdio_copy.dst = (unsigned long) area_dst + offset;
uffdio_copy.src = (unsigned long) area_src + offset;
uffdio_copy.len = page_size;
- if (test_uffdio_wp)
+ if (uffd_wp_supported())
uffdio_copy.mode = UFFDIO_COPY_MODE_WP;
else
uffdio_copy.mode = 0;
@@ -778,7 +777,7 @@ static void *background_thread(void *arg)
* at least the first half of the pages mapped already which
* can be write-protected for testing
*/
- if (test_uffdio_wp)
+ if (uffd_wp_supported())
wp_range(uffd, (unsigned long)area_dst + start_nr * page_size,
nr_pages_per_cpu * page_size, true);
@@ -1062,12 +1061,12 @@ static int userfaultfd_zeropage_test(void)
printf("testing UFFDIO_ZEROPAGE: ");
fflush(stdout);
- uffd_test_ctx_init(0);
+ uffd_test_ctx_init(UFFD_FEATURE_PAGEFAULT_FLAG_WP);
uffdio_register.range.start = (unsigned long) area_dst;
uffdio_register.range.len = nr_pages * page_size;
uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
- if (test_uffdio_wp)
+ if (uffd_wp_supported())
uffdio_register.mode |= UFFDIO_REGISTER_MODE_WP;
if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register))
err("register failure");
@@ -1089,7 +1088,7 @@ static int userfaultfd_events_test(void)
struct uffdio_register uffdio_register;
unsigned long expected_ioctls;
pthread_t uffd_mon;
- int err, features;
+ int err;
pid_t pid;
char c;
struct uffd_stats stats = { 0 };
@@ -1097,16 +1096,15 @@ static int userfaultfd_events_test(void)
printf("testing events (fork, remap, remove): ");
fflush(stdout);
- features = UFFD_FEATURE_EVENT_FORK | UFFD_FEATURE_EVENT_REMAP |
- UFFD_FEATURE_EVENT_REMOVE;
- uffd_test_ctx_init(features);
+ uffd_test_ctx_init(UFFD_FEATURE_EVENT_FORK | UFFD_FEATURE_EVENT_REMAP |
+ UFFD_FEATURE_EVENT_REMOVE | UFFD_FEATURE_PAGEFAULT_FLAG_WP);
fcntl(uffd, F_SETFL, uffd_flags | O_NONBLOCK);
uffdio_register.range.start = (unsigned long) area_dst;
uffdio_register.range.len = nr_pages * page_size;
uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
- if (test_uffdio_wp)
+ if (uffd_wp_supported())
uffdio_register.mode |= UFFDIO_REGISTER_MODE_WP;
if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register))
err("register failure");
@@ -1144,7 +1142,7 @@ static int userfaultfd_sig_test(void)
unsigned long expected_ioctls;
unsigned long userfaults;
pthread_t uffd_mon;
- int err, features;
+ int err;
pid_t pid;
char c;
struct uffd_stats stats = { 0 };
@@ -1152,15 +1150,15 @@ static int userfaultfd_sig_test(void)
printf("testing signal delivery: ");
fflush(stdout);
- features = UFFD_FEATURE_EVENT_FORK|UFFD_FEATURE_SIGBUS;
- uffd_test_ctx_init(features);
+ uffd_test_ctx_init(UFFD_FEATURE_EVENT_FORK | UFFD_FEATURE_SIGBUS |
+ UFFD_FEATURE_PAGEFAULT_FLAG_WP);
fcntl(uffd, F_SETFL, uffd_flags | O_NONBLOCK);
uffdio_register.range.start = (unsigned long) area_dst;
uffdio_register.range.len = nr_pages * page_size;
uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
- if (test_uffdio_wp)
+ if (uffd_wp_supported())
uffdio_register.mode |= UFFDIO_REGISTER_MODE_WP;
if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register))
err("register failure");
@@ -1209,25 +1207,23 @@ static int userfaultfd_minor_test(void)
void *expected_page;
char c;
struct uffd_stats stats = { 0 };
- uint64_t req_features, features_out;
-
- if (!test_uffdio_minor)
- return 0;
+ uint64_t features;
printf("testing minor faults: ");
fflush(stdout);
- if (test_type == TEST_HUGETLB)
- req_features = UFFD_FEATURE_MINOR_HUGETLBFS;
+ if (test_type == TEST_HUGETLB && map_shared)
+ features = UFFD_FEATURE_MINOR_HUGETLBFS;
else if (test_type == TEST_SHMEM)
- req_features = UFFD_FEATURE_MINOR_SHMEM;
- else
- return 1;
+ features = UFFD_FEATURE_MINOR_SHMEM;
+ else {
+ printf("skipping test due to unsupported memory type\n");
+ return 0;
+ }
- features_out = req_features;
- uffd_test_ctx_init_ext(&features_out);
+ uffd_test_ctx_init(features);
/* If kernel reports required features aren't supported, skip test. */
- if ((features_out & req_features) != req_features) {
+ if ((uffd_features & features) != features) {
printf("skipping test due to lack of feature support\n");
fflush(stdout);
return 0;
@@ -1349,10 +1345,6 @@ static void userfaultfd_pagemap_test(unsigned int test_pgsize)
int pagemap_fd;
uint64_t value;
- /* Pagemap tests uffd-wp only */
- if (!test_uffdio_wp)
- return;
-
/* Not enough memory to test this page size */
if (test_pgsize > nr_pages * page_size)
return;
@@ -1361,7 +1353,12 @@ static void userfaultfd_pagemap_test(unsigned int test_pgsize)
/* Flush so it doesn't flush twice in parent/child later */
fflush(stdout);
- uffd_test_ctx_init(0);
+ uffd_test_ctx_init(UFFD_FEATURE_PAGEFAULT_FLAG_WP);
+ /* Pagemap tests uffd-wp only */
+ if (!uffd_wp_supported()) {
+ printf("skipping test due to lack of feature support\n");
+ return;
+ }
if (test_pgsize > page_size) {
/* This is a thp test */
@@ -1426,7 +1423,7 @@ static int userfaultfd_stress(void)
struct uffdio_register uffdio_register;
struct uffd_stats uffd_stats[nr_cpus];
- uffd_test_ctx_init(0);
+ uffd_test_ctx_init(UFFD_FEATURE_PAGEFAULT_FLAG_WP);
if (posix_memalign(&area, page_size, page_size))
err("out of memory");
@@ -1464,7 +1461,7 @@ static int userfaultfd_stress(void)
uffdio_register.range.start = (unsigned long) area_dst;
uffdio_register.range.len = nr_pages * page_size;
uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
- if (test_uffdio_wp)
+ if (uffd_wp_supported())
uffdio_register.mode |= UFFDIO_REGISTER_MODE_WP;
if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register))
err("register failure");
@@ -1513,7 +1510,7 @@ static int userfaultfd_stress(void)
return 1;
/* Clear all the write protections if there is any */
- if (test_uffdio_wp)
+ if (uffd_wp_supported())
wp_range(uffd, (unsigned long)area_dst,
nr_pages * page_size, false);
@@ -1595,8 +1592,6 @@ static void set_test_type(const char *type)
if (!strcmp(type, "anon")) {
test_type = TEST_ANON;
uffd_test_ops = &anon_uffd_test_ops;
- /* Only enable write-protect test for anonymous test */
- test_uffdio_wp = true;
} else if (!strcmp(type, "hugetlb")) {
test_type = TEST_HUGETLB;
uffd_test_ops = &hugetlb_uffd_test_ops;
@@ -1604,13 +1599,10 @@ static void set_test_type(const char *type)
map_shared = true;
test_type = TEST_HUGETLB;
uffd_test_ops = &hugetlb_uffd_test_ops;
- /* Minor faults require shared hugetlb; only enable here. */
- test_uffdio_minor = true;
} else if (!strcmp(type, "shmem")) {
map_shared = true;
test_type = TEST_SHMEM;
uffd_test_ops = &shmem_uffd_test_ops;
- test_uffdio_minor = true;
} else {
err("Unknown test type: %s", type);
}
--
2.33.0.464.g1972c5931b-goog
From: Colin Ian King <colin.king(a)canonical.com>
There is a spelling mistake in an error message. Fix it.
Signed-off-by: Colin Ian King <colin.king(a)canonical.com>
---
tools/testing/selftests/kvm/lib/sparsebit.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/lib/sparsebit.c b/tools/testing/selftests/kvm/lib/sparsebit.c
index a0d0c83d83de..50e0cf41a7dd 100644
--- a/tools/testing/selftests/kvm/lib/sparsebit.c
+++ b/tools/testing/selftests/kvm/lib/sparsebit.c
@@ -1866,7 +1866,7 @@ void sparsebit_validate_internal(struct sparsebit *s)
* of total bits set.
*/
if (s->num_set != total_bits_set) {
- fprintf(stderr, "Number of bits set missmatch,\n"
+ fprintf(stderr, "Number of bits set mismatch,\n"
" s->num_set: 0x%lx total_bits_set: 0x%lx",
s->num_set, total_bits_set);
--
2.32.0
The kvm_vm_free() statement here is currently dead code, since the loop
in front of it can only be left with the "goto done" that jumps right
after the kvm_vm_free(). Fix it by swapping the locations of the "done"
label and the kvm_vm_free().
Signed-off-by: Thomas Huth <thuth(a)redhat.com>
---
tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c | 3 +--
tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c | 2 +-
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c b/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c
index f40fd097cb35..6f6fd189dda3 100644
--- a/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c
+++ b/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c
@@ -109,8 +109,7 @@ int main(int argc, char *argv[])
}
}
- kvm_vm_free(vm);
-
done:
+ kvm_vm_free(vm);
return 0;
}
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c b/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c
index 7e33a350b053..e683d0ac3e45 100644
--- a/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c
+++ b/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c
@@ -161,7 +161,7 @@ int main(int argc, char *argv[])
}
}
- kvm_vm_free(vm);
done:
+ kvm_vm_free(vm);
return 0;
}
--
2.27.0
Patch 1 fixes a KVM+rseq bug where KVM's handling of TIF_NOTIFY_RESUME,
e.g. for task migration, clears the flag without informing rseq and leads
to stale data in userspace's rseq struct.
Patch 2 is a cleanup to try and make future bugs less likely. It's also
a baby step towards moving and renaming tracehook_notify_resume() since
it has nothing to do with tracing. It kills me to not do the move/rename
as part of this series, but having a dedicated series/discussion seems
more appropriate given the sheer number of architectures that call
tracehook_notify_resume() and the lack of an obvious home for the code.
Patch 3 is a fix/cleanup to stop overriding x86's unistd_{32,64}.h when
the include path (intentionally) omits tools' uapi headers. KVM's
selftests do exactly that so that they can pick up the uapi headers from
the installed kernel headers, and still use various tools/ headers that
mirror kernel code, e.g. linux/types.h. This allows the new test in
patch 4 to reference __NR_rseq without having to manually define it.
Patch 4 is a regression test for the KVM+rseq bug.
Patch 5 is a cleanup made possible by patch 3.
Sean Christopherson (5):
KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM
guest
entry: rseq: Call rseq_handle_notify_resume() in
tracehook_notify_resume()
tools: Move x86 syscall number fallbacks to .../uapi/
KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration
bugs
KVM: selftests: Remove __NR_userfaultfd syscall fallback
arch/arm/kernel/signal.c | 1 -
arch/arm64/kernel/signal.c | 1 -
arch/csky/kernel/signal.c | 4 +-
arch/mips/kernel/signal.c | 4 +-
arch/powerpc/kernel/signal.c | 4 +-
arch/s390/kernel/signal.c | 1 -
include/linux/tracehook.h | 2 +
kernel/entry/common.c | 4 +-
kernel/rseq.c | 4 +-
.../x86/include/{ => uapi}/asm/unistd_32.h | 0
.../x86/include/{ => uapi}/asm/unistd_64.h | 3 -
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 3 +
tools/testing/selftests/kvm/rseq_test.c | 131 ++++++++++++++++++
14 files changed, 143 insertions(+), 20 deletions(-)
rename tools/arch/x86/include/{ => uapi}/asm/unistd_32.h (100%)
rename tools/arch/x86/include/{ => uapi}/asm/unistd_64.h (83%)
create mode 100644 tools/testing/selftests/kvm/rseq_test.c
--
2.33.0.rc1.237.g0d66db33f3-goog
Synchronous Ethernet networks use a physical layer clock to syntonize
the frequency across different network elements.
Basic SyncE node defined in the ITU-T G.8264 consist of an Ethernet
Equipment Clock (EEC) and have the ability to recover synchronization
from the synchronization inputs - either traffic interfaces or external
frequency sources.
The EEC can synchronize its frequency (syntonize) to any of those sources.
It is also able to select synchronization source through priority tables
and synchronization status messaging. It also provides neccessary
filtering and holdover capabilities
This patch series introduces basic interface for reading the Ethernet
Equipment Clock (EEC) state on a SyncE capable device. This state gives
information about the source of the syntonization signal (ether my port,
or any external one) and the state of EEC. This interface is required\
to implement Synchronization Status Messaging on upper layers.
Next steps:
- add interface to enable source clocks and get information about them
- properly return the EEC_SRC_PORT flag depending on the port recovered
clock being enabled and locked
v2:
- removed whitespace changes
- fix issues reported by test robot
v3:
- Changed naming from SyncE to EEC
- Clarify cover letter and commit message for patch 1
v4:
- Removed sync_source and pin_idx info
- Changed one structure to attributes
- Added EEC_SRC_PORT flag to indicate that the EEC is synchronized
to the recovered clock of a port that returns the state
Maciej Machnikowski (2):
rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
ice: add support for reading SyncE DPLL state
drivers/net/ethernet/intel/ice/ice.h | 5 ++
.../net/ethernet/intel/ice/ice_adminq_cmd.h | 34 +++++++++
drivers/net/ethernet/intel/ice/ice_common.c | 62 ++++++++++++++++
drivers/net/ethernet/intel/ice/ice_common.h | 4 ++
drivers/net/ethernet/intel/ice/ice_devids.h | 3 +
drivers/net/ethernet/intel/ice/ice_main.c | 29 ++++++++
drivers/net/ethernet/intel/ice/ice_ptp.c | 35 +++++++++
drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 44 ++++++++++++
drivers/net/ethernet/intel/ice/ice_ptp_hw.h | 22 ++++++
include/linux/netdevice.h | 6 ++
include/uapi/linux/if_link.h | 31 ++++++++
include/uapi/linux/rtnetlink.h | 3 +
net/core/rtnetlink.c | 71 +++++++++++++++++++
security/selinux/nlmsgtab.c | 3 +-
14 files changed, 351 insertions(+), 1 deletion(-)
--
2.26.3
During initialization of a signal testcase, features declared as required
are properly checked against the running system but no action is then taken
to effectively skip such a testcase.
Fix core signals test logic to abort initialization and report such a
testcase as skipped to the KSelfTest framework.
Fixes: f96bf4340316 ("kselftest: arm64: mangle_pstate_invalid_compat_toggle and common utils")
Signed-off-by: Cristian Marussi <cristian.marussi(a)arm.com>
---
As a consequence KSelfTest TAP results will now report this when a signal-SVE
testcase run on a system missing SVE:
\# selftests: arm64: fake_sigreturn_sve_change_vl
\# Registered handlers for all signals.
\# Detected MINSTKSIGSZ:4720
\# Required Features: [ SVE ] NOT supported
\# ==>> completed. SKIP.
\# # FAKE_SIGRETURN_SVE_CHANGE :: Attempt to change SVE VL
\# ok 7 selftests: arm64: fake_sigreturn_sve_change_vl # SKIP
---
tools/testing/selftests/arm64/signal/test_signals_utils.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.c b/tools/testing/selftests/arm64/signal/test_signals_utils.c
index 6836510a522f..22722abc9dfa 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.c
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.c
@@ -266,16 +266,19 @@ int test_init(struct tdescr *td)
td->feats_supported |= FEAT_SSBS;
if (getauxval(AT_HWCAP) & HWCAP_SVE)
td->feats_supported |= FEAT_SVE;
- if (feats_ok(td))
+ if (feats_ok(td)) {
fprintf(stderr,
"Required Features: [%s] supported\n",
feats_to_string(td->feats_required &
td->feats_supported));
- else
+ } else {
fprintf(stderr,
"Required Features: [%s] NOT supported\n",
feats_to_string(td->feats_required &
~td->feats_supported));
+ td->result = KSFT_SKIP;
+ return 0;
+ }
}
/* Perform test specific additional initialization */
--
2.17.1
Commit 6a499c9c42d0 ("kunit: tool: make --raw_output support only
showing kunit output") made --raw_output a string-typed argument.
Passing --raw_output=kunit would make it only show KUnit-related output
and not everything.
However, converting it to a string-typed argument had side effects.
These calls used to work:
$ kunit.py run --raw_output
$ kunit.py run --raw_output suite_filter
$ kunit.py run suite_filter --raw_output
But now the second is actually parsed as
$ kunit.py run --raw_output=suite_filter
So the order you add in --raw_output now matters and command lines that
used to work might not anymore.
Change --raw_output back to a boolean flag, but change its behavior to
match that of the former --raw_output=kunit.
The assumption is that this is what most people wanted to see anyways.
To get the old behavior, users can simply do:
$ kunit.py run >/dev/null; cat .kunit/test.log
They don't have any easy way of getting the --raw_output=kunit behavior.
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
Meta: this is an alternative to
https://lore.kernel.org/linux-kselftest/20210903161405.1861312-1-dlatypov@g…
I'd slightly prefer that approach, but if we're fine with giving up the
old --raw_output semantics entirely, this would be cleaner.
I'd also assume that most people would prefer the new semantics, but I'm
not sure of that.
---
Documentation/dev-tools/kunit/kunit-tool.rst | 7 -------
tools/testing/kunit/kunit.py | 12 +++---------
tools/testing/kunit/kunit_tool_test.py | 13 ++++++-------
3 files changed, 9 insertions(+), 23 deletions(-)
diff --git a/Documentation/dev-tools/kunit/kunit-tool.rst b/Documentation/dev-tools/kunit/kunit-tool.rst
index ae52e0f489f9..03404746f1f6 100644
--- a/Documentation/dev-tools/kunit/kunit-tool.rst
+++ b/Documentation/dev-tools/kunit/kunit-tool.rst
@@ -114,13 +114,6 @@ results in TAP format, you can pass the ``--raw_output`` argument.
./tools/testing/kunit/kunit.py run --raw_output
-The raw output from test runs may contain other, non-KUnit kernel log
-lines. You can see just KUnit output with ``--raw_output=kunit``:
-
-.. code-block:: bash
-
- ./tools/testing/kunit/kunit.py run --raw_output=kunit
-
If you have KUnit results in their raw TAP format, you can parse them and print
the human-readable summary with the ``parse`` command for kunit_tool. This
accepts a filename for an argument, or will read from standard input.
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index 5a931456e718..3626a56472b5 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -115,13 +115,7 @@ def parse_tests(request: KunitParseRequest) -> KunitResult:
'Tests not Parsed.')
if request.raw_output:
- output: Iterable[str] = request.input_data
- if request.raw_output == 'all':
- pass
- elif request.raw_output == 'kunit':
- output = kunit_parser.extract_tap_lines(output)
- else:
- print(f'Unknown --raw_output option "{request.raw_output}"', file=sys.stderr)
+ output = kunit_parser.extract_tap_lines(request.input_data)
for line in output:
print(line.rstrip())
@@ -256,8 +250,8 @@ def add_exec_opts(parser) -> None:
def add_parse_opts(parser) -> None:
parser.add_argument('--raw_output', help='If set don\'t format output from kernel. '
- 'If set to --raw_output=kunit, filters to just KUnit output.',
- type=str, nargs='?', const='all', default=None)
+ 'It will only show output from KUnit.',
+ action='store_true')
parser.add_argument('--json',
nargs='?',
help='Stores test results in a JSON, and either '
diff --git a/tools/testing/kunit/kunit_tool_test.py b/tools/testing/kunit/kunit_tool_test.py
index 619c4554cbff..55ed3dac31ee 100755
--- a/tools/testing/kunit/kunit_tool_test.py
+++ b/tools/testing/kunit/kunit_tool_test.py
@@ -399,14 +399,13 @@ class KUnitMainTest(unittest.TestCase):
self.assertNotEqual(call, mock.call(StrContains('Testing complete.')))
self.assertNotEqual(call, mock.call(StrContains(' 0 tests run')))
- def test_run_raw_output_kunit(self):
+ def test_run_raw_output_does_not_take_positional_args(self):
+ # --raw_output might eventually support an argument, but we don't want it
+ # to consume any positional arguments, only ones after an '='.
self.linux_source_mock.run_kernel = mock.Mock(return_value=[])
- kunit.main(['run', '--raw_output=kunit'], self.linux_source_mock)
- self.assertEqual(self.linux_source_mock.build_reconfig.call_count, 1)
- self.assertEqual(self.linux_source_mock.run_kernel.call_count, 1)
- for call in self.print_mock.call_args_list:
- self.assertNotEqual(call, mock.call(StrContains('Testing complete.')))
- self.assertNotEqual(call, mock.call(StrContains(' 0 tests run')))
+ kunit.main(['run', '--raw_output', 'filter_glob'], self.linux_source_mock)
+ self.linux_source_mock.run_kernel.assert_called_once_with(
+ args=None, build_dir='.kunit', filter_glob='filter_glob', timeout=300)
def test_exec_timeout(self):
timeout = 3453
base-commit: 316346243be6df12799c0b64b788e06bad97c30b
--
2.33.0.464.g1972c5931b-goog
Problem:
What does this do?
$ kunit.py run --json
Well, it runs all the tests and prints test results out as JSON.
And next is
$ kunit.py run my-test-suite --json
This runs just `my-test-suite` and prints results out as JSON.
But what about?
$ kunit.py run --json my-test-suite
This runs all the tests and stores the json results in a "my-test-suite"
file.
Why:
--json, and now --raw_output are actually string flags. They just have a
default value. --json in particular takes the name of an output file.
It was intended that you'd do
$ kunit.py run --json=my_output_file my-test-suite
if you ever wanted to specify the value.
Workaround:
It doesn't seem like there's a way to make
https://docs.python.org/3/library/argparse.html only accept arg values
after a '='.
I believe that `--json` should "just work" regardless of where it is.
So this patch automatically rewrites a bare `--json` to `--json=stdout`.
That makes the examples above work the same way.
Add a regression test that can catch this for --raw_output.
Fixes: 6a499c9c42d0 ("kunit: tool: make --raw_output support only showing kunit output")
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
tools/testing/kunit/kunit.py | 24 ++++++++++++++++++++++--
tools/testing/kunit/kunit_tool_test.py | 8 ++++++++
2 files changed, 30 insertions(+), 2 deletions(-)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index 5a931456e718..95d62020e4f2 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -16,7 +16,7 @@ assert sys.version_info >= (3, 7), "Python version is too old"
from collections import namedtuple
from enum import Enum, auto
-from typing import Iterable
+from typing import Iterable, Sequence
import kunit_config
import kunit_json
@@ -186,6 +186,26 @@ def run_tests(linux: kunit_kernel.LinuxSourceTree,
exec_result.elapsed_time))
return parse_result
+# Problem:
+# $ kunit.py run --json
+# works as one would expect and prints the parsed test results as JSON.
+# $ kunit.py run --json suite_name
+# would *not* pass suite_name as the filter_glob and print as json.
+# argparse will consider it to be another way of writing
+# $ kunit.py run --json=suite_name
+# i.e. it would run all tests, and dump the json to a `suite_name` file.
+# So we hackily automatically rewrite --json => --json=stdout
+pseudo_bool_flag_defaults = {
+ '--json': 'stdout',
+ '--raw_output': 'kunit',
+}
+def massage_argv(argv: Sequence[str]) -> Sequence[str]:
+ def massage_arg(arg: str) -> str:
+ if arg not in pseudo_bool_flag_defaults:
+ return arg
+ return f'{arg}={pseudo_bool_flag_defaults[arg]}'
+ return map(massage_arg, argv)
+
def add_common_opts(parser) -> None:
parser.add_argument('--build_dir',
help='As in the make command, it specifies the build '
@@ -303,7 +323,7 @@ def main(argv, linux=None):
help='Specifies the file to read results from.',
type=str, nargs='?', metavar='input_file')
- cli_args = parser.parse_args(argv)
+ cli_args = parser.parse_args(massage_argv(argv))
if get_kernel_root_path():
os.chdir(get_kernel_root_path())
diff --git a/tools/testing/kunit/kunit_tool_test.py b/tools/testing/kunit/kunit_tool_test.py
index 619c4554cbff..1edcc8373b4e 100755
--- a/tools/testing/kunit/kunit_tool_test.py
+++ b/tools/testing/kunit/kunit_tool_test.py
@@ -408,6 +408,14 @@ class KUnitMainTest(unittest.TestCase):
self.assertNotEqual(call, mock.call(StrContains('Testing complete.')))
self.assertNotEqual(call, mock.call(StrContains(' 0 tests run')))
+ def test_run_raw_output_does_not_take_positional_args(self):
+ # --raw_output is a string flag, but we don't want it to consume
+ # any positional arguments, only ones after an '='
+ self.linux_source_mock.run_kernel = mock.Mock(return_value=[])
+ kunit.main(['run', '--raw_output', 'filter_glob'], self.linux_source_mock)
+ self.linux_source_mock.run_kernel.assert_called_once_with(
+ args=None, build_dir='.kunit', filter_glob='filter_glob', timeout=300)
+
def test_exec_timeout(self):
timeout = 3453
kunit.main(['exec', '--timeout', str(timeout)], self.linux_source_mock)
base-commit: a9c9a6f741cdaa2fa9ba24a790db8d07295761e3
--
2.33.0.153.gba50c8fa24-goog
This test assumes that the declared kunit_suite object is the exact one
which is being executed, which KUnit will not guarantee [1].
Specifically, `suite->log` is not initialized until a suite object is
executed. So if KUnit makes a copy of the suite and runs that instead,
this test dereferences an invalid pointer and (hopefully) segfaults.
N.B. since we no longer assume this, we can no longer verify that
`suite->log` is *not* allocated during normal execution.
An alternative to this patch that would allow us to test that would
require exposing an API for the current test to get its current suite.
Exposing that for one internal kunit test seems like overkill, and
grants users more footguns (e.g. reusing a test case in multiple suites
and changing behavior based on the suite name, dynamically modifying the
setup/cleanup funcs, storing/reading stuff out of the suite->log, etc.).
[1] In a subsequent patch, KUnit will allow running subsets of test
cases within a suite by making a copy of the suite w/ the filtered test
list. But there are other reasons KUnit might execute a copy, e.g. if it
ever wants to support parallel execution of different suites, recovering
from errors and restarting suites
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
Reviewed-by: Brendan Higgins <brendanhiggins(a)google.com>
---
lib/kunit/kunit-test.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/lib/kunit/kunit-test.c b/lib/kunit/kunit-test.c
index d69efcbed624..555601d17f79 100644
--- a/lib/kunit/kunit-test.c
+++ b/lib/kunit/kunit-test.c
@@ -415,12 +415,15 @@ static struct kunit_suite kunit_log_test_suite = {
static void kunit_log_test(struct kunit *test)
{
- struct kunit_suite *suite = &kunit_log_test_suite;
+ struct kunit_suite suite;
+
+ suite.log = kunit_kzalloc(test, KUNIT_LOG_SIZE, GFP_KERNEL);
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, suite.log);
kunit_log(KERN_INFO, test, "put this in log.");
kunit_log(KERN_INFO, test, "this too.");
- kunit_log(KERN_INFO, suite, "add to suite log.");
- kunit_log(KERN_INFO, suite, "along with this.");
+ kunit_log(KERN_INFO, &suite, "add to suite log.");
+ kunit_log(KERN_INFO, &suite, "along with this.");
#ifdef CONFIG_KUNIT_DEBUGFS
KUNIT_EXPECT_NOT_ERR_OR_NULL(test,
@@ -428,12 +431,11 @@ static void kunit_log_test(struct kunit *test)
KUNIT_EXPECT_NOT_ERR_OR_NULL(test,
strstr(test->log, "this too."));
KUNIT_EXPECT_NOT_ERR_OR_NULL(test,
- strstr(suite->log, "add to suite log."));
+ strstr(suite.log, "add to suite log."));
KUNIT_EXPECT_NOT_ERR_OR_NULL(test,
- strstr(suite->log, "along with this."));
+ strstr(suite.log, "along with this."));
#else
KUNIT_EXPECT_PTR_EQ(test, test->log, (char *)NULL);
- KUNIT_EXPECT_PTR_EQ(test, suite->log, (char *)NULL);
#endif
}
base-commit: 316346243be6df12799c0b64b788e06bad97c30b
--
2.33.0.309.g3052b89438-goog
On Tue, Sep 21, 2021 at 1:24 AM David Laight <David.Laight(a)aculab.com> wrote:
>
> From: Luis Chamberlain
> > Sent: 17 September 2021 20:47
> >
> > When sysfs attributes use a lock also used on module removal we can
> > race to deadlock. This happens when for instance a sysfs file on
> > a driver is used, then at the same time we have module removal call
> > trigger. The module removal call code holds a lock, and then the sysfs
> > file entry waits for the same lock. While holding the lock the module
> > removal tries to remove the sysfs entries, but these cannot be removed
> > yet as one is waiting for a lock. This won't complete as the lock is
> > already held. Likewise module removal cannot complete, and so we deadlock.
>
> Isn't the real problem the race between a sysfs file action and the
> removal of the sysfs node?
Nope, that is taken care of by kernfs.
> This isn't really related to module unload - except that may
> well remove some sysfs nodes.
Nope, the issue is a deadlock that can happen due to a shared lock on
module removal and a driver sysfs operation.
> This is the same problem as removing any other kind of driver callback.
> There are three basic solutions:
> 1) Use a global lock - not usually useful.
> 2) Have the remove call sleep until any callbacks are complete.
> 3) Have the remove just request removal and have a final
> callback (from a different context).
Kernfs already does a sort of combination of 1) and 2) but 1) is using
atomic reference counts.
> If the remove can sleep (as in 2) then there is a requirement
> on the driver code to not hold any locks across the 'remove'
> that can be acquired during the callbacks.
And this is the part that kernfs has no control over since the removal
and sysfs operation are implementation specific.
> Now, for sysfs, you probably only want to sleep the remove code
> while a read/write is in progress - not just because the node
> is open.
> That probably requires marking an open node 'invalid' and
> deferring delete to close.
This is already done by kernfs.
> None of this requires a reference count on the module.
You are missing the point to the other aspect of the try_module_get(),
it lets you also check if module exit has been entered. By using
try_module_get() you let the module exit trump proceeding with an
operation, therefore also preventing any potential use of a shared
lock on module exit and the driver specific sysfs operation.
Luis
Makefile uses TEST_PROGS instead of TEST_GEN_PROGS to define
executables. TEST_PROGS is for shell scripts that need to be
installed and run by the common lib.mk framework. The common
framework doesn't touch TEST_PROGS when it does build and clean.
As a result "make kselftest-clean" and "make clean" fail to remove
executables. Run and install work because the common framework runs
and installs TEST_PROGS. Build works because the Makefile defines
"all" rule which is unnecessary if TEST_GEN_PROGS is used.
Use TEST_GEN_PROGS so the common framework can handle build/run/
install/clean properly.
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
tools/testing/selftests/net/af_unix/Makefile | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/tools/testing/selftests/net/af_unix/Makefile b/tools/testing/selftests/net/af_unix/Makefile
index cfc7f4f97fd1..df341648f818 100644
--- a/tools/testing/selftests/net/af_unix/Makefile
+++ b/tools/testing/selftests/net/af_unix/Makefile
@@ -1,5 +1,2 @@
-##TEST_GEN_FILES := test_unix_oob
-TEST_PROGS := test_unix_oob
+TEST_GEN_PROGS := test_unix_oob
include ../../lib.mk
-
-all: $(TEST_PROGS)
--
2.30.2
The ATTRIBUTE_GROUPS is typically used to avoid boiler plate
code which is used in many drivers. Embracing ATTRIBUTE_GROUPS was
long due on the zram driver, however a recent fix for sysfs allows
users of ATTRIBUTE_GROUPS to also associate a module to the group
attribute.
In zram's case this also means it allows us to fix a race which triggers
a deadlock on the zram driver. This deadlock happens when a sysfs attribute
use a lock also used on module removal. This happens when for instance a
sysfs file on a driver is used, then at the same time we have module
removal call trigger. The module removal call code holds a lock, and then
the sysfs file entry waits for the same lock. While holding the lock the
module removal tries to remove the sysfs entries, but these cannot be
removed yet as one is waiting for a lock. This won't complete as the lock
is already held. Likewise module removal cannot complete, and so we
deadlock.
Sysfs fixes this when the group attributes have a module associated to
it, sysfs will *try* to get a refcount to the module when a shared
lock is used, prior to mucking with a sysfs attribute. If this fails we
just give up right away.
This deadlock was first reported with the zram driver, a sketch of how
this can happen follows:
CPU A CPU B
whatever_store()
module_unload
mutex_lock(foo)
mutex_lock(foo)
del_gendisk(zram->disk);
device_del()
device_remove_groups()
In this situation whatever_store() is waiting for the mutex foo to
become unlocked, but that won't happen until module removal is complete.
But module removal won't complete until the sysfs file being poked
completes which is waiting for a lock already held.
This issue can be reproduced easily on the zram driver as follows:
Loop 1 on one terminal:
while true;
do modprobe zram;
modprobe -r zram;
done
Loop 2 on a second terminal:
while true; do
echo 1024 > /sys/block/zram0/disksize;
echo 1 > /sys/block/zram0/reset;
done
Without this patch we end up in a deadlock, and the following
stack trace is produced which hints to us what the issue was:
INFO: task bash:888 blocked for more than 120 seconds.
Tainted: G E 5.12.0-rc1-next-20210304+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:bash state:D stack: 0 pid: 888 ppid: 887 flags:<etc>
Call Trace:
__schedule+0x2e4/0x900
schedule+0x46/0xb0
schedule_preempt_disabled+0xa/0x10
__mutex_lock.constprop.0+0x2c3/0x490
? _kstrtoull+0x35/0xd0
reset_store+0x6c/0x160 [zram]
kernfs_fop_write_iter+0x124/0x1b0
new_sync_write+0x11c/0x1b0
vfs_write+0x1c2/0x260
ksys_write+0x5f/0xe0
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f34f2c3df33
RSP: 002b:00007ffe751df6e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f34f2c3df33
RDX: 0000000000000002 RSI: 0000561ccb06ec10 RDI: 0000000000000001
RBP: 0000561ccb06ec10 R08: 000000000000000a R09: 0000000000000001
R10: 0000561ccb157590 R11: 0000000000000246 R12: 0000000000000002
R13: 00007f34f2d0e6a0 R14: 0000000000000002 R15: 00007f34f2d0e8a0
INFO: task modprobe:1104 can't die for more than 120 seconds.
task:modprobe state:D stack: 0 pid: 1104 ppid: 916 flags:<etc>
Call Trace:
__schedule+0x2e4/0x900
schedule+0x46/0xb0
__kernfs_remove.part.0+0x228/0x2b0
? finish_wait+0x80/0x80
kernfs_remove_by_name_ns+0x50/0x90
remove_files+0x2b/0x60
sysfs_remove_group+0x38/0x80
sysfs_remove_groups+0x29/0x40
device_remove_attrs+0x4a/0x80
device_del+0x183/0x3e0
? mutex_lock+0xe/0x30
del_gendisk+0x27a/0x2d0
zram_remove+0x8a/0xb0 [zram]
? hot_remove_store+0xf0/0xf0 [zram]
zram_remove_cb+0xd/0x10 [zram]
idr_for_each+0x5e/0xd0
destroy_devices+0x39/0x6f [zram]
__do_sys_delete_module+0x190/0x2a0
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f32adf727d7
RSP: 002b:00007ffc08bb38a8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 000055eea23cbb10 RCX: 00007f32adf727d7
RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055eea23cbb78
RBP: 000055eea23cbb10 R08: 0000000000000000 R09: 0000000000000000
R10: 00007f32adfe5ac0 R11: 0000000000000206 R12: 000055eea23cbb78
R13: 0000000000000000 R14: 0000000000000000 R15: 000055eea23cbc20
[0] https://lkml.kernel.org/r/20210401235925.GR4332@42.do-not-panic.com
Signed-off-by: Luis Chamberlain <mcgrof(a)kernel.org>
---
drivers/block/zram/zram_drv.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index b26abcb955cc..60a55ae8cd91 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1902,14 +1902,7 @@ static struct attribute *zram_disk_attrs[] = {
NULL,
};
-static const struct attribute_group zram_disk_attr_group = {
- .attrs = zram_disk_attrs,
-};
-
-static const struct attribute_group *zram_disk_attr_groups[] = {
- &zram_disk_attr_group,
- NULL,
-};
+ATTRIBUTE_GROUPS(zram_disk);
/*
* Allocate and initialize new zram device. the function returns
@@ -1981,7 +1974,7 @@ static int zram_add(void)
blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);
blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, zram->disk->queue);
- device_add_disk(NULL, zram->disk, zram_disk_attr_groups);
+ device_add_disk(NULL, zram->disk, zram_disk_groups);
strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor));
--
2.30.2
Provide a simple state machine to fix races with driver exit where we
remove the CPU multistate callbacks and re-initialization / creation of
new per CPU instances which should be managed by these callbacks.
The zram driver makes use of cpu hotplug multistate support, whereby it
associates a struct zcomp per CPU. Each struct zcomp represents a
compression algorithm in charge of managing compression streams per
CPU. Although a compiled zram driver only supports a fixed set of
compression algorithms, each zram device gets a struct zcomp allocated
per CPU. The "multi" in CPU hotplug multstate refers to these per
cpu struct zcomp instances. Each of these will have the CPU hotplug
callback called for it on CPU plug / unplug. The kernel's CPU hotplug
multistate keeps a linked list of these different structures so that
it will iterate over them on CPU transitions.
By default at driver initialization we will create just one zram device
(num_devices=1) and a zcomp structure then set for the now default
lzo-rle comrpession algorithm. At driver removal we first remove each
zram device, and so we destroy the associated struct zcomp per CPU. But
since we expose sysfs attributes to create new devices or reset /
initialize existing zram devices, we can easily end up re-initializing
a struct zcomp for a zram device before the exit routine of the module
removes the cpu hotplug callback. When this happens the kernel's CPU
hotplug will detect that at least one instance (struct zcomp for us)
exists. This can happen in the following situation:
CPU 1 CPU 2
disksize_store(...);
class_unregister(...);
idr_for_each(...);
zram_debugfs_destroy();
idr_destroy(...);
unregister_blkdev(...);
cpuhp_remove_multi_state(...);
The warning comes up on cpuhp_remove_multi_state() when it sees that the
state for CPUHP_ZCOMP_PREPARE does not have an empty instance linked list.
In this case, that a struct zcom still exists, the driver allowed its
creation per CPU even though we could have just freed them per CPU
though a call on another CPU, and we are then later trying to remove the
hotplug callback.
Fix all this by providing a zram initialization boolean
protected the shared in the driver zram_index_mutex, which we
can use to annotate when sysfs attributes are safe to use or
not -- once the driver is properly initialized. When the driver
is going down we also are sure to not let userspace muck with
attributes which may affect each per cpu struct zcomp.
This also fixes a series of possible memory leaks. The
crashes and memory leaks can easily be caused by issuing
the zram02.sh script from the LTP project [0] in a loop
in two separate windows:
cd testcases/kernel/device-drivers/zram
while true; do PATH=$PATH:$PWD:$PWD/../../../lib/ ./zram02.sh; done
You end up with a splat as follows:
kernel: zram: Removed device: zram0
kernel: zram: Added device: zram0
kernel: zram0: detected capacity change from 0 to 209715200
kernel: Adding 104857596k swap on /dev/zram0. <etc>
kernel: zram0: detected capacitky change from 209715200 to 0
kernel: zram0: detected capacity change from 0 to 209715200
kernel: ------------[ cut here ]------------
kernel: Error: Removing state 63 which has instances left.
kernel: WARNING: CPU: 7 PID: 70457 at \
kernel/cpu.c:2069 __cpuhp_remove_state_cpuslocked+0xf9/0x100
kernel: Modules linked in: zram(E-) zsmalloc(E) <etc>
kernel: CPU: 7 PID: 70457 Comm: rmmod Tainted: G \
E 5.12.0-rc1-next-20210304 #3
kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), \
BIOS 1.14.0-2 04/01/2014
kernel: RIP: 0010:__cpuhp_remove_state_cpuslocked+0xf9/0x100
kernel: Code: <etc>
kernel: RSP: 0018:ffffa800c139be98 EFLAGS: 00010282
kernel: RAX: 0000000000000000 RBX: ffffffff9083db58 RCX: ffff9609f7dd86d8
kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9609f7dd86d0
kernel: RBP: 0000000000000000i R08: 0000000000000000 R09: ffffa800c139bcb8
kernel: R10: ffffa800c139bcb0 R11: ffffffff908bea40 R12: 000000000000003f
kernel: R13: 00000000000009d8 R14: 0000000000000000 R15: 0000000000000000
kernel: FS: 00007f1b075a7540(0000) GS:ffff9609f7dc0000(0000) knlGS:<etc>
kernel: CS: 0010 DS: 0000 ES 0000 CR0: 0000000080050033
kernel: CR2: 00007f1b07610490 CR3: 00000001bd04e000 CR4: 0000000000350ee0
kernel: Call Trace:
kernel: __cpuhp_remove_state+0x2e/0x80
kernel: __do_sys_delete_module+0x190/0x2a0
kernel: do_syscall_64+0x33/0x80
kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
The "Error: Removing state 63 which has instances left" refers
to the zram per CPU struct zcomp instances left.
[0] https://github.com/linux-test-project/ltp.git
Acked-by: Minchan Kim <minchan(a)kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof(a)kernel.org>
---
drivers/block/zram/zram_drv.c | 63 ++++++++++++++++++++++++++++++-----
1 file changed, 55 insertions(+), 8 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index f61910c65f0f..b26abcb955cc 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -44,6 +44,8 @@ static DEFINE_MUTEX(zram_index_mutex);
static int zram_major;
static const char *default_compressor = CONFIG_ZRAM_DEF_COMP;
+static bool zram_up;
+
/* Module params (documentation at end) */
static unsigned int num_devices = 1;
/*
@@ -1704,6 +1706,7 @@ static void zram_reset_device(struct zram *zram)
comp = zram->comp;
disksize = zram->disksize;
zram->disksize = 0;
+ zram->comp = NULL;
set_capacity_and_notify(zram->disk, 0);
part_stat_set_all(zram->disk->part0, 0);
@@ -1724,9 +1727,18 @@ static ssize_t disksize_store(struct device *dev,
struct zram *zram = dev_to_zram(dev);
int err;
+ mutex_lock(&zram_index_mutex);
+
+ if (!zram_up) {
+ err = -ENODEV;
+ goto out;
+ }
+
disksize = memparse(buf, NULL);
- if (!disksize)
- return -EINVAL;
+ if (!disksize) {
+ err = -EINVAL;
+ goto out;
+ }
down_write(&zram->init_lock);
if (init_done(zram)) {
@@ -1754,12 +1766,16 @@ static ssize_t disksize_store(struct device *dev,
set_capacity_and_notify(zram->disk, zram->disksize >> SECTOR_SHIFT);
up_write(&zram->init_lock);
+ mutex_unlock(&zram_index_mutex);
+
return len;
out_free_meta:
zram_meta_free(zram, disksize);
out_unlock:
up_write(&zram->init_lock);
+out:
+ mutex_unlock(&zram_index_mutex);
return err;
}
@@ -1775,8 +1791,17 @@ static ssize_t reset_store(struct device *dev,
if (ret)
return ret;
- if (!do_reset)
- return -EINVAL;
+ mutex_lock(&zram_index_mutex);
+
+ if (!zram_up) {
+ len = -ENODEV;
+ goto out;
+ }
+
+ if (!do_reset) {
+ len = -EINVAL;
+ goto out;
+ }
zram = dev_to_zram(dev);
bdev = zram->disk->part0;
@@ -1785,7 +1810,8 @@ static ssize_t reset_store(struct device *dev,
/* Do not reset an active device or claimed device */
if (bdev->bd_openers || zram->claim) {
mutex_unlock(&bdev->bd_disk->open_mutex);
- return -EBUSY;
+ len = -EBUSY;
+ goto out;
}
/* From now on, anyone can't open /dev/zram[0-9] */
@@ -1800,6 +1826,8 @@ static ssize_t reset_store(struct device *dev,
zram->claim = false;
mutex_unlock(&bdev->bd_disk->open_mutex);
+out:
+ mutex_unlock(&zram_index_mutex);
return len;
}
@@ -2010,6 +2038,10 @@ static ssize_t hot_add_show(struct class *class,
int ret;
mutex_lock(&zram_index_mutex);
+ if (!zram_up) {
+ mutex_unlock(&zram_index_mutex);
+ return -ENODEV;
+ }
ret = zram_add();
mutex_unlock(&zram_index_mutex);
@@ -2037,6 +2069,11 @@ static ssize_t hot_remove_store(struct class *class,
mutex_lock(&zram_index_mutex);
+ if (!zram_up) {
+ ret = -ENODEV;
+ goto out;
+ }
+
zram = idr_find(&zram_index_idr, dev_id);
if (zram) {
ret = zram_remove(zram);
@@ -2046,6 +2083,7 @@ static ssize_t hot_remove_store(struct class *class,
ret = -ENODEV;
}
+out:
mutex_unlock(&zram_index_mutex);
return ret ? ret : count;
}
@@ -2072,12 +2110,15 @@ static int zram_remove_cb(int id, void *ptr, void *data)
static void destroy_devices(void)
{
+ mutex_lock(&zram_index_mutex);
+ zram_up = false;
class_unregister(&zram_control_class);
idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
zram_debugfs_destroy();
idr_destroy(&zram_index_idr);
unregister_blkdev(zram_major, "zram");
cpuhp_remove_multi_state(CPUHP_ZCOMP_PREPARE);
+ mutex_unlock(&zram_index_mutex);
}
static int __init zram_init(void)
@@ -2105,15 +2146,21 @@ static int __init zram_init(void)
return -EBUSY;
}
+ mutex_lock(&zram_index_mutex);
+
while (num_devices != 0) {
- mutex_lock(&zram_index_mutex);
ret = zram_add();
- mutex_unlock(&zram_index_mutex);
- if (ret < 0)
+ if (ret < 0) {
+ mutex_unlock(&zram_index_mutex);
goto out_error;
+ }
num_devices--;
}
+ zram_up = true;
+
+ mutex_unlock(&zram_index_mutex);
+
return 0;
out_error:
--
2.30.2
Now that sysfs has the deadlock race fixed with module removal,
enable the deadlock tests module removal tests. They were left
disabled by default as otherwise you would deadlock your system
./tools/testing/selftests/sysfs/sysfs.sh -t 0027
Running test: sysfs_test_0027 - run #0
Test for possible rmmod deadlock while writing x ... ok
./tools/testing/selftests/sysfs/sysfs.sh -t 0028
Running test: sysfs_test_0028 - run #0
Test for possible rmmod deadlock using rtnl_lock while writing x ... ok
Signed-off-by: Luis Chamberlain <mcgrof(a)kernel.org>
---
tools/testing/selftests/sysfs/sysfs.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/sysfs/sysfs.sh b/tools/testing/selftests/sysfs/sysfs.sh
index f928635d0e35..4047ac48e764 100755
--- a/tools/testing/selftests/sysfs/sysfs.sh
+++ b/tools/testing/selftests/sysfs/sysfs.sh
@@ -60,8 +60,8 @@ ALL_TESTS="$ALL_TESTS 0023:1:1:test_dev_y:block"
ALL_TESTS="$ALL_TESTS 0024:1:1:test_dev_x:block"
ALL_TESTS="$ALL_TESTS 0025:1:1:test_dev_y:block"
ALL_TESTS="$ALL_TESTS 0026:1:1:test_dev_y:block"
-ALL_TESTS="$ALL_TESTS 0027:1:0:test_dev_x:block" # deadlock test
-ALL_TESTS="$ALL_TESTS 0028:1:0:test_dev_x:block" # deadlock test with rntl_lock
+ALL_TESTS="$ALL_TESTS 0027:1:1:test_dev_x:block" # deadlock test
+ALL_TESTS="$ALL_TESTS 0028:1:1:test_dev_x:block" # deadlock test with rntl_lock
ALL_TESTS="$ALL_TESTS 0029:1:1:test_dev_x:block" # kernfs race removal of store
ALL_TESTS="$ALL_TESTS 0030:1:1:test_dev_x:block" # kernfs race removal before mutex
ALL_TESTS="$ALL_TESTS 0031:1:1:test_dev_x:block" # kernfs race removal after mutex
--
2.30.2
If one ends up expanding on this line checkpatch will complain that the
combination S_IRWXU|S_IRUGO|S_IXUGO should just be replaced with the
octal 0755. Do that.
This makes no functional changes.
Signed-off-by: Luis Chamberlain <mcgrof(a)kernel.org>
---
fs/sysfs/dir.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 59dffd5ca517..b6b6796e1616 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -56,8 +56,7 @@ int sysfs_create_dir_ns(struct kobject *kobj, const void *ns)
kobject_get_ownership(kobj, &uid, &gid);
- kn = kernfs_create_dir_ns(parent, kobject_name(kobj),
- S_IRWXU | S_IRUGO | S_IXUGO, uid, gid,
+ kn = kernfs_create_dir_ns(parent, kobject_name(kobj), 0755, uid, gid,
kobj, ns);
if (IS_ERR(kn)) {
if (PTR_ERR(kn) == -EEXIST)
--
2.30.2
If one ends up extending this line checkpatch will complain about the
use of S_IRWXUGO suggesting it is not preferred and that 0777
should be used instead. Take the tip from checkpatch and do that
change before we do our subsequent changes.
This makes no functional changes.
Signed-off-by: Luis Chamberlain <mcgrof(a)kernel.org>
---
fs/kernfs/symlink.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index c8f8e41b8411..19a6c71c6ff5 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -36,8 +36,7 @@ struct kernfs_node *kernfs_create_link(struct kernfs_node *parent,
gid = target->iattr->ia_gid;
}
- kn = kernfs_new_node(parent, name, S_IFLNK|S_IRWXUGO, uid, gid,
- KERNFS_LINK);
+ kn = kernfs_new_node(parent, name, S_IFLNK|0777, uid, gid, KERNFS_LINK);
if (!kn)
return ERR_PTR(-ENOMEM);
--
2.30.2
There is quite a bit of tribal knowledge around proper use of
try_module_get() and that it must be used only in a context which
can ensure the module won't be gone during the operation. Document
this little bit of tribal knowledge.
I'm extending this tribal knowledge with new developments which it
seems some folks do not yet believe to be true: we can be sure a
module will exist during the lifetime of a sysfs file operation.
For proof, refer to test_sysfs test #32:
./tools/testing/selftests/sysfs/sysfs.sh -t 0032
Without this being true, the write would fail or worse,
a crash would happen, in this test. It does not.
Signed-off-by: Luis Chamberlain <mcgrof(a)kernel.org>
---
include/linux/module.h | 34 ++++++++++++++++++++++++++++++++--
1 file changed, 32 insertions(+), 2 deletions(-)
diff --git a/include/linux/module.h b/include/linux/module.h
index c9f1200b2312..22eacd5e1e85 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -609,10 +609,40 @@ void symbol_put_addr(void *addr);
to handle the error case (which only happens with rmmod --wait). */
extern void __module_get(struct module *module);
-/* This is the Right Way to get a module: if it fails, it's being removed,
- * so pretend it's not there. */
+/**
+ * try_module_get() - yields to module removal and bumps refcnt otherwise
+ * @module: the module we should check for
+ *
+ * This can be used to try to bump the reference count of a module, so to
+ * prevent module removal. The reference count of a module is not allowed
+ * to be incremented if the module is already being removed.
+ *
+ * Care must be taken to ensure the module cannot be removed during the call to
+ * try_module_get(). This can be done by having another entity other than the
+ * module itself increment the module reference count, or through some other
+ * means which guarantees the module could not be removed during an operation.
+ * An example of this later case is using try_module_get() in a sysfs file
+ * which the module created. The sysfs store / read file operations are
+ * gauranteed to exist through the use of kernfs's active reference (see
+ * kernfs_active()). If a sysfs file operation is being run, the module which
+ * created it must still exist as the module is in charge of removing the same
+ * sysfs file being read. Also, a sysfs / kernfs file removal cannot happen
+ * unless the same file is not active.
+ *
+ * One of the real values to try_module_get() is the module_is_live() check
+ * which ensures this the caller of try_module_get() can yield to userspace
+ * module removal requests and fail whatever it was about to process.
+ */
extern bool try_module_get(struct module *module);
+/**
+ * module_put() - release a reference count to a module
+ * @module: the module we should release a reference count for
+ *
+ * If you successfully bump a reference count to a module with try_module_get(),
+ * when you are finished you must call module_put() to release that reference
+ * count.
+ */
extern void module_put(struct module *module);
#else /*!CONFIG_MODULE_UNLOAD*/
--
2.30.2
This extends test_sysfs with support for using the failure injection
wait completion and knobs to force a few race conditions which
demonstrates that kernfs active reference protection is sufficient
for kobject / device protection at higher layers.
This adds 4 new tests which tries to remove the device attribute
store operation in 4 different situations:
1) at the start of kernfs_kernfs_fop_write_iter()
2) before the of->mutex is held in kernfs_kernfs_fop_write_iter()
3) after the of->mutex is held in kernfs_kernfs_fop_write_iter()
4) after the kernfs node active reference is taken
A write fails in call cases except the last one, test number #32. There
is a good explanation for this: *once* kernfs_get_active() gets called
we have a guarantee that the kernfs entry cannot be removed. If
kernfs_get_active() succeeds that entry cannot be removed and so
anything trying to remove that entry will have to wait. It is perhaps
not obvious but since a sysfs write will trigger eventually a
kernfs_get_active() call, and *only* if this succeeds will the sysfs
op be called, this and the fact that you cannot remove the kernfs
entry while the kenfs entry is active implies that a module that
created the respective sysfs / kernfs entry *cannot* possibly be
removed during a sysfs operation. And test number 32 provides us with
proof of this. If it were not true test #32 should crash.
No null dereferences are reproduced, even though this has been observed
in some complex testing cases [0]. If this issue really exists we should
have enough tools on the sysfs_test toolbox now to try to reproduce
this easily without having to poke around other drivers. It very likley
was the case that the issue reported [0] was possibly a side issue after
the first bug which was zram specific. This is why it is important to
isolate the issue and try to reproduce it in a generic form using the
test_sysfs driver.
[0] https://lkml.kernel.org/r/20210623215007.862787-1-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof(a)kernel.org>
---
lib/Kconfig.debug | 3 +
lib/test_sysfs.c | 31 +++++
tools/testing/selftests/sysfs/config | 3 +
tools/testing/selftests/sysfs/sysfs.sh | 175 +++++++++++++++++++++++++
4 files changed, 212 insertions(+)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a29b7d398c4e..176b822654e5 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2358,6 +2358,9 @@ config TEST_SYSFS
depends on SYSFS
depends on NET
depends on BLOCK
+ select FAULT_INJECTION
+ select FAULT_INJECTION_DEBUG_FS
+ select FAIL_KERNFS_KNOBS
help
This builds the "test_sysfs" module. This driver enables to test the
sysfs file system safely without affecting production knobs which
diff --git a/lib/test_sysfs.c b/lib/test_sysfs.c
index 273fc3f39740..391e0af2864a 100644
--- a/lib/test_sysfs.c
+++ b/lib/test_sysfs.c
@@ -38,6 +38,11 @@
#include <linux/rtnetlink.h>
#include <linux/genhd.h>
#include <linux/blkdev.h>
+#include <linux/kernfs.h>
+
+#ifdef CONFIG_FAIL_KERNFS_KNOBS
+MODULE_IMPORT_NS(KERNFS_DEBUG_PRIVATE);
+#endif
static bool enable_lock;
module_param(enable_lock, bool_enable_only, 0644);
@@ -82,6 +87,13 @@ static bool enable_verbose_rmmod;
module_param(enable_verbose_rmmod, bool_enable_only, 0644);
MODULE_PARM_DESC(enable_verbose_rmmod, "enable verbose print messages on rmmod");
+#ifdef CONFIG_FAIL_KERNFS_KNOBS
+static bool enable_completion_on_rmmod;
+module_param(enable_completion_on_rmmod, bool_enable_only, 0644);
+MODULE_PARM_DESC(enable_completion_on_rmmod,
+ "enable sending a kernfs completion on rmmod");
+#endif
+
static int sysfs_test_major;
/**
@@ -289,6 +301,12 @@ static ssize_t config_show(struct device *dev,
"enable_verbose_writes:\t%s\n",
enable_verbose_writes ? "true" : "false");
+#ifdef CONFIG_FAIL_KERNFS_KNOBS
+ len += snprintf(buf+len, PAGE_SIZE - len,
+ "enable_completion_on_rmmod:\t%s\n",
+ enable_completion_on_rmmod ? "true" : "false");
+#endif
+
test_dev_config_unlock(test_dev);
return len;
@@ -926,10 +944,23 @@ static int __init test_sysfs_init(void)
}
module_init(test_sysfs_init);
+#ifdef CONFIG_FAIL_KERNFS_KNOBS
+/* The goal is to race our device removal with a pending kernfs -> store call */
+static void test_sysfs_kernfs_send_completion_rmmod(void)
+{
+ if (!enable_completion_on_rmmod)
+ return;
+ complete(&kernfs_debug_wait_completion);
+}
+#else
+static inline void test_sysfs_kernfs_send_completion_rmmod(void) {}
+#endif
+
static void __exit test_sysfs_exit(void)
{
if (enable_debugfs)
debugfs_remove(debugfs_dir);
+ test_sysfs_kernfs_send_completion_rmmod();
if (delay_rmmod_ms)
msleep(delay_rmmod_ms);
unregister_test_dev_sysfs(first_test_dev);
diff --git a/tools/testing/selftests/sysfs/config b/tools/testing/selftests/sysfs/config
index 9196f452ecd5..2876a229f95b 100644
--- a/tools/testing/selftests/sysfs/config
+++ b/tools/testing/selftests/sysfs/config
@@ -1,2 +1,5 @@
CONFIG_SYSFS=m
CONFIG_TEST_SYSFS=m
+CONFIG_FAULT_INJECTION=y
+CONFIG_FAULT_INJECTION_DEBUG_FS=y
+CONFIG_FAIL_KERNFS_KNOBS=y
diff --git a/tools/testing/selftests/sysfs/sysfs.sh b/tools/testing/selftests/sysfs/sysfs.sh
index b3f4c2236c7f..f928635d0e35 100755
--- a/tools/testing/selftests/sysfs/sysfs.sh
+++ b/tools/testing/selftests/sysfs/sysfs.sh
@@ -62,6 +62,10 @@ ALL_TESTS="$ALL_TESTS 0025:1:1:test_dev_y:block"
ALL_TESTS="$ALL_TESTS 0026:1:1:test_dev_y:block"
ALL_TESTS="$ALL_TESTS 0027:1:0:test_dev_x:block" # deadlock test
ALL_TESTS="$ALL_TESTS 0028:1:0:test_dev_x:block" # deadlock test with rntl_lock
+ALL_TESTS="$ALL_TESTS 0029:1:1:test_dev_x:block" # kernfs race removal of store
+ALL_TESTS="$ALL_TESTS 0030:1:1:test_dev_x:block" # kernfs race removal before mutex
+ALL_TESTS="$ALL_TESTS 0031:1:1:test_dev_x:block" # kernfs race removal after mutex
+ALL_TESTS="$ALL_TESTS 0032:1:1:test_dev_x:block" # kernfs race removal after active
allow_user_defaults()
{
@@ -92,6 +96,9 @@ allow_user_defaults()
if [ -z $SYSFS_DEBUGFS_DIR ]; then
SYSFS_DEBUGFS_DIR="/sys/kernel/debug/test_sysfs"
fi
+ if [ -z $KERNFS_DEBUGFS_DIR ]; then
+ KERNFS_DEBUGFS_DIR="/sys/kernel/debug/kernfs"
+ fi
if [ -z $PAGE_SIZE ]; then
PAGE_SIZE=$(getconf PAGESIZE)
fi
@@ -167,6 +174,14 @@ modprobe_reset_enable_rtnl_lock_on_rmmod()
unset FIRST_MODPROBE_ARGS
}
+modprobe_reset_enable_completion()
+{
+ FIRST_MODPROBE_ARGS="enable_completion_on_rmmod=1 enable_verbose_writes=1"
+ FIRST_MODPROBE_ARGS="$FIRST_MODPROBE_ARGS enable_verbose_rmmod=1 delay_rmmod_ms=0"
+ modprobe_reset
+ unset FIRST_MODPROBE_ARGS
+}
+
load_req_mod()
{
modprobe_reset
@@ -197,6 +212,63 @@ debugfs_reset_first_test_dev_ignore_errors()
echo -n "1" >"$SYSFS_DEBUGFS_DIR"/reset_first_test_dev
}
+debugfs_kernfs_kernfs_fop_write_iter_exists()
+{
+ KNOB_DIR="${KERNFS_DEBUGFS_DIR}/config_fail_kernfs_fop_write_iter"
+ if [[ ! -d $KNOB_DIR ]]; then
+ echo "kernfs debugfs does not exist $KNOB_DIR"
+ return 0;
+ fi
+ KNOB_DEBUGFS="${KERNFS_DEBUGFS_DIR}/fail_kernfs_fop_write_iter"
+ if [[ ! -d $KNOB_DEBUGFS ]]; then
+ echo -n "kernfs debugfs for coniguring fail_kernfs_fop_write_iter "
+ echo "does not exist $KNOB_DIR"
+ return 0;
+ fi
+ return 1
+}
+
+debugfs_kernfs_kernfs_fop_write_iter_set_fail_once()
+{
+ KNOB_DEBUGFS="${KERNFS_DEBUGFS_DIR}/fail_kernfs_fop_write_iter"
+ echo 1 > $KNOB_DEBUGFS/interval
+ echo 100 > $KNOB_DEBUGFS/probability
+ echo 0 > $KNOB_DEBUGFS/space
+ # Disable verbose messages on the kernel ring buffer which may
+ # confuse developers with a kernel panic.
+ echo 0 > $KNOB_DEBUGFS/verbose
+
+ # Fail only once
+ echo 1 > $KNOB_DEBUGFS/times
+}
+
+debugfs_kernfs_kernfs_fop_write_iter_set_fail_never()
+{
+ KNOB_DEBUGFS="${KERNFS_DEBUGFS_DIR}/fail_kernfs_fop_write_iter"
+ echo 0 > $KNOB_DEBUGFS/times
+}
+
+debugfs_kernfs_set_wait_ms()
+{
+ SLEEP_AFTER_WAIT_MS="${KERNFS_DEBUGFS_DIR}/sleep_after_wait_ms"
+ echo $1 > $SLEEP_AFTER_WAIT_MS
+}
+
+debugfs_kernfs_disable_wait_kernfs_fop_write_iter()
+{
+ ENABLE_WAIT_KNOB="${KERNFS_DEBUGFS_DIR}/config_fail_kernfs_fop_write_iter/wait_"
+ for KNOB in ${ENABLE_WAIT_KNOB}*; do
+ echo 0 > $KNOB
+ done
+}
+
+debugfs_kernfs_enable_wait_kernfs_fop_write_iter()
+{
+ ENABLE_WAIT_KNOB="${KERNFS_DEBUGFS_DIR}/config_fail_kernfs_fop_write_iter/wait_$1"
+ echo -n "1" > $ENABLE_WAIT_KNOB
+ return $?
+}
+
set_orig()
{
if [[ ! -z $TARGET ]] && [[ ! -z $ORIG ]]; then
@@ -972,6 +1044,105 @@ sysfs_test_0028()
fi
}
+sysfs_race_kernfs_kernfs_fop_write_iter()
+{
+ TARGET="${DIR}/$(get_test_target $1)"
+ WAIT_AT=$2
+ EXPECT_WRITE_RETURNS=$3
+ MSDELAY=$4
+
+ modprobe_reset_enable_completion
+ ORIG=$(cat "${TARGET}")
+ TEST_STR=$(( $ORIG + 1 ))
+
+ echo -n "Test racing removal of sysfs store op with kernfs $WAIT_AT ... "
+
+ if debugfs_kernfs_kernfs_fop_write_iter_exists; then
+ echo -n "skipping test as CONFIG_FAIL_KERNFS_KNOBS "
+ echo " or CONFIG_FAULT_INJECTION_DEBUG_FS is disabled"
+ return $ksft_skip
+ fi
+
+ # Allow for failing the kernfs_kernfs_fop_write_iter call once,
+ # we'll provide exact context shortly afterwards.
+ debugfs_kernfs_kernfs_fop_write_iter_set_fail_once
+
+ # First disable all waits
+ debugfs_kernfs_disable_wait_kernfs_fop_write_iter
+
+ # Enable a wait_for_completion(&kernfs_debug_wait_completion) at the
+ # specified location inside the kernfs_fop_write_iter() routine
+ debugfs_kernfs_enable_wait_kernfs_fop_write_iter $WAIT_AT
+
+ # Configure kernfs so that after its wait_for_completion() it
+ # will msleep() this amount of time and schedule(). We figure this
+ # will be sufficient time to allow for our module removal to complete.
+ debugfs_kernfs_set_wait_ms $MSDELAY
+
+ # Now we trigger a kernfs write op, which will run kernfs_fop_write_iter,
+ # but will wait until our driver sends a respective completion
+ set_test_ignore_errors &
+ write_pid=$!
+
+ # At this point kernfs_fop_write_iter() hasn't run our op, its
+ # waiting for our completion at the specified time $WAIT_AT.
+ # We now remove our module which will send a
+ # complete(&kernfs_debug_wait_completion) right before we deregister
+ # our device and the sysfs device attributes are removed.
+ #
+ # After the completion is sent, the test_sysfs driver races with
+ # kernfs to do the device deregistration with the kernfs msleep
+ # and schedule(). This should mean we've forced trying to remove the
+ # module prior to allowing kernfs to run our store operation. If the
+ # race did happen we'll panic with a null dereference on the store op.
+ #
+ # If no race happens we should see no write operation triggered.
+ modprobe -r $TEST_DRIVER > /dev/null 2>&1
+
+ debugfs_kernfs_kernfs_fop_write_iter_set_fail_never
+
+ wait $write_pid
+ if [[ $? -eq $EXPECT_WRITE_RETURNS ]]; then
+ echo "ok"
+ else
+ echo "FAIL" >&2
+ fi
+}
+
+sysfs_test_0029()
+{
+ for delay in 0 2 4 8 16 32 64 128 246 512 1024; do
+ echo "Using delay-after-completion: $delay"
+ sysfs_race_kernfs_kernfs_fop_write_iter 0029 at_start 1 $delay
+ done
+}
+
+sysfs_test_0030()
+{
+ for delay in 0 2 4 8 16 32 64 128 246 512 1024; do
+ echo "Using delay-after-completion: $delay"
+ sysfs_race_kernfs_kernfs_fop_write_iter 0030 before_mutex 1 $delay
+ done
+}
+
+sysfs_test_0031()
+{
+ for delay in 0 2 4 8 16 32 64 128 246 512 1024; do
+ echo "Using delay-after-completion: $delay"
+ sysfs_race_kernfs_kernfs_fop_write_iter 0031 after_mutex 1 $delay
+ done
+}
+
+# A write only succeeds *iff* a module removal happens *after* the
+# kernfs active reference is obtained with kernfs_get_active().
+sysfs_test_0032()
+{
+ for delay in 0 2 4 8 16 32 64 128 246 512 1024; do
+ echo "Using delay-after-completion: $delay"
+ sysfs_race_kernfs_kernfs_fop_write_iter 0032 after_active 0 $delay
+ done
+}
+
test_gen_desc()
{
echo -n "$1 x $(get_test_count $1)"
@@ -1013,6 +1184,10 @@ list_tests()
echo "$(test_gen_desc 0026) - block test writing y larger delay and resetting device"
echo "$(test_gen_desc 0027) - test rmmod deadlock while writing x ... "
echo "$(test_gen_desc 0028) - test rmmod deadlock using rtnl_lock while writing x ..."
+ echo "$(test_gen_desc 0029) - racing removal of store op with kernfs at start"
+ echo "$(test_gen_desc 0030) - racing removal of store op with kernfs before mutex"
+ echo "$(test_gen_desc 0031) - racing removal of store op with kernfs after mutex"
+ echo "$(test_gen_desc 0032) - racing removal of store op with kernfs after active"
}
usage()
--
2.30.2
This adds initial failure injection support to kernfs. We start
off with debug knobs which when enabled allow test drivers, such as
test_sysfs, to then make use of these to try to force certain
difficult races to take place with a high degree of certainty.
This only adds runtime code *iff* the new bool CONFIG_FAIL_KERNFS_KNOBS is
enabled in your kernel. If you don't have this enabled this provides
no new functional. When CONFIG_FAIL_KERNFS_KNOBS is disabled the new
routine kernfs_debug_should_wait() ends up being transformed to if
(false), and so the compiler should optimize these out as dead code
producing no new effective binary changes.
We start off with enabling failure injections in kernfs by allowing us to
alter the way kernfs_fop_write_iter() behaves. We allow for the routine
kernfs_fop_write_iter() to wait for a certain condition in the kernel to
occur, after which it will sleep a predefined amount of time. This lets
kernfs users to time exactly when it want kernfs_fop_write_iter() to
complete, allowing for developing race conditions and test for correctness
in kernfs.
You'd boot with this enabled on your kernel command line:
fail_kernfs_fop_write_iter=1,100,0,1
The values are <interval,probability,size,times>, we don't care for
size, so for now we ignore it. The above ensures a failure will trigger
only once.
*How* we allow for this routine to change behaviour is left to knobs we
expose under debugfs:
# ls -1 /sys/kernel/debug/kernfs/config_fail_kernfs_fop_write_iter/
wait_after_active
wait_after_mutex
wait_at_start
wait_before_mutex
A debugfs entry also exists to allow us to sleep a configurabler amount
of time after the completion:
/sys/kernel/debug/kernfs/sleep_after_wait_ms
These two sets of knobs allow us to construct races and demonstrate
how the kernfs active reference should suffice to project against
races.
Enabling CONFIG_FAULT_INJECTION_DEBUG_FS enables us to configure the
differnt fault injection parametres for the new fail_kernfs_fop_write_iter
fault injection at run time:
ls -1 /sys/kernel/debug/kernfs/fail_kernfs_fop_write_iter/
interval
probability
space
task-filter
times
verbose
verbose_ratelimit_burst
verbose_ratelimit_interval_ms
Signed-off-by: Luis Chamberlain <mcgrof(a)kernel.org>
---
.../fault-injection/fault-injection.rst | 22 +++++
MAINTAINERS | 2 +-
fs/kernfs/Makefile | 1 +
fs/kernfs/failure-injection.c | 91 +++++++++++++++++++
fs/kernfs/file.c | 13 +++
fs/kernfs/kernfs-internal.h | 72 +++++++++++++++
include/linux/kernfs.h | 5 +
lib/Kconfig.debug | 10 ++
8 files changed, 215 insertions(+), 1 deletion(-)
create mode 100644 fs/kernfs/failure-injection.c
diff --git a/Documentation/fault-injection/fault-injection.rst b/Documentation/fault-injection/fault-injection.rst
index 4a25c5eb6f07..d4d34b082f47 100644
--- a/Documentation/fault-injection/fault-injection.rst
+++ b/Documentation/fault-injection/fault-injection.rst
@@ -28,6 +28,28 @@ Available fault injection capabilities
injects kernel RPC client and server failures.
+- fail_kernfs_fop_write_iter
+
+ Allows for failures to be enabled inside kernfs_fop_write_iter(). Enabling
+ this does not immediately enable any errors to occur. You must configure
+ how you want this routine to fail or change behaviour by using the debugfs
+ knobs for it:
+
+ # ls -1 /sys/kernel/debug/kernfs/config_fail_kernfs_fop_write_iter/
+ wait_after_active
+ wait_after_mutex
+ wait_at_start
+ wait_before_mutex
+
+ You can also configure how long to sleep after a wait under
+
+ /sys/kernel/debug/kernfs/sleep_after_wait_ms
+
+ If you enable CONFIG_FAULT_INJECTION_DEBUG_FS the fail_add_disk failure
+ injection parameters are placed under:
+
+ /sys/kernel/debug/kernfs/fail_kernfs_fop_write_iter/
+
- fail_make_request
injects disk IO errors on devices permitted by setting
diff --git a/MAINTAINERS b/MAINTAINERS
index 28a34384f541..acdbf91058d5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10341,7 +10341,7 @@ M: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
M: Tejun Heo <tj(a)kernel.org>
S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git
-F: fs/kernfs/
+F: fs/kernfs/*
F: include/linux/kernfs.h
KEXEC
diff --git a/fs/kernfs/Makefile b/fs/kernfs/Makefile
index 4ca54ff54c98..bc5b32ca39f9 100644
--- a/fs/kernfs/Makefile
+++ b/fs/kernfs/Makefile
@@ -4,3 +4,4 @@
#
obj-y := mount.o inode.o dir.o file.o symlink.o
+obj-$(CONFIG_FAIL_KERNFS_KNOBS) += failure-injection.o
diff --git a/fs/kernfs/failure-injection.c b/fs/kernfs/failure-injection.c
new file mode 100644
index 000000000000..4130d202c13b
--- /dev/null
+++ b/fs/kernfs/failure-injection.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/fault-inject.h>
+#include <linux/delay.h>
+
+#include "kernfs-internal.h"
+
+static DECLARE_FAULT_ATTR(fail_kernfs_fop_write_iter);
+struct kernfs_config_fail kernfs_config_fail;
+
+#define kernfs_config_fail(when) \
+ kernfs_config_fail.kernfs_fop_write_iter_fail.wait_ ## when
+
+#define kernfs_config_fail(when) \
+ kernfs_config_fail.kernfs_fop_write_iter_fail.wait_ ## when
+
+static int __init setup_fail_kernfs_fop_write_iter(char *str)
+{
+ return setup_fault_attr(&fail_kernfs_fop_write_iter, str);
+}
+
+__setup("fail_kernfs_fop_write_iter=", setup_fail_kernfs_fop_write_iter);
+
+struct dentry *kernfs_debugfs_root;
+struct dentry *config_fail_kernfs_fop_write_iter;
+
+static int __init kernfs_init_failure_injection(void)
+{
+ kernfs_config_fail.sleep_after_wait_ms = 100;
+ kernfs_debugfs_root = debugfs_create_dir("kernfs", NULL);
+
+ fault_create_debugfs_attr("fail_kernfs_fop_write_iter",
+ kernfs_debugfs_root, &fail_kernfs_fop_write_iter);
+
+ config_fail_kernfs_fop_write_iter =
+ debugfs_create_dir("config_fail_kernfs_fop_write_iter",
+ kernfs_debugfs_root);
+
+ debugfs_create_u32("sleep_after_wait_ms", 0600,
+ kernfs_debugfs_root,
+ &kernfs_config_fail.sleep_after_wait_ms);
+
+ debugfs_create_bool("wait_at_start", 0600,
+ config_fail_kernfs_fop_write_iter,
+ &kernfs_config_fail(at_start));
+ debugfs_create_bool("wait_before_mutex", 0600,
+ config_fail_kernfs_fop_write_iter,
+ &kernfs_config_fail(before_mutex));
+ debugfs_create_bool("wait_after_mutex", 0600,
+ config_fail_kernfs_fop_write_iter,
+ &kernfs_config_fail(after_mutex));
+ debugfs_create_bool("wait_after_active", 0600,
+ config_fail_kernfs_fop_write_iter,
+ &kernfs_config_fail(after_active));
+ return 0;
+}
+late_initcall(kernfs_init_failure_injection);
+
+int __kernfs_debug_should_wait_kernfs_fop_write_iter(bool evaluate)
+{
+ if (!evaluate)
+ return 0;
+
+ return should_fail(&fail_kernfs_fop_write_iter, 0);
+}
+
+DECLARE_COMPLETION(kernfs_debug_wait_completion);
+EXPORT_SYMBOL_NS_GPL(kernfs_debug_wait_completion, KERNFS_DEBUG_PRIVATE);
+
+void kernfs_debug_wait(void)
+{
+ unsigned long timeout;
+
+ timeout = wait_for_completion_timeout(&kernfs_debug_wait_completion,
+ msecs_to_jiffies(3000));
+ if (!timeout)
+ pr_info("%s waiting for kernfs_debug_wait_completion timed out\n",
+ __func__);
+ else
+ pr_info("%s received completion with time left on timeout %u ms\n",
+ __func__, jiffies_to_msecs(timeout));
+
+ /**
+ * The goal is wait for an event, and *then* once we have
+ * reached it, the other side will try to do something which
+ * it thinks will break. So we must give it some time to do
+ * that. The amount of time is configurable.
+ */
+ msleep(kernfs_config_fail.sleep_after_wait_ms);
+ pr_info("%s ended\n", __func__);
+}
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 60e2a86c535e..4479c6580333 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -259,6 +259,9 @@ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter)
const struct kernfs_ops *ops;
char *buf;
+ if (kernfs_debug_should_wait(kernfs_fop_write_iter, at_start))
+ kernfs_debug_wait();
+
if (of->atomic_write_len) {
if (len > of->atomic_write_len)
return -E2BIG;
@@ -280,17 +283,27 @@ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter)
}
buf[len] = '\0'; /* guarantee string termination */
+ if (kernfs_debug_should_wait(kernfs_fop_write_iter, before_mutex))
+ kernfs_debug_wait();
+
/*
* @of->mutex nests outside active ref and is used both to ensure that
* the ops aren't called concurrently for the same open file.
*/
mutex_lock(&of->mutex);
+
+ if (kernfs_debug_should_wait(kernfs_fop_write_iter, after_mutex))
+ kernfs_debug_wait();
+
if (!kernfs_get_active(of->kn)) {
mutex_unlock(&of->mutex);
len = -ENODEV;
goto out_free;
}
+ if (kernfs_debug_should_wait(kernfs_fop_write_iter, after_active))
+ kernfs_debug_wait();
+
ops = kernfs_ops(of->kn);
if (ops->write)
len = ops->write(of, buf, len, iocb->ki_pos);
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index f9cc912c31e1..9e3abf597e2d 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -18,6 +18,7 @@
#include <linux/kernfs.h>
#include <linux/fs_context.h>
+#include <linux/stringify.h>
struct kernfs_iattrs {
kuid_t ia_uid;
@@ -147,4 +148,75 @@ void kernfs_drain_open_files(struct kernfs_node *kn);
*/
extern const struct inode_operations kernfs_symlink_iops;
+/*
+ * failure-injection.c
+ */
+#ifdef CONFIG_FAIL_KERNFS_KNOBS
+
+/**
+ * struct kernfs_fop_write_iter_fail - how kernfs_fop_write_iter_fail fails
+ *
+ * This lets you configure what part of kernfs_fop_write_iter() should behave
+ * in a specific way to allow userspace to capture possible failures in
+ * kernfs. The wait knobs are allowed to let you design capture possible
+ * race conditions which would otherwise be difficult to reproduce. A
+ * secondary driver would tell kernfs's wait completion when it is done.
+ *
+ * The point to the wait completion failure injection tests are to confirm
+ * that the kernfs active refcount suffice to ensure other objects in other
+ * layers are also gauranteed to exist, even they are opaque to kernfs. This
+ * includes kobjects, devices, and other objects built on top of this, like
+ * the block layer when using sysfs block device attributes.
+ *
+ * @wait_at_start: waits for completion from a third party at the start of
+ * the routine.
+ * @wait_before_mutex: waits for completion from a third party before we
+ * are allowed to continue before the of->mutex is held.
+ * @wait_after_mutex: waits for completion from a third party after we
+ * have held the of->mutex.
+ * @wait_after_active: waits for completion from a thid party after we
+ * have refcounted the struct kernfs_node.
+ */
+struct kernfs_fop_write_iter_fail {
+ bool wait_at_start;
+ bool wait_before_mutex;
+ bool wait_after_mutex;
+ bool wait_after_active;
+};
+
+/**
+ * struct kernfs_config_fail - kernfs configuration for failure injection
+ *
+ * You can kernfs failure injection on boot, and in particular we currently
+ * only support failures for kernfs_fop_write_iter(). However, we don't
+ * want to always enable errors on this call when failure injection is enabled
+ * as this routine is used by many parts of the kernel for proper functionality.
+ * The compromise we make is we let userspace start enabling which parts it
+ * wants to fail after boot, if and only if failure injection has been enabled.
+ *
+ * @kernfs_fop_write_iter_fail: configuration for how we want to allow
+ * for failure injection on kernfs_fop_write_iter()
+ * @sleep_after_wait_ms: how many ms to wait after completion is received.
+ */
+struct kernfs_config_fail {
+ struct kernfs_fop_write_iter_fail kernfs_fop_write_iter_fail;
+ u32 sleep_after_wait_ms;
+};
+
+extern struct kernfs_config_fail kernfs_config_fail;
+
+#define __kernfs_config_wait_var(func, when) \
+ (kernfs_config_fail. func ## _fail.wait_ ## when)
+#define __kernfs_debug_should_wait_func_name(func) __kernfs_debug_should_wait_## func
+
+#define kernfs_debug_should_wait(func, when) \
+ __kernfs_debug_should_wait_func_name(func)(__kernfs_config_wait_var(func, when))
+int __kernfs_debug_should_wait_kernfs_fop_write_iter(bool evaluate);
+void kernfs_debug_wait(void);
+#else
+static inline void kernfs_init_failure_injection(void) {}
+#define kernfs_debug_should_wait(func, when) (false)
+static inline void kernfs_debug_wait(void) {}
+#endif
+
#endif /* __KERNFS_INTERNAL_H */
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 3ccce6f24548..cd968ee2b503 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -411,6 +411,11 @@ void kernfs_init(void);
struct kernfs_node *kernfs_find_and_get_node_by_id(struct kernfs_root *root,
u64 id);
+
+#ifdef CONFIG_FAIL_KERNFS_KNOBS
+extern struct completion kernfs_debug_wait_completion;
+#endif
+
#else /* CONFIG_KERNFS */
static inline enum kernfs_node_type kernfs_type(struct kernfs_node *kn)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index ae19bf1a21b8..a29b7d398c4e 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1902,6 +1902,16 @@ config FAULT_INJECTION_USERCOPY
Provides fault-injection capability to inject failures
in usercopy functions (copy_from_user(), get_user(), ...).
+config FAIL_KERNFS_KNOBS
+ bool "Fault-injection support in kernfs"
+ depends on FAULT_INJECTION
+ help
+ Provide fault-injection capability for kernfs. This only enables
+ the error injection functionality. To use it you must configure which
+ which path you want to trigger on error on using debugfs under
+ /sys/kernel/debug/kernfs/config_fail_kernfs_fop_write_iter/. By
+ default all of these are disabled.
+
config FAIL_MAKE_REQUEST
bool "Fault-injection capability for disk IO"
depends on FAULT_INJECTION && BLOCK
--
2.30.2
Two selftests drivers exist under the copyleft-next license.
These drivers were added prior to SPDX practice taking full swing
in the kernel. Now that we have an SPDX tag for copylef-next-0.3.1
documented, embrace it and remove the boiler plate.
Cc: Goldwyn Rodrigues <rgoldwyn(a)suse.com>
Cc: Kuno Woudt <kuno(a)frob.nl>
Cc: Richard Fontana <fontana(a)sharpeleven.org>
Cc: copyleft-next(a)lists.fedorahosted.org
Cc: Ciaran Farrell <Ciaran.Farrell(a)suse.com>
Cc: Christopher De Nicolo <Christopher.DeNicolo(a)suse.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Thorsten Leemhuis <linux(a)leemhuis.info>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Luis Chamberlain <mcgrof(a)kernel.org>
---
lib/test_kmod.c | 12 +-----------
lib/test_sysctl.c | 12 +-----------
tools/testing/selftests/kmod/kmod.sh | 13 +------------
tools/testing/selftests/sysctl/sysctl.sh | 12 +-----------
4 files changed, 4 insertions(+), 45 deletions(-)
diff --git a/lib/test_kmod.c b/lib/test_kmod.c
index ce1589391413..d62afd89dc63 100644
--- a/lib/test_kmod.c
+++ b/lib/test_kmod.c
@@ -1,18 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0-or-later OR copyleft-next-0.3.1
/*
* kmod stress test driver
*
* Copyright (C) 2017 Luis R. Rodriguez <mcgrof(a)kernel.org>
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or at your option any
- * later version; or, when distributed separately from the Linux kernel or
- * when incorporated into other software packages, subject to the following
- * license:
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of copyleft-next (version 0.3.1 or later) as published
- * at http://copyleft-next.org/.
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
diff --git a/lib/test_sysctl.c b/lib/test_sysctl.c
index 3750323973f4..9e5bd10a930a 100644
--- a/lib/test_sysctl.c
+++ b/lib/test_sysctl.c
@@ -1,18 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0-or-later OR copyleft-next-0.3.1
/*
* proc sysctl test driver
*
* Copyright (C) 2017 Luis R. Rodriguez <mcgrof(a)kernel.org>
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or at your option any
- * later version; or, when distributed separately from the Linux kernel or
- * when incorporated into other software packages, subject to the following
- * license:
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of copyleft-next (version 0.3.1 or later) as published
- * at http://copyleft-next.org/.
*/
/*
diff --git a/tools/testing/selftests/kmod/kmod.sh b/tools/testing/selftests/kmod/kmod.sh
index afd42387e8b2..7189715d7960 100755
--- a/tools/testing/selftests/kmod/kmod.sh
+++ b/tools/testing/selftests/kmod/kmod.sh
@@ -1,18 +1,7 @@
#!/bin/bash
-#
+# SPDX-License-Identifier: GPL-2.0-or-later OR copyleft-next-0.3.1
# Copyright (C) 2017 Luis R. Rodriguez <mcgrof(a)kernel.org>
#
-# This program is free software; you can redistribute it and/or modify it
-# under the terms of the GNU General Public License as published by the Free
-# Software Foundation; either version 2 of the License, or at your option any
-# later version; or, when distributed separately from the Linux kernel or
-# when incorporated into other software packages, subject to the following
-# license:
-#
-# This program is free software; you can redistribute it and/or modify it
-# under the terms of copyleft-next (version 0.3.1 or later) as published
-# at http://copyleft-next.org/.
-
# This is a stress test script for kmod, the kernel module loader. It uses
# test_kmod which exposes a series of knobs for the API for us so we can
# tweak each test in userspace rather than in kernelspace.
diff --git a/tools/testing/selftests/sysctl/sysctl.sh b/tools/testing/selftests/sysctl/sysctl.sh
index 19515dcb7d04..2046c603a4d4 100755
--- a/tools/testing/selftests/sysctl/sysctl.sh
+++ b/tools/testing/selftests/sysctl/sysctl.sh
@@ -1,16 +1,6 @@
#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-later OR copyleft-next-0.3.1
# Copyright (C) 2017 Luis R. Rodriguez <mcgrof(a)kernel.org>
-#
-# This program is free software; you can redistribute it and/or modify it
-# under the terms of the GNU General Public License as published by the Free
-# Software Foundation; either version 2 of the License, or at your option any
-# later version; or, when distributed separately from the Linux kernel or
-# when incorporated into other software packages, subject to the following
-# license:
-#
-# This program is free software; you can redistribute it and/or modify it
-# under the terms of copyleft-next (version 0.3.1 or later) as published
-# at http://copyleft-next.org/.
# This performs a series tests against the proc sysctl interface.
--
2.30.2
Add the full text of the copyleft-next-0.3.1 license to the kernel
tree as well as the required tags for reference and tooling.
The license text was copied directly from the copyleft-next project's
git tree [0].
Discussion of using copyleft-next-0.3.1 on Linux started since June,
2016 [1]. In the end Linus' preference was to have drivers use
MODULE_LICENSE("GPL") to make it clear that the GPL applies when it
comes to Linux [2]. Additionally, even though copyleft-next-0.3.1 has
been found to be to be GPLv2 compatible by three attorneys at SUSE and
Redhat [3], to err on the side of caution we simply recommend to
always use the "OR" language for this license [4].
Even though it has been a goal of the project to be GPL-v2 compatible
to be certain in 2016 I asked for a clarification about what makes
copyleft-next GPLv2 compatible and also asked for a summary of
benefits. This prompted some small minor changes to make compatibility
even further clear and as of copyleft 0.3.1 compatibility should
be crystal clear [5].
The summary of why copyleft-next 0.3.1 is compatible with GPLv2
is explained as follows:
Like GPLv2, copyleft-next requires distribution of derivative works
("Derived Works" in copyleft-next 0.3.x) to be under the same license.
Ordinarily this would make the two licenses incompatible. However,
copyleft-next 0.3.1 says: "If the Derived Work includes material
licensed under the GPL, You may instead license the Derived Work under
the GPL." "GPL" is defined to include GPLv2.
In practice this means copyleft-next code in Linux may be licensed
under the GPL2, however there are additional obvious gains for
bringing contributions from Linux outbound where copyleft-next is
preferred. A summary of benefits why projects outside of Linux might
prefer to use copyleft-next >= 0.3.1 over GPLv2:
o It is much shorter and simpler
o It has an explicit patent license grant, unlike GPLv2
o Its notice preservation conditions are clearer
o More free software/open source licenses are compatible
with it (via section 4)
o The source code requirement triggered by binary distribution
is much simpler in a procedural sense
o Recipients potentially have a contract claim against distributors
who are noncompliant with the source code requirement
o There is a built-in inbound=outbound policy for upstream
contributions (cf. Apache License 2.0 section 5)
o There are disincentives to engage in the controversial practice
of copyleft/ proprietary dual-licensing
o In 15 years copyleft expires, which can be advantageous
for legacy code
o There are explicit disincentives to bringing patent infringement
claims accusing the licensed work of infringement (see 10b)
o There is a cure period for licensees who are not compliant
with the license (there is no cure opportunity in GPLv2)
o copyleft-next has a 'built-in or-later' provision
The first driver submission to Linux under this dual strategy was
lib/test_sysctl.c through commit 9308f2f9e7f05 ("test_sysctl: add
dedicated proc sysctl test driver") merged in July 2017. Shortly after
that I also added test_kmod through commit d9c6a72d6fa29 ("kmod: add
test driver to stress test the module loader") in the same month. These
two drivers went in just a few months before the SPDX license practice
kicked in. In 2018 Kuno Woudt went through the process to get SPDX
identifiers for copyleft-next [6] [7]. Although there are SPDX tags
for copyleft-next-0.3.0, we only document use in Linux starting from
copyleft-next-0.3.1 which makes GPLv2 compatibility crystal clear.
This patch will let us update the two Linux selftest drivers in
subsequent patches with their respective SPDX license identifiers and
let us remove repetitive license boiler plate.
[0] https://github.com/copyleft-next/copyleft-next/blob/master/Releases/copylef…
[1] https://lore.kernel.org/lkml/1465929311-13509-1-git-send-email-mcgrof@kerne…
[2] https://lore.kernel.org/lkml/CA+55aFyhxcvD+q7tp+-yrSFDKfR0mOHgyEAe=f_94aKLs…
[3] https://lore.kernel.org/lkml/20170516232702.GL17314@wotan.suse.de/
[4] https://lkml.kernel.org/r/1495234558.7848.122.camel@linux.intel.com
[5] https://lists.fedorahosted.org/archives/list/copyleft-next@lists.fedorahost…
[6] https://spdx.org/licenses/copyleft-next-0.3.0.html
[7] https://spdx.org/licenses/copyleft-next-0.3.1.html
Cc: Goldwyn Rodrigues <rgoldwyn(a)suse.com>
Cc: Kuno Woudt <kuno(a)frob.nl>
Cc: Richard Fontana <fontana(a)sharpeleven.org>
Cc: copyleft-next(a)lists.fedorahosted.org
Cc: Ciaran Farrell <Ciaran.Farrell(a)suse.com>
Cc: Christopher De Nicolo <Christopher.DeNicolo(a)suse.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Thorsten Leemhuis <linux(a)leemhuis.info>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Luis Chamberlain <mcgrof(a)kernel.org>
---
LICENSES/dual/copyleft-next-0.3.1 | 237 ++++++++++++++++++++++++++++++
1 file changed, 237 insertions(+)
create mode 100644 LICENSES/dual/copyleft-next-0.3.1
diff --git a/LICENSES/dual/copyleft-next-0.3.1 b/LICENSES/dual/copyleft-next-0.3.1
new file mode 100644
index 000000000000..086bcb74b478
--- /dev/null
+++ b/LICENSES/dual/copyleft-next-0.3.1
@@ -0,0 +1,237 @@
+Valid-License-Identifier: copyleft-next-0.3.1
+SPDX-URL: https://spdx.org/licenses/copyleft-next-0.3.1
+Usage-Guide:
+ This license can be used in code, it has been found to be GPLv2 compatible
+ by attorneys at Redhat and SUSE, however to air on the side of caution,
+ it's best to only use it together with a GPL2 compatible license using "OR".
+ To use the copyleft-next-0.3.1 license put the following SPDX tag/value
+ pair into a comment according to the placement guidelines in the
+ licensing rules documentation:
+ SPDX-License-Identifier: GPL-2.0 OR copyleft-next-0.3.1
+ SPDX-License-Identifier: GPL-2.0-only OR copyleft-next 0.3.1
+ SPDX-License-Identifier: GPL-2.0+ OR copyleft-next-0.3.1
+ SPDX-License-Identifier: GPL-2.0-or-later OR copyleft-next-0.3.1
+License-Text:
+
+=======================================================================
+
+ copyleft-next 0.3.1 ("this License")
+ Release date: 2016-04-29
+
+1. License Grants; No Trademark License
+
+ Subject to the terms of this License, I grant You:
+
+ a) A non-exclusive, worldwide, perpetual, royalty-free, irrevocable
+ copyright license, to reproduce, Distribute, prepare derivative works
+ of, publicly perform and publicly display My Work.
+
+ b) A non-exclusive, worldwide, perpetual, royalty-free, irrevocable
+ patent license under Licensed Patents to make, have made, use, sell,
+ offer for sale, and import Covered Works.
+
+ This License does not grant any rights in My name, trademarks, service
+ marks, or logos.
+
+2. Distribution: General Conditions
+
+ You may Distribute Covered Works, provided that You (i) inform
+ recipients how they can obtain a copy of this License; (ii) satisfy the
+ applicable conditions of sections 3 through 6; and (iii) preserve all
+ Legal Notices contained in My Work (to the extent they remain
+ pertinent). "Legal Notices" means copyright notices, license notices,
+ license texts, and author attributions, but does not include logos,
+ other graphical images, trademarks or trademark legends.
+
+3. Conditions for Distributing Derived Works; Outbound GPL Compatibility
+
+ If You Distribute a Derived Work, You must license the entire Derived
+ Work as a whole under this License, with prominent notice of such
+ licensing. This condition may not be avoided through such means as
+ separate Distribution of portions of the Derived Work.
+
+ If the Derived Work includes material licensed under the GPL, You may
+ instead license the Derived Work under the GPL.
+
+4. Condition Against Further Restrictions; Inbound License Compatibility
+
+ When Distributing a Covered Work, You may not impose further
+ restrictions on the exercise of rights in the Covered Work granted under
+ this License. This condition is not excused merely because such
+ restrictions result from Your compliance with conditions or obligations
+ extrinsic to this License (such as a court order or an agreement with a
+ third party).
+
+ However, You may Distribute a Covered Work incorporating material
+ governed by a license that is both OSI-Approved and FSF-Free as of the
+ release date of this License, provided that compliance with such
+ other license would not conflict with any conditions stated in other
+ sections of this License.
+
+5. Conditions for Distributing Object Code
+
+ You may Distribute an Object Code form of a Covered Work, provided that
+ you accompany the Object Code with a URL through which the Corresponding
+ Source is made available, at no charge, by some standard or customary
+ means of providing network access to source code.
+
+ If you Distribute the Object Code in a physical product or tangible
+ storage medium ("Product"), the Corresponding Source must be available
+ through such URL for two years from the date of Your most recent
+ Distribution of the Object Code in the Product. However, if the Product
+ itself contains or is accompanied by the Corresponding Source (made
+ available in a customarily accessible manner), You need not also comply
+ with the first paragraph of this section.
+
+ Each direct and indirect recipient of the Covered Work from You is an
+ intended third-party beneficiary of this License solely as to this
+ section 5, with the right to enforce its terms.
+
+6. Symmetrical Licensing Condition for Upstream Contributions
+
+ If You Distribute a work to Me specifically for inclusion in or
+ modification of a Covered Work (a "Patch"), and no explicit licensing
+ terms apply to the Patch, You license the Patch under this License, to
+ the extent of Your copyright in the Patch. This condition does not
+ negate the other conditions of this License, if applicable to the Patch.
+
+7. Nullification of Copyleft/Proprietary Dual Licensing
+
+ If I offer to license, for a fee, a Covered Work under terms other than
+ a license that is OSI-Approved or FSF-Free as of the release date of this
+ License or a numbered version of copyleft-next released by the
+ Copyleft-Next Project, then the license I grant You under section 1 is no
+ longer subject to the conditions in sections 3 through 5.
+
+8. Copyleft Sunset
+
+ The conditions in sections 3 through 5 no longer apply once fifteen
+ years have elapsed from the date of My first Distribution of My Work
+ under this License.
+
+9. Pass-Through
+
+ When You Distribute a Covered Work, the recipient automatically receives
+ a license to My Work from Me, subject to the terms of this License.
+
+10. Termination
+
+ Your license grants under section 1 are automatically terminated if You
+
+ a) fail to comply with the conditions of this License, unless You cure
+ such noncompliance within thirty days after becoming aware of it, or
+
+ b) initiate a patent infringement litigation claim (excluding
+ declaratory judgment actions, counterclaims, and cross-claims)
+ alleging that any part of My Work directly or indirectly infringes
+ any patent.
+
+ Termination of Your license grants extends to all copies of Covered
+ Works You subsequently obtain. Termination does not terminate the
+ rights of those who have received copies or rights from You subject to
+ this License.
+
+ To the extent permission to make copies of a Covered Work is necessary
+ merely for running it, such permission is not terminable.
+
+11. Later License Versions
+
+ The Copyleft-Next Project may release new versions of copyleft-next,
+ designated by a distinguishing version number ("Later Versions").
+ Unless I explicitly remove the option of Distributing Covered Works
+ under Later Versions, You may Distribute Covered Works under any Later
+ Version.
+
+** 12. No Warranty **
+** **
+** My Work is provided "as-is", without warranty. You bear the risk **
+** of using it. To the extent permitted by applicable law, each **
+** Distributor of My Work excludes the implied warranties of title, **
+** merchantability, fitness for a particular purpose and **
+** non-infringement. **
+
+** 13. Limitation of Liability **
+** **
+** To the extent permitted by applicable law, in no event will any **
+** Distributor of My Work be liable to You for any damages **
+** whatsoever, whether direct, indirect, special, incidental, or **
+** consequential damages, whether arising under contract, tort **
+** (including negligence), or otherwise, even where the Distributor **
+** knew or should have known about the possibility of such damages. **
+
+14. Severability
+
+ The invalidity or unenforceability of any provision of this License
+ does not affect the validity or enforceability of the remainder of
+ this License. Such provision is to be reformed to the minimum extent
+ necessary to make it valid and enforceable.
+
+15. Definitions
+
+ "Copyleft-Next Project" means the project that maintains the source
+ code repository at <https://github.com/copyleft-next/copyleft-next.git/>
+ as of the release date of this License.
+
+ "Corresponding Source" of a Covered Work in Object Code form means (i)
+ the Source Code form of the Covered Work; (ii) all scripts,
+ instructions and similar information that are reasonably necessary for
+ a skilled developer to generate such Object Code from the Source Code
+ provided under (i); and (iii) a list clearly identifying all Separate
+ Works (other than those provided in compliance with (ii)) that were
+ specifically used in building and (if applicable) installing the
+ Covered Work (for example, a specified proprietary compiler including
+ its version number). Corresponding Source must be machine-readable.
+
+ "Covered Work" means My Work or a Derived Work.
+
+ "Derived Work" means a work of authorship that copies from, modifies,
+ adapts, is based on, is a derivative work of, transforms, translates or
+ contains all or part of My Work, such that copyright permission is
+ required. The following are not Derived Works: (i) Mere Aggregation;
+ (ii) a mere reproduction of My Work; and (iii) if My Work fails to
+ explicitly state an expectation otherwise, a work that merely makes
+ reference to My Work.
+
+ "Distribute" means to distribute, transfer or make a copy available to
+ someone else, such that copyright permission is required.
+
+ "Distributor" means Me and anyone else who Distributes a Covered Work.
+
+ "FSF-Free" means classified as 'free' by the Free Software Foundation.
+
+ "GPL" means a version of the GNU General Public License or the GNU
+ Affero General Public License.
+
+ "I"/"Me"/"My" refers to the individual or legal entity that places My
+ Work under this License. "You"/"Your" refers to the individual or legal
+ entity exercising rights in My Work under this License. A legal entity
+ includes each entity that controls, is controlled by, or is under
+ common control with such legal entity. "Control" means (a) the power to
+ direct the actions of such legal entity, whether by contract or
+ otherwise, or (b) ownership of more than fifty percent of the
+ outstanding shares or beneficial ownership of such legal entity.
+
+ "Licensed Patents" means all patent claims licensable royalty-free by
+ Me, now or in the future, that are necessarily infringed by making,
+ using, or selling My Work, and excludes claims that would be infringed
+ only as a consequence of further modification of My Work.
+
+ "Mere Aggregation" means an aggregation of a Covered Work with a
+ Separate Work.
+
+ "My Work" means the particular work of authorship I license to You
+ under this License.
+
+ "Object Code" means any form of a work that is not Source Code.
+
+ "OSI-Approved" means approved as 'Open Source' by the Open Source
+ Initiative.
+
+ "Separate Work" means a work that is separate from and independent of a
+ particular Covered Work and is not by its nature an extension or
+ enhancement of the Covered Work, and/or a runtime library, standard
+ library or similar component that is used to generate an Object Code
+ form of a Covered Work.
+
+ "Source Code" means the preferred form of a work for making
+ modifications to it.
--
2.30.2
udmabuf has the following implicit declaration warns:
udmabuf.c:30:10: warning: implicit declaration of function 'open';
udmabuf.c:42:8: warning: implicit declaration of function 'fcntl'
These are caused due to not including fcntl.h and including just
linux/fcntl.h. Fix it to include fcntl.h which will bring in the
linux/fcntl.h. In addition, define __EXPORTED_HEADERS__ to bring in
F_ADD_SEALS and F_SEAL_SHRINK defines and fix the following error
that show up when just fcntl.h is included.
udmabuf.c:45:21: error: 'F_ADD_SEALS' undeclared
45 | ret = fcntl(memfd, F_ADD_SEALS, F_SEAL_SHRINK);
| ^~~~~~~~~~~
udmabuf.c:45:34: error: 'F_SEAL_SHRINK' undeclared
45 | ret = fcntl(memfd, F_ADD_SEALS, F_SEAL_SHRINK);
| ^~~~~~~~~~~~~
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
tools/testing/selftests/drivers/dma-buf/udmabuf.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/drivers/dma-buf/udmabuf.c b/tools/testing/selftests/drivers/dma-buf/udmabuf.c
index 4de902ea14d8..de1c4e6de0b2 100644
--- a/tools/testing/selftests/drivers/dma-buf/udmabuf.c
+++ b/tools/testing/selftests/drivers/dma-buf/udmabuf.c
@@ -1,10 +1,13 @@
// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#define __EXPORTED_HEADERS__
+
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
-#include <linux/fcntl.h>
+#include <fcntl.h>
#include <malloc.h>
#include <sys/ioctl.h>
--
2.30.2
This series fixes up a few issues introduced into vec-syscfg during
refactoring in the review process, then adds a new test which ensures
that the behaviour when we attempt to set a vector length which is not
supported by the current system matches what is documented in the SVE
ABI documentation.
Mark Brown (4):
selftests: arm64: Fix printf() format mismatch in vec-syscfg
selftests: arm64: Remove bogus error check on writing to files
selftests: arm64: Fix and enable test for setting current VL in
vec-syscfg
selftests: arm64: Verify that all possible vector lengths are handled
tools/testing/selftests/arm64/fp/vec-syscfg.c | 94 ++++++++++++++++---
1 file changed, 81 insertions(+), 13 deletions(-)
base-commit: 6880fa6c56601bb8ed59df6c30fd390cc5f6dd8f
--
2.20.1
Hi Everybody,
This series consists out of outstanding SGX selftests changes, rebased
and gathered in a single series that is more easily merged for testing
and development, and a few more changes added to expand the existing tests.
The outstanding SGX selftest changes included in this series that have already
been submitted separately are:
* An almost two year old patch fixing a benign linker warning that is still
present today:
https://lore.kernel.org/linux-sgx/20191017030340.18301-2-sean.j.christopher…
The original patch is added intact and not all email addresses
within are valid.
* Latest (v4) of Jarkko Sakkinen's series to add an oversubscription test:
https://lore.kernel.org/linux-sgx/20210809093127.76264-1-jarkko@kernel.org/
* Latest (v2) of Jarkko Sakkinen's patch that provides provide per-op
parameter structs for the test enclave:
https://lore.kernel.org/linux-sgx/20210812224645.90280-1-jarkko@kernel.org/
The reason why most of these patches are outstanding is that they depend
on a kernel change that is still under discussion. Decision to wait in:
https://lore.kernel.org/linux-sgx/f8674dac5579a8a424de1565f7ffa2b5bf2f8e36.…
The original patch for this kernel dependency continues to be included in
this series as a placeholder until the ongoing discussions are concluded.
The new changes introduced in this series builds on Jarkko's outstanding
SGX selftest changes and adds new tests for page permissions, exception
handling, and thread entry.
Reinette
Jarkko Sakkinen (9):
x86/sgx: Add /sys/kernel/debug/x86/sgx_total_mem
selftests/sgx: Assign source for each segment
selftests/sgx: Make data measurement for an enclave segment optional
selftests/sgx: Create a heap for the test enclave
selftests/sgx: Dump segments and /proc/self/maps only on failure
selftests/sgx: Encpsulate the test enclave creation
selftests/sgx: Move setup_test_encl() to each TEST_F()
selftests/sgx: Add a new kselftest: unclobbered_vdso_oversubscribed
selftests/sgx: Provide per-op parameter structs for the test enclave
Reinette Chatre (4):
selftests/sgx: Rename test properties in preparation for more enclave
tests
selftests/sgx: Add page permission and exception test
selftests/sgx: Enable multiple thread support
selftests/sgx: Add test for multiple TCS entry
Sean Christopherson (1):
selftests/x86/sgx: Fix a benign linker warning
Documentation/x86/sgx.rst | 6 +
arch/x86/kernel/cpu/sgx/main.c | 10 +-
tools/testing/selftests/sgx/Makefile | 2 +-
tools/testing/selftests/sgx/defines.h | 33 +-
tools/testing/selftests/sgx/load.c | 40 +-
tools/testing/selftests/sgx/main.c | 341 +++++++++++++++---
tools/testing/selftests/sgx/main.h | 7 +-
tools/testing/selftests/sgx/sigstruct.c | 12 +-
tools/testing/selftests/sgx/test_encl.c | 60 ++-
.../selftests/sgx/test_encl_bootstrap.S | 21 +-
10 files changed, 445 insertions(+), 87 deletions(-)
--
2.25.1
From: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
commit 210f9df02611cbe641ced3239122b270fd907d86 upstream.
The selftest for ftrace checks some features by checking if the README has
text that states the feature is supported by that kernel. Unfortunately,
this check gives false positives because it many not be checked if there's
spaces in the string to check. This is due to the compare between the
required variable with the ":README" string stripped, because neither has
quotes around them.
Link: https://lkml.kernel.org/r/20210820204742.087177341@goodmis.org
Cc: "Tzvetomir Stoyanov" <tz.stoyanov(a)gmail.com>
Cc: Tom Zanussi <zanussi(a)kernel.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Shuah Khan <skhan(a)linuxfoundation.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Fixes: 1b8eec510ba64 ("selftests/ftrace: Support ":README" suffix for requires")
Acked-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
tools/testing/selftests/ftrace/test.d/functions | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/ftrace/test.d/functions
+++ b/tools/testing/selftests/ftrace/test.d/functions
@@ -115,7 +115,7 @@ check_requires() { # Check required file
echo "Required tracer $t is not configured."
exit_unsupported
fi
- elif [ $r != $i ]; then
+ elif [ "$r" != "$i" ]; then
if ! grep -Fq "$r" README ; then
echo "Required feature pattern \"$r\" is not in README."
exit_unsupported
From: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
commit 210f9df02611cbe641ced3239122b270fd907d86 upstream.
The selftest for ftrace checks some features by checking if the README has
text that states the feature is supported by that kernel. Unfortunately,
this check gives false positives because it many not be checked if there's
spaces in the string to check. This is due to the compare between the
required variable with the ":README" string stripped, because neither has
quotes around them.
Link: https://lkml.kernel.org/r/20210820204742.087177341@goodmis.org
Cc: "Tzvetomir Stoyanov" <tz.stoyanov(a)gmail.com>
Cc: Tom Zanussi <zanussi(a)kernel.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Shuah Khan <skhan(a)linuxfoundation.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Fixes: 1b8eec510ba64 ("selftests/ftrace: Support ":README" suffix for requires")
Acked-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
tools/testing/selftests/ftrace/test.d/functions | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/ftrace/test.d/functions
+++ b/tools/testing/selftests/ftrace/test.d/functions
@@ -115,7 +115,7 @@ check_requires() { # Check required file
echo "Required tracer $t is not configured."
exit_unsupported
fi
- elif [ $r != $i ]; then
+ elif [ "$r" != "$i" ]; then
if ! grep -Fq "$r" README ; then
echo "Required feature pattern \"$r\" is not in README."
exit_unsupported
From: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
commit 210f9df02611cbe641ced3239122b270fd907d86 upstream.
The selftest for ftrace checks some features by checking if the README has
text that states the feature is supported by that kernel. Unfortunately,
this check gives false positives because it many not be checked if there's
spaces in the string to check. This is due to the compare between the
required variable with the ":README" string stripped, because neither has
quotes around them.
Link: https://lkml.kernel.org/r/20210820204742.087177341@goodmis.org
Cc: "Tzvetomir Stoyanov" <tz.stoyanov(a)gmail.com>
Cc: Tom Zanussi <zanussi(a)kernel.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Shuah Khan <skhan(a)linuxfoundation.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Fixes: 1b8eec510ba64 ("selftests/ftrace: Support ":README" suffix for requires")
Acked-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
tools/testing/selftests/ftrace/test.d/functions | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/ftrace/test.d/functions
+++ b/tools/testing/selftests/ftrace/test.d/functions
@@ -115,7 +115,7 @@ check_requires() { # Check required file
echo "Required tracer $t is not configured."
exit_unsupported
fi
- elif [ $r != $i ]; then
+ elif [ "$r" != "$i" ]; then
if ! grep -Fq "$r" README ; then
echo "Required feature pattern \"$r\" is not in README."
exit_unsupported