Implemented small fix so that the script changes work directories to the
root of the linux kernel source tree from which kunit.py is run. This
enables the user to run kunit from any working directory. Originally
considered using os.path.join but this is more error prone as we would
have to find all file path usages and modify them accordingly. Using
os.chdir ensures that the entire script is run within /linux.
Signed-off-by: Heidi Fahim <heidifahim(a)google.com>
Reviewed-by: Brendan Higgins <brendanhiggins(a)google.com>
---
tools/testing/kunit/kunit.py | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index 3f552e847a14..060d960a7029 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -26,6 +26,8 @@ KunitResult = namedtuple('KunitResult', ['status','result'])
KunitRequest = namedtuple('KunitRequest', ['raw_output','timeout', 'jobs',
'build_dir', 'defconfig', 'json'])
+KernelDirectoryPath = sys.argv[0].split('tools/testing/kunit/')[0]
+
class KunitStatus(Enum):
SUCCESS = auto()
CONFIG_FAILURE = auto()
@@ -37,6 +39,13 @@ def create_default_kunitconfig():
shutil.copyfile('arch/um/configs/kunit_defconfig',
kunit_kernel.kunitconfig_path)
+def get_kernel_root_path():
+ parts = sys.argv[0] if not __file__ else __file__
+ parts = os.path.realpath(parts).split('tools/testing/kunit')
+ if len(parts) != 2:
+ sys.exit(1)
+ return parts[0]
+
def run_tests(linux: kunit_kernel.LinuxSourceTree,
request: KunitRequest) -> KunitResult:
config_start = time.time()
@@ -130,6 +139,9 @@ def main(argv, linux=None):
cli_args = parser.parse_args(argv)
if cli_args.subcommand == 'run':
+ if get_kernel_root_path():
+ os.chdir(get_kernel_root_path())
+
if cli_args.build_dir:
if not os.path.exists(cli_args.build_dir):
os.mkdir(cli_args.build_dir)
--
2.25.0.265.gbab2e86ba0-goog
This patch series extend the "bpftool feature" subcommand with the
new positional arguments:
- "section", which allows to select a specific section of probes (i.e.
"system_config", "program_types", "map_types");
- "filter_in", which allows to select only probes which matches the
given regex pattern;
- "filter_out", which allows to filter out probes which do not match the
given regex pattern.
The main motivation behind those changes is ability the fact that some
probes (for example those related to "trace" or "write_user" helpers)
emit dmesg messages which might be confusing for people who are running
on production environments. For details see the Cilium issue[0].
[0] https://github.com/cilium/cilium/issues/10048
Michal Rostecki (6):
bpftool: Move out sections to separate functions
bpftool: Allow to select a specific section to probe
bpftool: Add arguments for filtering in and filtering out probes
bpftool: Update documentation of "bpftool feature" command
bpftool: Update bash completion for "bpftool feature" command
selftests/bpf: Add test for "bpftool feature" command
.../bpftool/Documentation/bpftool-feature.rst | 37 +-
tools/bpf/bpftool/bash-completion/bpftool | 32 +-
tools/bpf/bpftool/feature.c | 592 +++++++++++++-----
tools/testing/selftests/.gitignore | 5 +-
tools/testing/selftests/bpf/Makefile | 3 +-
tools/testing/selftests/bpf/test_bpftool.py | 294 +++++++++
tools/testing/selftests/bpf/test_bpftool.sh | 5 +
7 files changed, 811 insertions(+), 157 deletions(-)
create mode 100644 tools/testing/selftests/bpf/test_bpftool.py
create mode 100755 tools/testing/selftests/bpf/test_bpftool.sh
--
2.25.0
When running the ftrace selftests, 2 failures and 6 unresolved
cases were observed. The failures can be avoided by setting
a sysctl prior to test execution (fixed in patch 1) and the
unresolved cases result from absence of testing modules which
are built based on CONFIG options being set and program
availability (fixed in patch 2).
These seem more like "unsupported" than "unresolved" errors,
since for the ftrace tests "unresolved" cases cause the test
(and thus kselftest) to report failure. With these changes
in place, the unresolved cases become unsupported and the
test failures disappear, resulting in the ftracetest program
exiting with "ok" status.
Alan Maguire (2):
ftrace/selftests: workaround cgroup RT scheduling issues
ftrace/selftest: absence of modules/programs should trigger
unsupported errors
tools/testing/selftests/ftrace/ftracetest | 23 ++++++++++++++++++++++
.../ftrace/test.d/direct/ftrace-direct.tc | 2 +-
.../ftrace/test.d/direct/kprobe-direct.tc | 2 +-
.../selftests/ftrace/test.d/event/trace_printk.tc | 2 +-
.../ftrace/test.d/ftrace/func_mod_trace.tc | 2 +-
.../ftrace/test.d/kprobe/kprobe_module.tc | 2 +-
.../selftests/ftrace/test.d/selftest/bashisms.tc | 2 +-
7 files changed, 29 insertions(+), 6 deletions(-)
--
1.8.3.1
When kunit tests are run on native (i.e. non-UML) environments, the results
of test execution are often intermixed with dmesg output. This patch
series attempts to solve this by providing a debugfs representation
of the results of the last test run, available as
/sys/kernel/debug/kunit/<testsuite>/results
Changes since v2:
- updated kunit_status2str() to kunit_status_to_string() and made it
static inline in include/kunit/test.h (Brendan)
- added log string to struct kunit_suite and kunit_case, with log
pointer in struct kunit pointing at the case log. This allows us
to collect kunit_[err|info|warning]() messages at the same time
as we printk() them. This solves for the most part the sharing
of log messages between test execution and debugfs since we
just print the suite log (which contains the test suite preamble)
and the individual test logs. The only exception is the suite-level
status, which we cannot store in the suite log as it would mean
we'd print the suite and its status prior to the suite's results.
(Brendan, patch 1)
- dropped debugfs-based kunit run patch for now so as not to cause
problems with tests currently under development (Brendan)
- fixed doc issues with code block (Brendan, patch 3)
Changes since v1:
- trimmed unneeded include files in lib/kunit/debugfs.c (Greg)
- renamed global debugfs functions to be prefixed with kunit_ (Greg)
- removed error checking for debugfs operations (Greg)
Alan Maguire (2):
kunit: add debugfs /sys/kernel/debug/kunit/<suite>/results display
kunit: update documentation to describe debugfs representation
Documentation/dev-tools/kunit/usage.rst | 12 +++
include/kunit/test.h | 52 ++++++++++--
lib/kunit/Makefile | 3 +-
lib/kunit/debugfs.c | 105 ++++++++++++++++++++++++
lib/kunit/debugfs.h | 16 ++++
lib/kunit/kunit-test.c | 4 +-
lib/kunit/test.c | 136 ++++++++++++++++++++++++--------
7 files changed, 286 insertions(+), 42 deletions(-)
create mode 100644 lib/kunit/debugfs.c
create mode 100644 lib/kunit/debugfs.h
--
1.8.3.1
Implemented small fix so that the script changes work directories to the
linux directory where kunit.py is run. This enables the user to run
kunit from any working directory. Originally considered using
os.path.join but this is more error prone as we would have to find all
file path usages and modify them accordingly. Using os.chdir ensures
that the entire script is run within /linux.
Signed-off-by: Heidi Fahim <heidifahim(a)google.com>
---
tools/testing/kunit/kunit.py | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index e59eb9e7f923..3cc7be7b28a0 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -35,6 +35,13 @@ def create_default_kunitconfig():
shutil.copyfile('arch/um/configs/kunit_defconfig',
kunit_kernel.kunitconfig_path)
+def get_kernel_root_path():
+ parts = sys.argv[0] if not __file__ else __file__
+ parts = os.path.realpath(parts).split('tools/testing/kunit')
+ if len(parts) != 2:
+ sys.exit(1)
+ return parts[0]
+
def run_tests(linux: kunit_kernel.LinuxSourceTree,
request: KunitRequest) -> KunitResult:
config_start = time.time()
@@ -114,6 +121,9 @@ def main(argv, linux=None):
cli_args = parser.parse_args(argv)
if cli_args.subcommand == 'run':
+ if get_kernel_root_path():
+ os.chdir(get_kernel_root_path())
+
if cli_args.build_dir:
if not os.path.exists(cli_args.build_dir):
os.mkdir(cli_args.build_dir)
--
2.25.0.341.g760bfbb309-goog
There's four patches here, but only one of them actually does anything. The
first patch fixes a BPF selftests build failure on my machine and has already
been sent to the list separately. The next three are just staged such that
there are some patches that avoid changing any functionality pulled out from
the whole point of those refactorings, with two cleanups and then the idea.
Maybe this is an odd thing to say in a cover letter, but I'm not actually sure
this patch set is a good idea. The issue of extra moves after calls came up as
I was reviewing some unrelated performance optimizations to the RISC-V BPF JIT.
I figured I'd take a whack at performing the optimization in the context of the
arm64 port just to get a breath of fresh air, and I'm not convinced I like the
results.
That said, I think I would accept something like this for the RISC-V port
because we're already doing a multi-pass optimization for shrinking function
addresses so it's not as much extra complexity over there. If we do that we
should probably start puling some of this code into the shared BPF compiler,
but we're also opening the doors to more complicated BPF JIT optimizations.
Given that the BPF JIT appears to have been designed explicitly to be
simple/fast as opposed to perform complex optimization, I'm not sure this is a
sane way to move forward.
I figured I'd send the patch set out as more of a question than anything else.
Specifically:
* How should I go about measuring the performance of these sort of
optimizations? I'd like to balance the time it takes to run the JIT with the
time spent executing the program, but I don't have any feel for what real BPF
programs look like or have any benchmark suite to run. Is there something
out there this should be benchmarked against? (I'd also like to know that to
run those benchmarks on the RISC-V port.)
* Is this the sort of thing that makes sense in a BPF JIT? I guess I've just
realized I turned "review this patch" into a way bigger rabbit hole than I
really want to go down...
I worked on top of 5.4 for these, but trivially different versions of the
patches applied on Linus' master a few days ago when I tried. LMK if those
aren't sane places to start from over here, I'm new to both arm64 and BPF so I
might be a bit lost.
[PATCH 1/4] selftests/bpf: Elide a check for LLVM versions that can't
[PATCH 2/4] arm64: bpf: Convert bpf2a64 to a function
[PATCH 3/4] arm64: bpf: Split the read and write halves of dst
[PATCH 4/4] arm64: bpf: Elide some moves to a0 after calls
When kunit tests are run on native (i.e. non-UML) environments, the results
of test execution are often intermixed with dmesg output. This patch
series attempts to solve this by providing a debugfs representation
of the results of the last test run, available as
/sys/kernel/debug/kunit/<testsuite>/results
Changes since v3:
- added CONFIG_KUNIT_DEBUGFS to support conditional compilation of debugfs
representation, including string logging (Frank, patch 1)
- removed unneeded NULL check for test_case in kunit_suite_for_each_test_case()
(Frank, patch 1)
- added kunit log test to verify logging multiple strings works
(Frank, patch 2)
- rephrased description of results file (Frank, patch 3)
Changes since v2:
- updated kunit_status2str() to kunit_status_to_string() and made it
static inline in include/kunit/test.h (Brendan)
- added log string to struct kunit_suite and kunit_case, with log
pointer in struct kunit pointing at the case log. This allows us
to collect kunit_[err|info|warning]() messages at the same time
as we printk() them. This solves for the most part the sharing
of log messages between test execution and debugfs since we
just print the suite log (which contains the test suite preamble)
and the individual test logs. The only exception is the suite-level
status, which we cannot store in the suite log as it would mean
we'd print the suite and its status prior to the suite's results.
(Brendan, patch 1)
- dropped debugfs-based kunit run patch for now so as not to cause
problems with tests currently under development (Brendan)
- fixed doc issues with code block (Brendan, patch 3)
Changes since v1:
- trimmed unneeded include files in lib/kunit/debugfs.c (Greg)
- renamed global debugfs functions to be prefixed with kunit_ (Greg)
- removed error checking for debugfs operations (Greg)
Alan Maguire (3):
kunit: add debugfs /sys/kernel/debug/kunit/<suite>/results display
kunit: add log test
kunit: update documentation to describe debugfs representation
Documentation/dev-tools/kunit/usage.rst | 13 ++++
include/kunit/test.h | 54 +++++++++++---
lib/kunit/Kconfig | 8 +++
lib/kunit/Makefile | 4 ++
lib/kunit/debugfs.c | 116 ++++++++++++++++++++++++++++++
lib/kunit/debugfs.h | 30 ++++++++
lib/kunit/kunit-test.c | 31 +++++++-
lib/kunit/test.c | 122 ++++++++++++++++++++++++--------
8 files changed, 336 insertions(+), 42 deletions(-)
create mode 100644 lib/kunit/debugfs.c
create mode 100644 lib/kunit/debugfs.h
--
1.8.3.1
Remove some of the outmoded "Why KUnit" rationale, and move some
UML-specific information to the kunit_tool page. Also update the Getting
Started guide to mention running tests without the kunit_tool wrapper.
Signed-off-by: David Gow <davidgow(a)google.com>
---
Thanks: I've added a note to the "Why KUnit" section which I think
better reflects both the UML is optional, and that it can be used both
with and without kunit_tool.
Changelog:
v3:
- Added a note that KUnit can be used with UML, both with and without
kunit_tool to replace the section moved to kunit_tool.
v2:
https://lore.kernel.org/linux-kselftest/f99a3d4d-ad65-5fd1-3407-db33f378b1f…
- Reinstated the "Why Kunit?" section, minus the comparison with other
testing frameworks (covered in the FAQ), and the description of UML.
- Moved the description of UML into to kunit_tool page.
- Tidied up the wording around how KUnit is built and run to make it
work
without the UML description.
v1:
https://lore.kernel.org/linux-kselftest/9c703dea-a9e1-94e2-c12d-3cb0a09e75a…
- Initial patch
Documentation/dev-tools/kunit/index.rst | 41 ++++++----
Documentation/dev-tools/kunit/kunit-tool.rst | 7 ++
Documentation/dev-tools/kunit/start.rst | 80 ++++++++++++++++----
3 files changed, 100 insertions(+), 28 deletions(-)
diff --git a/Documentation/dev-tools/kunit/index.rst b/Documentation/dev-tools/kunit/index.rst
index d16a4d2c3a41..9e77d0c90ff4 100644
--- a/Documentation/dev-tools/kunit/index.rst
+++ b/Documentation/dev-tools/kunit/index.rst
@@ -17,14 +17,23 @@ What is KUnit?
==============
KUnit is a lightweight unit testing and mocking framework for the Linux kernel.
-These tests are able to be run locally on a developer's workstation without a VM
-or special hardware.
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining unit test
cases, grouping related test cases into test suites, providing common
infrastructure for running tests, and much more.
+KUnit consists of a kernel component, which provides a set of macros for easily
+writing unit tests. Tests written against KUnit will run on kernel boot if
+built-in, or when loaded if built as a module. These tests write out results to
+the kernel log in `TAP <https://testanything.org/>`_ format.
+
+To make running these tests (and reading the results) easier, KUnit offsers
+:doc:`kunit_tool <kunit-tool>`, which builds a `User Mode Linux
+<http://user-mode-linux.sourceforge.net>`_ kernel, runs it, and parses the test
+results. This provides a quick way of running KUnit tests during development,
+without requiring a virtual machine or separate hardware.
+
Get started now: :doc:`start`
Why KUnit?
@@ -36,22 +45,21 @@ allow all possible code paths to be tested in the code under test; this is only
possible if the code under test is very small and does not have any external
dependencies outside of the test's control like hardware.
-Outside of KUnit, there are no testing frameworks currently
-available for the kernel that do not require installing the kernel on a test
-machine or in a VM and all require tests to be written in userspace running on
-the kernel; this is true for Autotest, and kselftest, disqualifying
-any of them from being considered unit testing frameworks.
-
-KUnit addresses the problem of being able to run tests without needing a virtual
-machine or actual hardware with User Mode Linux. User Mode Linux is a Linux
-architecture, like ARM or x86; however, unlike other architectures it compiles
-to a standalone program that can be run like any other program directly inside
-of a host operating system; to be clear, it does not require any virtualization
-support; it is just a regular program.
+KUnit provides a common framework for unit tests within the kernel.
-Alternatively, kunit and kunit tests can be built as modules and tests will
+KUnit tests can be run on most kernel configurations, and most tests are
+architecture independent. All built-in KUnit tests run on kernel startup.
+Alternatively, KUnit and KUnit tests can be built as modules and tests will
run when the test module is loaded.
+.. note::
+
+ KUnit can also run tests without needing a virtual machine or actual
+ hardware under User Mode Linux. User Mode Linux is a Linux architecture,
+ like ARM or x86, which compiles the kernel as a Linux executable. KUnit
+ can be used with UML either by building with ``ARCH=um`` (like any other
+ architecture), or by using :doc:`kunit_tool <kunit-tool>`.
+
KUnit is fast. Excluding build time, from invocation to completion KUnit can run
several dozen tests in only 10 to 20 seconds; this might not sound like a big
deal to some people, but having such fast and easy to run tests fundamentally
@@ -75,9 +83,12 @@ someone sends you some code. Why trust that someone ran all their tests
correctly on every change when you can just run them yourself in less time than
it takes to read their test log?
+
How do I use it?
================
* :doc:`start` - for new users of KUnit
* :doc:`usage` - for a more detailed explanation of KUnit features
* :doc:`api/index` - for the list of KUnit APIs used for testing
+* :doc:`kunit-tool` - for more information on the kunit_tool helper script
+* :doc:`faq` - for answers to some common questions about KUnit
diff --git a/Documentation/dev-tools/kunit/kunit-tool.rst b/Documentation/dev-tools/kunit/kunit-tool.rst
index 50d46394e97e..949af2da81e5 100644
--- a/Documentation/dev-tools/kunit/kunit-tool.rst
+++ b/Documentation/dev-tools/kunit/kunit-tool.rst
@@ -12,6 +12,13 @@ the Linux kernel as UML (`User Mode Linux
<http://user-mode-linux.sourceforge.net/>`_), running KUnit tests, parsing
the test results and displaying them in a user friendly manner.
+kunit_tool addresses the problem of being able to run tests without needing a
+virtual machine or actual hardware with User Mode Linux. User Mode Linux is a
+Linux architecture, like ARM or x86; however, unlike other architectures it
+compiles the kernel as a standalone Linux executable that can be run like any
+other program directly inside of a host operating system. To be clear, it does
+not require any virtualization support: it is just a regular program.
+
What is a kunitconfig?
======================
diff --git a/Documentation/dev-tools/kunit/start.rst b/Documentation/dev-tools/kunit/start.rst
index 4e1d24db6b13..e1c5ce80ce12 100644
--- a/Documentation/dev-tools/kunit/start.rst
+++ b/Documentation/dev-tools/kunit/start.rst
@@ -9,11 +9,10 @@ Installing dependencies
KUnit has the same dependencies as the Linux kernel. As long as you can build
the kernel, you can run KUnit.
-KUnit Wrapper
-=============
-Included with KUnit is a simple Python wrapper that helps format the output to
-easily use and read KUnit output. It handles building and running the kernel, as
-well as formatting the output.
+Running tests with the KUnit Wrapper
+====================================
+Included with KUnit is a simple Python wrapper which runs tests under User Mode
+Linux, and formats the test results.
The wrapper can be run with:
@@ -21,22 +20,42 @@ The wrapper can be run with:
./tools/testing/kunit/kunit.py run --defconfig
-For more information on this wrapper (also called kunit_tool) checkout the
+For more information on this wrapper (also called kunit_tool) check out the
:doc:`kunit-tool` page.
Creating a .kunitconfig
-=======================
-The Python script is a thin wrapper around Kbuild. As such, it needs to be
-configured with a ``.kunitconfig`` file. This file essentially contains the
-regular Kernel config, with the specific test targets as well.
-
+-----------------------
+If you want to run a specific set of tests (rather than those listed in the
+KUnit defconfig), you can provide Kconfig options in the ``.kunitconfig`` file.
+This file essentially contains the regular Kernel config, with the specific
+test targets as well. The ``.kunitconfig`` should also contain any other config
+options required by the tests.
+
+A good starting point for a ``.kunitconfig`` is the KUnit defconfig:
.. code-block:: bash
cd $PATH_TO_LINUX_REPO
cp arch/um/configs/kunit_defconfig .kunitconfig
-Verifying KUnit Works
----------------------
+You can then add any other Kconfig options you wish, e.g.:
+.. code-block:: none
+
+ CONFIG_LIST_KUNIT_TEST=y
+
+:doc:`kunit_tool <kunit-tool>` will ensure that all config options set in
+``.kunitconfig`` are set in the kernel ``.config`` before running the tests.
+It'll warn you if you haven't included the dependencies of the options you're
+using.
+
+.. note::
+ Note that removing something from the ``.kunitconfig`` will not trigger a
+ rebuild of the ``.config`` file: the configuration is only updated if the
+ ``.kunitconfig`` is not a subset of ``.config``. This means that you can use
+ other tools (such as make menuconfig) to adjust other config options.
+
+
+Running the tests
+-----------------
To make sure that everything is set up correctly, simply invoke the Python
wrapper from your kernel repo:
@@ -62,6 +81,41 @@ followed by a list of tests that are run. All of them should be passing.
Because it is building a lot of sources for the first time, the
``Building KUnit kernel`` step may take a while.
+Running tests without the KUnit Wrapper
+=======================================
+
+If you'd rather not use the KUnit Wrapper (if, for example, you need to
+integrate with other systems, or use an architecture other than UML), KUnit can
+be included in any kernel, and the results read out and parsed manually.
+
+.. note::
+ KUnit is not designed for use in a production system, and it's possible that
+ tests may reduce the stability or security of the system.
+
+
+
+Configuring the kernel
+----------------------
+
+In order to enable KUnit itself, you simply need to enable the ``CONFIG_KUNIT``
+Kconfig option (it's under Kernel Hacking/Kernel Testing and Coverage in
+menuconfig). From there, you can enable any KUnit tests you want: they usually
+have config options ending in ``_KUNIT_TEST``.
+
+KUnit and KUnit tests can be compiled as modules: in this case the tests in a
+module will be run when the module is loaded.
+
+Running the tests
+-----------------
+
+Build and run your kernel as usual. Test output will be written to the kernel
+log in `TAP <https://testanything.org/>`_ format.
+
+.. note::
+ It's possible that there will be other lines and/or data interspersed in the
+ TAP output.
+
+
Writing your first test
=======================
--
2.25.0.265.gbab2e86ba0-goog
There is an important use-case which is not possible with the
"rseq" (Restartable Sequences) system call, which was left as
future work.
That use-case is to modify user-space per-cpu data structures
belonging to specific CPUs which may be brought offline and
online again by CPU hotplug. This can be used by memory
allocators to migrate free memory pools when CPUs are brought
offline, or by ring buffer consumers to target specific per-CPU
buffers, even when CPUs are brought offline.
A few rather complex prior attempts were made to solve this.
Those were based on in-kernel interpreters (cpu_opv, do_on_cpu).
That complexity was generally frowned upon, even by their author.
This patch fulfills this use-case in a refreshingly simple way:
it introduces a "pin_on_cpu" system call, which allows user-space
threads to pin themselves on a specific CPU (which needs to be
present in the thread's allowed cpu mask), and then clear this
pinned state.
"But this can already be done with sched_setaffinity", some
would rightfully reply. However, there is a significant twist
in the way pin_on_cpu deals with CPU hotplug compared to the
allowed cpu mask.
When all CPUs part of the thread's allowed cpu mask are offlined,
this mask is effectively reset to include all CPUs. This behavior
is completely incompatible with modifying per-cpu data structures:
the updates then become racy between concurrent CPUs trying to
update the given per-cpu data.
Conversely, all threads pinned on a given CPU with pin_on_cpu are
guaranteed to be scheduled on the same runqueue when that CPU is
offline. If that CPU is brought back online, the CPU hotplug
scheduler hooks are responsible for migrating back the tasks to
their pinned CPU.
For instance, this allows implementing this userspace library API
for incrementing a per-cpu counter for a specific cpu number
received as parameter:
static inline __attribute__((always_inline))
int percpu_addv(intptr_t *v, intptr_t count, int cpu)
{
int ret;
ret = rseq_addv(v, count, cpu);
check:
if (rseq_unlikely(ret)) {
pin_on_cpu_set(cpu);
ret = rseq_addv(v, count, percpu_current_cpu());
pin_on_cpu_clear();
goto check;
}
return 0;
}
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Joel Fernandes <joelaf(a)google.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Watson <davejwatson(a)fb.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Andi Kleen <andi(a)firstfloor.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: "H . Peter Anvin" <hpa(a)zytor.com>
Cc: Chris Lameter <cl(a)linux.com>
Cc: Russell King <linux(a)arm.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages(a)gmail.com>
Cc: "Paul E . McKenney" <paulmck(a)linux.vnet.ibm.com>
Cc: Paul Turner <pjt(a)google.com>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: Josh Triplett <josh(a)joshtriplett.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Ben Maurer <bmaurer(a)fb.com>
Cc: linux-api(a)vger.kernel.org
Cc: Andy Lutomirski <luto(a)amacapital.net>
---
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
fs/exec.c | 1 +
include/linux/sched.h | 1 +
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 5 +-
include/uapi/linux/sched.h | 6 +
init/init_task.c | 1 +
kernel/sched/core.c | 321 +++++++++++++++++++++++--
kernel/sched/deadline.c | 54 +++--
kernel/sched/fair.c | 19 ++
kernel/sched/rt.c | 15 +-
kernel/sched/sched.h | 28 +++
kernel/sys_ni.c | 1 +
14 files changed, 413 insertions(+), 42 deletions(-)
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 15908eb9b17e..0b1081a9b872 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -440,3 +440,4 @@
433 i386 fspick sys_fspick __ia32_sys_fspick
434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open
435 i386 clone3 sys_clone3 __ia32_sys_clone3
+436 i386 pin_on_cpu sys_pin_on_cpu __ia32_sys_pin_on_cpu
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index c29976eca4a8..90f9b3cab88d 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -357,6 +357,7 @@
433 common fspick __x64_sys_fspick
434 common pidfd_open __x64_sys_pidfd_open
435 common clone3 __x64_sys_clone3/ptregs
+436 common pin_on_cpu __x64_sys_pin_on_cpu
#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/exec.c b/fs/exec.c
index c27231234764..6d882dbdd1e3 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1827,6 +1827,7 @@ static int __do_execve_file(int fd, struct filename *filename,
current->fs->in_exec = 0;
current->in_execve = 0;
rseq_execve(current);
+ current->pinned_cpu = -1;
acct_update_integrals(current);
task_numa_free(current, false);
free_bprm(bprm);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7f0bb6fff27c..ac0cac7b8d1d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -651,6 +651,7 @@ struct task_struct {
/* Current CPU: */
unsigned int cpu;
#endif
+ int pinned_cpu;
unsigned int wakee_flips;
unsigned long wakee_flip_decay_ts;
struct task_struct *last_wakee;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index be0d0cf788ba..46fee5af99e3 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1000,6 +1000,7 @@ asmlinkage long sys_fspick(int dfd, const char __user *path, unsigned int flags)
asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
siginfo_t __user *info,
unsigned int flags);
+asmlinkage long sys_pin_on_cpu(int cmd, int flags, int cpu);
/*
* Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 1fc8faa6e973..43b0c956cc3c 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -851,8 +851,11 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open)
__SYSCALL(__NR_clone3, sys_clone3)
#endif
+#define __NR_pin_on_cpu 436
+__SYSCALL(__NR_pin_on_cpu, sys_pin_on_cpu)
+
#undef __NR_syscalls
-#define __NR_syscalls 436
+#define __NR_syscalls 437
/*
* 32 bit systems traditionally used different
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 25b4fa00bad1..590cdc613698 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -114,4 +114,10 @@ struct clone_args {
SCHED_FLAG_KEEP_ALL | \
SCHED_FLAG_UTIL_CLAMP)
+enum pin_on_cpu_cmd {
+ PIN_ON_CPU_CMD_QUERY = 0,
+ PIN_ON_CPU_CMD_SET = (1 << 0),
+ PIN_ON_CPU_CMD_CLEAR = (1 << 1),
+};
+
#endif /* _UAPI_LINUX_SCHED_H */
diff --git a/init/init_task.c b/init/init_task.c
index 9e5cbe5eab7b..9aabce589cc7 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -88,6 +88,7 @@ struct task_struct init_task
.tasks = LIST_HEAD_INIT(init_task.tasks),
#ifdef CONFIG_SMP
.pushable_tasks = PLIST_NODE_INIT(init_task.pushable_tasks, MAX_PRIO),
+ .pinned_cpu = -1,
#endif
#ifdef CONFIG_CGROUP_SCHED
.sched_task_group = &root_task_group,
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8dacda4b0362..6ca904d6e0ef 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -52,6 +52,8 @@ const_debug unsigned int sysctl_sched_features =
#undef SCHED_FEAT
#endif
+#define PIN_ON_CPU_CMD_BITMASK (PIN_ON_CPU_CMD_SET | PIN_ON_CPU_CMD_CLEAR)
+
/*
* Number of tasks to iterate in a single balance run.
* Limited because this is done with IRQs disabled.
@@ -1457,8 +1459,13 @@ static inline bool is_per_cpu_kthread(struct task_struct *p)
*/
static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
{
- if (!cpumask_test_cpu(cpu, p->cpus_ptr))
- return false;
+ if (is_pinned_task(p)) {
+ if (!allowed_pinned_cpu(p, cpu))
+ return false;
+ } else {
+ if (!cpumask_test_cpu(cpu, p->cpus_ptr))
+ return false;
+ }
if (is_per_cpu_kthread(p))
return cpu_online(cpu);
@@ -1662,6 +1669,12 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
goto out;
}
+ /* Prevent removing the currently pinned CPU from the allowed cpu mask. */
+ if (is_pinned_task(p) && !cpumask_test_cpu(p->pinned_cpu, new_mask)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
do_set_cpus_allowed(p, new_mask);
if (p->flags & PF_KTHREAD) {
@@ -1674,6 +1687,10 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
p->nr_cpus_allowed != 1);
}
+ /* Task pinned to a CPU overrides allowed cpu mask. */
+ if (is_pinned_task(p))
+ goto out;
+
/* Can the task run on the task's current CPU? If so, we're done */
if (cpumask_test_cpu(task_cpu(p), new_mask))
goto out;
@@ -1813,11 +1830,20 @@ static int migrate_swap_stop(void *data)
if (task_cpu(arg->src_task) != arg->src_cpu)
goto unlock;
- if (!cpumask_test_cpu(arg->dst_cpu, arg->src_task->cpus_ptr))
- goto unlock;
-
- if (!cpumask_test_cpu(arg->src_cpu, arg->dst_task->cpus_ptr))
- goto unlock;
+ if (is_pinned_task(arg->src_task)) {
+ if (!allowed_pinned_cpu(arg->src_task, arg->dst_cpu))
+ goto unlock;
+ } else {
+ if (!cpumask_test_cpu(arg->dst_cpu, arg->src_task->cpus_ptr))
+ goto unlock;
+ }
+ if (is_pinned_task(arg->dst_task)) {
+ if (!allowed_pinned_cpu(arg->dst_task, arg->src_cpu))
+ goto unlock;
+ } else {
+ if (!cpumask_test_cpu(arg->src_cpu, arg->dst_task->cpus_ptr))
+ goto unlock;
+ }
__migrate_swap_task(arg->src_task, arg->dst_cpu);
__migrate_swap_task(arg->dst_task, arg->src_cpu);
@@ -1858,11 +1884,21 @@ int migrate_swap(struct task_struct *cur, struct task_struct *p,
if (!cpu_active(arg.src_cpu) || !cpu_active(arg.dst_cpu))
goto out;
- if (!cpumask_test_cpu(arg.dst_cpu, arg.src_task->cpus_ptr))
- goto out;
+ if (is_pinned_task(arg.src_task)) {
+ if (!allowed_pinned_cpu(arg.src_task, arg.dst_cpu))
+ goto out;
+ } else {
+ if (!cpumask_test_cpu(arg.dst_cpu, arg.src_task->cpus_ptr))
+ goto out;
+ }
- if (!cpumask_test_cpu(arg.src_cpu, arg.dst_task->cpus_ptr))
- goto out;
+ if (is_pinned_task(arg.dst_task)) {
+ if (!allowed_pinned_cpu(arg.dst_task, arg.src_cpu))
+ goto out;
+ } else {
+ if (!cpumask_test_cpu(arg.src_cpu, arg.dst_task->cpus_ptr))
+ goto out;
+ }
trace_sched_swap_numa(cur, arg.src_cpu, p, arg.dst_cpu);
ret = stop_two_cpus(arg.dst_cpu, arg.src_cpu, migrate_swap_stop, &arg);
@@ -2034,6 +2070,18 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
enum { cpuset, possible, fail } state = cpuset;
int dest_cpu;
+ /*
+ * If the task is pinned to a CPU which is online, pick that pinned CPU
+ * number.
+ * If the task is pinned to a CPU which is offline, pick a CPU which is
+ * guaranteed to be the same for all tasks pinned to that offlined CPU.
+ */
+ if (is_pinned_task(p)) {
+ if (cpu_online(p->pinned_cpu))
+ return p->pinned_cpu;
+ else
+ return pinned_cpu_offline_offload(p);
+ }
/*
* If the node that the CPU is on has been offlined, cpu_to_node()
* will return -1. There is no CPU on the node, and we should
@@ -2104,10 +2152,15 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
{
lockdep_assert_held(&p->pi_lock);
- if (p->nr_cpus_allowed > 1)
- cpu = p->sched_class->select_task_rq(p, cpu, sd_flags, wake_flags);
- else
- cpu = cpumask_any(p->cpus_ptr);
+ if (is_pinned_task(p))
+ cpu = p->pinned_cpu;
+ else {
+ if (p->nr_cpus_allowed > 1)
+ cpu = p->sched_class->select_task_rq(p, cpu, sd_flags,
+ wake_flags);
+ else
+ cpu = cpumask_any(p->cpus_ptr);
+ }
/*
* In order not to call set_task_cpu() on a blocking task we need
@@ -6130,8 +6183,13 @@ int migrate_task_to(struct task_struct *p, int target_cpu)
if (curr_cpu == target_cpu)
return 0;
- if (!cpumask_test_cpu(target_cpu, p->cpus_ptr))
- return -EINVAL;
+ if (is_pinned_task(p)) {
+ if (!allowed_pinned_cpu(p, target_cpu))
+ return -EINVAL;
+ } else {
+ if (!cpumask_test_cpu(target_cpu, p->cpus_ptr))
+ return -EINVAL;
+ }
/* TODO: This is not properly updating schedstats */
@@ -6300,6 +6358,7 @@ static void migrate_tasks(struct rq *dead_rq, struct rq_flags *rf)
rq->stop = stop;
}
+
#endif /* CONFIG_HOTPLUG_CPU */
void set_rq_online(struct rq *rq)
@@ -6380,11 +6439,100 @@ static int cpuset_cpu_inactive(unsigned int cpu)
return 0;
}
+static bool skip_pinned_task(int pinned_cpu, int cpu,
+ bool first_online)
+{
+ if (pinned_cpu < 0)
+ return true;
+ if (first_online) {
+ if (cpu_online(pinned_cpu) && pinned_cpu != cpu)
+ return true;
+ } else {
+ if (pinned_cpu != cpu)
+ return true;
+ }
+ return false;
+}
+
+static void sched_cpu_migrate_pinned_tasks(unsigned int cpu)
+{
+ struct rq *rq = cpu_rq(cpu);
+ struct task_struct *p, *t;
+ bool first_online = false;
+
+ if (cpu == cpumask_first(cpu_online_mask))
+ first_online = true;
+
+ /*
+ * This state transition (online && !active) when going online
+ * only allow bound kthreads to be scheduled.
+ * At this point, the CPU is completely online and running,
+ * but no userspace tasks are scheduled yet.
+ */
+ read_lock(&tasklist_lock);
+ for_each_process_thread(p, t) {
+ struct rq *target_rq;
+ struct rq_flags rf;
+ int pinned_cpu;
+
+ /*
+ * Migrate t to cpu if pinned to this cpu.
+ *
+ * Migrate t to cpu if its pinned cpu is offline
+ * and cpu becomes the new first online cpu.
+ *
+ * Transition of t->pinned_cpu to cpu can only
+ * happen if the thread is scheduled on cpu, which
+ * is impossible at this point because the cpu is
+ * not active.
+ *
+ * Transition of t->pinned_cpu from cpu to -1 or some
+ * other cpu number may happen concurrently. Therefore,
+ * skip early (without rq lock), and check again with
+ * the rq lock held to eliminate concurrent transitions
+ * from cpu to -1 or some other cpu number.
+ */
+ pinned_cpu = READ_ONCE(t->pinned_cpu);
+ if (skip_pinned_task(pinned_cpu, cpu, first_online))
+ continue;
+ if (pinned_cpu == cpu)
+ printk("pin_on_cpu migrate to owner: online cpu %d\n",
+ cpu);
+ if (first_online && !cpu_online(pinned_cpu))
+ printk("pin_on_cpu migrate to new offload cpu %d\n",
+ cpu);
+ target_rq = task_rq_lock(t, &rf);
+ pinned_cpu = t->pinned_cpu;
+ if (skip_pinned_task(pinned_cpu, cpu, first_online))
+ goto unlock;
+ WARN_ON_ONCE(target_rq == rq);
+ update_rq_clock(target_rq);
+ if (task_running(target_rq, t) || t->state == TASK_WAKING) {
+ struct migration_arg arg = { t, cpu };
+ /* Need help from migration thread: drop lock and wait. */
+ task_rq_unlock(target_rq, t, &rf);
+ stop_one_cpu(cpu_of(target_rq), migration_cpu_stop, &arg);
+ continue;
+ } else if (task_on_rq_queued(t)) {
+ /*
+ * OK, since we're going to drop the lock immediately
+ * afterwards anyway.
+ */
+ rq = move_queued_task(target_rq, &rf, t, cpu);
+ }
+ unlock:
+ task_rq_unlock(target_rq, t, &rf);
+ }
+ read_unlock(&tasklist_lock);
+}
+
int sched_cpu_activate(unsigned int cpu)
{
struct rq *rq = cpu_rq(cpu);
struct rq_flags rf;
+ sched_cpu_migrate_pinned_tasks(cpu);
+
#ifdef CONFIG_SCHED_SMT
/*
* When going up, increment the number of cores with SMT present.
@@ -7899,6 +8047,145 @@ struct cgroup_subsys cpu_cgrp_subsys = {
#endif /* CONFIG_CGROUP_SCHED */
+static void do_set_pinned_cpu(struct task_struct *p, int cpu)
+{
+ struct rq *rq = task_rq(p);
+ bool queued, running;
+
+ lockdep_assert_held(&p->pi_lock);
+
+ queued = task_on_rq_queued(p);
+ running = task_current(rq, p);
+
+ if (queued) {
+ /*
+ * Because __kthread_bind() calls this on blocked tasks without
+ * holding rq->lock.
+ */
+ lockdep_assert_held(&rq->lock);
+ dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK);
+ }
+ if (running)
+ put_prev_task(rq, p);
+
+ WRITE_ONCE(p->pinned_cpu, cpu);
+
+ if (queued)
+ enqueue_task(rq, p, ENQUEUE_RESTORE | ENQUEUE_NOCLOCK);
+ if (running)
+ set_next_task(rq, p);
+}
+
+static int __do_pin_on_cpu(int cpu)
+{
+ struct task_struct *p = current;
+ struct rq_flags rf;
+ struct rq *rq;
+ int ret = 0, dest_cpu;
+ struct migration_arg arg = { p };
+
+ cpus_read_lock();
+ rq = task_rq_lock(p, &rf);
+ update_rq_clock(rq);
+ if (cpu >= 0 && !cpumask_test_cpu(cpu, current->cpus_ptr)) {
+ ret = -EINVAL;
+ goto out;
+ }
+#ifdef CONFIG_SMP
+ do_set_pinned_cpu(p, cpu);
+ if (cpu >= 0) {
+ if (cpu_online(cpu))
+ dest_cpu = cpu;
+ else
+ dest_cpu = pinned_cpu_offline_offload(p);
+ if (task_cpu(p) == dest_cpu) {
+ /*
+ * If the task already runs on the pinned cpu, we're
+ * done.
+ */
+ goto out;
+ }
+ } else {
+ /*
+ * When clearing the pinned cpu, we may need to migrate the
+ * current task if it is currently sitting on a runqueue which
+ * does not belong to the allowed mask.
+ */
+ dest_cpu = cpumask_any(p->cpus_ptr);
+ }
+ arg.dest_cpu = dest_cpu;
+ /* Need help from migration thread: drop lock and wait. */
+ task_rq_unlock(rq, p, &rf);
+ stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
+
+ /* Preempt disable prevents hotplug on current cpu. */
+ preempt_disable();
+ WARN_ON_ONCE(cpu >= 0 && cpu_online(cpu) &&
+ smp_processor_id() != cpu);
+ preempt_enable();
+ cpus_read_unlock();
+ return 0;
+#endif
+out:
+ task_rq_unlock(rq, p, &rf);
+ cpus_read_unlock();
+ return ret;
+}
+
+static int pin_on_cpu_set(int cpu)
+{
+ if (cpu < 0 || !cpu_possible(cpu)) {
+ return -EINVAL;
+ }
+ return __do_pin_on_cpu(cpu);
+}
+
+static int pin_on_cpu_clear(void)
+{
+ return __do_pin_on_cpu(-1);
+}
+
+/*
+ * sys_pin_on_cpu - pin current task to a specific cpu.
+ * @cmd: command to issue (enum pin_on_cpu_cmd)
+ * @flags: system call flags
+ * @cpu: cpu where the task should run.
+ *
+ * Returns -EINVAL if cmd is unknown.
+ * Returns -EINVAL if flags are unknown.
+ * Returns -EINVAL if the CPU is not part of the possible CPUs.
+ * Returns -EINVAL if the CPU is not part of the allowed cpu mask
+ * for the current task.
+ *
+ * PIN_ON_CPU_CMD_QUERY: Return the mask of supported commands.
+ * PIN_ON_CPU_CMD_SET: Pin the current task to a specific CPU.
+ * PIN_ON_CPU_CMD_CLEAR: Clear cpu pinning for current task.
+ *
+ * If the pinned CPU is online, the current task will run on that CPU.
+ *
+ * If the pinned CPU is offline, the scheduler guarantees that
+ * all tasks pinned to that CPU number are moved to the same
+ * runqueue.
+ *
+ * Removing the pinned CPU from the task's allowed cpu mask is
+ * forbidden.
+ */
+SYSCALL_DEFINE3(pin_on_cpu, int, cmd, int, flags, int, cpu)
+{
+ if (unlikely(flags))
+ return -EINVAL;
+ switch (cmd) {
+ case PIN_ON_CPU_CMD_QUERY:
+ return PIN_ON_CPU_CMD_BITMASK;
+ case PIN_ON_CPU_CMD_SET:
+ return pin_on_cpu_set(cpu);
+ case PIN_ON_CPU_CMD_CLEAR:
+ return pin_on_cpu_clear();
+ default:
+ return -EINVAL;
+ }
+}
+
void dump_cpu_task(int cpu)
{
pr_info("Task dump for CPU %d:\n", cpu);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index a8a08030a8f7..8a1581e8509e 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -535,24 +535,31 @@ static struct rq *dl_task_offline_migration(struct rq *rq, struct task_struct *p
if (!later_rq) {
int cpu;
- /*
- * If we cannot preempt any rq, fall back to pick any
- * online CPU:
- */
- cpu = cpumask_any_and(cpu_active_mask, p->cpus_ptr);
- if (cpu >= nr_cpu_ids) {
- /*
- * Failed to find any suitable CPU.
- * The task will never come back!
- */
- BUG_ON(dl_bandwidth_enabled());
-
+ if (is_pinned_task(p)) {
+ if (cpu_online(p->pinned_cpu))
+ cpu = p->pinned_cpu;
+ else
+ cpu = pinned_cpu_offline_offload(p);
+ } else {
/*
- * If admission control is disabled we
- * try a little harder to let the task
- * run.
+ * If we cannot preempt any rq, fall back to pick any
+ * online CPU:
*/
- cpu = cpumask_any(cpu_active_mask);
+ cpu = cpumask_any_and(cpu_active_mask, p->cpus_ptr);
+ if (cpu >= nr_cpu_ids) {
+ /*
+ * Failed to find any suitable CPU.
+ * The task will never come back!
+ */
+ BUG_ON(dl_bandwidth_enabled());
+
+ /*
+ * If admission control is disabled we
+ * try a little harder to let the task
+ * run.
+ */
+ cpu = cpumask_any(cpu_active_mask);
+ }
}
later_rq = cpu_rq(cpu);
double_lock_balance(rq, later_rq);
@@ -1836,9 +1843,15 @@ static void task_fork_dl(struct task_struct *p)
static int pick_dl_task(struct rq *rq, struct task_struct *p, int cpu)
{
- if (!task_running(rq, p) &&
- cpumask_test_cpu(cpu, p->cpus_ptr))
- return 1;
+ if (!task_running(rq, p)) {
+ if (is_pinned_task(p)) {
+ if (allowed_pinned_cpu(p, cpu))
+ return 1;
+ } else {
+ if (cpumask_test_cpu(cpu, p->cpus_ptr))
+ return 1;
+ }
+ }
return 0;
}
@@ -1987,7 +2000,8 @@ static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
/* Retry if something changed. */
if (double_lock_balance(rq, later_rq)) {
if (unlikely(task_rq(task) != rq ||
- !cpumask_test_cpu(later_rq->cpu, task->cpus_ptr) ||
+ (is_pinned_task(task) && !allowed_pinned_cpu(task, later_rq->cpu)) ||
+ (!is_pinned_task(task) && !cpumask_test_cpu(later_rq->cpu, task->cpus_ptr)) ||
task_running(rq, task) ||
!dl_task(task) ||
!task_on_rq_queued(task))) {
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 69a81a5709ff..e96ae1ce9829 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7223,6 +7223,25 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
lockdep_assert_held(&env->src_rq->lock);
+ if (is_pinned_task(p)) {
+ if (task_running(env->src_rq, p)) {
+ schedstat_inc(p->se.statistics.nr_failed_migrations_running);
+ return 0;
+ }
+ if (cpu_online(p->pinned_cpu)) {
+ if (env->dst_cpu == p->pinned_cpu)
+ return 1;
+ else
+ return 0;
+ } else {
+ if (env->dst_cpu ==
+ pinned_cpu_offline_offload(p))
+ return 1;
+ else
+ return 0;
+ }
+ }
+
/*
* We do not migrate tasks that are:
* 1) throttled_lb_pair, or
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 9b8adc01be3d..2774311e5750 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1600,9 +1600,15 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
{
- if (!task_running(rq, p) &&
- cpumask_test_cpu(cpu, p->cpus_ptr))
- return 1;
+ if (!task_running(rq, p)) {
+ if (is_pinned_task(p)) {
+ if (allowed_pinned_cpu(p, cpu))
+ return 1;
+ } else {
+ if (cpumask_test_cpu(cpu, p->cpus_ptr))
+ return 1;
+ }
+ }
return 0;
}
@@ -1738,7 +1744,8 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
* Also make sure that it wasn't scheduled on its rq.
*/
if (unlikely(task_rq(task) != rq ||
- !cpumask_test_cpu(lowest_rq->cpu, task->cpus_ptr) ||
+ (is_pinned_task(task) && !allowed_pinned_cpu(task, lowest_rq->cpu)) ||
+ (!is_pinned_task(task) && !cpumask_test_cpu(lowest_rq->cpu, task->cpus_ptr)) ||
task_running(rq, task) ||
!rt_task(task) ||
!task_on_rq_queued(task))) {
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 49ed949f850c..922bc618cc87 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -187,6 +187,34 @@ static inline int task_has_dl_policy(struct task_struct *p)
return dl_policy(p->policy);
}
+/*
+ * All tasks which require to be pinned on offlined CPUs are sent
+ * to runqueue of the first online CPU.
+ */
+static inline bool is_pinned_task(struct task_struct *p)
+{
+ return p->pinned_cpu >= 0;
+}
+
+static inline int pinned_cpu_offline_offload(struct task_struct *p)
+{
+ return cpumask_first(cpu_online_mask);
+}
+
+static inline bool allowed_pinned_cpu(struct task_struct *p, int cpu)
+{
+ if (!cpu_possible(cpu))
+ return false;
+ if (cpu_online(p->pinned_cpu)) {
+ if (p->pinned_cpu == cpu)
+ return true;
+ } else {
+ if (cpu == pinned_cpu_offline_offload(p))
+ return true;
+ }
+ return false;
+}
+
#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
/*
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 34b76895b81e..7e5192cd8d9d 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -449,3 +449,4 @@ COND_SYSCALL(setuid16);
/* restartable sequence */
COND_SYSCALL(rseq);
+COND_SYSCALL(pin_on_cpu);
--
2.17.1
From: Siddhesh Poyarekar <siddhesh(a)gotplt.org>
[ Upstream commit 6b64a650f0b2ae3940698f401732988699eecf7a ]
It was observed[1] on arm64 that __builtin_strlen led to an infinite
loop in the get_size selftest. This is because __builtin_strlen (and
other builtins) may sometimes result in a call to the C library
function. The C library implementation of strlen uses an IFUNC
resolver to load the most efficient strlen implementation for the
underlying machine and hence has a PLT indirection even for static
binaries. Because this binary avoids the C library startup routines,
the PLT initialization never happens and hence the program gets stuck
in an infinite loop.
On x86_64 the __builtin_strlen just happens to expand inline and avoid
the call but that is not always guaranteed.
Further, while testing on x86_64 (Fedora 31), it was observed that the
test also failed with a segfault inside write() because the generated
code for the write function in glibc seems to access TLS before the
syscall (probably due to the cancellation point check) and fails
because TLS is not initialised.
To mitigate these problems, this patch reduces the interface with the
C library to just the syscall function. The syscall function still
sets errno on failure, which is undesirable but for now it only
affects cases where syscalls fail.
[1] https://bugs.linaro.org/show_bug.cgi?id=5479
Signed-off-by: Siddhesh Poyarekar <siddhesh(a)gotplt.org>
Reported-by: Masami Hiramatsu <masami.hiramatsu(a)linaro.org>
Tested-by: Masami Hiramatsu <masami.hiramatsu(a)linaro.org>
Reviewed-by: Tim Bird <tim.bird(a)sony.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/size/get_size.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/size/get_size.c b/tools/testing/selftests/size/get_size.c
index d4b59ab979a09..f55943b6d1e2a 100644
--- a/tools/testing/selftests/size/get_size.c
+++ b/tools/testing/selftests/size/get_size.c
@@ -12,23 +12,35 @@
* own execution. It also attempts to have as few dependencies
* on kernel features as possible.
*
- * It should be statically linked, with startup libs avoided.
- * It uses no library calls, and only the following 3 syscalls:
+ * It should be statically linked, with startup libs avoided. It uses
+ * no library calls except the syscall() function for the following 3
+ * syscalls:
* sysinfo(), write(), and _exit()
*
* For output, it avoids printf (which in some C libraries
* has large external dependencies) by implementing it's own
* number output and print routines, and using __builtin_strlen()
+ *
+ * The test may crash if any of the above syscalls fails because in some
+ * libc implementations (e.g. the GNU C Library) errno is saved in
+ * thread-local storage, which does not get initialized due to avoiding
+ * startup libs.
*/
#include <sys/sysinfo.h>
#include <unistd.h>
+#include <sys/syscall.h>
#define STDOUT_FILENO 1
static int print(const char *s)
{
- return write(STDOUT_FILENO, s, __builtin_strlen(s));
+ size_t len = 0;
+
+ while (s[len] != '\0')
+ len++;
+
+ return syscall(SYS_write, STDOUT_FILENO, s, len);
}
static inline char *num_to_str(unsigned long num, char *buf, int len)
@@ -80,12 +92,12 @@ void _start(void)
print("TAP version 13\n");
print("# Testing system size.\n");
- ccode = sysinfo(&info);
+ ccode = syscall(SYS_sysinfo, &info);
if (ccode < 0) {
print("not ok 1");
print(test_name);
print(" ---\n reason: \"could not get sysinfo\"\n ...\n");
- _exit(ccode);
+ syscall(SYS_exit, ccode);
}
print("ok 1");
print(test_name);
@@ -101,5 +113,5 @@ void _start(void)
print(" ...\n");
print("1..1\n");
- _exit(0);
+ syscall(SYS_exit, 0);
}
--
2.20.1
From: Lorenz Bauer <lmb(a)cloudflare.com>
[ Upstream commit 51bad0f05616c43d6d34b0a19bcc9bdab8e8fb39 ]
Currently, there is a lot of false positives if a single reuseport test
fails. This is because expected_results and the result map are not cleared.
Zero both after individual test runs, which fixes the mentioned false
positives.
Fixes: 91134d849a0e ("bpf: Test BPF_PROG_TYPE_SK_REUSEPORT")
Signed-off-by: Lorenz Bauer <lmb(a)cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub(a)cloudflare.com>
Acked-by: Martin KaFai Lau <kafai(a)fb.com>
Acked-by: John Fastabend <john.fastabend(a)gmail.com>
Link: https://lore.kernel.org/bpf/20200124112754.19664-5-lmb@cloudflare.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/bpf/test_select_reuseport.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_select_reuseport.c b/tools/testing/selftests/bpf/test_select_reuseport.c
index 75646d9b34aaa..cdbbdab2725fc 100644
--- a/tools/testing/selftests/bpf/test_select_reuseport.c
+++ b/tools/testing/selftests/bpf/test_select_reuseport.c
@@ -30,7 +30,7 @@
#define REUSEPORT_ARRAY_SIZE 32
static int result_map, tmp_index_ovr_map, linum_map, data_check_map;
-static enum result expected_results[NR_RESULTS];
+static __u32 expected_results[NR_RESULTS];
static int sk_fds[REUSEPORT_ARRAY_SIZE];
static int reuseport_array, outer_map;
static int select_by_skb_data_prog;
@@ -610,7 +610,19 @@ static void setup_per_test(int type, unsigned short family, bool inany)
static void cleanup_per_test(void)
{
- int i, err;
+ int i, err, zero = 0;
+
+ memset(expected_results, 0, sizeof(expected_results));
+
+ for (i = 0; i < NR_RESULTS; i++) {
+ err = bpf_map_update_elem(result_map, &i, &zero, BPF_ANY);
+ RET_IF(err, "reset elem in result_map",
+ "i:%u err:%d errno:%d\n", i, err, errno);
+ }
+
+ err = bpf_map_update_elem(linum_map, &zero, &zero, BPF_ANY);
+ RET_IF(err, "reset line number in linum_map", "err:%d errno:%d\n",
+ err, errno);
for (i = 0; i < REUSEPORT_ARRAY_SIZE; i++)
close(sk_fds[i]);
--
2.20.1
From: Siddhesh Poyarekar <siddhesh(a)gotplt.org>
[ Upstream commit 6b64a650f0b2ae3940698f401732988699eecf7a ]
It was observed[1] on arm64 that __builtin_strlen led to an infinite
loop in the get_size selftest. This is because __builtin_strlen (and
other builtins) may sometimes result in a call to the C library
function. The C library implementation of strlen uses an IFUNC
resolver to load the most efficient strlen implementation for the
underlying machine and hence has a PLT indirection even for static
binaries. Because this binary avoids the C library startup routines,
the PLT initialization never happens and hence the program gets stuck
in an infinite loop.
On x86_64 the __builtin_strlen just happens to expand inline and avoid
the call but that is not always guaranteed.
Further, while testing on x86_64 (Fedora 31), it was observed that the
test also failed with a segfault inside write() because the generated
code for the write function in glibc seems to access TLS before the
syscall (probably due to the cancellation point check) and fails
because TLS is not initialised.
To mitigate these problems, this patch reduces the interface with the
C library to just the syscall function. The syscall function still
sets errno on failure, which is undesirable but for now it only
affects cases where syscalls fail.
[1] https://bugs.linaro.org/show_bug.cgi?id=5479
Signed-off-by: Siddhesh Poyarekar <siddhesh(a)gotplt.org>
Reported-by: Masami Hiramatsu <masami.hiramatsu(a)linaro.org>
Tested-by: Masami Hiramatsu <masami.hiramatsu(a)linaro.org>
Reviewed-by: Tim Bird <tim.bird(a)sony.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/size/get_size.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/size/get_size.c b/tools/testing/selftests/size/get_size.c
index d4b59ab979a09..f55943b6d1e2a 100644
--- a/tools/testing/selftests/size/get_size.c
+++ b/tools/testing/selftests/size/get_size.c
@@ -12,23 +12,35 @@
* own execution. It also attempts to have as few dependencies
* on kernel features as possible.
*
- * It should be statically linked, with startup libs avoided.
- * It uses no library calls, and only the following 3 syscalls:
+ * It should be statically linked, with startup libs avoided. It uses
+ * no library calls except the syscall() function for the following 3
+ * syscalls:
* sysinfo(), write(), and _exit()
*
* For output, it avoids printf (which in some C libraries
* has large external dependencies) by implementing it's own
* number output and print routines, and using __builtin_strlen()
+ *
+ * The test may crash if any of the above syscalls fails because in some
+ * libc implementations (e.g. the GNU C Library) errno is saved in
+ * thread-local storage, which does not get initialized due to avoiding
+ * startup libs.
*/
#include <sys/sysinfo.h>
#include <unistd.h>
+#include <sys/syscall.h>
#define STDOUT_FILENO 1
static int print(const char *s)
{
- return write(STDOUT_FILENO, s, __builtin_strlen(s));
+ size_t len = 0;
+
+ while (s[len] != '\0')
+ len++;
+
+ return syscall(SYS_write, STDOUT_FILENO, s, len);
}
static inline char *num_to_str(unsigned long num, char *buf, int len)
@@ -80,12 +92,12 @@ void _start(void)
print("TAP version 13\n");
print("# Testing system size.\n");
- ccode = sysinfo(&info);
+ ccode = syscall(SYS_sysinfo, &info);
if (ccode < 0) {
print("not ok 1");
print(test_name);
print(" ---\n reason: \"could not get sysinfo\"\n ...\n");
- _exit(ccode);
+ syscall(SYS_exit, ccode);
}
print("ok 1");
print(test_name);
@@ -101,5 +113,5 @@ void _start(void)
print(" ...\n");
print("1..1\n");
- _exit(0);
+ syscall(SYS_exit, 0);
}
--
2.20.1
From: Oliver O'Halloran <oohall(a)gmail.com>
[ Upstream commit 414f50434aa2463202a5b35e844f4125dd1a7101 ]
Some newer cards supported by aacraid can take up to 40s to recover
after an EEH event. This causes spurious failures in the basic EEH
self-test since the current maximim timeout is only 30s.
Fix the immediate issue by bumping the timeout to a default of 60s,
and allow the wait time to be specified via an environmental variable
(EEH_MAX_WAIT).
Reported-by: Steve Best <sbest(a)redhat.com>
Suggested-by: Douglas Miller <dougmill(a)us.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall(a)gmail.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20200122031125.25991-1-oohall@gmail.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/powerpc/eeh/eeh-functions.sh | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/powerpc/eeh/eeh-functions.sh b/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
index 26112ab5cdf42..f52ed92b53e74 100755
--- a/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
+++ b/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
@@ -53,9 +53,13 @@ eeh_one_dev() {
# is a no-op.
echo $dev >/sys/kernel/debug/powerpc/eeh_dev_check
- # Enforce a 30s timeout for recovery. Even the IPR, which is infamously
- # slow to reset, should recover within 30s.
- max_wait=30
+ # Default to a 60s timeout when waiting for a device to recover. This
+ # is an arbitrary default which can be overridden by setting the
+ # EEH_MAX_WAIT environmental variable when required.
+
+ # The current record holder for longest recovery time is:
+ # "Adaptec Series 8 12G SAS/PCIe 3" at 39 seconds
+ max_wait=${EEH_MAX_WAIT:=60}
for i in `seq 0 ${max_wait}` ; do
if pe_ok $dev ; then
--
2.20.1
From: Lorenz Bauer <lmb(a)cloudflare.com>
[ Upstream commit 51bad0f05616c43d6d34b0a19bcc9bdab8e8fb39 ]
Currently, there is a lot of false positives if a single reuseport test
fails. This is because expected_results and the result map are not cleared.
Zero both after individual test runs, which fixes the mentioned false
positives.
Fixes: 91134d849a0e ("bpf: Test BPF_PROG_TYPE_SK_REUSEPORT")
Signed-off-by: Lorenz Bauer <lmb(a)cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub(a)cloudflare.com>
Acked-by: Martin KaFai Lau <kafai(a)fb.com>
Acked-by: John Fastabend <john.fastabend(a)gmail.com>
Link: https://lore.kernel.org/bpf/20200124112754.19664-5-lmb@cloudflare.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/bpf/test_select_reuseport.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_select_reuseport.c b/tools/testing/selftests/bpf/test_select_reuseport.c
index 7566c13eb51a7..079d0f5a29091 100644
--- a/tools/testing/selftests/bpf/test_select_reuseport.c
+++ b/tools/testing/selftests/bpf/test_select_reuseport.c
@@ -30,7 +30,7 @@
#define REUSEPORT_ARRAY_SIZE 32
static int result_map, tmp_index_ovr_map, linum_map, data_check_map;
-static enum result expected_results[NR_RESULTS];
+static __u32 expected_results[NR_RESULTS];
static int sk_fds[REUSEPORT_ARRAY_SIZE];
static int reuseport_array, outer_map;
static int select_by_skb_data_prog;
@@ -662,7 +662,19 @@ static void setup_per_test(int type, unsigned short family, bool inany)
static void cleanup_per_test(void)
{
- int i, err;
+ int i, err, zero = 0;
+
+ memset(expected_results, 0, sizeof(expected_results));
+
+ for (i = 0; i < NR_RESULTS; i++) {
+ err = bpf_map_update_elem(result_map, &i, &zero, BPF_ANY);
+ RET_IF(err, "reset elem in result_map",
+ "i:%u err:%d errno:%d\n", i, err, errno);
+ }
+
+ err = bpf_map_update_elem(linum_map, &zero, &zero, BPF_ANY);
+ RET_IF(err, "reset line number in linum_map", "err:%d errno:%d\n",
+ err, errno);
for (i = 0; i < REUSEPORT_ARRAY_SIZE; i++)
close(sk_fds[i]);
--
2.20.1
From: Matthieu Baerts <matthieu.baerts(a)tessares.net>
[ Upstream commit ac87813d4372f4c005264acbe3b7f00c1dee37c4 ]
Commit 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second
timeout per test") adds support for a new per-test-directory "settings"
file. But this only works for tests not in a sub-subdirectories, e.g.
- tools/testing/selftests/rtc (rtc) is OK,
- tools/testing/selftests/net/mptcp (net/mptcp) is not.
We have to increase the timeout for net/mptcp tests which are not
upstreamed yet but this fix is valid for other tests if they need to add
a "settings" file, see the full list with:
tools/testing/selftests/*/*/**/Makefile
Note that this patch changes the text header message printed at the end
of the execution but this text is modified only for the tests that are
in sub-subdirectories, e.g.
ok 1 selftests: net/mptcp: mptcp_connect.sh
Before we had:
ok 1 selftests: mptcp: mptcp_connect.sh
But showing the full target name is probably better, just in case a
subsubdir has the same name as another one in another subdirectory.
Fixes: 852c8cbf34d3 (selftests/kselftest/runner.sh: Add 45 second timeout per test)
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/kselftest/runner.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index a8d20cbb711cf..e84d901f85672 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -91,7 +91,7 @@ run_one()
run_many()
{
echo "TAP version 13"
- DIR=$(basename "$PWD")
+ DIR="${PWD#${BASE_DIR}/}"
test_num=0
total=$(echo "$@" | wc -w)
echo "1..$total"
--
2.20.1
From: Siddhesh Poyarekar <siddhesh(a)gotplt.org>
[ Upstream commit 6b64a650f0b2ae3940698f401732988699eecf7a ]
It was observed[1] on arm64 that __builtin_strlen led to an infinite
loop in the get_size selftest. This is because __builtin_strlen (and
other builtins) may sometimes result in a call to the C library
function. The C library implementation of strlen uses an IFUNC
resolver to load the most efficient strlen implementation for the
underlying machine and hence has a PLT indirection even for static
binaries. Because this binary avoids the C library startup routines,
the PLT initialization never happens and hence the program gets stuck
in an infinite loop.
On x86_64 the __builtin_strlen just happens to expand inline and avoid
the call but that is not always guaranteed.
Further, while testing on x86_64 (Fedora 31), it was observed that the
test also failed with a segfault inside write() because the generated
code for the write function in glibc seems to access TLS before the
syscall (probably due to the cancellation point check) and fails
because TLS is not initialised.
To mitigate these problems, this patch reduces the interface with the
C library to just the syscall function. The syscall function still
sets errno on failure, which is undesirable but for now it only
affects cases where syscalls fail.
[1] https://bugs.linaro.org/show_bug.cgi?id=5479
Signed-off-by: Siddhesh Poyarekar <siddhesh(a)gotplt.org>
Reported-by: Masami Hiramatsu <masami.hiramatsu(a)linaro.org>
Tested-by: Masami Hiramatsu <masami.hiramatsu(a)linaro.org>
Reviewed-by: Tim Bird <tim.bird(a)sony.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/size/get_size.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/size/get_size.c b/tools/testing/selftests/size/get_size.c
index 2ad45b9443550..2980b1a63366b 100644
--- a/tools/testing/selftests/size/get_size.c
+++ b/tools/testing/selftests/size/get_size.c
@@ -11,23 +11,35 @@
* own execution. It also attempts to have as few dependencies
* on kernel features as possible.
*
- * It should be statically linked, with startup libs avoided.
- * It uses no library calls, and only the following 3 syscalls:
+ * It should be statically linked, with startup libs avoided. It uses
+ * no library calls except the syscall() function for the following 3
+ * syscalls:
* sysinfo(), write(), and _exit()
*
* For output, it avoids printf (which in some C libraries
* has large external dependencies) by implementing it's own
* number output and print routines, and using __builtin_strlen()
+ *
+ * The test may crash if any of the above syscalls fails because in some
+ * libc implementations (e.g. the GNU C Library) errno is saved in
+ * thread-local storage, which does not get initialized due to avoiding
+ * startup libs.
*/
#include <sys/sysinfo.h>
#include <unistd.h>
+#include <sys/syscall.h>
#define STDOUT_FILENO 1
static int print(const char *s)
{
- return write(STDOUT_FILENO, s, __builtin_strlen(s));
+ size_t len = 0;
+
+ while (s[len] != '\0')
+ len++;
+
+ return syscall(SYS_write, STDOUT_FILENO, s, len);
}
static inline char *num_to_str(unsigned long num, char *buf, int len)
@@ -79,12 +91,12 @@ void _start(void)
print("TAP version 13\n");
print("# Testing system size.\n");
- ccode = sysinfo(&info);
+ ccode = syscall(SYS_sysinfo, &info);
if (ccode < 0) {
print("not ok 1");
print(test_name);
print(" ---\n reason: \"could not get sysinfo\"\n ...\n");
- _exit(ccode);
+ syscall(SYS_exit, ccode);
}
print("ok 1");
print(test_name);
@@ -100,5 +112,5 @@ void _start(void)
print(" ...\n");
print("1..1\n");
- _exit(0);
+ syscall(SYS_exit, 0);
}
--
2.20.1
From: Oliver O'Halloran <oohall(a)gmail.com>
[ Upstream commit 414f50434aa2463202a5b35e844f4125dd1a7101 ]
Some newer cards supported by aacraid can take up to 40s to recover
after an EEH event. This causes spurious failures in the basic EEH
self-test since the current maximim timeout is only 30s.
Fix the immediate issue by bumping the timeout to a default of 60s,
and allow the wait time to be specified via an environmental variable
(EEH_MAX_WAIT).
Reported-by: Steve Best <sbest(a)redhat.com>
Suggested-by: Douglas Miller <dougmill(a)us.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall(a)gmail.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20200122031125.25991-1-oohall@gmail.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/powerpc/eeh/eeh-functions.sh | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/powerpc/eeh/eeh-functions.sh b/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
index 26112ab5cdf42..f52ed92b53e74 100755
--- a/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
+++ b/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
@@ -53,9 +53,13 @@ eeh_one_dev() {
# is a no-op.
echo $dev >/sys/kernel/debug/powerpc/eeh_dev_check
- # Enforce a 30s timeout for recovery. Even the IPR, which is infamously
- # slow to reset, should recover within 30s.
- max_wait=30
+ # Default to a 60s timeout when waiting for a device to recover. This
+ # is an arbitrary default which can be overridden by setting the
+ # EEH_MAX_WAIT environmental variable when required.
+
+ # The current record holder for longest recovery time is:
+ # "Adaptec Series 8 12G SAS/PCIe 3" at 39 seconds
+ max_wait=${EEH_MAX_WAIT:=60}
for i in `seq 0 ${max_wait}` ; do
if pe_ok $dev ; then
--
2.20.1
From: Lorenz Bauer <lmb(a)cloudflare.com>
[ Upstream commit 51bad0f05616c43d6d34b0a19bcc9bdab8e8fb39 ]
Currently, there is a lot of false positives if a single reuseport test
fails. This is because expected_results and the result map are not cleared.
Zero both after individual test runs, which fixes the mentioned false
positives.
Fixes: 91134d849a0e ("bpf: Test BPF_PROG_TYPE_SK_REUSEPORT")
Signed-off-by: Lorenz Bauer <lmb(a)cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub(a)cloudflare.com>
Acked-by: Martin KaFai Lau <kafai(a)fb.com>
Acked-by: John Fastabend <john.fastabend(a)gmail.com>
Link: https://lore.kernel.org/bpf/20200124112754.19664-5-lmb@cloudflare.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/bpf/test_select_reuseport.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_select_reuseport.c b/tools/testing/selftests/bpf/test_select_reuseport.c
index 7566c13eb51a7..079d0f5a29091 100644
--- a/tools/testing/selftests/bpf/test_select_reuseport.c
+++ b/tools/testing/selftests/bpf/test_select_reuseport.c
@@ -30,7 +30,7 @@
#define REUSEPORT_ARRAY_SIZE 32
static int result_map, tmp_index_ovr_map, linum_map, data_check_map;
-static enum result expected_results[NR_RESULTS];
+static __u32 expected_results[NR_RESULTS];
static int sk_fds[REUSEPORT_ARRAY_SIZE];
static int reuseport_array, outer_map;
static int select_by_skb_data_prog;
@@ -662,7 +662,19 @@ static void setup_per_test(int type, unsigned short family, bool inany)
static void cleanup_per_test(void)
{
- int i, err;
+ int i, err, zero = 0;
+
+ memset(expected_results, 0, sizeof(expected_results));
+
+ for (i = 0; i < NR_RESULTS; i++) {
+ err = bpf_map_update_elem(result_map, &i, &zero, BPF_ANY);
+ RET_IF(err, "reset elem in result_map",
+ "i:%u err:%d errno:%d\n", i, err, errno);
+ }
+
+ err = bpf_map_update_elem(linum_map, &zero, &zero, BPF_ANY);
+ RET_IF(err, "reset line number in linum_map", "err:%d errno:%d\n",
+ err, errno);
for (i = 0; i < REUSEPORT_ARRAY_SIZE; i++)
close(sk_fds[i]);
--
2.20.1
From: Alan Maguire <alan.maguire(a)oracle.com>
[ Upstream commit 1c024d45151b51c8f8d4749e65958b0bcf3e7c52 ]
In discussion of how to handle timeouts, it was noted that if
sysctl_hung_task_timeout_seconds is exceeded for a kunit test,
the test task will be killed and an oops generated. This should
suffice as a means of debugging such timeout issues for now.
Hence remove use of sysctl_hung_task_timeout_secs, which has the
added benefit of avoiding the need to export that symbol from
the core kernel.
Co-developed-by: Knut Omang <knut.omang(a)oracle.com>
Signed-off-by: Knut Omang <knut.omang(a)oracle.com>
Signed-off-by: Alan Maguire <alan.maguire(a)oracle.com>
Reviewed-by: Stephen Boyd <sboyd(a)kernel.org>
Acked-by: Brendan Higgins <brendanhiggins(a)google.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
lib/kunit/try-catch.c | 22 ++++------------------
1 file changed, 4 insertions(+), 18 deletions(-)
diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c
index 55686839eb619..6b9c5242017f6 100644
--- a/lib/kunit/try-catch.c
+++ b/lib/kunit/try-catch.c
@@ -12,7 +12,6 @@
#include <linux/completion.h>
#include <linux/kernel.h>
#include <linux/kthread.h>
-#include <linux/sched/sysctl.h>
void __noreturn kunit_try_catch_throw(struct kunit_try_catch *try_catch)
{
@@ -31,8 +30,6 @@ static int kunit_generic_run_threadfn_adapter(void *data)
static unsigned long kunit_test_timeout(void)
{
- unsigned long timeout_msecs;
-
/*
* TODO(brendanhiggins(a)google.com): We should probably have some type of
* variable timeout here. The only question is what that timeout value
@@ -49,22 +46,11 @@ static unsigned long kunit_test_timeout(void)
*
* For more background on this topic, see:
* https://mike-bland.com/2011/11/01/small-medium-large.html
+ *
+ * If tests timeout due to exceeding sysctl_hung_task_timeout_secs,
+ * the task will be killed and an oops generated.
*/
- if (sysctl_hung_task_timeout_secs) {
- /*
- * If sysctl_hung_task is active, just set the timeout to some
- * value less than that.
- *
- * In regards to the above TODO, if we decide on variable
- * timeouts, this logic will likely need to change.
- */
- timeout_msecs = (sysctl_hung_task_timeout_secs - 1) *
- MSEC_PER_SEC;
- } else {
- timeout_msecs = 300 * MSEC_PER_SEC; /* 5 min */
- }
-
- return timeout_msecs;
+ return 300 * MSEC_PER_SEC; /* 5 min */
}
void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
--
2.20.1
From: Matthieu Baerts <matthieu.baerts(a)tessares.net>
[ Upstream commit ac87813d4372f4c005264acbe3b7f00c1dee37c4 ]
Commit 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second
timeout per test") adds support for a new per-test-directory "settings"
file. But this only works for tests not in a sub-subdirectories, e.g.
- tools/testing/selftests/rtc (rtc) is OK,
- tools/testing/selftests/net/mptcp (net/mptcp) is not.
We have to increase the timeout for net/mptcp tests which are not
upstreamed yet but this fix is valid for other tests if they need to add
a "settings" file, see the full list with:
tools/testing/selftests/*/*/**/Makefile
Note that this patch changes the text header message printed at the end
of the execution but this text is modified only for the tests that are
in sub-subdirectories, e.g.
ok 1 selftests: net/mptcp: mptcp_connect.sh
Before we had:
ok 1 selftests: mptcp: mptcp_connect.sh
But showing the full target name is probably better, just in case a
subsubdir has the same name as another one in another subdirectory.
Fixes: 852c8cbf34d3 (selftests/kselftest/runner.sh: Add 45 second timeout per test)
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/kselftest/runner.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index a8d20cbb711cf..e84d901f85672 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -91,7 +91,7 @@ run_one()
run_many()
{
echo "TAP version 13"
- DIR=$(basename "$PWD")
+ DIR="${PWD#${BASE_DIR}/}"
test_num=0
total=$(echo "$@" | wc -w)
echo "1..$total"
--
2.20.1
From: Siddhesh Poyarekar <siddhesh(a)gotplt.org>
[ Upstream commit 6b64a650f0b2ae3940698f401732988699eecf7a ]
It was observed[1] on arm64 that __builtin_strlen led to an infinite
loop in the get_size selftest. This is because __builtin_strlen (and
other builtins) may sometimes result in a call to the C library
function. The C library implementation of strlen uses an IFUNC
resolver to load the most efficient strlen implementation for the
underlying machine and hence has a PLT indirection even for static
binaries. Because this binary avoids the C library startup routines,
the PLT initialization never happens and hence the program gets stuck
in an infinite loop.
On x86_64 the __builtin_strlen just happens to expand inline and avoid
the call but that is not always guaranteed.
Further, while testing on x86_64 (Fedora 31), it was observed that the
test also failed with a segfault inside write() because the generated
code for the write function in glibc seems to access TLS before the
syscall (probably due to the cancellation point check) and fails
because TLS is not initialised.
To mitigate these problems, this patch reduces the interface with the
C library to just the syscall function. The syscall function still
sets errno on failure, which is undesirable but for now it only
affects cases where syscalls fail.
[1] https://bugs.linaro.org/show_bug.cgi?id=5479
Signed-off-by: Siddhesh Poyarekar <siddhesh(a)gotplt.org>
Reported-by: Masami Hiramatsu <masami.hiramatsu(a)linaro.org>
Tested-by: Masami Hiramatsu <masami.hiramatsu(a)linaro.org>
Reviewed-by: Tim Bird <tim.bird(a)sony.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/size/get_size.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/size/get_size.c b/tools/testing/selftests/size/get_size.c
index 2ad45b9443550..2980b1a63366b 100644
--- a/tools/testing/selftests/size/get_size.c
+++ b/tools/testing/selftests/size/get_size.c
@@ -11,23 +11,35 @@
* own execution. It also attempts to have as few dependencies
* on kernel features as possible.
*
- * It should be statically linked, with startup libs avoided.
- * It uses no library calls, and only the following 3 syscalls:
+ * It should be statically linked, with startup libs avoided. It uses
+ * no library calls except the syscall() function for the following 3
+ * syscalls:
* sysinfo(), write(), and _exit()
*
* For output, it avoids printf (which in some C libraries
* has large external dependencies) by implementing it's own
* number output and print routines, and using __builtin_strlen()
+ *
+ * The test may crash if any of the above syscalls fails because in some
+ * libc implementations (e.g. the GNU C Library) errno is saved in
+ * thread-local storage, which does not get initialized due to avoiding
+ * startup libs.
*/
#include <sys/sysinfo.h>
#include <unistd.h>
+#include <sys/syscall.h>
#define STDOUT_FILENO 1
static int print(const char *s)
{
- return write(STDOUT_FILENO, s, __builtin_strlen(s));
+ size_t len = 0;
+
+ while (s[len] != '\0')
+ len++;
+
+ return syscall(SYS_write, STDOUT_FILENO, s, len);
}
static inline char *num_to_str(unsigned long num, char *buf, int len)
@@ -79,12 +91,12 @@ void _start(void)
print("TAP version 13\n");
print("# Testing system size.\n");
- ccode = sysinfo(&info);
+ ccode = syscall(SYS_sysinfo, &info);
if (ccode < 0) {
print("not ok 1");
print(test_name);
print(" ---\n reason: \"could not get sysinfo\"\n ...\n");
- _exit(ccode);
+ syscall(SYS_exit, ccode);
}
print("ok 1");
print(test_name);
@@ -100,5 +112,5 @@ void _start(void)
print(" ...\n");
print("1..1\n");
- _exit(0);
+ syscall(SYS_exit, 0);
}
--
2.20.1
While building selftests, the following errors were observed:
> tools/testing/selftests/timens'
> gcc -Wall -Werror -pthread -lrt -ldl timens.c -o tools/testing/selftests/timens/timens
> /usr/bin/ld: /tmp/ccGy5CST.o: in function `check_config_posix_timers':
> timens.c:(.text+0x65a): undefined reference to `timer_create'
> collect2: error: ld returned 1 exit status
Quoting commit 870f193d48c2 ("selftests: net: use LDLIBS instead of
LDFLAGS"):
The default Makefile rule looks like:
$(CC) $(CFLAGS) $(LDFLAGS) $@ $^ $(LDLIBS)
When linking is done by gcc itself, no issue, but when it needs to be passed
to proper ld, only LDLIBS follows and then ld cannot know what libs to link
with.
More detail:
https://www.gnu.org/software/make/manual/html_node/Implicit-Variables.html
LDFLAGS
Extra flags to give to compilers when they are supposed to invoke the linker,
‘ld’, such as -L. Libraries (-lfoo) should be added to the LDLIBS variable
instead.
LDLIBS
Library flags or names given to compilers when they are supposed to invoke the
linker, ‘ld’. LOADLIBES is a deprecated (but still supported) alternative to
LDLIBS. Non-library linker flags, such as -L, should go in the LDFLAGS
variable.
While at here, correct other selftests, not only timens ones.
Reported-by: Shuah Khan <skhan(a)kernel.org>
Signed-off-by: Dmitry Safonov <dima(a)arista.com>
---
tools/testing/selftests/futex/functional/Makefile | 2 +-
tools/testing/selftests/net/Makefile | 4 ++--
tools/testing/selftests/rtc/Makefile | 2 +-
tools/testing/selftests/timens/Makefile | 2 +-
4 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile
index 30996306cabc..23207829ec75 100644
--- a/tools/testing/selftests/futex/functional/Makefile
+++ b/tools/testing/selftests/futex/functional/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
INCLUDES := -I../include -I../../
CFLAGS := $(CFLAGS) -g -O2 -Wall -D_GNU_SOURCE -pthread $(INCLUDES)
-LDFLAGS := $(LDFLAGS) -pthread -lrt
+LDLIBS := -lpthread -lrt
HEADERS := \
../include/futextest.h \
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index b5694196430a..287ae916ec0b 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -27,5 +27,5 @@ KSFT_KHDR_INSTALL := 1
include ../lib.mk
$(OUTPUT)/reuseport_bpf_numa: LDLIBS += -lnuma
-$(OUTPUT)/tcp_mmap: LDFLAGS += -lpthread
-$(OUTPUT)/tcp_inq: LDFLAGS += -lpthread
+$(OUTPUT)/tcp_mmap: LDLIBS += -lpthread
+$(OUTPUT)/tcp_inq: LDLIBS += -lpthread
diff --git a/tools/testing/selftests/rtc/Makefile b/tools/testing/selftests/rtc/Makefile
index de9c8566672a..2d93d65723c9 100644
--- a/tools/testing/selftests/rtc/Makefile
+++ b/tools/testing/selftests/rtc/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
CFLAGS += -O3 -Wl,-no-as-needed -Wall
-LDFLAGS += -lrt -lpthread -lm
+LDLIBS += -lrt -lpthread -lm
TEST_GEN_PROGS = rtctest
diff --git a/tools/testing/selftests/timens/Makefile b/tools/testing/selftests/timens/Makefile
index e9fb30bd8aeb..b4fd9a934654 100644
--- a/tools/testing/selftests/timens/Makefile
+++ b/tools/testing/selftests/timens/Makefile
@@ -2,6 +2,6 @@ TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs exec
TEST_GEN_PROGS_EXTENDED := gettime_perf
CFLAGS := -Wall -Werror -pthread
-LDFLAGS := -lrt -ldl
+LDLIBS := -lrt -ldl
include ../lib.mk
--
2.25.0
It appears that newer glibcs check that openat(O_CREAT) was provided a
fourth argument (rather than passing garbage), resulting in the
following build error:
> In file included from /usr/include/fcntl.h:301,
> from helpers.c:9:
> In function 'openat',
> inlined from 'touchat' at helpers.c:49:11:
> /usr/include/x86_64-linux-gnu/bits/fcntl2.h:126:4: error: call to
> '__openat_missing_mode' declared with attribute error: openat with O_CREAT
> or O_TMPFILE in third argument needs 4 arguments
> 126 | __openat_missing_mode ();
> | ^~~~~~~~~~~~~~~~~~~~~~~~
Reported-by: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
tools/testing/selftests/openat2/helpers.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/openat2/helpers.c b/tools/testing/selftests/openat2/helpers.c
index e9a6557ab16f..5074681ffdc9 100644
--- a/tools/testing/selftests/openat2/helpers.c
+++ b/tools/testing/selftests/openat2/helpers.c
@@ -46,7 +46,7 @@ int sys_renameat2(int olddirfd, const char *oldpath,
int touchat(int dfd, const char *path)
{
- int fd = openat(dfd, path, O_CREAT);
+ int fd = openat(dfd, path, O_CREAT, 0700);
if (fd >= 0)
close(fd);
return fd;
--
2.25.0
The following tests fail to build on x86_64
openat2:
tools/testing/selftests/openat2'
gcc -Wall -O2 -g -fsanitize=address -fsanitize=undefined
openat2_test.c helpers.c -o tools/testing/selftests/openat2/openat2_test
In file included from /usr/include/fcntl.h:301,
from helpers.c:9:
In function ‘openat’,
inlined from ‘touchat’ at helpers.c:49:11:
/usr/include/x86_64-linux-gnu/bits/fcntl2.h:126:4: error: call to
‘__openat_missing_mode’ declared with attribute error: openat with
O_CREAT or O_TMPFILE in third argument needs 4 arguments
126 | __openat_missing_mode ();
| ^~~~~~~~~~~~~~~~~~~~~~~~
timerns:
tools/testing/selftests/timens'
gcc -Wall -Werror -pthread -lrt -ldl timens.c -o
tools/testing/selftests/timens/timens
/usr/bin/ld: /tmp/ccGy5CST.o: in function `check_config_posix_timers':
timens.c:(.text+0x65a): undefined reference to `timer_create'
collect2: error: ld returned 1 exit status
thanks,
-- Shuah
Remove some of the outmoded "Why KUnit" rationale, and move some
UML-specific information to the kunit_tool page. Also update the Getting
Started guide to mention running tests without the kunit_tool wrapper.
Signed-off-by: David Gow <davidgow(a)google.com>
---
Thanks for your comments. I've reinstated the "Why KUnit" section with
some minor changes.
Changes since v1[1]:
- Reinstated the "Why Kunit?" section, minus the comparison with other
testing frameworks (covered in the FAQ), and the description of UML.
- Moved the description of UML into to kunit_tool page.
- Tidied up the wording around how KUnit is built and run to make it work
without the UML description.
[1]:
https://lore.kernel.org/linux-kselftest/9c703dea-a9e1-94e2-c12d-3cb0a09e75a…
Documentation/dev-tools/kunit/index.rst | 33 ++++----
Documentation/dev-tools/kunit/kunit-tool.rst | 7 ++
Documentation/dev-tools/kunit/start.rst | 80 ++++++++++++++++----
3 files changed, 92 insertions(+), 28 deletions(-)
diff --git a/Documentation/dev-tools/kunit/index.rst b/Documentation/dev-tools/kunit/index.rst
index d16a4d2c3a41..ca6cd6dd6ab7 100644
--- a/Documentation/dev-tools/kunit/index.rst
+++ b/Documentation/dev-tools/kunit/index.rst
@@ -17,14 +17,23 @@ What is KUnit?
==============
KUnit is a lightweight unit testing and mocking framework for the Linux kernel.
-These tests are able to be run locally on a developer's workstation without a VM
-or special hardware.
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining unit test
cases, grouping related test cases into test suites, providing common
infrastructure for running tests, and much more.
+KUnit consists of a kernel component, which provides a set of macros for easily
+writing unit tests. Tests written against KUnit will run on kernel boot if
+built-in, or when loaded if built as a module. These tests write out results to
+the kernel log in `TAP <https://testanything.org/>`_ format.
+
+To make running these tests (and reading the results) easier, KUnit offsers
+:doc:`kunit_tool <kunit-tool>`, which builds a `User Mode Linux
+<http://user-mode-linux.sourceforge.net>`_ kernel, runs it, and parses the test
+results. This provides a quick way of running KUnit tests during development,
+without requiring a virtual machine or separate hardware.
+
Get started now: :doc:`start`
Why KUnit?
@@ -36,20 +45,11 @@ allow all possible code paths to be tested in the code under test; this is only
possible if the code under test is very small and does not have any external
dependencies outside of the test's control like hardware.
-Outside of KUnit, there are no testing frameworks currently
-available for the kernel that do not require installing the kernel on a test
-machine or in a VM and all require tests to be written in userspace running on
-the kernel; this is true for Autotest, and kselftest, disqualifying
-any of them from being considered unit testing frameworks.
+KUnit provides a common framework for unit tests within the kernel.
-KUnit addresses the problem of being able to run tests without needing a virtual
-machine or actual hardware with User Mode Linux. User Mode Linux is a Linux
-architecture, like ARM or x86; however, unlike other architectures it compiles
-to a standalone program that can be run like any other program directly inside
-of a host operating system; to be clear, it does not require any virtualization
-support; it is just a regular program.
-
-Alternatively, kunit and kunit tests can be built as modules and tests will
+KUnit tests can be run on most kernel configurations, and most tests are
+architecture independent. All built-in KUnit tests run on kernel startup.
+Alternatively, KUnit and KUnit tests can be built as modules and tests will
run when the test module is loaded.
KUnit is fast. Excluding build time, from invocation to completion KUnit can run
@@ -75,9 +75,12 @@ someone sends you some code. Why trust that someone ran all their tests
correctly on every change when you can just run them yourself in less time than
it takes to read their test log?
+
How do I use it?
================
* :doc:`start` - for new users of KUnit
* :doc:`usage` - for a more detailed explanation of KUnit features
* :doc:`api/index` - for the list of KUnit APIs used for testing
+* :doc:`kunit-tool` - for more information on the kunit_tool helper script
+* :doc:`faq` - for answers to some common questions about KUnit
diff --git a/Documentation/dev-tools/kunit/kunit-tool.rst b/Documentation/dev-tools/kunit/kunit-tool.rst
index 50d46394e97e..949af2da81e5 100644
--- a/Documentation/dev-tools/kunit/kunit-tool.rst
+++ b/Documentation/dev-tools/kunit/kunit-tool.rst
@@ -12,6 +12,13 @@ the Linux kernel as UML (`User Mode Linux
<http://user-mode-linux.sourceforge.net/>`_), running KUnit tests, parsing
the test results and displaying them in a user friendly manner.
+kunit_tool addresses the problem of being able to run tests without needing a
+virtual machine or actual hardware with User Mode Linux. User Mode Linux is a
+Linux architecture, like ARM or x86; however, unlike other architectures it
+compiles the kernel as a standalone Linux executable that can be run like any
+other program directly inside of a host operating system. To be clear, it does
+not require any virtualization support: it is just a regular program.
+
What is a kunitconfig?
======================
diff --git a/Documentation/dev-tools/kunit/start.rst b/Documentation/dev-tools/kunit/start.rst
index 4e1d24db6b13..e1c5ce80ce12 100644
--- a/Documentation/dev-tools/kunit/start.rst
+++ b/Documentation/dev-tools/kunit/start.rst
@@ -9,11 +9,10 @@ Installing dependencies
KUnit has the same dependencies as the Linux kernel. As long as you can build
the kernel, you can run KUnit.
-KUnit Wrapper
-=============
-Included with KUnit is a simple Python wrapper that helps format the output to
-easily use and read KUnit output. It handles building and running the kernel, as
-well as formatting the output.
+Running tests with the KUnit Wrapper
+====================================
+Included with KUnit is a simple Python wrapper which runs tests under User Mode
+Linux, and formats the test results.
The wrapper can be run with:
@@ -21,22 +20,42 @@ The wrapper can be run with:
./tools/testing/kunit/kunit.py run --defconfig
-For more information on this wrapper (also called kunit_tool) checkout the
+For more information on this wrapper (also called kunit_tool) check out the
:doc:`kunit-tool` page.
Creating a .kunitconfig
-=======================
-The Python script is a thin wrapper around Kbuild. As such, it needs to be
-configured with a ``.kunitconfig`` file. This file essentially contains the
-regular Kernel config, with the specific test targets as well.
-
+-----------------------
+If you want to run a specific set of tests (rather than those listed in the
+KUnit defconfig), you can provide Kconfig options in the ``.kunitconfig`` file.
+This file essentially contains the regular Kernel config, with the specific
+test targets as well. The ``.kunitconfig`` should also contain any other config
+options required by the tests.
+
+A good starting point for a ``.kunitconfig`` is the KUnit defconfig:
.. code-block:: bash
cd $PATH_TO_LINUX_REPO
cp arch/um/configs/kunit_defconfig .kunitconfig
-Verifying KUnit Works
----------------------
+You can then add any other Kconfig options you wish, e.g.:
+.. code-block:: none
+
+ CONFIG_LIST_KUNIT_TEST=y
+
+:doc:`kunit_tool <kunit-tool>` will ensure that all config options set in
+``.kunitconfig`` are set in the kernel ``.config`` before running the tests.
+It'll warn you if you haven't included the dependencies of the options you're
+using.
+
+.. note::
+ Note that removing something from the ``.kunitconfig`` will not trigger a
+ rebuild of the ``.config`` file: the configuration is only updated if the
+ ``.kunitconfig`` is not a subset of ``.config``. This means that you can use
+ other tools (such as make menuconfig) to adjust other config options.
+
+
+Running the tests
+-----------------
To make sure that everything is set up correctly, simply invoke the Python
wrapper from your kernel repo:
@@ -62,6 +81,41 @@ followed by a list of tests that are run. All of them should be passing.
Because it is building a lot of sources for the first time, the
``Building KUnit kernel`` step may take a while.
+Running tests without the KUnit Wrapper
+=======================================
+
+If you'd rather not use the KUnit Wrapper (if, for example, you need to
+integrate with other systems, or use an architecture other than UML), KUnit can
+be included in any kernel, and the results read out and parsed manually.
+
+.. note::
+ KUnit is not designed for use in a production system, and it's possible that
+ tests may reduce the stability or security of the system.
+
+
+
+Configuring the kernel
+----------------------
+
+In order to enable KUnit itself, you simply need to enable the ``CONFIG_KUNIT``
+Kconfig option (it's under Kernel Hacking/Kernel Testing and Coverage in
+menuconfig). From there, you can enable any KUnit tests you want: they usually
+have config options ending in ``_KUNIT_TEST``.
+
+KUnit and KUnit tests can be compiled as modules: in this case the tests in a
+module will be run when the module is loaded.
+
+Running the tests
+-----------------
+
+Build and run your kernel as usual. Test output will be written to the kernel
+log in `TAP <https://testanything.org/>`_ format.
+
+.. note::
+ It's possible that there will be other lines and/or data interspersed in the
+ TAP output.
+
+
Writing your first test
=======================
--
2.25.0.265.gbab2e86ba0-goog
A cgroup containing only dying tasks will be seen as empty when a userspace
process reads its cgroup.procs or cgroup.tasks files. It should be safe to
delete such a cgroup as it is considered empty. However if one of the dying
tasks did not reach cgroup_exit then an attempt to delete the cgroup will
fail with EBUSY because cgroup_is_populated() will not consider it empty
until all tasks reach cgroup_exit. Such a condition can be triggered when
a task consumes large amounts of memory and spends enough time in exit_mm
to create delay between the moment it is flagged as PF_EXITING and the
moment it reaches cgroup_exit.
Fix this by detecting cgroups containing only dying tasks during cgroup
destruction and proceeding with it while postponing the final step of
releasing the last reference until the last task reaches cgroup_exit.
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: JeiFeng Lee <linger.lee(a)mediatek.com>
Fixes: c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS iterations")
---
include/linux/cgroup-defs.h | 3 ++
kernel/cgroup/cgroup.c | 65 +++++++++++++++++++++++++++++++++----
2 files changed, 61 insertions(+), 7 deletions(-)
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 63097cb243cb..f9bcccbac8dd 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -71,6 +71,9 @@ enum {
/* Cgroup is frozen. */
CGRP_FROZEN,
+
+ /* Cgroup is dead. */
+ CGRP_DEAD,
};
/* cgroup_root->flags */
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 735af8f15f95..a99ebddd37d9 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -795,10 +795,11 @@ static bool css_set_populated(struct css_set *cset)
* that the content of the interface file has changed. This can be used to
* detect when @cgrp and its descendants become populated or empty.
*/
-static void cgroup_update_populated(struct cgroup *cgrp, bool populated)
+static bool cgroup_update_populated(struct cgroup *cgrp, bool populated)
{
struct cgroup *child = NULL;
int adj = populated ? 1 : -1;
+ bool state_change = false;
lockdep_assert_held(&css_set_lock);
@@ -817,6 +818,7 @@ static void cgroup_update_populated(struct cgroup *cgrp, bool populated)
if (was_populated == cgroup_is_populated(cgrp))
break;
+ state_change = true;
cgroup1_check_for_release(cgrp);
TRACE_CGROUP_PATH(notify_populated, cgrp,
cgroup_is_populated(cgrp));
@@ -825,6 +827,21 @@ static void cgroup_update_populated(struct cgroup *cgrp, bool populated)
child = cgrp;
cgrp = cgroup_parent(cgrp);
} while (cgrp);
+
+ return state_change;
+}
+
+static void cgroup_prune_dead(struct cgroup *cgrp)
+{
+ lockdep_assert_held(&css_set_lock);
+
+ do {
+ /* put the base reference if cgroup was already destroyed */
+ if (!cgroup_is_populated(cgrp) &&
+ test_bit(CGRP_DEAD, &cgrp->flags))
+ percpu_ref_kill(&cgrp->self.refcnt);
+ cgrp = cgroup_parent(cgrp);
+ } while (cgrp);
}
/**
@@ -838,11 +855,15 @@ static void cgroup_update_populated(struct cgroup *cgrp, bool populated)
static void css_set_update_populated(struct css_set *cset, bool populated)
{
struct cgrp_cset_link *link;
+ bool state_change;
lockdep_assert_held(&css_set_lock);
- list_for_each_entry(link, &cset->cgrp_links, cgrp_link)
- cgroup_update_populated(link->cgrp, populated);
+ list_for_each_entry(link, &cset->cgrp_links, cgrp_link) {
+ state_change = cgroup_update_populated(link->cgrp, populated);
+ if (state_change && !populated)
+ cgroup_prune_dead(link->cgrp);
+ }
}
/*
@@ -5458,8 +5479,26 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
* Only migration can raise populated from zero and we're already
* holding cgroup_mutex.
*/
- if (cgroup_is_populated(cgrp))
- return -EBUSY;
+ if (cgroup_is_populated(cgrp)) {
+ struct css_task_iter it;
+ struct task_struct *task;
+
+ /*
+ * cgroup_is_populated does not account for exiting tasks
+ * that did not reach cgroup_exit yet. Check if all the tasks
+ * in this cgroup are exiting.
+ */
+ css_task_iter_start(&cgrp->self, 0, &it);
+ do {
+ task = css_task_iter_next(&it);
+ } while (task && (task->flags & PF_EXITING));
+ css_task_iter_end(&it);
+
+ if (task) {
+ /* cgroup is indeed populated */
+ return -EBUSY;
+ }
+ }
/*
* Make sure there's no live children. We can't test emptiness of
@@ -5510,8 +5549,20 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
cgroup_bpf_offline(cgrp);
- /* put the base reference */
- percpu_ref_kill(&cgrp->self.refcnt);
+ /*
+ * Take css_set_lock because of the possible race with
+ * cgroup_update_populated.
+ */
+ spin_lock_irq(&css_set_lock);
+ /* The last task might have died since we last checked */
+ if (cgroup_is_populated(cgrp)) {
+ /* mark cgroup for future destruction */
+ set_bit(CGRP_DEAD, &cgrp->flags);
+ } else {
+ /* put the base reference */
+ percpu_ref_kill(&cgrp->self.refcnt);
+ }
+ spin_unlock_irq(&css_set_lock);
return 0;
};
--
2.25.0.rc1.283.g88dfdc4193-goog
From: Colin Ian King <colin.king(a)canonical.com>
There are two spelling mistakes in error messages. Fix these.
Signed-off-by: Colin Ian King <colin.king(a)canonical.com>
---
tools/testing/selftests/resctrl/fill_buf.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c
index 84d2a8b9657a..79c611c99a3d 100644
--- a/tools/testing/selftests/resctrl/fill_buf.c
+++ b/tools/testing/selftests/resctrl/fill_buf.c
@@ -181,7 +181,7 @@ fill_cache(unsigned long long buf_size, int malloc_and_init, int memflush,
ret = fill_cache_write(start_ptr, end_ptr, resctrl_val);
if (ret) {
- printf("\n Errror in fill cache read/write...\n");
+ printf("\n Error in fill cache read/write...\n");
return -1;
}
@@ -205,7 +205,7 @@ int run_fill_buf(unsigned long span, int malloc_and_init_memory,
ret = fill_cache(cache_size, malloc_and_init_memory, memflush, op,
resctrl_val);
if (ret) {
- printf("\n Errror in fill cache\n");
+ printf("\n Error in fill cache\n");
return -1;
}
--
2.25.0
Hi,
This is a refresh of my earlier attempt to fix READ_IMPLIES_EXEC. I think
it incorporates the feedback from v2, and I've now added a selftest. This
series is for x86, arm, and arm64; I'd like it to go via -tip, though,
just to keep this change together with the selftest. To that end, I'd like
to collect Acks from the respective architecture maintainers. (Note that
most other architectures don't suffer from this problem. e.g. powerpc's
behavior appears to already be correct. MIPS may need adjusting but the
history of CPU features and toolchain behavior is very unclear to me.)
Repeating the commit log from later in the series:
The READ_IMPLIES_EXEC work-around was designed for old toolchains that
lacked the ELF PT_GNU_STACK marking under the assumption that toolchains
that couldn't specify executable permission flags for the stack may not
know how to do it correctly for any memory region.
This logic is sensible for having ancient binaries coexist in a system
with possibly NX memory, but was implemented in a way that equated having
a PT_GNU_STACK marked executable as being as "broken" as lacking the
PT_GNU_STACK marking entirely. Things like unmarked assembly and stack
trampolines may cause PT_GNU_STACK to need an executable bit, but they
do not imply all mappings must be executable.
This confusion has led to situations where modern programs with explicitly
marked executable stack are forced into the READ_IMPLIES_EXEC state when
no such thing is needed. (And leads to unexpected failures when mmap()ing
regions of device driver memory that wish to disallow VM_EXEC[1].)
In looking for other reasons for the READ_IMPLIES_EXEC behavior, Jann
Horn noted that glibc thread stacks have always been marked RWX (until
2003 when they started tracking the PT_GNU_STACK flag instead[2]). And
musl doesn't support executable stacks at all[3]. As such, no breakage
for multithreaded applications is expected from this change.
[1] https://lkml.kernel.org/r/20190418055759.GA3155@mellanox.com
[2] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=54ee14b3882
[3] https://lkml.kernel.org/r/20190423192534.GN23599@brightrain.aerifal.cx
-Kees
v3:
- split steps in to distinct patches
- include arm32 and arm64/compat
- add selftests to validate behavior
v2: https://lore.kernel.org/lkml/20190424203408.GA11386@beast/
v1: https://lore.kernel.org/lkml/20190423181210.GA2443@beast/
Kees Cook (7):
x86/elf: Add table to document READ_IMPLIES_EXEC
x86/elf: Split READ_IMPLIES_EXEC from executable GNU_STACK
x86/elf: Disable automatic READ_IMPLIES_EXEC for 64-bit address spaces
arm32/64, elf: Add tables to document READ_IMPLIES_EXEC
arm32/64, elf: Split READ_IMPLIES_EXEC from executable GNU_STACK
arm64, elf: Disable automatic READ_IMPLIES_EXEC for 64-bit address
spaces
selftests/exec: Add READ_IMPLIES_EXEC tests
arch/arm/kernel/elf.c | 27 +++-
arch/arm64/include/asm/elf.h | 23 +++-
arch/x86/include/asm/elf.h | 22 +++-
fs/compat_binfmt_elf.c | 5 +
tools/testing/selftests/exec/Makefile | 42 +++++-
.../selftests/exec/read_implies_exec.c | 121 ++++++++++++++++++
.../selftests/exec/strip-gnu-stack-bits.c | 34 +++++
.../testing/selftests/exec/strip-gnu-stack.c | 69 ++++++++++
8 files changed, 336 insertions(+), 7 deletions(-)
create mode 100644 tools/testing/selftests/exec/read_implies_exec.c
create mode 100644 tools/testing/selftests/exec/strip-gnu-stack-bits.c
create mode 100644 tools/testing/selftests/exec/strip-gnu-stack.c
--
2.20.1
Remove some of the outmoded "Why KUnit" rationale -- which focuses
significantly on UML -- and update the Getting Started guide to mention
running tests without the kunit_tool wrapper.
Signed-off-by: David Gow <davidgow(a)google.com>
---
This is an attempt at resolving some of the issues with the KUnit
documentation pointed out here:
https://lore.kernel.org/linux-kselftest/CABVgOSkiLi0UNijH1xTSvmsJEE5+ocCZ7n…
There's definitely room for further work on the KUnit documentation
(e.g., adding more information around the environment tests run in), but
this hopefully is better than nothing as a starting point.
Documentation/dev-tools/kunit/index.rst | 60 ++++---------------
Documentation/dev-tools/kunit/start.rst | 80 +++++++++++++++++++++----
2 files changed, 78 insertions(+), 62 deletions(-)
diff --git a/Documentation/dev-tools/kunit/index.rst b/Documentation/dev-tools/kunit/index.rst
index d16a4d2c3a41..6064cd14dfad 100644
--- a/Documentation/dev-tools/kunit/index.rst
+++ b/Documentation/dev-tools/kunit/index.rst
@@ -17,63 +17,23 @@ What is KUnit?
==============
KUnit is a lightweight unit testing and mocking framework for the Linux kernel.
-These tests are able to be run locally on a developer's workstation without a VM
-or special hardware.
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining unit test
cases, grouping related test cases into test suites, providing common
infrastructure for running tests, and much more.
-Get started now: :doc:`start`
-
-Why KUnit?
-==========
-
-A unit test is supposed to test a single unit of code in isolation, hence the
-name. A unit test should be the finest granularity of testing and as such should
-allow all possible code paths to be tested in the code under test; this is only
-possible if the code under test is very small and does not have any external
-dependencies outside of the test's control like hardware.
-
-Outside of KUnit, there are no testing frameworks currently
-available for the kernel that do not require installing the kernel on a test
-machine or in a VM and all require tests to be written in userspace running on
-the kernel; this is true for Autotest, and kselftest, disqualifying
-any of them from being considered unit testing frameworks.
+KUnit consists of a kernel component, which provides a set of macros for easily
+writing unit tests. Tests written against KUnit will run on kernel boot if
+built-in, or when loaded if built as a module. These tests write out results to
+the kernel log in `TAP <https://testanything.org/>`_ format.
-KUnit addresses the problem of being able to run tests without needing a virtual
-machine or actual hardware with User Mode Linux. User Mode Linux is a Linux
-architecture, like ARM or x86; however, unlike other architectures it compiles
-to a standalone program that can be run like any other program directly inside
-of a host operating system; to be clear, it does not require any virtualization
-support; it is just a regular program.
+To make running these tests (and reading the results) easier, KUnit offsers
+:doc:`kunit_tool <kunit-tool>`, which builds a `User Mode Linux
+<http://user-mode-linux.sourceforge.net>`_ kernel, runs it, and parses the test
+results. This provides a quick way of running KUnit tests during development.
-Alternatively, kunit and kunit tests can be built as modules and tests will
-run when the test module is loaded.
-
-KUnit is fast. Excluding build time, from invocation to completion KUnit can run
-several dozen tests in only 10 to 20 seconds; this might not sound like a big
-deal to some people, but having such fast and easy to run tests fundamentally
-changes the way you go about testing and even writing code in the first place.
-Linus himself said in his `git talk at Google
-<https://gist.github.com/lorn/1272686/revisions#diff-53c65572127855f1b003db4…>`_:
-
- "... a lot of people seem to think that performance is about doing the
- same thing, just doing it faster, and that is not true. That is not what
- performance is all about. If you can do something really fast, really
- well, people will start using it differently."
-
-In this context Linus was talking about branching and merging,
-but this point also applies to testing. If your tests are slow, unreliable, are
-difficult to write, and require a special setup or special hardware to run,
-then you wait a lot longer to write tests, and you wait a lot longer to run
-tests; this means that tests are likely to break, unlikely to test a lot of
-things, and are unlikely to be rerun once they pass. If your tests are really
-fast, you run them all the time, every time you make a change, and every time
-someone sends you some code. Why trust that someone ran all their tests
-correctly on every change when you can just run them yourself in less time than
-it takes to read their test log?
+Get started now: :doc:`start`
How do I use it?
================
@@ -81,3 +41,5 @@ How do I use it?
* :doc:`start` - for new users of KUnit
* :doc:`usage` - for a more detailed explanation of KUnit features
* :doc:`api/index` - for the list of KUnit APIs used for testing
+* :doc:`kunit-tool` - for more information on the kunit_tool helper script
+* :doc:`faq` - for answers to some common questions about KUnit
diff --git a/Documentation/dev-tools/kunit/start.rst b/Documentation/dev-tools/kunit/start.rst
index 4e1d24db6b13..e1c5ce80ce12 100644
--- a/Documentation/dev-tools/kunit/start.rst
+++ b/Documentation/dev-tools/kunit/start.rst
@@ -9,11 +9,10 @@ Installing dependencies
KUnit has the same dependencies as the Linux kernel. As long as you can build
the kernel, you can run KUnit.
-KUnit Wrapper
-=============
-Included with KUnit is a simple Python wrapper that helps format the output to
-easily use and read KUnit output. It handles building and running the kernel, as
-well as formatting the output.
+Running tests with the KUnit Wrapper
+====================================
+Included with KUnit is a simple Python wrapper which runs tests under User Mode
+Linux, and formats the test results.
The wrapper can be run with:
@@ -21,22 +20,42 @@ The wrapper can be run with:
./tools/testing/kunit/kunit.py run --defconfig
-For more information on this wrapper (also called kunit_tool) checkout the
+For more information on this wrapper (also called kunit_tool) check out the
:doc:`kunit-tool` page.
Creating a .kunitconfig
-=======================
-The Python script is a thin wrapper around Kbuild. As such, it needs to be
-configured with a ``.kunitconfig`` file. This file essentially contains the
-regular Kernel config, with the specific test targets as well.
-
+-----------------------
+If you want to run a specific set of tests (rather than those listed in the
+KUnit defconfig), you can provide Kconfig options in the ``.kunitconfig`` file.
+This file essentially contains the regular Kernel config, with the specific
+test targets as well. The ``.kunitconfig`` should also contain any other config
+options required by the tests.
+
+A good starting point for a ``.kunitconfig`` is the KUnit defconfig:
.. code-block:: bash
cd $PATH_TO_LINUX_REPO
cp arch/um/configs/kunit_defconfig .kunitconfig
-Verifying KUnit Works
----------------------
+You can then add any other Kconfig options you wish, e.g.:
+.. code-block:: none
+
+ CONFIG_LIST_KUNIT_TEST=y
+
+:doc:`kunit_tool <kunit-tool>` will ensure that all config options set in
+``.kunitconfig`` are set in the kernel ``.config`` before running the tests.
+It'll warn you if you haven't included the dependencies of the options you're
+using.
+
+.. note::
+ Note that removing something from the ``.kunitconfig`` will not trigger a
+ rebuild of the ``.config`` file: the configuration is only updated if the
+ ``.kunitconfig`` is not a subset of ``.config``. This means that you can use
+ other tools (such as make menuconfig) to adjust other config options.
+
+
+Running the tests
+-----------------
To make sure that everything is set up correctly, simply invoke the Python
wrapper from your kernel repo:
@@ -62,6 +81,41 @@ followed by a list of tests that are run. All of them should be passing.
Because it is building a lot of sources for the first time, the
``Building KUnit kernel`` step may take a while.
+Running tests without the KUnit Wrapper
+=======================================
+
+If you'd rather not use the KUnit Wrapper (if, for example, you need to
+integrate with other systems, or use an architecture other than UML), KUnit can
+be included in any kernel, and the results read out and parsed manually.
+
+.. note::
+ KUnit is not designed for use in a production system, and it's possible that
+ tests may reduce the stability or security of the system.
+
+
+
+Configuring the kernel
+----------------------
+
+In order to enable KUnit itself, you simply need to enable the ``CONFIG_KUNIT``
+Kconfig option (it's under Kernel Hacking/Kernel Testing and Coverage in
+menuconfig). From there, you can enable any KUnit tests you want: they usually
+have config options ending in ``_KUNIT_TEST``.
+
+KUnit and KUnit tests can be compiled as modules: in this case the tests in a
+module will be run when the module is loaded.
+
+Running the tests
+-----------------
+
+Build and run your kernel as usual. Test output will be written to the kernel
+log in `TAP <https://testanything.org/>`_ format.
+
+.. note::
+ It's possible that there will be other lines and/or data interspersed in the
+ TAP output.
+
+
Writing your first test
=======================
--
2.25.0.341.g760bfbb309-goog
From: Randy Dunlap <rdunlap(a)infradead.org>
Fix Documentation warning due to missing a blank line after a directive:
linux/Documentation/dev-tools/kunit/usage.rst:553: WARNING: Error in "code-block" directive:
maximum 1 argument(s) allowed, 3 supplied.
.. code-block:: bash
modprobe example-test
Fixes: 6ae2bfd3df06 ("kunit: update documentation to describe module-based build")
Signed-off-by: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Alan Maguire <alan.maguire(a)oracle.com>
Cc: Knut Omang <knut.omang(a)oracle.com>
Cc: Brendan Higgins <brendanhiggins(a)google.com>
Cc: linux-kselftest(a)vger.kernel.org
Cc: kunit-dev(a)googlegroups.com
Cc: Shuah Khan <skhan(a)linuxfoundation.org>
---
Documentation/dev-tools/kunit/usage.rst | 1 +
1 file changed, 1 insertion(+)
--- lnx-56-rc1.orig/Documentation/dev-tools/kunit/usage.rst
+++ lnx-56-rc1/Documentation/dev-tools/kunit/usage.rst
@@ -551,6 +551,7 @@ options to your ``.config``:
Once the kernel is built and installed, a simple
.. code-block:: bash
+
modprobe example-test
...will run the tests.
[Resend the v9 patch set to Shuah Khan and linux-kselftest mailing list.
No code and commit message change.]
With more and more resctrl features are being added by Intel, AMD
and ARM, a test tool is becoming more and more useful to validate
that both hardware and software functionalities work as expected.
We introduce resctrl selftest to cover resctrl features on X86, AMD
and ARM architectures. It first implements MBM (Memory Bandwidth
Monitoring), MBA (Memory Bandwidth Allocation), L3 CAT (Cache Allocation
Technology), and CQM (Cache QoS Monitoring) tests. We will enhance
the selftest tool to include more functionality tests in the future.
The tool has been tested on Intel RDT, AMD QoS and ARM MPAM and is
in tools/testing/selftests/resctrl in order to have generic test code
base for all architectures.
The selftest tool we are introducing here provides a convenient
tool which does automatic resctrl testing, is easily available in kernel
tree, and covers Intel RDT, AMD QoS and ARM MPAM.
There is an existing resctrl test suite 'intel_cmt_cat'. But its major
purpose is to test Intel RDT hardware via writing and reading MSR
registers. It does access resctrl file system; but the functionalities
are very limited. And it doesn't support automatic test and a lot of
manual verifications are involved.
Changelog:
v9:
- Per Boris suggestion, add Co-developed-by in each patch to make it
clear who contributed to the patch set.
v8:
Update code per comments from Andre Przywara from ARM:
- Change Makefile and remove inline assembly code to build and test the
tool on ARM
- Change the output to TAP format because the format is both readable by
human and other test tools.
- Detect resctrl feature from /proc/cpuinfo instead of dmesg to support
generic detection on all architectures.
- Fix a few coding issues.
v7:
- Fix a few warnings when compiling patches separately, pointed by Babu
v6:
- Fix a benchmark reading optimized out issue in newer GCC.
- Fix a few coding style issues.
- Re-arrange code among patches to make cleaner code. No change in patches
structure.
v5:
- Based the v4 patches submitted by Fenghua Yu and added changes to support
AMD.
- Changed the function name get_sock_num to get_resource_id. Intel uses
socket number for schemata and AMD uses l3 index id. To generalize,
changed the function name to get_resource_id.
- Added the code to detect vendor.
- Disabled the few tests for AMD where the test results are not clear.
Also AMD does not have IMC.
- Fixed few compile issues.
- Some cleanup to make each patch independent.
- Tested the patches on AMD system. Fenghua, Need your help to test on
Intel box. Please feel free to change and resubmit if something
broken.
v4:
- address comments from Balu and Randy
- Add CAT and CQM tests
v3:
- Change code based on comments from Babu Moger
- Remove some unnessary code and use pipe to communicate b/w processes
v2:
- Change code based on comments from Babu Moger
- Clean up other places.
Babu Moger (3):
selftests/resctrl: Add vendor detection mechanism
selftests/resctrl: Use cache index3 id for AMD schemata masks
selftests/resctrl: Disable MBA and MBM tests for AMD
Fenghua Yu (6):
selftests/resctrl: Add README for resctrl tests
selftests/resctrl: Add MBM test
selftests/resctrl: Add MBA test
selftests/resctrl: Add Cache QoS Monitoring (CQM) selftest
selftests/resctrl: Add Cache Allocation Technology (CAT) selftest
selftests/resctrl: Add the test in MAINTAINERS
Sai Praneeth Prakhya (4):
selftests/resctrl: Add basic resctrl file system operations and data
selftests/resctrl: Read memory bandwidth from perf IMC counter and
from resctrl file system
selftests/resctrl: Add callback to start a benchmark
selftests/resctrl: Add built in benchmark
MAINTAINERS | 1 +
tools/testing/selftests/resctrl/Makefile | 17 +
tools/testing/selftests/resctrl/README | 53 ++
tools/testing/selftests/resctrl/cache.c | 272 +++++++
tools/testing/selftests/resctrl/cat_test.c | 250 ++++++
tools/testing/selftests/resctrl/cqm_test.c | 176 +++++
tools/testing/selftests/resctrl/fill_buf.c | 213 +++++
tools/testing/selftests/resctrl/mba_test.c | 171 ++++
tools/testing/selftests/resctrl/mbm_test.c | 145 ++++
tools/testing/selftests/resctrl/resctrl.h | 107 +++
.../testing/selftests/resctrl/resctrl_tests.c | 202 +++++
tools/testing/selftests/resctrl/resctrl_val.c | 744 ++++++++++++++++++
tools/testing/selftests/resctrl/resctrlfs.c | 722 +++++++++++++++++
13 files changed, 3073 insertions(+)
create mode 100644 tools/testing/selftests/resctrl/Makefile
create mode 100644 tools/testing/selftests/resctrl/README
create mode 100644 tools/testing/selftests/resctrl/cache.c
create mode 100644 tools/testing/selftests/resctrl/cat_test.c
create mode 100644 tools/testing/selftests/resctrl/cqm_test.c
create mode 100644 tools/testing/selftests/resctrl/fill_buf.c
create mode 100644 tools/testing/selftests/resctrl/mba_test.c
create mode 100644 tools/testing/selftests/resctrl/mbm_test.c
create mode 100644 tools/testing/selftests/resctrl/resctrl.h
create mode 100644 tools/testing/selftests/resctrl/resctrl_tests.c
create mode 100644 tools/testing/selftests/resctrl/resctrl_val.c
create mode 100644 tools/testing/selftests/resctrl/resctrlfs.c
--
2.19.1
Hi,
Here's another update after another round of reviews from Kirill and Jan.
There is a git repo and branch, for convenience in reviewing:
git@github.com:johnhubbard/linux.git track_user_pages_v5
============================================================
Changes since v4:
* Added documentation about the huge page behavior of the new
/proc/vmstat items.
* Added a missing mode_node_page_state() call to put_compound_head().
* Fixed a tracepoint call in page_ref_sub_return().
* Added a trailing underscore to a URL in pin_user_pages.rst, to fix
a broken generated link.
* Added ACKs and reviewed-by's from Jan Kara and Kirill Shutemov.
* Rebased onto today's linux.git, and
* I am experimenting here with "git format-patch --base=<commit>".
This generated the "base-commit:" tag you'll see at the end of this
cover letter. I was inspired to do so after trying out a new
get-lore-mbox.py tool (it's very nice), mentioned in a recent LWN
article (https://lwn.net/Articles/811528/ ). That tool relies on the
base-commit tag for some things.
============================================================
Changes since v3:
* Rebased onto latest linux.git
* Added ACKs and reviewed-by's from Kirill Shutemov and Jan Kara.
* /proc/vmstat:
* Renamed items, after realizing that I hate the previous names:
nr_foll_pin_requested --> nr_foll_pin_acquired
nr_foll_pin_returned --> nr_foll_pin_released
* Removed the CONFIG_DEBUG_VM guard, and collapsed away a wrapper
routine: now just calls mod_node_page_state() directly.
* Tweaked the WARN_ON_ONCE() statements in mm/hugetlb.c to be more
informative, and added comments above them as well.
* Fixed gup_benchmark: signed int --> unsigned long.
* One or two minor formatting changes.
============================================================
Changes since v2:
* Rebased onto linux.git, because the akpm tree for 5.6 has been merged.
* Split the tracking patch into even more patches, as requested.
* Merged Matthew Wilcox's dump_page() changes into mine, as part of the
first patch.
* Renamed: page_dma_pinned() --> page_maybe_dma_pinned(), in response to
Kirill Shutemov's review.
* Moved a WARN to the top of a routine, and fixed a typo in the commit
description of patch #7, also as suggested by Kirill.
============================================================
Changes since v1:
* Split the tracking patch into 6 smaller patches
* Rebased onto today's linux-next/akpm (there weren't any conflicts).
* Fixed an "unsigned int" vs. "int" problem in gup_benchmark, reported
by Nathan Chancellor. (I don't see it in my local builds, probably
because they use gcc, but an LLVM test found the mismatch.)
* Fixed a huge page pincount problem (add/subtract vs.
increment/decrement), spotted by Jan Kara.
============================================================
There is a reasonable case to be made for merging two of the patches
(patches 7 and 8), given that patch 7 provides tracking that has upper
limits on the number of pins that can be done with huge pages. Let me
know if anyone wants those merged, but unless there is some weird chance
of someone grabbing patch 7 and not patch 8, I don't really see the
need. Meanwhile, it's easier to review in this form.
Also, patch 3 has been revived. Earlier reviewers asked for it to be
merged into the tracking patch (one cannot please everyone, heh), but
now it's back out on it's own.
This activates tracking of FOLL_PIN pages. This is in support of fixing
the get_user_pages()+DMA problem described in [1]-[4].
FOLL_PIN support is now in the main linux tree. However, the
patch to use FOLL_PIN to track pages was *not* submitted, because Leon
saw an RDMA test suite failure that involved (I think) page refcount
overflows when huge pages were used.
This patch definitively solves that kind of overflow problem, by adding
an exact pincount, for compound pages (of order > 1), in the 3rd struct
page of a compound page. If available, that form of pincounting is used,
instead of the GUP_PIN_COUNTING_BIAS approach. Thanks again to Jan Kara
for that idea.
Other interesting changes:
* dump_page(): added one, or two new things to report for compound
pages: head refcount (for all compound pages), and map_pincount (for
compound pages of order > 1).
* Documentation/core-api/pin_user_pages.rst: removed the "TODO" for the
huge page refcount upper limit problems, and added notes about how it
works now. Also added a note about the dump_page() enhancements.
* Added some comments in gup.c and mm.h, to explain that there are two
ways to count pinned pages: exact (for compound pages of order > 1)
and fuzzy (GUP_PIN_COUNTING_BIAS: for all other pages).
============================================================
General notes about the tracking patch:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], [4] and in a remarkable number of email threads since about
2017. :)
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Next steps:
* Convert more subsystems from get_user_pages() to pin_user_pages().
* Work with Ira and others to connect this all up with file system
leases.
[1] Some slow progress on get_user_pages() (Apr 2, 2019):
https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018):
https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018):
https://lwn.net/Articles/753027/
[4] LWN kernel index: get_user_pages()
https://lwn.net/Kernel/Index/#Memory_management-get_user_pages
John Hubbard (12):
mm: dump_page(): better diagnostics for compound pages
mm/gup: split get_user_pages_remote() into two routines
mm/gup: pass a flags arg to __gup_device_* functions
mm: introduce page_ref_sub_return()
mm/gup: pass gup flags to two more routines
mm/gup: require FOLL_GET for get_user_pages_fast()
mm/gup: track FOLL_PIN pages
mm/gup: page->hpage_pinned_refcount: exact pin counts for huge pages
mm: dump_page(): better diagnostics for huge pinned pages
mm/gup: /proc/vmstat: pin_user_pages (FOLL_PIN) reporting
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
Documentation/core-api/pin_user_pages.rst | 86 ++--
include/linux/mm.h | 108 ++++-
include/linux/mm_types.h | 7 +-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 9 +
mm/debug.c | 61 ++-
mm/gup.c | 452 ++++++++++++++++-----
mm/gup_benchmark.c | 71 +++-
mm/huge_memory.c | 29 +-
mm/hugetlb.c | 60 ++-
mm/page_alloc.c | 2 +
mm/rmap.c | 6 +
mm/vmstat.c | 2 +
tools/testing/selftests/vm/gup_benchmark.c | 15 +-
tools/testing/selftests/vm/run_vmtests | 22 +
15 files changed, 752 insertions(+), 180 deletions(-)
base-commit: 90568ecf561540fa330511e21fcd823b0c3829c6
--
2.25.0
Hello,
This patchset implements a new futex operation, called FUTEX_WAIT_MULTIPLE,
which allows a thread to wait on several futexes at the same time, and be
awoken by any of them.
The use case lies in the Wine implementation of the Windows NT interface
WaitMultipleObjects. This Windows API function allows a thread to sleep
waiting on the first of a set of event sources (mutexes, timers, signal,
console input, etc) to signal. Considering this is a primitive
synchronization operation for Windows applications, being able to quickly
signal events on the producer side, and quickly go to sleep on the
consumer side is essential for good performance of those running over Wine.
Since this API exposes a mechanism to wait on multiple objects, and
we might have multiple waiters for each of these events, a M->N
relationship, the current Linux interfaces fell short on performance
evaluation of large M,N scenarios. We experimented, for instance, with
eventfd, which has performance problems discussed below, but we also
experimented with userspace solutions, like making each consumer wait on
a condition variable guarding the entire list of objects, and then
waking up multiple variables on the producer side, but this is
prohibitively expensive since we either need to signal many condition
variables or share that condition variable among multiple waiters, and
then verify for the event being signaled in userspace, which means
dealing with often false positive wakes ups.
The natural interface to implement the behavior we want, also
considering that one of the waitable objects is a mutex itself, would be
the futex interface. Therefore, this patchset proposes a mechanism for
a thread to wait on multiple futexes at once, and wake up on the first
futex that was awaken.
In particular, using futexes in our Wine use case reduced the CPU
utilization by 4% for the game Beat Saber and by 1.5% for the game
Shadow of Tomb Raider, both running over Proton (a Wine based solution
for Windows emulation), when compared to the eventfd interface. This
implementation also doesn't rely of file descriptors, so it doesn't risk
overflowing the resource.
In time, we are also proposing modifications to glibc and libpthread to
make this feature available for Linux native multithreaded applications
using libpthread, which can benefit from the behavior of waiting on any
of a group of futexes.
Technically, the existing FUTEX_WAIT implementation can be easily
reworked by using futex_wait_multiple() with a count of one, and I
have a patch showing how it works. I'm not proposing it, since
futex is such a tricky code, that I'd be more comfortable to have
FUTEX_WAIT_MULTIPLE running upstream for a couple development cycles,
before considering modifying FUTEX_WAIT.
The patch series includes an extensive set of kselftests validating
the behavior of the interface. We also implemented support[1] on
Syzkaller and survived the fuzzy testing.
Finally, if you'd rather pull directly a branch with this set you can
find it here:
https://gitlab.collabora.com/tonyk/linux/commits/futex-dev
=== Performance of eventfd ===
Polling on several eventfd contexts with semaphore semantics would
provide us with the exact semantics we are looking for. However, as
shown below, in a scenario with sufficient producers and consumers, the
eventfd interface itself becomes a bottleneck, in particular because
each thread will compete to acquire a sequence of waitqueue locks for
each eventfd context in the poll list. In addition, in the uncontended
case, where the producer is ready for consumption, eventfd still
requires going into the kernel on the consumer side.
When a write or a read operation in an eventfd file succeeds, it will try
to wake up all threads that are waiting to perform some operation to
the file. The lock (ctx->wqh.lock) that hold the access to the file value
(ctx->count) is the same lock used to control access the waitqueue. When
all those those thread woke, they will compete to get this lock. Along
with that, the poll() also manipulates the waitqueue and need to hold
this same lock. This lock is specially hard to acquire when poll() calls
poll_freewait(), where it tries to free all waitqueues associated with
this poll. While doing that, it will compete with a lot of read and
write operations that have been waken.
In our use case, with a huge number of parallel reads, writes and polls,
this lock is a bottleneck and hurts the performance of applications. Our
implementation of futex, however, decrease the calls of spin lock by more
than 80% in some user applications.
Finally, eventfd operates on file descriptors, which is a limited
resource that has shown its limitation in our use cases. Despite the
Windows interface not waiting on more than 64 objects at once, we still
have multiple waiters at the same time, and we were easily able to
exhaust the FD limits on applications like games.
The RFC for this patch can be found here:
https://lkml.org/lkml/2019/7/30/1399
Thanks,
André
[1] https://github.com/andrealmeid/syzkaller/tree/futex-wait-multiple
Gabriel Krisman Bertazi (4):
futex: Implement mechanism to wait on any of several futexes
selftests: futex: Add FUTEX_WAIT_MULTIPLE timeout test
selftests: futex: Add FUTEX_WAIT_MULTIPLE wouldblock test
selftests: futex: Add FUTEX_WAIT_MULTIPLE wake up test
include/uapi/linux/futex.h | 20 +
kernel/futex.c | 356 +++++++++++++++++-
.../selftests/futex/functional/.gitignore | 1 +
.../selftests/futex/functional/Makefile | 3 +-
.../futex/functional/futex_wait_multiple.c | 173 +++++++++
.../futex/functional/futex_wait_timeout.c | 38 +-
.../futex/functional/futex_wait_wouldblock.c | 28 +-
.../testing/selftests/futex/functional/run.sh | 3 +
.../selftests/futex/include/futextest.h | 22 +
9 files changed, 635 insertions(+), 7 deletions(-)
create mode 100644 tools/testing/selftests/futex/functional/futex_wait_multiple.c
--
2.25.0
Commit 5f70bde26a48 ("selftests: fix build behaviour on targets' failures")
added a logic to track failure of builds of individual targets. However, it
does exactly the opposite of what a distro kernel needs: we create a RPM
package with a selected set of selftests and we need the build to fail if
build of any of the targets fail.
Both use cases are valid. A distribution kernel is in control of what is
included in the kernel and what is being built; any error needs to be
flagged and acted upon. A CI system that tries to build as many tests as
possible on the best effort basis is not really interested in a failure here
and there.
Support both use cases by introducing a FORCE_TARGETS variable. It is
switched off by default to make life for CI systems easier, distributions
can easily switch it on while building their packages.
Reported-by: Yauheni Kaliuta <yauheni.kaliuta(a)redhat.com>
Signed-off-by: Jiri Benc <jbenc(a)redhat.com>
---
tools/testing/selftests/Makefile | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 5182d6078cbc..97fca70d2cd6 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -74,6 +74,12 @@ ifneq ($(SKIP_TARGETS),)
override TARGETS := $(TMP)
endif
+# User can set FORCE_TARGETS to 1 to require all targets to be successfully
+# built; make will fail if any of the targets cannot be built. If
+# FORCE_TARGETS is not set (the default), make will succeed if at least one
+# of the targets gets built.
+FORCE_TARGETS ?=
+
# Clear LDFLAGS and MAKEFLAGS if called from main
# Makefile to avoid test build failures when test
# Makefile doesn't have explicit build rules.
@@ -148,7 +154,8 @@ all: khdr
for TARGET in $(TARGETS); do \
BUILD_TARGET=$$BUILD/$$TARGET; \
mkdir $$BUILD_TARGET -p; \
- $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET; \
+ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET \
+ $(if $(FORCE_TARGETS),|| exit); \
ret=$$((ret * $$?)); \
done; exit $$ret;
@@ -202,7 +209,8 @@ ifdef INSTALL_PATH
@ret=1; \
for TARGET in $(TARGETS); do \
BUILD_TARGET=$$BUILD/$$TARGET; \
- $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET INSTALL_PATH=$(INSTALL_PATH)/$$TARGET install; \
+ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET INSTALL_PATH=$(INSTALL_PATH)/$$TARGET install \
+ $(if $(FORCE_TARGETS),|| exit); \
ret=$$((ret * $$?)); \
done; exit $$ret;
--
2.18.1
With some shells, the command construed for install of bpf selftests becomes
too large due to long list of files:
make[1]: execvp: /bin/sh: Argument list too long
make[1]: *** [../lib.mk:73: install] Error 127
Currently, each of the file lists is replicated three times in the command:
in the shell 'if' condition, in the 'echo' and in the 'rsync'. Reduce that
by one instance by using make conditionals and separate the echo and rsync
into two shell commands. (One would be inclined to just remove the '@' at
the beginning of the rsync command and let 'make' echo it by itself;
unfortunately, it appears that the '@' in the front of mkdir silences output
also for the following commands.)
Also, separate handling of each of the lists to its own shell command.
The semantics of the makefile is unchanged before and after the patch. The
ability of individual test directories to override INSTALL_RULE is retained.
Reported-by: Yauheni Kaliuta <yauheni.kaliuta(a)redhat.com>
Tested-by: Yauheni Kaliuta <yauheni.kaliuta(a)redhat.com>
Signed-off-by: Jiri Benc <jbenc(a)redhat.com>
---
tools/testing/selftests/lib.mk | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 1c8a1963d03f..3ed0134a764d 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -83,17 +83,20 @@ else
$(call RUN_TESTS, $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS) $(TEST_PROGS))
endif
+define INSTALL_SINGLE_RULE
+ $(if $(INSTALL_LIST),@mkdir -p $(INSTALL_PATH))
+ $(if $(INSTALL_LIST),@echo rsync -a $(INSTALL_LIST) $(INSTALL_PATH)/)
+ $(if $(INSTALL_LIST),@rsync -a $(INSTALL_LIST) $(INSTALL_PATH)/)
+endef
+
define INSTALL_RULE
- @if [ "X$(TEST_PROGS)$(TEST_PROGS_EXTENDED)$(TEST_FILES)" != "X" ]; then \
- mkdir -p ${INSTALL_PATH}; \
- echo "rsync -a $(TEST_PROGS) $(TEST_PROGS_EXTENDED) $(TEST_FILES) $(INSTALL_PATH)/"; \
- rsync -a $(TEST_PROGS) $(TEST_PROGS_EXTENDED) $(TEST_FILES) $(INSTALL_PATH)/; \
- fi
- @if [ "X$(TEST_GEN_PROGS)$(TEST_CUSTOM_PROGS)$(TEST_GEN_PROGS_EXTENDED)$(TEST_GEN_FILES)" != "X" ]; then \
- mkdir -p ${INSTALL_PATH}; \
- echo "rsync -a $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) $(INSTALL_PATH)/"; \
- rsync -a $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) $(INSTALL_PATH)/; \
- fi
+ $(eval INSTALL_LIST = $(TEST_PROGS)) $(INSTALL_SINGLE_RULE)
+ $(eval INSTALL_LIST = $(TEST_PROGS_EXTENDED)) $(INSTALL_SINGLE_RULE)
+ $(eval INSTALL_LIST = $(TEST_FILES)) $(INSTALL_SINGLE_RULE)
+ $(eval INSTALL_LIST = $(TEST_GEN_PROGS)) $(INSTALL_SINGLE_RULE)
+ $(eval INSTALL_LIST = $(TEST_CUSTOM_PROGS)) $(INSTALL_SINGLE_RULE)
+ $(eval INSTALL_LIST = $(TEST_GEN_PROGS_EXTENDED)) $(INSTALL_SINGLE_RULE)
+ $(eval INSTALL_LIST = $(TEST_GEN_FILES)) $(INSTALL_SINGLE_RULE)
endef
install: all
--
2.18.1
Hi,
Here's an update after the latest round of reviews from Kirill and Jan.
There is a git repo and branch, for convenience in reviewing:
git@github.com:johnhubbard/linux.git track_user_pages_v4
I'm particularly pleased that the /proc/vmstat items are now available
in all configurations, seeing as how I was looking (in vain) for that
information during a recent investigation of a remotely reported
problem.
============================================================
Changes since v3:
* Rebased onto latest linux.git
* Added ACKs and reviewed-by's from Kirill Shutemov and Jan Kara.
* /proc/vmstat:
* Renamed items, after realizing that I hate the previous names:
nr_foll_pin_requested --> nr_foll_pin_acquired
nr_foll_pin_returned --> nr_foll_pin_released
* Removed the CONFIG_DEBUG_VM guard, and collapsed away a wrapper
routine: now just calls mod_node_page_state() directly.
* Tweaked the WARN_ON_ONCE() statements in mm/hugetlb.c to be more
informative, and added comments above them as well.
* Fixed gup_benchmark: signed int --> unsigned long.
* One or two minor formatting changes.
============================================================
Changes since v2:
* Rebased onto linux.git, because the akpm tree for 5.6 has been merged.
* Split the tracking patch into even more patches, as requested.
* Merged Matthew Wilcox's dump_page() changes into mine, as part of the
first patch.
* Renamed: page_dma_pinned() --> page_maybe_dma_pinned(), in response to
Kirill Shutemov's review.
* Moved a WARN to the top of a routine, and fixed a typo in the commit
description of patch #7, also as suggested by Kirill.
============================================================
Changes since v1:
* Split the tracking patch into 6 smaller patches
* Rebased onto today's linux-next/akpm (there weren't any conflicts).
* Fixed an "unsigned int" vs. "int" problem in gup_benchmark, reported
by Nathan Chancellor. (I don't see it in my local builds, probably
because they use gcc, but an LLVM test found the mismatch.)
* Fixed a huge page pincount problem (add/subtract vs.
increment/decrement), spotted by Jan Kara.
============================================================
There is a reasonable case to be made for merging two of the patches
(patches 7 and 8), given that patch 7 provides tracking that has upper
limits on the number of pins that can be done with huge pages. Let me
know if anyone wants those merged, but unless there is some weird chance
of someone grabbing patch 7 and not patch 8, I don't really see the
need. Meanwhile, it's easier to review in this form.
Also, patch 3 has been revived. Earlier reviewers asked for it to be
merged into the tracking patch (one cannot please everyone, heh), but
now it's back out on it's own.
This activates tracking of FOLL_PIN pages. This is in support of fixing
the get_user_pages()+DMA problem described in [1]-[4].
FOLL_PIN support is now in the main linux tree. However, the
patch to use FOLL_PIN to track pages was *not* submitted, because Leon
saw an RDMA test suite failure that involved (I think) page refcount
overflows when huge pages were used.
This patch definitively solves that kind of overflow problem, by adding
an exact pincount, for compound pages (of order > 1), in the 3rd struct
page of a compound page. If available, that form of pincounting is used,
instead of the GUP_PIN_COUNTING_BIAS approach. Thanks again to Jan Kara
for that idea.
Other interesting changes:
* dump_page(): added one, or two new things to report for compound
pages: head refcount (for all compound pages), and map_pincount (for
compound pages of order > 1).
* Documentation/core-api/pin_user_pages.rst: removed the "TODO" for the
huge page refcount upper limit problems, and added notes about how it
works now. Also added a note about the dump_page() enhancements.
* Added some comments in gup.c and mm.h, to explain that there are two
ways to count pinned pages: exact (for compound pages of order > 1)
and fuzzy (GUP_PIN_COUNTING_BIAS: for all other pages).
============================================================
General notes about the tracking patch:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], [4] and in a remarkable number of email threads since about
2017. :)
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Next steps:
* Convert more subsystems from get_user_pages() to pin_user_pages().
* Work with Ira and others to connect this all up with file system
leases.
[1] Some slow progress on get_user_pages() (Apr 2, 2019):
https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018):
https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018):
https://lwn.net/Articles/753027/
[4] LWN kernel index: get_user_pages()
https://lwn.net/Kernel/Index/#Memory_management-get_user_pages
John Hubbard (12):
mm: dump_page(): better diagnostics for compound pages
mm/gup: split get_user_pages_remote() into two routines
mm/gup: pass a flags arg to __gup_device_* functions
mm: introduce page_ref_sub_return()
mm/gup: pass gup flags to two more routines
mm/gup: require FOLL_GET for get_user_pages_fast()
mm/gup: track FOLL_PIN pages
mm/gup: page->hpage_pinned_refcount: exact pin counts for huge pages
mm: dump_page(): better diagnostics for huge pinned pages
mm/gup: /proc/vmstat: pin_user_pages (FOLL_PIN) reporting
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
Documentation/core-api/pin_user_pages.rst | 59 ++-
include/linux/mm.h | 108 ++++-
include/linux/mm_types.h | 7 +-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 9 +
mm/debug.c | 61 ++-
mm/gup.c | 448 ++++++++++++++++-----
mm/gup_benchmark.c | 71 +++-
mm/huge_memory.c | 29 +-
mm/hugetlb.c | 60 ++-
mm/page_alloc.c | 2 +
mm/rmap.c | 6 +
mm/vmstat.c | 2 +
tools/testing/selftests/vm/gup_benchmark.c | 15 +-
tools/testing/selftests/vm/run_vmtests | 22 +
15 files changed, 721 insertions(+), 180 deletions(-)
--
2.25.0
Matthew,
I've merged in your dump_page() ideas, and also factored things out
into a new __dump_tail_page() routine, in order to save a few
indentation levels, mainly.
Kirill, thanks for your review comments. I've applied them, and I think
splitting this up as you recommended really makes it a lot better, and
easier to spot problems.
============================================================
Changes since v2:
* Rebased onto linux.git, because the akpm tree for 5.6 has been merged.
* Split the tracking patch into even more patches, as requested.
* Merged Matthew Wilcox's dump_page() changes into mine, as part of the
first patch.
* Renamed: page_dma_pinned() --> page_maybe_dma_pinned(), in response to
Kirill Shutemov's review.
* Moved a WARN to the top of a routine, and fixed a typo in the commit
description of patch #7, also as suggested by Kirill.
============================================================
Changes since v1:
* Split the tracking patch into 6 smaller patches
* Rebased onto today's linux-next/akpm (there weren't any conflicts).
* Fixed an "unsigned int" vs. "int" problem in gup_benchmark, reported
by Nathan Chancellor. (I don't see it in my local builds, probably
because they use gcc, but an LLVM test found the mismatch.)
* Fixed a huge page pincount problem (add/subtract vs.
increment/decrement), spotted by Jan Kara.
============================================================
There is a reasonable case to be made for merging two of the patches
(patches 4 and 5), given that patch 4 provides tracking that has upper
limits on the number of pins that can be done with huge pages. Let me
know if anyone wants those merged, but unless there is some weird chance
of someone grabbing patch 4 and not patch 5, I don't really see the
need. Meanwhile, it's easier to review in this form.
Also, patch 3 has been revived. Earlier reviewers asked for it to be
merged into the tracking patch (one cannot please everyone, heh), but
now it's back out on it's own.
This activates tracking of FOLL_PIN pages. This is in support of fixing
the get_user_pages()+DMA problem described in [1]-[4].
It is based on today's (Jan 28) linux-next (branch: akpm),
commit 280e9cb00b41 ("drivers/media/platform/sti/delta/delta-ipc.c: fix
read buffer overflow")
There is a git repo and branch, for convenience in reviewing:
git@github.com:johnhubbard/linux.git
track_user_pages_v2_linux-next_akpm_28Jan2020
FOLL_PIN support is (so far) in mmotm and linux-next. However, the
patch to use FOLL_PIN to track pages was *not* submitted, because Leon
saw an RDMA test suite failure that involved (I think) page refcount
overflows when huge pages were used.
This patch definitively solves that kind of overflow problem, by adding
an exact pincount, for compound pages (of order > 1), in the 3rd struct
page of a compound page. If available, that form of pincounting is used,
instead of the GUP_PIN_COUNTING_BIAS approach. Thanks again to Jan Kara
for that idea.
Here's the last reviewed version of the tracking patch (v11):
https://lore.kernel.org/r/20191216222537.491123-1-jhubbard@nvidia.com
Jan Kara had provided a reviewed-by tag for that, but I've had to remove
it (again) here, due to having changed the patch "a little bit", in
order to add the feature described above.
Other interesting changes:
* dump_page(): added one, or two new things to report for compound
pages: head refcount (for all compound pages), and map_pincount (for
compound pages of order > 1).
* Documentation/core-api/pin_user_pages.rst: removed the "TODO" for the
huge page refcount upper limit problems, and added notes about how it
works now. Also added a note about the dump_page() enhancements.
* Added some comments in gup.c and mm.h, to explain that there are two
ways to count pinned pages: exact (for compound pages of order > 1)
and fuzzy (GUP_PIN_COUNTING_BIAS: for all other pages).
============================================================
General notes about the tracking patch:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], [4] and in a remarkable number of email threads since about
2017. :)
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Next steps:
* Convert more subsystems from get_user_pages() to pin_user_pages().
* Work with Ira and others to connect this all up with file system
leases.
[1] Some slow progress on get_user_pages() (Apr 2, 2019):
https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018):
https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018):
https://lwn.net/Articles/753027/
[4] LWN kernel index: get_user_pages()
https://lwn.net/Kernel/Index/#Memory_management-get_user_pages
John Hubbard (12):
mm: dump_page(): better diagnostics for compound pages
mm/gup: split get_user_pages_remote() into two routines
mm/gup: pass a flags arg to __gup_device_* functions
mm: introduce page_ref_sub_return()
mm/gup: pass gup flags to two more routines
mm/gup: require FOLL_GET for get_user_pages_fast()
mm/gup: track FOLL_PIN pages
mm/gup: page->hpage_pinned_refcount: exact pin counts for huge pages
mm: dump_page(): better diagnostics for huge pinned pages
mm/gup: /proc/vmstat: pin_user_pages (FOLL_PIN) reporting
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
Documentation/core-api/pin_user_pages.rst | 53 +--
include/linux/mm.h | 108 ++++-
include/linux/mm_types.h | 7 +-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 10 +
mm/debug.c | 60 ++-
mm/gup.c | 459 ++++++++++++++++-----
mm/gup_benchmark.c | 71 +++-
mm/huge_memory.c | 29 +-
mm/hugetlb.c | 44 +-
mm/page_alloc.c | 2 +
mm/rmap.c | 6 +
mm/vmstat.c | 2 +
tools/testing/selftests/vm/gup_benchmark.c | 15 +-
tools/testing/selftests/vm/run_vmtests | 22 +
15 files changed, 715 insertions(+), 175 deletions(-)
--
2.25.0
From: SeongJae Park <sjpark(a)amazon.de>
When closing a connection, the two acks that required to change closing
socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
reverse order. This is possible in RSS disabled environments such as a
connection inside a host.
For example, expected state transitions and required packets for the
disconnection will be similar to below flow.
00 (Process A) (Process B)
01 ESTABLISHED ESTABLISHED
02 close()
03 FIN_WAIT_1
04 ---FIN-->
05 CLOSE_WAIT
06 <--ACK---
07 FIN_WAIT_2
08 <--FIN/ACK---
09 TIME_WAIT
10 ---ACK-->
11 LAST_ACK
12 CLOSED CLOSED
The acks in lines 6 and 8 are the acks. If the line 8 packet is
processed before the line 6 packet, it will be just ignored as it is not
a expected packet, and the later process of the line 6 packet will
change the status of Process A to FIN_WAIT_2, but as it has already
handled line 8 packet, it will not go to TIME_WAIT and thus will not
send the line 10 packet to Process B. Thus, Process B will left in
CLOSE_WAIT status, as below.
00 (Process A) (Process B)
01 ESTABLISHED ESTABLISHED
02 close()
03 FIN_WAIT_1
04 ---FIN-->
05 CLOSE_WAIT
06 (<--ACK---)
07 (<--FIN/ACK---)
08 (fired in right order)
09 <--FIN/ACK---
10 <--ACK---
11 (processed in reverse order)
12 FIN_WAIT_2
Later, if the Process B sends SYN to Process A for reconnection using
the same port, Process A will responds with an ACK for the last flow,
which has no increased sequence number. Thus, Process A will send RST,
wait for TIMEOUT_INIT (one second in default), and then try
reconnection. If reconnections are frequent, the one second latency
spikes can be a big problem. Below is a tcpdump results of the problem:
14.436259 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644
14.436266 IP 127.0.0.1.4242 > 127.0.0.1.45150: Flags [.], ack 5, win 512
14.436271 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [R], seq 2541101298
/* ONE SECOND DELAY */
15.464613 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644
Patchset Organization
---------------------
The first patch fix a trivial nit. The second one fix the problem by
adjusting the resend delay of the SYN in the case. Finally, the third
patch adds a user space test to reproduce this problem.
The patches are based on the v5.5. You can also clone the complete git
tree:
$ git clone git://github.com/sjp38/linux -b patches/finack_lat/v1
The web is also available:
https://github.com/sjp38/linux/tree/patches/finack_lat/v1
SeongJae Park (3):
net/ipv4/inet_timewait_sock: Fix inconsistent comments
tcp: Reduce SYN resend delay if a suspicous ACK is received
selftests: net: Add FIN_ACK processing order related latency spike
test
net/ipv4/inet_timewait_sock.c | 1 +
net/ipv4/tcp_input.c | 6 +-
tools/testing/selftests/net/.gitignore | 2 +
tools/testing/selftests/net/Makefile | 2 +
tools/testing/selftests/net/fin_ack_lat.sh | 42 ++++++++++
.../selftests/net/fin_ack_lat_accept.c | 49 +++++++++++
.../selftests/net/fin_ack_lat_connect.c | 81 +++++++++++++++++++
7 files changed, 182 insertions(+), 1 deletion(-)
create mode 100755 tools/testing/selftests/net/fin_ack_lat.sh
create mode 100644 tools/testing/selftests/net/fin_ack_lat_accept.c
create mode 100644 tools/testing/selftests/net/fin_ack_lat_connect.c
--
2.17.1
From: SeongJae Park <sjpark(a)amazon.de>
When closing a connection, the two acks that required to change closing
socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
reverse order. This is possible in RSS disabled environments such as a
connection inside a host.
For example, expected state transitions and required packets for the
disconnection will be similar to below flow.
00 (Process A) (Process B)
01 ESTABLISHED ESTABLISHED
02 close()
03 FIN_WAIT_1
04 ---FIN-->
05 CLOSE_WAIT
06 <--ACK---
07 FIN_WAIT_2
08 <--FIN/ACK---
09 TIME_WAIT
10 ---ACK-->
11 LAST_ACK
12 CLOSED CLOSED
In some cases such as LINGER option applied socket, the FIN and FIN/ACK will be
substituted to RST and RST/ACK, but there is no difference in the main logic.
The acks in lines 6 and 8 are the acks. If the line 8 packet is
processed before the line 6 packet, it will be just ignored as it is not
a expected packet, and the later process of the line 6 packet will
change the status of Process A to FIN_WAIT_2, but as it has already
handled line 8 packet, it will not go to TIME_WAIT and thus will not
send the line 10 packet to Process B. Thus, Process B will left in
CLOSE_WAIT status, as below.
00 (Process A) (Process B)
01 ESTABLISHED ESTABLISHED
02 close()
03 FIN_WAIT_1
04 ---FIN-->
05 CLOSE_WAIT
06 (<--ACK---)
07 (<--FIN/ACK---)
08 (fired in right order)
09 <--FIN/ACK---
10 <--ACK---
11 (processed in reverse order)
12 FIN_WAIT_2
Later, if the Process B sends SYN to Process A for reconnection using
the same port, Process A will responds with an ACK for the last flow,
which has no increased sequence number. Thus, Process A will send RST,
wait for TIMEOUT_INIT (one second in default), and then try
reconnection. If reconnections are frequent, the one second latency
spikes can be a big problem. Below is a tcpdump results of the problem:
14.436259 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644
14.436266 IP 127.0.0.1.4242 > 127.0.0.1.45150: Flags [.], ack 5, win 512
14.436271 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [R], seq 2541101298
/* ONE SECOND DELAY */
15.464613 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644
Patchset Organization
---------------------
The first patch fixes the problem by adjusting the first resend delay of
the SYN in the case. The second one adds a user space test to reproduce
this problem.
The patches are based on the v5.5. You can also clone the complete git
tree:
$ git clone git://github.com/sjp38/linux -b patches/finack_lat/v3
The web is also available:
https://github.com/sjp38/linux/tree/patches/finack_lat/v3
Patchset History
----------------
>From v2
(https://lore.kernel.org/linux-kselftest/20200201071859.4231-1-sj38.park@gma…)
- Use TCP_TIMEOUT_MIN as reduced delay (Neal Cardwall)
- Add Reviewed-by and Signed-off-by from Eric Dumazet
>From v1
(https://lore.kernel.org/linux-kselftest/20200131122421.23286-1-sjpark@amazo…)
- Drop the trivial comment fix patch (Eric Dumazet)
- Limit the delay adjustment to only the first SYN resend (Eric Dumazet)
- selftest: Avoid use of hard-coded port number (Eric Dumazet)
- Explain RST/ACK and FIN/ACK has no big difference (Neal Cardwell)
SeongJae Park (2):
tcp: Reduce SYN resend delay if a suspicous ACK is received
selftests: net: Add FIN_ACK processing order related latency spike
test
net/ipv4/tcp_input.c | 8 +-
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 2 +
tools/testing/selftests/net/fin_ack_lat.c | 151 +++++++++++++++++++++
tools/testing/selftests/net/fin_ack_lat.sh | 35 +++++
5 files changed, 196 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/net/fin_ack_lat.c
create mode 100755 tools/testing/selftests/net/fin_ack_lat.sh
--
2.17.1
From: SeongJae Park <sjpark(a)amazon.de>
When closing a connection, the two acks that required to change closing
socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
reverse order. This is possible in RSS disabled environments such as a
connection inside a host.
For example, expected state transitions and required packets for the
disconnection will be similar to below flow.
00 (Process A) (Process B)
01 ESTABLISHED ESTABLISHED
02 close()
03 FIN_WAIT_1
04 ---FIN-->
05 CLOSE_WAIT
06 <--ACK---
07 FIN_WAIT_2
08 <--FIN/ACK---
09 TIME_WAIT
10 ---ACK-->
11 LAST_ACK
12 CLOSED CLOSED
In some cases such as LINGER option applied socket, the FIN and FIN/ACK will be
substituted to RST and RST/ACK, but there is no difference in the main logic.
The acks in lines 6 and 8 are the acks. If the line 8 packet is
processed before the line 6 packet, it will be just ignored as it is not
a expected packet, and the later process of the line 6 packet will
change the status of Process A to FIN_WAIT_2, but as it has already
handled line 8 packet, it will not go to TIME_WAIT and thus will not
send the line 10 packet to Process B. Thus, Process B will left in
CLOSE_WAIT status, as below.
00 (Process A) (Process B)
01 ESTABLISHED ESTABLISHED
02 close()
03 FIN_WAIT_1
04 ---FIN-->
05 CLOSE_WAIT
06 (<--ACK---)
07 (<--FIN/ACK---)
08 (fired in right order)
09 <--FIN/ACK---
10 <--ACK---
11 (processed in reverse order)
12 FIN_WAIT_2
Later, if the Process B sends SYN to Process A for reconnection using
the same port, Process A will responds with an ACK for the last flow,
which has no increased sequence number. Thus, Process A will send RST,
wait for TIMEOUT_INIT (one second in default), and then try
reconnection. If reconnections are frequent, the one second latency
spikes can be a big problem. Below is a tcpdump results of the problem:
14.436259 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644
14.436266 IP 127.0.0.1.4242 > 127.0.0.1.45150: Flags [.], ack 5, win 512
14.436271 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [R], seq 2541101298
/* ONE SECOND DELAY */
15.464613 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644
Patchset Organization
---------------------
The first patch fixes the problem by adjusting the first resend delay of
the SYN in the case. The second one adds a user space test to reproduce
this problem.
The patches are based on the v5.5. You can also clone the complete git
tree:
$ git clone git://github.com/sjp38/linux -b patches/finack_lat/v2
The web is also available:
https://github.com/sjp38/linux/tree/patches/finack_lat/v2
Patchset History
----------------
>From v1
(https://lore.kernel.org/linux-kselftest/20200131122421.23286-1-sjpark@amazo…)
- Drop the trivial comment fix patch (Eric Dumazet)
- Limit the delay adjustment to only the first SYN resend (Eric Dumazet)
- selftest: Avoid use of hard-coded port number (Eric Dumazet)
- Explain RST/ACK and FIN/ACK has no big difference (Neal Cardwell)
SeongJae Park (2):
tcp: Reduce SYN resend delay if a suspicous ACK is received
selftests: net: Add FIN_ACK processing order related latency spike
test
net/ipv4/tcp_input.c | 8 +-
tools/testing/selftests/net/.gitignore | 2 +
tools/testing/selftests/net/Makefile | 2 +
tools/testing/selftests/net/fin_ack_lat.c | 151 +++++++++++++++++++++
tools/testing/selftests/net/fin_ack_lat.sh | 35 +++++
5 files changed, 197 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/net/fin_ack_lat.c
create mode 100755 tools/testing/selftests/net/fin_ack_lat.sh
--
2.17.1
Fix a missing newline in a code block that was causing a warning:
Documentation/dev-tools/kunit/usage.rst:553: WARNING: Error in "code-block" directive:
maximum 1 argument(s) allowed, 3 supplied.
.. code-block:: bash
modprobe example-test
Signed-off-by: Brendan Higgins <brendanhiggins(a)google.com>
---
Documentation/dev-tools/kunit/usage.rst | 1 +
1 file changed, 1 insertion(+)
diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst
index 7cd56a1993b14..607758a66a99c 100644
--- a/Documentation/dev-tools/kunit/usage.rst
+++ b/Documentation/dev-tools/kunit/usage.rst
@@ -551,6 +551,7 @@ options to your ``.config``:
Once the kernel is built and installed, a simple
.. code-block:: bash
+
modprobe example-test
...will run the tests.
--
2.25.0.341.g760bfbb309-goog
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
While running the ftracetests, the pid filter test failed because the
instance "foo" existed, and it was using it to rerun the test under a
instance named foo. The collision caused the test to fail as the mkdir
failed as the name already existed.
As of commit b5b77be812de7 ("selftests: ftrace: Allow some tests to be run
in a tracing instance") all a selftest needs to do to be tested in an
instance is to set the "instance" flag. There's no reason a selftest needs
to create an instance to run its test in an instance directly.
Remove the open coded testing in an instance for the pid filter test and
have it set the "instance" flag instead.
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
.../selftests/ftrace/test.d/ftrace/func-filter-pid.tc | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc
index 64cfcc75e3c1..f2ee1e889e13 100644
--- a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc
+++ b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc
@@ -1,6 +1,7 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
# description: ftrace - function pid filters
+# flags: instance
# Make sure that function pid matching filter works.
# Also test it on an instance directory
@@ -96,13 +97,6 @@ do_test() {
}
do_test
-
-mkdir instances/foo
-cd instances/foo
-do_test
-cd ../../
-rmdir instances/foo
-
do_reset
exit 0
--
2.20.1
OK, as requested, I've split the tracking patch into 6 smaller patches,
and it should be *much* easier to understand and review now.
============================================================
Changes since v1:
* Split the tracking patch into 6 smaller patches
* Rebased onto today's linux-next/akpm (there weren't any conflicts).
* Fixed an "unsigned int" vs. "int" problem in gup_benchmark, reported
by Nathan Chancellor. (I don't see it in my local builds, probably
because they use gcc, but an LLVM test found the mismatch.)
* Fixed a huge page pincount problem (add/subtract vs.
increment/decrement), spotted by Jan Kara.
============================================================
There is a reasonable case to be made for merging two of the patches
(patches 4 and 5), given that patch 4 provides tracking that has upper
limits on the number of pins that can be done with huge pages. Let me
know if anyone wants those merged, but unless there is some weird chance
of someone grabbing patch 4 and not patch 5, I don't really see the
need. Meanwhile, it's easier to review in this form.
Also, patch 3 has been revived. Earlier reviewers asked for it to be
merged into the tracking patch (one cannot please everyone, heh), but
now it's back out on it's own.
This activates tracking of FOLL_PIN pages. This is in support of fixing
the get_user_pages()+DMA problem described in [1]-[4].
It is based on today's (Jan 28) linux-next (branch: akpm),
commit 280e9cb00b41 ("drivers/media/platform/sti/delta/delta-ipc.c: fix
read buffer overflow")
There is a git repo and branch, for convenience in reviewing:
git@github.com:johnhubbard/linux.git
track_user_pages_v2_linux-next_akpm_28Jan2020
FOLL_PIN support is (so far) in mmotm and linux-next. However, the
patch to use FOLL_PIN to track pages was *not* submitted, because Leon
saw an RDMA test suite failure that involved (I think) page refcount
overflows when huge pages were used.
This patch definitively solves that kind of overflow problem, by adding
an exact pincount, for compound pages (of order > 1), in the 3rd struct
page of a compound page. If available, that form of pincounting is used,
instead of the GUP_PIN_COUNTING_BIAS approach. Thanks again to Jan Kara
for that idea.
Here's the last reviewed version of the tracking patch (v11):
https://lore.kernel.org/r/20191216222537.491123-1-jhubbard@nvidia.com
Jan Kara had provided a reviewed-by tag for that, but I've had to remove
it (again) here, due to having changed the patch "a little bit", in
order to add the feature described above.
Other interesting changes:
* dump_page(): added one, or two new things to report for compound
pages: head refcount (for all compound pages), and map_pincount (for
compound pages of order > 1).
* Documentation/core-api/pin_user_pages.rst: removed the "TODO" for the
huge page refcount upper limit problems, and added notes about how it
works now. Also added a note about the dump_page() enhancements.
* Added some comments in gup.c and mm.h, to explain that there are two
ways to count pinned pages: exact (for compound pages of order > 1)
and fuzzy (GUP_PIN_COUNTING_BIAS: for all other pages).
============================================================
General notes about the tracking patch:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], [4] and in a remarkable number of email threads since about
2017. :)
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Next steps:
* Convert more subsystems from get_user_pages() to pin_user_pages().
* Work with Ira and others to connect this all up with file system
leases.
[1] Some slow progress on get_user_pages() (Apr 2, 2019):
https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018):
https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018):
https://lwn.net/Articles/753027/
[4] LWN kernel index: get_user_pages()
https://lwn.net/Kernel/Index/#Memory_management-get_user_pages
John Hubbard (8):
mm: dump_page: print head page's refcount, for compound pages
mm/gup: split get_user_pages_remote() into two routines
mm/gup: pass a flags arg to __gup_device_* functions
mm/gup: track FOLL_PIN pages
mm/gup: page->hpage_pinned_refcount: exact pin counts for huge pages
mm/gup: /proc/vmstat: pin_user_pages (FOLL_PIN) reporting
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
Documentation/core-api/pin_user_pages.rst | 47 +--
include/linux/mm.h | 109 ++++-
include/linux/mm_types.h | 7 +-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 10 +
mm/debug.c | 22 +-
mm/gup.c | 460 ++++++++++++++++-----
mm/gup_benchmark.c | 71 +++-
mm/huge_memory.c | 29 +-
mm/hugetlb.c | 44 +-
mm/page_alloc.c | 2 +
mm/rmap.c | 6 +
mm/vmstat.c | 2 +
tools/testing/selftests/vm/gup_benchmark.c | 15 +-
tools/testing/selftests/vm/run_vmtests | 22 +
15 files changed, 681 insertions(+), 167 deletions(-)
--
2.25.0
When kunit tests are run on native (i.e. non-UML) environments, the results
of test execution are often intermixed with dmesg output. This patch
series attempts to solve this by providing a debugfs representation
of the results of the last test run, available as
/sys/kernel/debug/kunit/<testsuite>/results
In addition, we provide a way to re-run the tests and show results via
/sys/kernel/debug/kunit/<testsuite>/run
Changes since v1:
- trimmed unneeded include files in lib/kunit/debugfs.c (Greg)
- renamed global debugfs functions to be prefixed with kunit_ (Greg)
- removed error checking for debugfs operations (Greg)
Alan Maguire (3):
kunit: add debugfs /sys/kernel/debug/kunit/<suite>/results display
kunit: add "run" debugfs file to run suites, display results
kunit: update documentation to describe debugfs representation
Documentation/dev-tools/kunit/usage.rst | 19 +++++
include/kunit/test.h | 21 +++--
lib/kunit/Makefile | 3 +-
lib/kunit/debugfs.c | 137 ++++++++++++++++++++++++++++++++
lib/kunit/debugfs.h | 16 ++++
lib/kunit/test.c | 85 +++++++++++++++-----
6 files changed, 254 insertions(+), 27 deletions(-)
create mode 100644 lib/kunit/debugfs.c
create mode 100644 lib/kunit/debugfs.h
--
1.8.3.1
## TL;DR
This patchset adds a centralized executor to dispatch tests rather than
relying on late_initcall to schedule each test suite separately along
with a couple of new features that depend on it.
## What am I trying to do?
Conceptually, I am trying to provide a mechanism by which test suites
can be grouped together so that they can be reasoned about collectively.
The last two of three patches in this series add features which depend
on this:
PATCH 5/7 Prints out a test plan right before KUnit tests are run[1];
this is valuable because it makes it possible for a test
harness to detect whether the number of tests run matches the
number of tests expected to be run, ensuring that no tests
silently failed.
PATCH 6/7 Add a new kernel command-line option which allows the user to
specify that the kernel poweroff, halt, or reboot after
completing all KUnit tests; this is very handy for running
KUnit tests on UML or a VM so that the UML/VM process exits
cleanly immediately after running all tests without needing a
special initramfs.
In addition, by dispatching tests from a single location, we can
guarantee that all KUnit tests run after late_init is complete, which
was a concern during the initial KUnit patchset review (this has not
been a problem in practice, but resolving with certainty is nevertheless
desirable).
Other use cases for this exist, but the above features should provide an
idea of the value that this could provide.
Alan Maguire (1):
kunit: test: create a single centralized executor for all tests
Brendan Higgins (5):
vmlinux.lds.h: add linker section for KUnit test suites
arch: um: add linker section for KUnit test suites
init: main: add KUnit to kernel init
kunit: test: add test plan to KUnit TAP format
Documentation: Add kunit_shutdown to kernel-parameters.txt
David Gow (1):
kunit: Add 'kunit_shutdown' option
.../admin-guide/kernel-parameters.txt | 7 ++
arch/um/include/asm/common.lds.S | 4 +
include/asm-generic/vmlinux.lds.h | 8 ++
include/kunit/test.h | 82 ++++++++++++-------
init/main.c | 4 +
lib/kunit/Makefile | 3 +-
lib/kunit/executor.c | 71 ++++++++++++++++
lib/kunit/test.c | 11 ---
tools/testing/kunit/kunit_kernel.py | 2 +-
tools/testing/kunit/kunit_parser.py | 76 ++++++++++++++---
.../test_is_test_passed-all_passed.log | 1 +
.../test_data/test_is_test_passed-crash.log | 1 +
.../test_data/test_is_test_passed-failure.log | 1 +
13 files changed, 217 insertions(+), 54 deletions(-)
create mode 100644 lib/kunit/executor.c
--
2.25.0.341.g760bfbb309-goog
Memory protection keys enables an application to protect its address
space from inadvertent access by its own code.
This feature is now enabled on powerpc and has been available since
4.16-rc1. The patches move the selftests to arch neutral directory
and enhance their test coverage.
Tested on powerpc64 and x86_64 (Skylake-SP).
Link to development branch:
https://github.com/sandip4n/linux/tree/pkey-selftests
Changelog
---------
Link to previous version (v16):
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=153824
v17:
(1) Fixed issues with i386 builds when running on x86_64
based on feedback from Dave.
(2) Replaced patch 6 from previous version with patch 7.
This addresses u64 format specifier related concerns
that Michael had raised in v15.
v16:
(1) Rebased on top of latest master.
(2) Switched to u64 instead of using an arch-dependent
pkey_reg_t type for references to the pkey register
based on suggestions from Dave, Michal and Michael.
(3) Removed build time determination of page size based
on suggestion from Michael.
(4) Fixed comment before the definition of __page_o_noops()
from patch 13 ("selftests/vm/pkeys: Introduce powerpc
support").
v15:
(1) Rebased on top of latest master.
(2) Addressed review comments from Dave Hansen.
(3) Moved code for getting or setting pkey bits to new
helpers. These changes replace patch 7 of v14.
(4) Added a fix which ensures that the correct count of
reserved keys is used across different platforms.
(5) Added a fix which ensures that the correct page size
is used as powerpc supports both 4K and 64K pages.
v14:
(1) Incorporated another round of comments from Dave Hansen.
v13:
(1) Incorporated comments for Dave Hansen.
(2) Added one more test for correct pkey-0 behavior.
v12:
(1) Fixed the offset of pkey field in the siginfo structure for
x86_64 and powerpc. And tries to use the actual field
if the headers have it defined.
v11:
(1) Fixed a deadlock in the ptrace testcase.
v10 and prior:
(1) Moved the testcase to arch neutral directory.
(2) Split the changes into incremental patches.
Desnes A. Nunes do Rosario (1):
selftests/vm/pkeys: Fix number of reserved powerpc pkeys
Ram Pai (16):
selftests/x86/pkeys: Move selftests to arch-neutral directory
selftests/vm/pkeys: Rename all references to pkru to a generic name
selftests/vm/pkeys: Move generic definitions to header file
selftests/vm/pkeys: Fix pkey_disable_clear()
selftests/vm/pkeys: Fix assertion in pkey_disable_set/clear()
selftests/vm/pkeys: Fix alloc_random_pkey() to make it really random
selftests/vm/pkeys: Introduce generic pkey abstractions
selftests/vm/pkeys: Introduce powerpc support
selftests/vm/pkeys: Fix assertion in test_pkey_alloc_exhaust()
selftests/vm/pkeys: Improve checks to determine pkey support
selftests/vm/pkeys: Associate key on a mapped page and detect access
violation
selftests/vm/pkeys: Associate key on a mapped page and detect write
violation
selftests/vm/pkeys: Detect write violation on a mapped
access-denied-key page
selftests/vm/pkeys: Introduce a sub-page allocator
selftests/vm/pkeys: Test correct behaviour of pkey-0
selftests/vm/pkeys: Override access right definitions on powerpc
Sandipan Das (5):
selftests: vm: pkeys: Fix multilib builds for x86
selftests: vm: pkeys: Use sane types for pkey register
selftests: vm: pkeys: Add helpers for pkey bits
selftests: vm: pkeys: Use the correct huge page size
selftests: vm: pkeys: Use the correct page size on powerpc
Thiago Jung Bauermann (2):
selftests/vm/pkeys: Move some definitions to arch-specific header
selftests/vm/pkeys: Make gcc check arguments of sigsafe_printf()
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 50 ++
tools/testing/selftests/vm/pkey-helpers.h | 225 ++++++
tools/testing/selftests/vm/pkey-powerpc.h | 136 ++++
tools/testing/selftests/vm/pkey-x86.h | 181 +++++
.../selftests/{x86 => vm}/protection_keys.c | 696 ++++++++++--------
tools/testing/selftests/x86/.gitignore | 1 -
tools/testing/selftests/x86/Makefile | 2 +-
tools/testing/selftests/x86/pkey-helpers.h | 219 ------
9 files changed, 979 insertions(+), 532 deletions(-)
create mode 100644 tools/testing/selftests/vm/pkey-helpers.h
create mode 100644 tools/testing/selftests/vm/pkey-powerpc.h
create mode 100644 tools/testing/selftests/vm/pkey-x86.h
rename tools/testing/selftests/{x86 => vm}/protection_keys.c (74%)
delete mode 100644 tools/testing/selftests/x86/pkey-helpers.h
--
2.17.1
Hi Linus,
Please pull the following Kselftest update for Linux 5.6-rc1
This Kselftest update for Linux 5.6-rc1 consists of several fixes to
framework and individual tests. In addition, it enables LKDTM tests
adding lkdtm target to kselftest Makefile.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit c79f46a282390e0f5b306007bf7b11a46d529538:
Linux 5.5-rc5 (2020-01-05 14:23:27 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-5.6-rc1
for you to fetch changes up to af4ddd607dff7aabd466a4a878e01b9f592a75ab:
selftests/ftrace: fix glob selftest (2020-01-28 13:36:48 -0700)
----------------------------------------------------------------
linux-kselftest-5.6-rc1
This Kselftest update for Linux 5.6-rc1 consists of several fixes to
framework and individual tests. In addition, it enables LKDTM tests
adding lkdtm target to kselftest Makefile.
----------------------------------------------------------------
Cristian Marussi (1):
selftests: fix build behaviour on targets' failures
Dan Carpenter (1):
selftests: Uninitialized variable in test_cgcore_proc_migration()
Kees Cook (1):
selftests/lkdtm: Add tests for LKDTM targets
Matthieu Baerts (1):
selftests: settings: tests can be in subsubdirs
Miroslav Benes (2):
selftests/livepatch: Replace set_dynamic_debug() with
setup_config() in README
selftests/livepatch: Remove unused local variable in
set_ftrace_enabled()
Siddhesh Poyarekar (1):
kselftest: Minimise dependency of get_size on C library interfaces
Sven Schnelle (1):
selftests/ftrace: fix glob selftest
MAINTAINERS | 1 +
tools/testing/selftests/Makefile | 19 +++--
tools/testing/selftests/cgroup/test_core.c | 2 +-
.../ftrace/test.d/ftrace/func-filter-glob.tc | 2 +-
tools/testing/selftests/kselftest/runner.sh | 2 +-
tools/testing/selftests/livepatch/README | 2 +-
tools/testing/selftests/livepatch/functions.sh | 1 -
tools/testing/selftests/lkdtm/Makefile | 12 +++
tools/testing/selftests/lkdtm/config | 1 +
tools/testing/selftests/lkdtm/run.sh | 92
++++++++++++++++++++++
tools/testing/selftests/lkdtm/tests.txt | 71 +++++++++++++++++
tools/testing/selftests/size/get_size.c | 24 ++++--
12 files changed, 211 insertions(+), 18 deletions(-)
create mode 100644 tools/testing/selftests/lkdtm/Makefile
create mode 100644 tools/testing/selftests/lkdtm/config
create mode 100755 tools/testing/selftests/lkdtm/run.sh
create mode 100644 tools/testing/selftests/lkdtm/tests.txt
----------------------------------------------------------------
Leon Romanovsky:
If you get a chance, I'd love to have this short series (or even just
the first patch; the others are just selftests) run through your test
suite that was previously choking on my earlier v11 patchset. The huge
page pincount limitations are removed, so I'm expecting a perfect test
run this time!
Everyone:
This activates tracking of FOLL_PIN pages. This is in support of fixing
the get_user_pages()+DMA problem described in [1]-[4].
It is based on today's (Jan 24) mmotm. There is a git repo and branch,
for convenience in reviewing:
git@github.com:johnhubbard/linux.git track_user_pages_v1_mmotm_24Jan2020
FOLL_PIN support is (so far) in mmotm and linux-next. However, the
patch to use FOLL_PIN to track pages was *not* submitted, because Leon
saw an RDMA test suite failure that involved (I think) page refcount
overflows when huge pages were used.
This patch definitively solves that kind of overflow problem, by adding
an exact pincount, for compound pages (of order > 1), in the 3rd struct
page of a compound page. If available, that form of pincounting is used,
instead of the GUP_PIN_COUNTING_BIAS approach. Thanks again to Jan Kara
for that idea.
Here's the last reviewed version of the tracking patch (v11):
https://lore.kernel.org/r/20191216222537.491123-1-jhubbard@nvidia.com
Jan Kara had provided a reviewed-by tag for that, but I've had to remove
it (again) here, due to having changed the patch "a little bit", in
order to add the feature described above.
Other interesting changes:
* dump_page(): added one, or two new things to report for compound
pages: head refcount (for all compound pages), and map_pincount (for
compound pages of order > 1).
* Documentation/core-api/pin_user_pages.rst: removed the "TODO" for the
huge page refcount upper limit problems, and added notes about how it
works now. Also added a note about the dump_page() enhancements.
* Added some comments in gup.c and mm.h, to explain that there are two
ways to count pinned pages: exact (for compound pages of order > 1)
and fuzzy (GUP_PIN_COUNTING_BIAS: for all other pages).
============================================================
General notes about the tracking patch:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], [4] and in a remarkable number of email threads since about
2017. :)
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Next steps:
* Convert more subsystems from get_user_pages() to pin_user_pages().
* Work with Ira and others to connect this all up with file system
leases.
[1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/
[4] LWN kernel index: get_user_pages() https://lwn.net/Kernel/Index/#Memory_management-get_user_pages
John Hubbard (3):
mm/gup: track FOLL_PIN pages
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
Documentation/core-api/pin_user_pages.rst | 48 ++-
include/linux/mm.h | 109 ++++-
include/linux/mm_types.h | 7 +-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 10 +
mm/debug.c | 22 +-
mm/gup.c | 467 ++++++++++++++++-----
mm/gup_benchmark.c | 70 ++-
mm/huge_memory.c | 29 +-
mm/hugetlb.c | 44 +-
mm/page_alloc.c | 2 +
mm/rmap.c | 6 +
mm/vmstat.c | 2 +
tools/testing/selftests/vm/gup_benchmark.c | 15 +-
tools/testing/selftests/vm/run_vmtests | 22 +
15 files changed, 678 insertions(+), 177 deletions(-)
--
2.25.0
On Mon, Jan 27, 2020 at 6:23 AM Sven Schnelle <svens(a)linux.ibm.com> wrote:
>
> Hi Steve,
>
> On Wed, Jan 08, 2020 at 09:11:55AM -0500, Steven Rostedt wrote:
> >
> > Shuah,
> >
> > Want to take this through your tree?
> >
> > https://lore.kernel.org/r/20200108074043.21580-1-svens@linux.ibm.com
> >
> > Reviewed-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
>
> As Shuah didn't reply, can you push that through your tree?
>
Hi Sven,
Did you run getmaintainers of this patch? You didn't send this to my
email address listed in the get maintainers file and also didn't cc
linux-kselftest.
I just happen to notice this now. Please resend with steve's
Reviewed-by tag to the recipients suggested by get_maintainers.pl
I will take this through ksleftest tree.
thanks,
-- Shuah
test.d/ftrace/func-filter-glob.tc is failing on s390 because it has
ARCH_INLINE_SPIN_LOCK and friends set to 'y'. So the usual
__raw_spin_lock symbol isn't in the ftrace function list. Change
'*aw*lock' to '*spin*lock' which would hopefully match some of the
locking functions on all platforms.
Reviewed-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Sven Schnelle <svens(a)linux.ibm.com>
---
Changes in v4:
- rebase to latest master
Changes in v3:
change '*spin*lock' to '*pin*lock' to not match the beginning
Changes in v2:
use '*spin*lock' instead of '*ktime*ns'
.../testing/selftests/ftrace/test.d/ftrace/func-filter-glob.tc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-glob.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-glob.tc
index 27a54a17da65..f4e92afab14b 100644
--- a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-glob.tc
+++ b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-glob.tc
@@ -30,7 +30,7 @@ ftrace_filter_check '*schedule*' '^.*schedule.*$'
ftrace_filter_check 'schedule*' '^schedule.*$'
# filter by *mid*end
-ftrace_filter_check '*aw*lock' '.*aw.*lock$'
+ftrace_filter_check '*pin*lock' '.*pin.*lock$'
# filter by start*mid*
ftrace_filter_check 'mutex*try*' '^mutex.*try.*'
--
2.17.1
Commit 852c8cbf34d3 (selftests/kselftest/runner.sh: Add 45 second
timeout per test) adds support for a new per-test-directory "settings"
file. But this only works for tests not in a sub-subdirectories, e.g.
- tools/testing/selftests/rtc (rtc) is OK,
- tools/testing/selftests/net/mptcp (net/mptcp) is not.
We have to increase the timeout for net/mptcp tests which are not
upstreamed yet but this fix is valid for other tests if they need to add
a "settings" file, see the full list with:
tools/testing/selftests/*/*/**/Makefile
Note that this patch changes the text header message printed at the end
of the execution but this text is modified only for the tests that are
in sub-subdirectories, e.g.
ok 1 selftests: net/mptcp: mptcp_connect.sh
Before we had:
ok 1 selftests: mptcp: mptcp_connect.sh
But showing the full target name is probably better, just in case a
subsubdir has the same name as another one in another subdirectory.
Fixes: 852c8cbf34d3 (selftests/kselftest/runner.sh: Add 45 second timeout per test)
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
tools/testing/selftests/kselftest/runner.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index 84de7bc74f2c..0d7a89901ef7 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -90,7 +90,7 @@ run_one()
run_many()
{
echo "TAP version 13"
- DIR=$(basename "$PWD")
+ DIR="${PWD#${BASE_DIR}/}"
test_num=0
total=$(echo "$@" | wc -w)
echo "1..$total"
--
2.20.1
When handling page faults for many vCPUs during demand paging, KVM's MMU
lock becomes highly contended. This series creates a test with a naive
userfaultfd based demand paging implementation to demonstrate that
contention. This test serves both as a functional test of userfaultfd
and a microbenchmark of demand paging performance with a variable number
of vCPUs and memory per vCPU.
The test creates N userfaultfd threads, N vCPUs, and a region of memory
with M pages per vCPU. The N userfaultfd polling threads are each set up
to serve faults on a region of memory corresponding to one of the vCPUs.
Each of the vCPUs is then started, and touches each page of its disjoint
memory region, sequentially. In response to faults, the userfaultfd
threads copy a static buffer into the guest's memory. This creates a
worst case for MMU lock contention as we have removed most of the
contention between the userfaultfd threads and there is no time required
to fetch the contents of guest memory.
This test was run successfully on Intel Haswell, Broadwell, and
Cascadelake hosts with a variety of vCPU counts and memory sizes.
This test was adapted from the dirty_log_test.
The series can also be viewed in Gerrit here:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/1464
(Thanks to Dmitry Vyukov <dvyukov(a)google.com> for setting up the Gerrit
instance)
v4 (Responding to feedback from Andrew Jones, Peter Xu, and Peter Shier):
- Tested this revision by running
demand_paging_test
at each commit in the series on an Intel Haswell machine. Ran
demand_paging_test -u -v 8 -b 8M -d 10
on the same machine at the last commit in the series.
- Readded partial aarch64 support, though aarch64 and s390 remain
untested
- Implemented pipefd polling to reduce UFFD thread exit latency
- Added variable unit input for memory size so users can pass command
line arguments of the form -b 24M instead of the raw number or bytes
- Moved a missing break from a patch later in the series to an earlier
one
- Moved to syncing per-vCPU global variables to guest and looking up
per-vcpu arguments based on a single CPU ID passed to each guest
vCPU. This allows for future patches to pass more than the supported
number of arguments for each arch to the vCPUs.
- Implemented vcpu_args_set for s390 and aarch64 [UNTESTED]
- Changed vm_create to always allocate memslot 0 at 4G instead of only
when the number of pages required is large.
- Changed vcpu_wss to vcpu_memory_size for clarity.
Ben Gardon (10):
KVM: selftests: Create a demand paging test
KVM: selftests: Add demand paging content to the demand paging test
KVM: selftests: Add configurable demand paging delay
KVM: selftests: Add memory size parameter to the demand paging test
KVM: selftests: Pass args to vCPU in global vCPU args struct
KVM: selftests: Add support for vcpu_args_set to aarch64 and s390x
KVM: selftests: Support multiple vCPUs in demand paging test
KVM: selftests: Time guest demand paging
KVM: selftests: Stop memslot creation in KVM internal memslot region
KVM: selftests: Move memslot 0 above KVM internal memslots
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 5 +-
.../selftests/kvm/demand_paging_test.c | 680 ++++++++++++++++++
.../testing/selftests/kvm/include/test_util.h | 2 +
.../selftests/kvm/lib/aarch64/processor.c | 33 +
tools/testing/selftests/kvm/lib/kvm_util.c | 27 +-
.../selftests/kvm/lib/s390x/processor.c | 35 +
tools/testing/selftests/kvm/lib/test_util.c | 61 ++
8 files changed, 839 insertions(+), 5 deletions(-)
create mode 100644 tools/testing/selftests/kvm/demand_paging_test.c
create mode 100644 tools/testing/selftests/kvm/lib/test_util.c
--
2.25.0.341.g760bfbb309-goog
The current stable LLVM BPF backend fails to compile the BPF selftests
due to a compiler bug. The bug has been fixed in trunk, but that fix
hasn't landed in the binary packages I'm using yet (Fedora arm64).
Without this workaround the tests don't compile for me.
This patch triggers a preprocessor warning on LLVM versions that
definitely have the bug. The test may be conservative (ie, I'm not sure
if 9.1 will have the fix), but it should at least make the current set
of stable releases work together.
See https://reviews.llvm.org/D69438 for more information on the fix. I
obtained the workaround from
https://lore.kernel.org/linux-kselftest/aed8eda7-df20-069b-ea14-f0662898456…
Fixes: 20a9ad2e7136 ("selftests/bpf: add CO-RE relocs array tests")
Signed-off-by: Palmer Dabbelt <palmerdabbelt(a)google.com>
---
.../testing/selftests/bpf/progs/test_core_reloc_arrays.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c b/tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c
index 89951b684282..e5eafdab80a4 100644
--- a/tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c
+++ b/tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c
@@ -43,15 +43,23 @@ int test_core_arrays(void *ctx)
/* in->a[2] */
if (CORE_READ(&out->a2, &in->a[2]))
return 1;
+#if defined(__clang__) && (__clang_major__ < 10) && (__clang_minor__ < 1)
+# warning "clang 9.0 SEGVs on multidimensional arrays, see https://reviews.llvm.org/D69438"
+#else
/* in->b[1][2][3] */
if (CORE_READ(&out->b123, &in->b[1][2][3]))
return 1;
+#endif
/* in->c[1].c */
if (CORE_READ(&out->c1c, &in->c[1].c))
return 1;
+#if defined(__clang__) && (__clang_major__ < 10) && (__clang_minor__ < 1)
+# warning "clang 9.0 SEGVs on multidimensional arrays, see https://reviews.llvm.org/D69438"
+#else
/* in->d[0][0].d */
if (CORE_READ(&out->d00d, &in->d[0][0].d))
return 1;
+#endif
return 0;
}
--
2.25.0.341.g760bfbb309-goog
A few of the lists used in the linked-list KUnit tests (the
for_each_entry{,_reverse} tests) are declared 'static', and so are
not-reinitialised if the test runs multiple times. This was not a
problem when KUnit tests were run once on startup, but when tests are
able to be run manually (e.g. from debugfs[1]), this is no longer the
case.
Making these lists no longer 'static' causes the lists to be
reinitialised, and the test passes each time it is run. While there may
be some value in testing that initialising static lists works, the
for_each_entry_* tests are unlikely to be the right place for it.
Signed-off-by: David Gow <davidgow(a)google.com>
---
lib/list-test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/list-test.c b/lib/list-test.c
index 76babb1df889..ee09505df16f 100644
--- a/lib/list-test.c
+++ b/lib/list-test.c
@@ -659,7 +659,7 @@ static void list_test_list_for_each_prev_safe(struct kunit *test)
static void list_test_list_for_each_entry(struct kunit *test)
{
struct list_test_struct entries[5], *cur;
- static LIST_HEAD(list);
+ LIST_HEAD(list);
int i = 0;
for (i = 0; i < 5; ++i) {
@@ -680,7 +680,7 @@ static void list_test_list_for_each_entry(struct kunit *test)
static void list_test_list_for_each_entry_reverse(struct kunit *test)
{
struct list_test_struct entries[5], *cur;
- static LIST_HEAD(list);
+ LIST_HEAD(list);
int i = 0;
for (i = 0; i < 5; ++i) {
--
2.25.0.341.g760bfbb309-goog
The reuseport tests currently suffer from a race condition: RST
packets count towards DROP_ERR_SKB_DATA, since they don't contain
a valid struct cmd. Tests will spuriously fail depending on whether
check_results is called before or after the RST is processed.
Exit the BPF program early if FIN is set.
Signed-off-by: Lorenz Bauer <lmb(a)cloudflare.com>
---
.../selftests/bpf/progs/test_select_reuseport_kern.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c b/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
index d69a1f2bbbfd..26e77dcc7e91 100644
--- a/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
@@ -113,6 +113,12 @@ int _select_by_skb_data(struct sk_reuseport_md *reuse_md)
data_check.skb_ports[0] = th->source;
data_check.skb_ports[1] = th->dest;
+ if (th->fin)
+ /* The connection is being torn down at the end of a
+ * test. It can't contain a cmd, so return early.
+ */
+ return SK_PASS;
+
if ((th->doff << 2) + sizeof(*cmd) > data_check.len)
GOTO_DONE(DROP_ERR_SKB_DATA);
if (bpf_skb_load_bytes(reuse_md, th->doff << 2, &cmd_copy,
--
2.20.1
Currently, there is a lot of false positives if a single reuseport test
fails. This is because expected_results and the result map are not cleared.
Zero both after individual test runs, which fixes the mentioned false
positives.
Signed-off-by: Lorenz Bauer <lmb(a)cloudflare.com>
Fixes: 91134d849a0e ("bpf: Test BPF_PROG_TYPE_SK_REUSEPORT")
---
.../selftests/bpf/prog_tests/select_reuseport.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/select_reuseport.c b/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
index e7e56929751c..098bcae5f827 100644
--- a/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
+++ b/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
@@ -33,7 +33,7 @@
#define REUSEPORT_ARRAY_SIZE 32
static int result_map, tmp_index_ovr_map, linum_map, data_check_map;
-static enum result expected_results[NR_RESULTS];
+static __u32 expected_results[NR_RESULTS];
static int sk_fds[REUSEPORT_ARRAY_SIZE];
static int reuseport_array = -1, outer_map = -1;
static int select_by_skb_data_prog;
@@ -697,7 +697,19 @@ static void setup_per_test(int type, sa_family_t family, bool inany,
static void cleanup_per_test(bool no_inner_map)
{
- int i, err;
+ int i, err, zero = 0;
+
+ memset(expected_results, 0, sizeof(expected_results));
+
+ for (i = 0; i < NR_RESULTS; i++) {
+ err = bpf_map_update_elem(result_map, &i, &zero, BPF_ANY);
+ RET_IF(err, "reset elem in result_map",
+ "i:%u err:%d errno:%d\n", i, err, errno);
+ }
+
+ err = bpf_map_update_elem(linum_map, &zero, &zero, BPF_ANY);
+ RET_IF(err, "reset line number in linum_map", "err:%d errno:%d\n",
+ err, errno);
for (i = 0; i < REUSEPORT_ARRAY_SIZE; i++)
close(sk_fds[i]);
--
2.20.1
The reuseport tests currently suffer from a race condition: FIN
packets count towards DROP_ERR_SKB_DATA, since they don't contain
a valid struct cmd. Tests will spuriously fail depending on whether
check_results is called before or after the FIN is processed.
Exit the BPF program early if FIN is set.
Signed-off-by: Lorenz Bauer <lmb(a)cloudflare.com>
Fixes: 91134d849a0e ("bpf: Test BPF_PROG_TYPE_SK_REUSEPORT")
---
.../selftests/bpf/progs/test_select_reuseport_kern.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c b/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
index d69a1f2bbbfd..26e77dcc7e91 100644
--- a/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
@@ -113,6 +113,12 @@ int _select_by_skb_data(struct sk_reuseport_md *reuse_md)
data_check.skb_ports[0] = th->source;
data_check.skb_ports[1] = th->dest;
+ if (th->fin)
+ /* The connection is being torn down at the end of a
+ * test. It can't contain a cmd, so return early.
+ */
+ return SK_PASS;
+
if ((th->doff << 2) + sizeof(*cmd) > data_check.len)
GOTO_DONE(DROP_ERR_SKB_DATA);
if (bpf_skb_load_bytes(reuse_md, th->doff << 2, &cmd_copy,
--
2.20.1
Currently, there is a lot of false positives if a single reuseport test
fails. This is because expected_results and the result map are not cleared.
Zero both after individual test runs, which fixes the mentioned false
positives.
Signed-off-by: Lorenz Bauer <lmb(a)cloudflare.com>
---
.../selftests/bpf/prog_tests/select_reuseport.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/select_reuseport.c b/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
index 09a536af139a..0bab8b1ca1c3 100644
--- a/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
+++ b/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
@@ -699,7 +699,19 @@ static void setup_per_test(int type, sa_family_t family, bool inany,
static void cleanup_per_test(bool no_inner_map)
{
- int i, err;
+ int i, err, zero = 0;
+
+ memset(expected_results, 0, sizeof(expected_results));
+
+ for (i = 0; i < NR_RESULTS; i++) {
+ err = bpf_map_update_elem(result_map, &i, &zero, BPF_ANY);
+ RET_IF(err, "reset elem in result_map",
+ "i:%u err:%d errno:%d\n", i, err, errno);
+ }
+
+ err = bpf_map_update_elem(linum_map, &zero, &zero, BPF_ANY);
+ RET_IF(err, "reset line number in linum_map", "err:%d errno:%d\n",
+ err, errno);
for (i = 0; i < REUSEPORT_ARRAY_SIZE; i++)
close(sk_fds[i]);
--
2.20.1
When kunit tests are run on native (i.e. non-UML) environments, the results
of test execution are often intermixed with dmesg output. This patch
series attempts to solve this by providing a debugfs representation
of the results of the last test run, available as
/sys/kernel/debug/kunit/<testsuite>/results
In addition, we provide a way to re-run the tests and show results via
/sys/kernel/debug/kunit/<testsuite>/run
Alan Maguire (3):
kunit: add debugfs /sys/kernel/debug/kunit/<suite>/results display
kunit: add "run" debugfs file to run suites, display results
kunit: update documentation to describe debugfs representation
Documentation/dev-tools/kunit/usage.rst | 20 +++++
include/kunit/test.h | 21 +++--
lib/kunit/Makefile | 3 +-
lib/kunit/debugfs.c | 145 ++++++++++++++++++++++++++++++++
lib/kunit/debugfs.h | 11 +++
lib/kunit/test.c | 88 ++++++++++++++-----
6 files changed, 260 insertions(+), 28 deletions(-)
create mode 100644 lib/kunit/debugfs.c
create mode 100644 lib/kunit/debugfs.h
--
1.8.3.1
These counters will track hugetlb reservations rather than hugetlb
memory faulted in. This patch only adds the counter, following patches
add the charging and uncharging of the counter.
This is patch 1 of an 8 patch series.
Problem:
Currently tasks attempting to allocate more hugetlb memory than is available get
a failure at mmap/shmget time. This is thanks to Hugetlbfs Reservations [1].
However, if a task attempts to allocate hugetlb memory only more than its
hugetlb_cgroup limit allows, the kernel will allow the mmap/shmget call,
but will SIGBUS the task when it attempts to fault the memory in.
We have developers interested in using hugetlb_cgroups, and they have expressed
dissatisfaction regarding this behavior. We'd like to improve this
behavior such that tasks violating the hugetlb_cgroup limits get an error on
mmap/shmget time, rather than getting SIGBUS'd when they try to fault
the excess memory in.
The underlying problem is that today's hugetlb_cgroup accounting happens
at hugetlb memory *fault* time, rather than at *reservation* time.
Thus, enforcing the hugetlb_cgroup limit only happens at fault time, and
the offending task gets SIGBUS'd.
Proposed Solution:
A new page counter named hugetlb.xMB.reservation_[limit|usage]_in_bytes. This
counter has slightly different semantics than
hugetlb.xMB.[limit|usage]_in_bytes:
- While usage_in_bytes tracks all *faulted* hugetlb memory,
reservation_usage_in_bytes tracks all *reserved* hugetlb memory and
hugetlb memory faulted in without a prior reservation.
- If a task attempts to reserve more memory than limit_in_bytes allows,
the kernel will allow it to do so. But if a task attempts to reserve
more memory than reservation_limit_in_bytes, the kernel will fail this
reservation.
This proposal is implemented in this patch series, with tests to verify
functionality and show the usage. We also added cgroup-v2 support to
hugetlb_cgroup so that the new use cases can be extended to v2.
Alternatives considered:
1. A new cgroup, instead of only a new page_counter attached to
the existing hugetlb_cgroup. Adding a new cgroup seemed like a lot of code
duplication with hugetlb_cgroup. Keeping hugetlb related page counters under
hugetlb_cgroup seemed cleaner as well.
2. Instead of adding a new counter, we considered adding a sysctl that modifies
the behavior of hugetlb.xMB.[limit|usage]_in_bytes, to do accounting at
reservation time rather than fault time. Adding a new page_counter seems
better as userspace could, if it wants, choose to enforce different cgroups
differently: one via limit_in_bytes, and another via
reservation_limit_in_bytes. This could be very useful if you're
transitioning how hugetlb memory is partitioned on your system one
cgroup at a time, for example. Also, someone may find usage for both
limit_in_bytes and reservation_limit_in_bytes concurrently, and this
approach gives them the option to do so.
Testing:
- Added tests passing.
- Used libhugetlbfs for regression testing.
[1]: https://www.kernel.org/doc/html/latest/vm/hugetlbfs_reserv.html
Signed-off-by: Mina Almasry <almasrymina(a)google.com>
Acked-by: Hillf Danton <hdanton(a)sina.com>
---
include/linux/hugetlb.h | 4 +-
mm/hugetlb_cgroup.c | 116 +++++++++++++++++++++++++++++++++++-----
2 files changed, 106 insertions(+), 14 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 1e897e4168ac1..dea6143aa0685 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -432,8 +432,8 @@ struct hstate {
unsigned int surplus_huge_pages_node[MAX_NUMNODES];
#ifdef CONFIG_CGROUP_HUGETLB
/* cgroup control files */
- struct cftype cgroup_files_dfl[5];
- struct cftype cgroup_files_legacy[5];
+ struct cftype cgroup_files_dfl[7];
+ struct cftype cgroup_files_legacy[9];
#endif
char name[HSTATE_NAME_LEN];
};
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index e434b05416c68..35415af9ed26f 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -36,6 +36,11 @@ struct hugetlb_cgroup {
*/
struct page_counter hugepage[HUGE_MAX_HSTATE];
+ /*
+ * the counter to account for hugepage reservations from hugetlb.
+ */
+ struct page_counter reserved_hugepage[HUGE_MAX_HSTATE];
+
atomic_long_t events[HUGE_MAX_HSTATE][HUGETLB_NR_MEMORY_EVENTS];
atomic_long_t events_local[HUGE_MAX_HSTATE][HUGETLB_NR_MEMORY_EVENTS];
@@ -55,6 +60,14 @@ struct hugetlb_cgroup {
static struct hugetlb_cgroup *root_h_cgroup __read_mostly;
+static inline struct page_counter *
+hugetlb_cgroup_get_counter(struct hugetlb_cgroup *h_cg, int idx, bool reserved)
+{
+ if (reserved)
+ return &h_cg->reserved_hugepage[idx];
+ return &h_cg->hugepage[idx];
+}
+
static inline
struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s)
{
@@ -295,28 +308,42 @@ void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
enum {
RES_USAGE,
+ RES_RESERVATION_USAGE,
RES_LIMIT,
+ RES_RESERVATION_LIMIT,
RES_MAX_USAGE,
+ RES_RESERVATION_MAX_USAGE,
RES_FAILCNT,
+ RES_RESERVATION_FAILCNT,
};
static u64 hugetlb_cgroup_read_u64(struct cgroup_subsys_state *css,
struct cftype *cft)
{
struct page_counter *counter;
+ struct page_counter *reserved_counter;
struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(css);
counter = &h_cg->hugepage[MEMFILE_IDX(cft->private)];
+ reserved_counter = &h_cg->reserved_hugepage[MEMFILE_IDX(cft->private)];
switch (MEMFILE_ATTR(cft->private)) {
case RES_USAGE:
return (u64)page_counter_read(counter) * PAGE_SIZE;
+ case RES_RESERVATION_USAGE:
+ return (u64)page_counter_read(reserved_counter) * PAGE_SIZE;
case RES_LIMIT:
return (u64)counter->max * PAGE_SIZE;
+ case RES_RESERVATION_LIMIT:
+ return (u64)reserved_counter->max * PAGE_SIZE;
case RES_MAX_USAGE:
return (u64)counter->watermark * PAGE_SIZE;
+ case RES_RESERVATION_MAX_USAGE:
+ return (u64)reserved_counter->watermark * PAGE_SIZE;
case RES_FAILCNT:
return counter->failcnt;
+ case RES_RESERVATION_FAILCNT:
+ return reserved_counter->failcnt;
default:
BUG();
}
@@ -338,10 +365,16 @@ static int hugetlb_cgroup_read_u64_max(struct seq_file *seq, void *v)
1 << huge_page_order(&hstates[idx]));
switch (MEMFILE_ATTR(cft->private)) {
+ case RES_RESERVATION_USAGE:
+ counter = &h_cg->reserved_hugepage[idx];
+ /* Fall through. */
case RES_USAGE:
val = (u64)page_counter_read(counter);
seq_printf(seq, "%llu\n", val * PAGE_SIZE);
break;
+ case RES_RESERVATION_LIMIT:
+ counter = &h_cg->reserved_hugepage[idx];
+ /* Fall through. */
case RES_LIMIT:
val = (u64)counter->max;
if (val == limit)
@@ -365,6 +398,7 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file *of,
int ret, idx;
unsigned long nr_pages;
struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(of_css(of));
+ bool reserved = false;
if (hugetlb_cgroup_is_root(h_cg)) /* Can't set limit on root */
return -EINVAL;
@@ -378,9 +412,14 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file *of,
nr_pages = round_down(nr_pages, 1 << huge_page_order(&hstates[idx]));
switch (MEMFILE_ATTR(of_cft(of)->private)) {
+ case RES_RESERVATION_LIMIT:
+ reserved = true;
+ /* Fall through. */
case RES_LIMIT:
mutex_lock(&hugetlb_limit_mutex);
- ret = page_counter_set_max(&h_cg->hugepage[idx], nr_pages);
+ ret = page_counter_set_max(hugetlb_cgroup_get_counter(h_cg, idx,
+ reserved),
+ nr_pages);
mutex_unlock(&hugetlb_limit_mutex);
break;
default:
@@ -406,18 +445,26 @@ static ssize_t hugetlb_cgroup_reset(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
{
int ret = 0;
- struct page_counter *counter;
+ struct page_counter *counter, *reserved_counter;
struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(of_css(of));
counter = &h_cg->hugepage[MEMFILE_IDX(of_cft(of)->private)];
+ reserved_counter =
+ &h_cg->reserved_hugepage[MEMFILE_IDX(of_cft(of)->private)];
switch (MEMFILE_ATTR(of_cft(of)->private)) {
case RES_MAX_USAGE:
page_counter_reset_watermark(counter);
break;
+ case RES_RESERVATION_MAX_USAGE:
+ page_counter_reset_watermark(reserved_counter);
+ break;
case RES_FAILCNT:
counter->failcnt = 0;
break;
+ case RES_RESERVATION_FAILCNT:
+ reserved_counter->failcnt = 0;
+ break;
default:
ret = -EINVAL;
break;
@@ -472,7 +519,7 @@ static void __init __hugetlb_cgroup_file_dfl_init(int idx)
struct hstate *h = &hstates[idx];
/* format the size */
- mem_fmt(buf, 32, huge_page_size(h));
+ mem_fmt(buf, sizeof(buf), huge_page_size(h));
/* Add the limit file */
cft = &h->cgroup_files_dfl[0];
@@ -482,15 +529,30 @@ static void __init __hugetlb_cgroup_file_dfl_init(int idx)
cft->write = hugetlb_cgroup_write_dfl;
cft->flags = CFTYPE_NOT_ON_ROOT;
- /* Add the current usage file */
+ /* Add the reservation limit file */
cft = &h->cgroup_files_dfl[1];
+ snprintf(cft->name, MAX_CFTYPE_NAME, "%s.reservation_max", buf);
+ cft->private = MEMFILE_PRIVATE(idx, RES_RESERVATION_LIMIT);
+ cft->seq_show = hugetlb_cgroup_read_u64_max;
+ cft->write = hugetlb_cgroup_write_dfl;
+ cft->flags = CFTYPE_NOT_ON_ROOT;
+
+ /* Add the current usage file */
+ cft = &h->cgroup_files_dfl[2];
snprintf(cft->name, MAX_CFTYPE_NAME, "%s.current", buf);
cft->private = MEMFILE_PRIVATE(idx, RES_USAGE);
cft->seq_show = hugetlb_cgroup_read_u64_max;
cft->flags = CFTYPE_NOT_ON_ROOT;
+ /* Add the current reservation usage file */
+ cft = &h->cgroup_files_dfl[3];
+ snprintf(cft->name, MAX_CFTYPE_NAME, "%s.reservation_current", buf);
+ cft->private = MEMFILE_PRIVATE(idx, RES_RESERVATION_USAGE);
+ cft->seq_show = hugetlb_cgroup_read_u64_max;
+ cft->flags = CFTYPE_NOT_ON_ROOT;
+
/* Add the events file */
- cft = &h->cgroup_files_dfl[2];
+ cft = &h->cgroup_files_dfl[4];
snprintf(cft->name, MAX_CFTYPE_NAME, "%s.events", buf);
cft->private = MEMFILE_PRIVATE(idx, 0);
cft->seq_show = hugetlb_events_show;
@@ -498,7 +560,7 @@ static void __init __hugetlb_cgroup_file_dfl_init(int idx)
cft->flags = CFTYPE_NOT_ON_ROOT;
/* Add the events.local file */
- cft = &h->cgroup_files_dfl[3];
+ cft = &h->cgroup_files_dfl[5];
snprintf(cft->name, MAX_CFTYPE_NAME, "%s.events.local", buf);
cft->private = MEMFILE_PRIVATE(idx, 0);
cft->seq_show = hugetlb_events_local_show;
@@ -507,7 +569,7 @@ static void __init __hugetlb_cgroup_file_dfl_init(int idx)
cft->flags = CFTYPE_NOT_ON_ROOT;
/* NULL terminate the last cft */
- cft = &h->cgroup_files_dfl[4];
+ cft = &h->cgroup_files_dfl[6];
memset(cft, 0, sizeof(*cft));
WARN_ON(cgroup_add_dfl_cftypes(&hugetlb_cgrp_subsys,
@@ -530,28 +592,58 @@ static void __init __hugetlb_cgroup_file_legacy_init(int idx)
cft->read_u64 = hugetlb_cgroup_read_u64;
cft->write = hugetlb_cgroup_write_legacy;
- /* Add the usage file */
+ /* Add the reservation limit file */
cft = &h->cgroup_files_legacy[1];
+ snprintf(cft->name, MAX_CFTYPE_NAME, "%s.reservation_limit_in_bytes",
+ buf);
+ cft->private = MEMFILE_PRIVATE(idx, RES_RESERVATION_LIMIT);
+ cft->read_u64 = hugetlb_cgroup_read_u64;
+ cft->write = hugetlb_cgroup_write_legacy;
+
+ /* Add the usage file */
+ cft = &h->cgroup_files_legacy[2];
snprintf(cft->name, MAX_CFTYPE_NAME, "%s.usage_in_bytes", buf);
cft->private = MEMFILE_PRIVATE(idx, RES_USAGE);
cft->read_u64 = hugetlb_cgroup_read_u64;
+ /* Add the reservation usage file */
+ cft = &h->cgroup_files_legacy[3];
+ snprintf(cft->name, MAX_CFTYPE_NAME, "%s.reservation_usage_in_bytes",
+ buf);
+ cft->private = MEMFILE_PRIVATE(idx, RES_RESERVATION_USAGE);
+ cft->read_u64 = hugetlb_cgroup_read_u64;
+
/* Add the MAX usage file */
- cft = &h->cgroup_files_legacy[2];
+ cft = &h->cgroup_files_legacy[4];
snprintf(cft->name, MAX_CFTYPE_NAME, "%s.max_usage_in_bytes", buf);
cft->private = MEMFILE_PRIVATE(idx, RES_MAX_USAGE);
cft->write = hugetlb_cgroup_reset;
cft->read_u64 = hugetlb_cgroup_read_u64;
+ /* Add the MAX reservation usage file */
+ cft = &h->cgroup_files_legacy[5];
+ snprintf(cft->name, MAX_CFTYPE_NAME,
+ "%s.reservation_max_usage_in_bytes", buf);
+ cft->private = MEMFILE_PRIVATE(idx, RES_RESERVATION_MAX_USAGE);
+ cft->write = hugetlb_cgroup_reset;
+ cft->read_u64 = hugetlb_cgroup_read_u64;
+
/* Add the failcntfile */
- cft = &h->cgroup_files_legacy[3];
+ cft = &h->cgroup_files_legacy[6];
snprintf(cft->name, MAX_CFTYPE_NAME, "%s.failcnt", buf);
- cft->private = MEMFILE_PRIVATE(idx, RES_FAILCNT);
+ cft->private = MEMFILE_PRIVATE(idx, RES_FAILCNT);
+ cft->write = hugetlb_cgroup_reset;
+ cft->read_u64 = hugetlb_cgroup_read_u64;
+
+ /* Add the reservation failcntfile */
+ cft = &h->cgroup_files_legacy[7];
+ snprintf(cft->name, MAX_CFTYPE_NAME, "%s.reservation_failcnt", buf);
+ cft->private = MEMFILE_PRIVATE(idx, RES_RESERVATION_FAILCNT);
cft->write = hugetlb_cgroup_reset;
cft->read_u64 = hugetlb_cgroup_read_u64;
/* NULL terminate the last cft */
- cft = &h->cgroup_files_legacy[4];
+ cft = &h->cgroup_files_legacy[8];
memset(cft, 0, sizeof(*cft));
WARN_ON(cgroup_add_legacy_cftypes(&hugetlb_cgrp_subsys,
--
2.24.1.735.g03f4e72817-goog
(replying again as plain text for mailing lists)
----- On Jan 22, 2020, at 10:44 AM, Jan Ziak 0xe2.0x9a.0x9b(a)gmail.com wrote:
> Hello
> I would like to note that this does not help userspace to express dynamic
> scheduling relationships among processes/threads such as "do not run processes
> A and B on the same core" or "run processes A and B on cores sharing the same
> L2 cache".
Indeed, this is not what this system call is trying to solve. Does the name "pin_on_cpu" lead
to confusion here ?
I thought that cgroups was already the mechanism taking care of this kind of requirement.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Hello
I would like to note that this does not help userspace to express
dynamic scheduling relationships among processes/threads such as "do
not run processes A and B on the same core" or "run processes A and B
on cores sharing the same L2 cache".
Sincerely
Jan
Patch changelog:
v3:
* Merge changes into the original patches to make Al's life easier.
[Al Viro]
v2:
* Add include <linux/types.h> to openat2.h. [Florian Weimer]
* Move OPEN_HOW_SIZE_* constants out of UAPI. [Florian Weimer]
* Switch from __aligned_u64 to __u64 since it isn't necessary.
[David Laight]
v1: <https://lore.kernel.org/lkml/20191219105533.12508-1-cyphar@cyphar.com/>
While openat2(2) is still not yet in Linus's tree, we can take this
opportunity to iron out some small warts that weren't noticed earlier:
* A fix was suggested by Florian Weimer, to separate the openat2
definitions so glibc can use the header directly. I've put the
maintainership under VFS but let me know if you'd prefer it belong
ot the fcntl folks.
* Having heterogenous field sizes in an extensible struct results in
"padding hole" problems when adding new fields (in addition the
correct error to use for non-zero padding isn't entirely clear ).
The simplest solution is to just copy clone(3)'s model -- always use
u64s. It will waste a little more space in the struct, but it
removes a possible future headache.
This patch is intended to replace the corresponding patches in Al's
#work.openat2 tree (and *will not* apply on Linus' tree).
@Al: I will send some additional patches later, but they will require
proper design review since they're ABI-related features (namely,
adding a way to check what features a syscall supports as I
outlined in my talk here[1]).
[1]: https://youtu.be/ggD-eb3yPVs
Aleksa Sarai (2):
open: introduce openat2(2) syscall
selftests: add openat2(2) selftests
CREDITS | 4 +-
MAINTAINERS | 1 +
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/ia64/kernel/syscalls/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
fs/open.c | 147 +++--
include/linux/fcntl.h | 16 +-
include/linux/syscalls.h | 3 +
include/uapi/asm-generic/unistd.h | 5 +-
include/uapi/linux/fcntl.h | 2 +-
include/uapi/linux/openat2.h | 39 ++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/openat2/.gitignore | 1 +
tools/testing/selftests/openat2/Makefile | 8 +
tools/testing/selftests/openat2/helpers.c | 109 ++++
tools/testing/selftests/openat2/helpers.h | 106 ++++
.../testing/selftests/openat2/openat2_test.c | 312 +++++++++++
.../selftests/openat2/rename_attack_test.c | 160 ++++++
.../testing/selftests/openat2/resolve_test.c | 523 ++++++++++++++++++
34 files changed, 1418 insertions(+), 39 deletions(-)
create mode 100644 include/uapi/linux/openat2.h
create mode 100644 tools/testing/selftests/openat2/.gitignore
create mode 100644 tools/testing/selftests/openat2/Makefile
create mode 100644 tools/testing/selftests/openat2/helpers.c
create mode 100644 tools/testing/selftests/openat2/helpers.h
create mode 100644 tools/testing/selftests/openat2/openat2_test.c
create mode 100644 tools/testing/selftests/openat2/rename_attack_test.c
create mode 100644 tools/testing/selftests/openat2/resolve_test.c
--
2.24.1
Memory protection keys enables an application to protect its address
space from inadvertent access by its own code.
This feature is now enabled on powerpc and has been available since
4.16-rc1. The patches move the selftests to arch neutral directory
and enhance their test coverage.
Tested on powerpc64 and x86_64 (Skylake-SP).
Changelog
---------
Link to previous version (v15):
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=149238
v16:
(1) Rebased on top of latest master.
(2) Switched to u64 instead of using an arch-dependent
pkey_reg_t type for references to the pkey register
based on suggestions from Dave, Michal and Michael.
(3) Removed build time determination of page size based
on suggestion from Michael.
(4) Fixed comment before the definition of __page_o_noops()
from patch 13 ("selftests/vm/pkeys: Introduce powerpc
support").
v15:
(1) Rebased on top of latest master.
(2) Addressed review comments from Dave Hansen.
(3) Moved code for getting or setting pkey bits to new
helpers. These changes replace patch 7 of v14.
(4) Added a fix which ensures that the correct count of
reserved keys is used across different platforms.
(5) Added a fix which ensures that the correct page size
is used as powerpc supports both 4K and 64K pages.
v14:
(1) Incorporated another round of comments from Dave Hansen.
v13:
(1) Incorporated comments for Dave Hansen.
(2) Added one more test for correct pkey-0 behavior.
v12:
(1) Fixed the offset of pkey field in the siginfo structure for
x86_64 and powerpc. And tries to use the actual field
if the headers have it defined.
v11:
(1) Fixed a deadlock in the ptrace testcase.
v10 and prior:
(1) Moved the testcase to arch neutral directory.
(2) Split the changes into incremental patches.
Desnes A. Nunes do Rosario (1):
selftests/vm/pkeys: Fix number of reserved powerpc pkeys
Ram Pai (17):
selftests/x86/pkeys: Move selftests to arch-neutral directory
selftests/vm: Rename all references to pkru to a generic name
selftests/vm: Move generic definitions to header file
selftests/vm: Typecast references to pkey register
selftests/vm: Fix pkey_disable_clear()
selftests/vm/pkeys: Fix assertion in pkey_disable_set/clear()
selftests/vm/pkeys: Fix alloc_random_pkey() to make it really random
selftests/vm/pkeys: Introduce generic pkey abstractions
selftests/vm/pkeys: Introduce powerpc support
selftests/vm/pkeys: Fix assertion in test_pkey_alloc_exhaust()
selftests/vm/pkeys: Improve checks to determine pkey support
selftests/vm/pkeys: Associate key on a mapped page and detect access
violation
selftests/vm/pkeys: Associate key on a mapped page and detect write
violation
selftests/vm/pkeys: Detect write violation on a mapped
access-denied-key page
selftests/vm/pkeys: Introduce a sub-page allocator
selftests/vm/pkeys: Test correct behaviour of pkey-0
selftests/vm/pkeys: Override access right definitions on powerpc
Sandipan Das (3):
selftests: vm: pkeys: Add helpers for pkey bits
selftests: vm: pkeys: Use the correct huge page size
selftests: vm: pkeys: Use the correct page size on powerpc
Thiago Jung Bauermann (2):
selftests/vm: Move some definitions to arch-specific header
selftests/vm: Make gcc check arguments of sigsafe_printf()
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 1 +
tools/testing/selftests/vm/pkey-helpers.h | 225 ++++++
tools/testing/selftests/vm/pkey-powerpc.h | 136 ++++
tools/testing/selftests/vm/pkey-x86.h | 181 +++++
.../selftests/{x86 => vm}/protection_keys.c | 693 ++++++++++--------
tools/testing/selftests/x86/.gitignore | 1 -
tools/testing/selftests/x86/pkey-helpers.h | 219 ------
8 files changed, 927 insertions(+), 530 deletions(-)
create mode 100644 tools/testing/selftests/vm/pkey-helpers.h
create mode 100644 tools/testing/selftests/vm/pkey-powerpc.h
create mode 100644 tools/testing/selftests/vm/pkey-x86.h
rename tools/testing/selftests/{x86 => vm}/protection_keys.c (74%)
delete mode 100644 tools/testing/selftests/x86/pkey-helpers.h
--
2.17.1
It was observed[1] on arm64 that __builtin_strlen led to an infinite
loop in the get_size selftest. This is because __builtin_strlen (and
other builtins) may sometimes result in a call to the C library
function. The C library implementation of strlen uses an IFUNC
resolver to load the most efficient strlen implementation for the
underlying machine and hence has a PLT indirection even for static
binaries. Because this binary avoids the C library startup routines,
the PLT initialization never happens and hence the program gets stuck
in an infinite loop.
On x86_64 the __builtin_strlen just happens to expand inline and avoid
the call but that is not always guaranteed.
Further, while testing on x86_64 (Fedora 31), it was observed that the
test also failed with a segfault inside write() because the generated
code for the write function in glibc seems to access TLS before the
syscall (probably due to the cancellation point check) and fails
because TLS is not initialised.
To mitigate these problems, this patch reduces the interface with the
C library to just the syscall function. The syscall function still
sets errno on failure, which is undesirable but for now it only
affects cases where syscalls fail.
[1] https://bugs.linaro.org/show_bug.cgi?id=5479
Signed-off-by: Siddhesh Poyarekar <siddhesh(a)gotplt.org>
Reported-by: Masami Hiramatsu <masami.hiramatsu(a)linaro.org>
---
tools/testing/selftests/size/get_size.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/size/get_size.c b/tools/testing/selftests/size/get_size.c
index d4b59ab979a0..f55943b6d1e2 100644
--- a/tools/testing/selftests/size/get_size.c
+++ b/tools/testing/selftests/size/get_size.c
@@ -12,23 +12,35 @@
* own execution. It also attempts to have as few dependencies
* on kernel features as possible.
*
- * It should be statically linked, with startup libs avoided.
- * It uses no library calls, and only the following 3 syscalls:
+ * It should be statically linked, with startup libs avoided. It uses
+ * no library calls except the syscall() function for the following 3
+ * syscalls:
* sysinfo(), write(), and _exit()
*
* For output, it avoids printf (which in some C libraries
* has large external dependencies) by implementing it's own
* number output and print routines, and using __builtin_strlen()
+ *
+ * The test may crash if any of the above syscalls fails because in some
+ * libc implementations (e.g. the GNU C Library) errno is saved in
+ * thread-local storage, which does not get initialized due to avoiding
+ * startup libs.
*/
#include <sys/sysinfo.h>
#include <unistd.h>
+#include <sys/syscall.h>
#define STDOUT_FILENO 1
static int print(const char *s)
{
- return write(STDOUT_FILENO, s, __builtin_strlen(s));
+ size_t len = 0;
+
+ while (s[len] != '\0')
+ len++;
+
+ return syscall(SYS_write, STDOUT_FILENO, s, len);
}
static inline char *num_to_str(unsigned long num, char *buf, int len)
@@ -80,12 +92,12 @@ void _start(void)
print("TAP version 13\n");
print("# Testing system size.\n");
- ccode = sysinfo(&info);
+ ccode = syscall(SYS_sysinfo, &info);
if (ccode < 0) {
print("not ok 1");
print(test_name);
print(" ---\n reason: \"could not get sysinfo\"\n ...\n");
- _exit(ccode);
+ syscall(SYS_exit, ccode);
}
print("ok 1");
print(test_name);
@@ -101,5 +113,5 @@ void _start(void)
print(" ...\n");
print("1..1\n");
- _exit(0);
+ syscall(SYS_exit, 0);
}
--
2.24.1
Hi,
The "track FOLL_PIN pages" would have been the very next patch, but it is
not included here because I'm still debugging a bug report from Leon.
Let's get all of the prerequisite work (it's been reviewed) into the tree
so that future reviews are easier. It's clear that any fixes that are
required to the tracking patch, won't affect these patches here.
This implements an API naming change (put_user_page*() -->
unpin_user_page*()), and also adds FOLL_PIN page support, up to
*but not including* actually tracking FOLL_PIN pages. It extends
the FOLL_PIN support to a few select subsystems. More subsystems will
be added in follow up work.
Christoph Hellwig, a point of interest:
a) I've moved the bulk of the code out of the inline functions, as
requested, for the devmap changes (patch 4: "mm: devmap: refactor
1-based refcounting for ZONE_DEVICE pages").
Changes since v11: Fixes resulting from Kirill Shutemov's review, plus
a fix for a kbuild robot-reported warning.
* Only include the first 22 patches: up to, but not including, the "track
FOLL_PIN pages" patch.
* Improved the efficiency of put_compound_head(), by avoiding get_page()
entirely, and instead doing the mass subtraction on one less than
refs, followed by a final put_page().
* Got rid of the forward declaration of __gup_longterm_locked(), by
moving get_user_pages_remote() further down in gup.c
* Got rid of a redundant page_is_devmap_managed() call, and simplified
put_devmap_managed_page() as part of that small cleanup.
* Changed put_devmap_managed_page() to do an early out if the page is
not devmap managed. This saves an indentation level.
* Applied the same type of change to __unpin_devmap_managed_user_page(),
which has the same checks.
* Changed release_pages() to handle the changed put_devmap_managed_page()
API.
* Removed EXPORT_SYMBOL(free_devmap_managed_page), as it is not required,
after the other refactoring.
* Fixed a kbuild robot sparse warning: added "static" to
try_pin_compound_head()'s declaration.
There is a git repo and branch, for convenience:
git@github.com:johnhubbard/linux.git pin_user_pages_tracking_v8
For the remaining list of "changes since version N", those are all in
v11, which is here:
https://lore.kernel.org/r/20191216222537.491123-1-jhubbard@nvidia.com
============================================================
Overview:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], and in a remarkable number of email threads since about
2017. :)
A new internal gup flag, FOLL_PIN is introduced, and thoroughly
documented in the last patch's Documentation/vm/pin_user_pages.rst.
I believe that this will provide a good starting point for doing the
layout lease work that Ira Weiny has been working on. That's because
these new wrapper functions provide a clean, constrained, systematically
named set of functionality that, again, is required in order to even
know if a page is "dma-pinned".
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Testing:
* I've done some overall kernel testing (LTP, and a few other goodies),
and some directed testing to exercise some of the changes. And as you
can see, gup_benchmark is enhanced to exercise this. Basically, I've
been able to runtime test the core get_user_pages() and
pin_user_pages() and related routines, but not so much on several of
the call sites--but those are generally just a couple of lines
changed, each.
Not much of the kernel is actually using this, which on one hand
reduces risk quite a lot. But on the other hand, testing coverage
is low. So I'd love it if, in particular, the Infiniband and PowerPC
folks could do a smoke test of this series for me.
Runtime testing for the call sites so far is pretty light:
* io_uring: Some directed tests from liburing exercise this, and
they pass.
* process_vm_access.c: A small directed test passes.
* gup_benchmark: the enhanced version hits the new gup.c code, and
passes.
* infiniband: Ran rdma-core tests: rdma-core/build/bin/run_tests.py
* VFIO: compiles (I'm vowing to set up a run time test soon, but it's
not ready just yet)
* powerpc: it compiles...
* drm/via: compiles...
* goldfish: compiles...
* net/xdp: compiles...
* media/v4l2: compiles...
[1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/
Dan Williams (1):
mm: Cleanup __put_devmap_managed_page() vs ->page_free()
John Hubbard (21):
mm/gup: factor out duplicate code from four routines
mm/gup: move try_get_compound_head() to top, fix minor issues
mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages
goldish_pipe: rename local pin_user_pages() routine
mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM
vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call
mm/gup: allow FOLL_FORCE for get_user_pages_fast()
IB/umem: use get_user_pages_fast() to pin DMA pages
media/v4l2-core: set pages dirty upon releasing DMA buffers
mm/gup: introduce pin_user_pages*() and FOLL_PIN
goldish_pipe: convert to pin_user_pages() and put_user_page()
IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP
mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()
drm/via: set FOLL_PIN via pin_user_pages_fast()
fs/io_uring: set FOLL_PIN via pin_user_pages()
net/xdp: set FOLL_PIN via pin_user_pages()
media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page()
conversion
vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion
powerpc: book3s64: convert to pin_user_pages() and put_user_page()
mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding
"1"
mm, tree-wide: rename put_user_page*() to unpin_user_page*()
Documentation/core-api/index.rst | 1 +
Documentation/core-api/pin_user_pages.rst | 232 +++++++++
arch/powerpc/mm/book3s64/iommu_api.c | 10 +-
drivers/gpu/drm/via/via_dmablit.c | 6 +-
drivers/infiniband/core/umem.c | 19 +-
drivers/infiniband/core/umem_odp.c | 13 +-
drivers/infiniband/hw/hfi1/user_pages.c | 4 +-
drivers/infiniband/hw/mthca/mthca_memfree.c | 8 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 4 +-
drivers/infiniband/hw/qib/qib_user_sdma.c | 8 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 4 +-
drivers/infiniband/sw/siw/siw_mem.c | 4 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 8 +-
drivers/nvdimm/pmem.c | 6 -
drivers/platform/goldfish/goldfish_pipe.c | 35 +-
drivers/vfio/vfio_iommu_type1.c | 35 +-
fs/io_uring.c | 6 +-
include/linux/mm.h | 95 +++-
mm/gup.c | 495 ++++++++++++--------
mm/gup_benchmark.c | 9 +-
mm/memremap.c | 75 ++-
mm/process_vm_access.c | 28 +-
mm/swap.c | 27 +-
net/xdp/xdp_umem.c | 4 +-
tools/testing/selftests/vm/gup_benchmark.c | 6 +-
25 files changed, 762 insertions(+), 380 deletions(-)
create mode 100644 Documentation/core-api/pin_user_pages.rst
--
2.24.1
Fenghua Yu <fenghua.yu(a)intel.com> writes:
>
> Hi, Boris, Thomas, Ingo, et al,
>
> Any comment on this patch set?
No objections from my side, but you forgot to CC the relevant
maintainer/mailinglist for tools/testing/selftests/. CC'ed now.
Thanks,
tglx
From: Hewenliang <hewenliang4(a)huawei.com>
[ Upstream commit d671fa6393d6788fc65555d4643b71cb3a361f36 ]
It is necessary to set fd to -1 when inotify_add_watch() fails in
cg_prepare_for_wait. Otherwise the fd which has been closed in
cg_prepare_for_wait may be misused in other functions such as
cg_enter_and_wait_for_frozen and cg_freeze_wait.
Fixes: 5313bfe425c8 ("selftests: cgroup: add freezer controller self-tests")
Signed-off-by: Hewenliang <hewenliang4(a)huawei.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/cgroup/test_freezer.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/cgroup/test_freezer.c b/tools/testing/selftests/cgroup/test_freezer.c
index 0fc1b6d4b0f9..62a27ab3c2f3 100644
--- a/tools/testing/selftests/cgroup/test_freezer.c
+++ b/tools/testing/selftests/cgroup/test_freezer.c
@@ -72,6 +72,7 @@ static int cg_prepare_for_wait(const char *cgroup)
if (ret == -1) {
debug("Error: inotify_add_watch() failed\n");
close(fd);
+ fd = -1;
}
return fd;
--
2.20.1
This series adds basic self tests for HMM and mmu interval notifiers so
that changes can be validated. It is based on linux-5.5.0-rc1 and is for
Jason's rdma/hmm tree since I believe he is planning some interval
notifier changes. Patch 2 was last posted as part of [1] but the other
patches in that series have been merged and this version of the HMM tests
is modified to address Jason's concern over using both process wide MMU
notifiers in combination with MMU interval notifiers. Therefore, patch 1
adds some core functionality to allow intervals to be updated from within
the invalidation() callback so that MMU_NOTIFY_UNMAP events can update
the range being tracked.
[1] https://lore.kernel.org/linux-mm/20191104222141.5173-1-rcampbell@nvidia.com
Ralph Campbell (2):
mm/mmu_notifier: make interval notifier updates safe
mm/hmm/test: add self tests for HMM
MAINTAINERS | 3 +
include/linux/mmu_notifier.h | 15 +
lib/Kconfig.debug | 11 +
lib/Makefile | 1 +
lib/test_hmm.c | 1367 ++++++++++++++++++++++++
mm/mmu_notifier.c | 196 +++-
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 3 +
tools/testing/selftests/vm/config | 2 +
tools/testing/selftests/vm/hmm-tests.c | 1360 +++++++++++++++++++++++
tools/testing/selftests/vm/run_vmtests | 16 +
tools/testing/selftests/vm/test_hmm.sh | 97 ++
12 files changed, 3047 insertions(+), 25 deletions(-)
create mode 100644 lib/test_hmm.c
create mode 100644 tools/testing/selftests/vm/hmm-tests.c
create mode 100755 tools/testing/selftests/vm/test_hmm.sh
--
2.20.1
Memory protection keys enables an application to protect its address
space from inadvertent access by its own code.
This feature is now enabled on powerpc and has been available since
4.16-rc1. The patches move the selftests to arch neutral directory
and enhance their test coverage.
Testing
-------
Verified for correctness on powerpc. Need help with x86 testing as I
do not have access to a Skylake server. Client platforms like Coffee
Lake do not have the required feature bits set in CPUID.
Changelog
---------
Link to previous version (v14):
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=55981&state=*
v15:
(1) Rebased on top of latest master.
(2) Addressed review comments from Dave Hansen.
(3) Moved code for getting or setting pkey bits to new
helpers. These changes replace patch 7 of v14.
(4) Added a fix which ensures that the correct count of
reserved keys is used across different platforms.
(5) Added a fix which ensures that the correct page size
is used as powerpc supports both 4K and 64K pages.
v14:
(1) Incorporated another round of comments from Dave Hansen.
v13:
(1) Incorporated comments for Dave Hansen.
(2) Added one more test for correct pkey-0 behavior.
v12:
(1) Fixed the offset of pkey field in the siginfo structure for
x86_64 and powerpc. And tries to use the actual field
if the headers have it defined.
v11:
(1) Fixed a deadlock in the ptrace testcase.
v10 and prior:
(1) Moved the testcase to arch neutral directory.
(2) Split the changes into incremental patches.
Desnes A. Nunes do Rosario (1):
selftests/vm/pkeys: Fix number of reserved powerpc pkeys
Ram Pai (17):
selftests/x86/pkeys: Move selftests to arch-neutral directory
selftests/vm/pkeys: Rename all references to pkru to a generic name
selftests/vm/pkeys: Move generic definitions to header file
selftests/vm/pkeys: Typecast the pkey register
selftests/vm/pkeys: Fix pkey_disable_clear()
selftests/vm/pkeys: Fix assertion in pkey_disable_set/clear()
selftests/vm/pkeys: Fix alloc_random_pkey() to make it really random
selftests/vm/pkeys: Introduce generic pkey abstractions
selftests/vm/pkeys: Introduce powerpc support
selftests/vm/pkeys: Fix assertion in test_pkey_alloc_exhaust()
selftests/vm/pkeys: Improve checks to determine pkey support
selftests/vm/pkeys: Associate key on a mapped page and detect access
violation
selftests/vm/pkeys: Associate key on a mapped page and detect write
violation
selftests/vm/pkeys: Detect write violation on a mapped
access-denied-key page
selftests/vm/pkeys: Introduce a sub-page allocator
selftests/vm/pkeys: Test correct behaviour of pkey-0
selftests/vm/pkeys: Override access right definitions on powerpc
Sandipan Das (3):
selftests: vm: pkeys: Add helpers for pkey bits
selftests: vm: pkeys: Use the correct huge page size
selftests: vm: pkeys: Use the correct page size on powerpc
Thiago Jung Bauermann (2):
selftests/vm/pkeys: Move some definitions to arch-specific header
selftests/vm/pkeys: Make gcc check arguments of sigsafe_printf()
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 5 +
tools/testing/selftests/vm/pkey-helpers.h | 226 ++++++
tools/testing/selftests/vm/pkey-powerpc.h | 138 ++++
tools/testing/selftests/vm/pkey-x86.h | 183 +++++
.../selftests/{x86 => vm}/protection_keys.c | 688 ++++++++++--------
tools/testing/selftests/x86/.gitignore | 1 -
tools/testing/selftests/x86/pkey-helpers.h | 219 ------
8 files changed, 931 insertions(+), 530 deletions(-)
create mode 100644 tools/testing/selftests/vm/pkey-helpers.h
create mode 100644 tools/testing/selftests/vm/pkey-powerpc.h
create mode 100644 tools/testing/selftests/vm/pkey-x86.h
rename tools/testing/selftests/{x86 => vm}/protection_keys.c (74%)
delete mode 100644 tools/testing/selftests/x86/pkey-helpers.h
--
2.17.1
currently the property entry kunit tests are built if CONFIG_KUNIT=y.
This will cause warnings when merged with the kunit tree that now
supports tristate CONFIG_KUNIT. While the tests appear to compile
as a module, we get a warning about missing module license.
It's better to have a per-test suite CONFIG variable so that
we can do selective building of kunit-based suites, and can
also avoid merge issues like this.
Reported-by: Stephen Rothwell <sfr(a)canb.auug.org.au>
Fixes: c032ace71c29 ("software node: add basic tests for property entries")
Signed-off-by: Alan Maguire <alan.maguire(a)oracle.com>
---
drivers/base/test/Kconfig | 3 +++
drivers/base/test/Makefile | 2 +-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/base/test/Kconfig b/drivers/base/test/Kconfig
index 86e85da..d29ae95 100644
--- a/drivers/base/test/Kconfig
+++ b/drivers/base/test/Kconfig
@@ -8,3 +8,6 @@ config TEST_ASYNC_DRIVER_PROBE
The module name will be test_async_driver_probe.ko
If unsure say N.
+config KUNIT_DRIVER_PE_TEST
+ bool "KUnit Tests for property entry API"
+ depends on KUNIT
diff --git a/drivers/base/test/Makefile b/drivers/base/test/Makefile
index 2214310..3ca5636 100644
--- a/drivers/base/test/Makefile
+++ b/drivers/base/test/Makefile
@@ -1,4 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_TEST_ASYNC_DRIVER_PROBE) += test_async_driver_probe.o
-obj-$(CONFIG_KUNIT) += property-entry-test.o
+obj-$(CONFIG_KUNIT_DRIVER_PE_TEST) += property-entry-test.o
--
1.8.3.1
On 1/12/20 11:14 PM, Stephen Rothwell wrote:
> Hi all,
>
> Changes since 20200110:
>
> The kunit-next tree lost its failures.
>
on i386:
WARNING: modpost: missing MODULE_LICENSE() in drivers/base/test/property-entry-test.o
see include/linux/module.h for more information
--
~Randy
Reported-by: Randy Dunlap <rdunlap(a)infradead.org>
This patch series is a follow up to "lib/vdso, x86/vdso: Fix fallout
from generic VDSO conversion" [1].
The main purpose is to complete the 32bit vDSOs conversion to use the
legacy 32bit syscalls as a fallback. With the conversion of all the
architectures present in -next complete, this patch series removes as
well the conditional choice in between 32 and 64 bit for 32bit vDSOs.
This series has been rebased on linux-next/master.
[1] https://lkml.org/lkml/2019/7/28/86
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Paul Burton <paul.burton(a)mips.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Dmitry Safonov <0x7f454c46(a)gmail.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Vincenzo Frascino (8):
arm64: compat: vdso: Expose BUILD_VDSO32
lib: vdso: Build 32 bit specific functions in the right context
mips: compat: vdso: Use legacy syscalls as fallback
lib: vdso: Remove VDSO_HAS_32BIT_FALLBACK
lib: vdso: Remove checks on return value for 32 bit vDSO
arm64: compat: vdso: Remove unused VDSO_HAS_32BIT_FALLBACK
mips: vdso: Remove unused VDSO_HAS_32BIT_FALLBACK
x86: vdso: Remove unused VDSO_HAS_32BIT_FALLBACK
.../include/asm/vdso/compat_gettimeofday.h | 2 +-
arch/mips/include/asm/vdso/gettimeofday.h | 43 +++++++++++++++++++
arch/mips/vdso/config-n32-o32-env.c | 1 +
arch/x86/include/asm/vdso/gettimeofday.h | 2 -
lib/vdso/gettimeofday.c | 30 ++++++-------
5 files changed, 57 insertions(+), 21 deletions(-)
--
2.23.0
Two trivial cleanups after recent changes in selftests/livepatch. Based
on "next" branch of Shuah's kselftest tree.
Miroslav Benes (2):
selftests/livepatch: Replace set_dynamic_debug() with setup_config()
in README
selftests/livepatch: Remove unused local variable in
set_ftrace_enabled()
tools/testing/selftests/livepatch/README | 2 +-
tools/testing/selftests/livepatch/functions.sh | 1 -
2 files changed, 1 insertion(+), 2 deletions(-)
--
2.24.1
kunit tests that do not support module build should depend
on KUNIT=y rather than just KUNIT in Kconfig, otherwise
they will trigger compilation errors for "make allmodconfig"
builds.
Fixes: 9fe124bf1b77 ("kunit: allow kunit to be loaded as a module")
Signed-off-by: Alan Maguire <alan.maguire(a)oracle.com>
Reported-by: Stephen Rothwell <sfr(a)canb.auug.org.au>
---
drivers/base/Kconfig | 2 +-
security/apparmor/Kconfig | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index c3b3b5c..5f0bc74 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -150,7 +150,7 @@ config DEBUG_TEST_DRIVER_REMOVE
config PM_QOS_KUNIT_TEST
bool "KUnit Test for PM QoS features"
- depends on KUNIT
+ depends on KUNIT=y
config HMEM_REPORTING
bool
diff --git a/security/apparmor/Kconfig b/security/apparmor/Kconfig
index d547930..0fe3368 100644
--- a/security/apparmor/Kconfig
+++ b/security/apparmor/Kconfig
@@ -71,7 +71,7 @@ config SECURITY_APPARMOR_DEBUG_MESSAGES
config SECURITY_APPARMOR_KUNIT_TEST
bool "Build KUnit tests for policy_unpack.c"
- depends on KUNIT && SECURITY_APPARMOR
+ depends on KUNIT=y && SECURITY_APPARMOR
help
This builds the AppArmor KUnit tests.
--
1.8.3.1
Greetings,
Please read the attached investment proposal and reply for more details.
Are you interested in loan?
Sincerely: Peter Wong
----------------------------------------------------
This email was sent by the shareware version of Postman Professional.
This adds a basic framework for running all the "safe" LKDTM tests. This
will allow easy introspection into any selftest logs to examine the
results of most LKDTM tests.
Signed-off-by: Kees Cook <keescook(a)chromium.org>
---
v3: replace open-coded "4" with $KSELFTEST_SKIP_TEXT (shuah), add DOUBLE_FAULT
v2: https://lore.kernel.org/lkml/201912301453.19D686EE6@keescook
v1: https://lore.kernel.org/lkml/201905091013.E228F0F0BE@keescook
---
MAINTAINERS | 1 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/lkdtm/Makefile | 12 ++++
tools/testing/selftests/lkdtm/config | 1 +
tools/testing/selftests/lkdtm/run.sh | 92 +++++++++++++++++++++++++
tools/testing/selftests/lkdtm/tests.txt | 71 +++++++++++++++++++
6 files changed, 178 insertions(+)
create mode 100644 tools/testing/selftests/lkdtm/Makefile
create mode 100644 tools/testing/selftests/lkdtm/config
create mode 100755 tools/testing/selftests/lkdtm/run.sh
create mode 100644 tools/testing/selftests/lkdtm/tests.txt
diff --git a/MAINTAINERS b/MAINTAINERS
index cc0a4a8ae06a..eacc00c6cfd5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9578,6 +9578,7 @@ LINUX KERNEL DUMP TEST MODULE (LKDTM)
M: Kees Cook <keescook(a)chromium.org>
S: Maintained
F: drivers/misc/lkdtm/*
+F: tools/testing/selftests/lkdtm/*
LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM)
M: Alan Stern <stern(a)rowland.harvard.edu>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index b001c602414b..f0b02a12ba39 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -26,6 +26,7 @@ TARGETS += kexec
TARGETS += kvm
TARGETS += lib
TARGETS += livepatch
+TARGETS += lkdtm
TARGETS += membarrier
TARGETS += memfd
TARGETS += memory-hotplug
diff --git a/tools/testing/selftests/lkdtm/Makefile b/tools/testing/selftests/lkdtm/Makefile
new file mode 100644
index 000000000000..1bcc9ee990eb
--- /dev/null
+++ b/tools/testing/selftests/lkdtm/Makefile
@@ -0,0 +1,12 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for LKDTM regression tests
+
+include ../lib.mk
+
+# NOTE: $(OUTPUT) won't get default value if used before lib.mk
+TEST_FILES := tests.txt
+TEST_GEN_PROGS = $(patsubst %,$(OUTPUT)/%.sh,$(shell awk '{print $$1}' tests.txt | sed -e 's/\#//'))
+all: $(TEST_GEN_PROGS)
+
+$(OUTPUT)/%: run.sh tests.txt
+ install -m 0744 run.sh $@
diff --git a/tools/testing/selftests/lkdtm/config b/tools/testing/selftests/lkdtm/config
new file mode 100644
index 000000000000..d874990e442b
--- /dev/null
+++ b/tools/testing/selftests/lkdtm/config
@@ -0,0 +1 @@
+CONFIG_LKDTM=y
diff --git a/tools/testing/selftests/lkdtm/run.sh b/tools/testing/selftests/lkdtm/run.sh
new file mode 100755
index 000000000000..dadf819148a4
--- /dev/null
+++ b/tools/testing/selftests/lkdtm/run.sh
@@ -0,0 +1,92 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# This reads tests.txt for the list of LKDTM tests to invoke. Any marked
+# with a leading "#" are skipped. The rest of the line after the
+# test name is either the text to look for in dmesg for a "success",
+# or the rationale for why a test is marked to be skipped.
+#
+set -e
+TRIGGER=/sys/kernel/debug/provoke-crash/DIRECT
+KSELFTEST_SKIP_TEST=4
+
+# Verify we have LKDTM available in the kernel.
+if [ ! -r $TRIGGER ] ; then
+ /sbin/modprobe -q lkdtm || true
+ if [ ! -r $TRIGGER ] ; then
+ echo "Cannot find $TRIGGER (missing CONFIG_LKDTM?)"
+ else
+ echo "Cannot write $TRIGGER (need to run as root?)"
+ fi
+ # Skip this test
+ exit $KSELFTEST_SKIP_TEST
+fi
+
+# Figure out which test to run from our script name.
+test=$(basename $0 .sh)
+# Look up details about the test from master list of LKDTM tests.
+line=$(egrep '^#?'"$test"'\b' tests.txt)
+if [ -z "$line" ]; then
+ echo "Skipped: missing test '$test' in tests.txt"
+ exit $KSELFTEST_SKIP_TEST
+fi
+# Check that the test is known to LKDTM.
+if ! egrep -q '^'"$test"'$' "$TRIGGER" ; then
+ echo "Skipped: test '$test' missing in $TRIGGER!"
+ exit $KSELFTEST_SKIP_TEST
+fi
+
+# Extract notes/expected output from test list.
+test=$(echo "$line" | cut -d" " -f1)
+if echo "$line" | grep -q ' ' ; then
+ expect=$(echo "$line" | cut -d" " -f2-)
+else
+ expect=""
+fi
+
+# If the test is commented out, report a skip
+if echo "$test" | grep -q '^#' ; then
+ test=$(echo "$test" | cut -c2-)
+ if [ -z "$expect" ]; then
+ expect="crashes entire system"
+ fi
+ echo "Skipping $test: $expect"
+ exit $KSELFTEST_SKIP_TEST
+fi
+
+# If no expected output given, assume an Oops with back trace is success.
+if [ -z "$expect" ]; then
+ expect="call trace:"
+fi
+
+# Clear out dmesg for output reporting
+dmesg -c >/dev/null
+
+# Prepare log for report checking
+LOG=$(mktemp --tmpdir -t lkdtm-XXXXXX)
+cleanup() {
+ rm -f "$LOG"
+}
+trap cleanup EXIT
+
+# Most shells yell about signals and we're expecting the "cat" process
+# to usually be killed by the kernel. So we have to run it in a sub-shell
+# and silence errors.
+($SHELL -c 'cat <(echo '"$test"') >'"$TRIGGER" 2>/dev/null) || true
+
+# Record and dump the results
+dmesg -c >"$LOG"
+cat "$LOG"
+# Check for expected output
+if egrep -qi "$expect" "$LOG" ; then
+ echo "$test: saw '$expect': ok"
+ exit 0
+else
+ if egrep -qi XFAIL: "$LOG" ; then
+ echo "$test: saw 'XFAIL': [SKIP]"
+ exit $KSELFTEST_SKIP_TEST
+ else
+ echo "$test: missing '$expect': [FAIL]"
+ exit 1
+ fi
+fi
diff --git a/tools/testing/selftests/lkdtm/tests.txt b/tools/testing/selftests/lkdtm/tests.txt
new file mode 100644
index 000000000000..92ca32143ae5
--- /dev/null
+++ b/tools/testing/selftests/lkdtm/tests.txt
@@ -0,0 +1,71 @@
+#PANIC
+BUG kernel BUG at
+WARNING WARNING:
+WARNING_MESSAGE message trigger
+EXCEPTION
+#LOOP Hangs the system
+#EXHAUST_STACK Corrupts memory on failure
+#CORRUPT_STACK Crashes entire system on success
+#CORRUPT_STACK_STRONG Crashes entire system on success
+CORRUPT_LIST_ADD list_add corruption
+CORRUPT_LIST_DEL list_del corruption
+CORRUPT_USER_DS Invalid address limit on user-mode return
+STACK_GUARD_PAGE_LEADING
+STACK_GUARD_PAGE_TRAILING
+UNSET_SMEP CR4 bits went missing
+DOUBLE_FAULT
+UNALIGNED_LOAD_STORE_WRITE
+#OVERWRITE_ALLOCATION Corrupts memory on failure
+#WRITE_AFTER_FREE Corrupts memory on failure
+READ_AFTER_FREE
+#WRITE_BUDDY_AFTER_FREE Corrupts memory on failure
+READ_BUDDY_AFTER_FREE
+SLAB_FREE_DOUBLE
+SLAB_FREE_CROSS
+SLAB_FREE_PAGE
+#SOFTLOCKUP Hangs the system
+#HARDLOCKUP Hangs the system
+#SPINLOCKUP Hangs the system
+#HUNG_TASK Hangs the system
+EXEC_DATA
+EXEC_STACK
+EXEC_KMALLOC
+EXEC_VMALLOC
+EXEC_RODATA
+EXEC_USERSPACE
+EXEC_NULL
+ACCESS_USERSPACE
+ACCESS_NULL
+WRITE_RO
+WRITE_RO_AFTER_INIT
+WRITE_KERN
+REFCOUNT_INC_OVERFLOW
+REFCOUNT_ADD_OVERFLOW
+REFCOUNT_INC_NOT_ZERO_OVERFLOW
+REFCOUNT_ADD_NOT_ZERO_OVERFLOW
+REFCOUNT_DEC_ZERO
+REFCOUNT_DEC_NEGATIVE Negative detected: saturated
+REFCOUNT_DEC_AND_TEST_NEGATIVE Negative detected: saturated
+REFCOUNT_SUB_AND_TEST_NEGATIVE Negative detected: saturated
+REFCOUNT_INC_ZERO
+REFCOUNT_ADD_ZERO
+REFCOUNT_INC_SATURATED Saturation detected: still saturated
+REFCOUNT_DEC_SATURATED Saturation detected: still saturated
+REFCOUNT_ADD_SATURATED Saturation detected: still saturated
+REFCOUNT_INC_NOT_ZERO_SATURATED
+REFCOUNT_ADD_NOT_ZERO_SATURATED
+REFCOUNT_DEC_AND_TEST_SATURATED Saturation detected: still saturated
+REFCOUNT_SUB_AND_TEST_SATURATED Saturation detected: still saturated
+#REFCOUNT_TIMING timing only
+#ATOMIC_TIMING timing only
+USERCOPY_HEAP_SIZE_TO
+USERCOPY_HEAP_SIZE_FROM
+USERCOPY_HEAP_WHITELIST_TO
+USERCOPY_HEAP_WHITELIST_FROM
+USERCOPY_STACK_FRAME_TO
+USERCOPY_STACK_FRAME_FROM
+USERCOPY_STACK_BEYOND
+USERCOPY_KERNEL
+USERCOPY_KERNEL_DS
+STACKLEAK_ERASING OK: the rest of the thread stack is properly erased
+CFI_FORWARD_PROTO
--
2.20.1
--
Kees Cook
This adds a basic framework for running all the "safe" LKDTM tests. This
will allow easy introspection into any selftest logs to examine the
results of most LKDTM tests.
Signed-off-by: Kees Cook <keescook(a)chromium.org>
---
v2: refreshed for v5.5, added new tests since v1
v1: https://lore.kernel.org/lkml/201905091013.E228F0F0BE@keescook/
---
MAINTAINERS | 1 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/lkdtm/Makefile | 12 ++++
tools/testing/selftests/lkdtm/config | 1 +
tools/testing/selftests/lkdtm/run.sh | 91 +++++++++++++++++++++++++
tools/testing/selftests/lkdtm/tests.txt | 70 +++++++++++++++++++
6 files changed, 176 insertions(+)
create mode 100644 tools/testing/selftests/lkdtm/Makefile
create mode 100644 tools/testing/selftests/lkdtm/config
create mode 100755 tools/testing/selftests/lkdtm/run.sh
create mode 100644 tools/testing/selftests/lkdtm/tests.txt
diff --git a/MAINTAINERS b/MAINTAINERS
index cc0a4a8ae06a..eacc00c6cfd5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9578,6 +9578,7 @@ LINUX KERNEL DUMP TEST MODULE (LKDTM)
M: Kees Cook <keescook(a)chromium.org>
S: Maintained
F: drivers/misc/lkdtm/*
+F: tools/testing/selftests/lkdtm/*
LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM)
M: Alan Stern <stern(a)rowland.harvard.edu>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index b001c602414b..f0b02a12ba39 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -26,6 +26,7 @@ TARGETS += kexec
TARGETS += kvm
TARGETS += lib
TARGETS += livepatch
+TARGETS += lkdtm
TARGETS += membarrier
TARGETS += memfd
TARGETS += memory-hotplug
diff --git a/tools/testing/selftests/lkdtm/Makefile b/tools/testing/selftests/lkdtm/Makefile
new file mode 100644
index 000000000000..1bcc9ee990eb
--- /dev/null
+++ b/tools/testing/selftests/lkdtm/Makefile
@@ -0,0 +1,12 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for LKDTM regression tests
+
+include ../lib.mk
+
+# NOTE: $(OUTPUT) won't get default value if used before lib.mk
+TEST_FILES := tests.txt
+TEST_GEN_PROGS = $(patsubst %,$(OUTPUT)/%.sh,$(shell awk '{print $$1}' tests.txt | sed -e 's/\#//'))
+all: $(TEST_GEN_PROGS)
+
+$(OUTPUT)/%: run.sh tests.txt
+ install -m 0744 run.sh $@
diff --git a/tools/testing/selftests/lkdtm/config b/tools/testing/selftests/lkdtm/config
new file mode 100644
index 000000000000..d874990e442b
--- /dev/null
+++ b/tools/testing/selftests/lkdtm/config
@@ -0,0 +1 @@
+CONFIG_LKDTM=y
diff --git a/tools/testing/selftests/lkdtm/run.sh b/tools/testing/selftests/lkdtm/run.sh
new file mode 100755
index 000000000000..793ee0d5d5a3
--- /dev/null
+++ b/tools/testing/selftests/lkdtm/run.sh
@@ -0,0 +1,91 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# This reads tests.txt for the list of LKDTM tests to invoke. Any marked
+# with a leading "#" are skipped. The rest of the line after the
+# test name is either the text to look for in dmesg for a "success",
+# or the rationale for why a test is marked to be skipped.
+#
+set -e
+TRIGGER=/sys/kernel/debug/provoke-crash/DIRECT
+
+# Verify we have LKDTM available in the kernel.
+if [ ! -r $TRIGGER ] ; then
+ /sbin/modprobe -q lkdtm || true
+ if [ ! -r $TRIGGER ] ; then
+ echo "Cannot find $TRIGGER (missing CONFIG_LKDTM?)"
+ else
+ echo "Cannot write $TRIGGER (need to run as root?)"
+ fi
+ # Skip this test
+ exit 4
+fi
+
+# Figure out which test to run from our script name.
+test=$(basename $0 .sh)
+# Look up details about the test from master list of LKDTM tests.
+line=$(egrep '^#?'"$test"'\b' tests.txt)
+if [ -z "$line" ]; then
+ echo "Skipped: missing test '$test' in tests.txt"
+ exit 4
+fi
+# Check that the test is known to LKDTM.
+if ! egrep -q '^'"$test"'$' "$TRIGGER" ; then
+ echo "Skipped: test '$test' missing in $TRIGGER!"
+ exit 4
+fi
+
+# Extract notes/expected output from test list.
+test=$(echo "$line" | cut -d" " -f1)
+if echo "$line" | grep -q ' ' ; then
+ expect=$(echo "$line" | cut -d" " -f2-)
+else
+ expect=""
+fi
+
+# If the test is commented out, report a skip
+if echo "$test" | grep -q '^#' ; then
+ test=$(echo "$test" | cut -c2-)
+ if [ -z "$expect" ]; then
+ expect="crashes entire system"
+ fi
+ echo "Skipping $test: $expect"
+ exit 4
+fi
+
+# If no expected output given, assume an Oops with back trace is success.
+if [ -z "$expect" ]; then
+ expect="call trace:"
+fi
+
+# Clear out dmesg for output reporting
+dmesg -c >/dev/null
+
+# Prepare log for report checking
+LOG=$(mktemp --tmpdir -t lkdtm-XXXXXX)
+cleanup() {
+ rm -f "$LOG"
+}
+trap cleanup EXIT
+
+# Most shells yell about signals and we're expecting the "cat" process
+# to usually be killed by the kernel. So we have to run it in a sub-shell
+# and silence errors.
+($SHELL -c 'cat <(echo '"$test"') >'"$TRIGGER" 2>/dev/null) || true
+
+# Record and dump the results
+dmesg -c >"$LOG"
+cat "$LOG"
+# Check for expected output
+if egrep -qi "$expect" "$LOG" ; then
+ echo "$test: saw '$expect': ok"
+ exit 0
+else
+ if egrep -qi XFAIL: "$LOG" ; then
+ echo "$test: saw 'XFAIL': [SKIP]"
+ exit 4
+ else
+ echo "$test: missing '$expect': [FAIL]"
+ exit 1
+ fi
+fi
diff --git a/tools/testing/selftests/lkdtm/tests.txt b/tools/testing/selftests/lkdtm/tests.txt
new file mode 100644
index 000000000000..fc55f8ef8bee
--- /dev/null
+++ b/tools/testing/selftests/lkdtm/tests.txt
@@ -0,0 +1,70 @@
+#PANIC
+BUG kernel BUG at
+WARNING WARNING:
+WARNING_MESSAGE message trigger
+EXCEPTION
+#LOOP Hangs the system
+#EXHAUST_STACK Corrupts memory on failure
+#CORRUPT_STACK Crashes entire system on success
+#CORRUPT_STACK_STRONG Crashes entire system on success
+CORRUPT_LIST_ADD list_add corruption
+CORRUPT_LIST_DEL list_del corruption
+CORRUPT_USER_DS Invalid address limit on user-mode return
+STACK_GUARD_PAGE_LEADING
+STACK_GUARD_PAGE_TRAILING
+UNSET_SMEP CR4 bits went missing
+UNALIGNED_LOAD_STORE_WRITE
+#OVERWRITE_ALLOCATION Corrupts memory on failure
+#WRITE_AFTER_FREE Corrupts memory on failure
+READ_AFTER_FREE
+#WRITE_BUDDY_AFTER_FREE Corrupts memory on failure
+READ_BUDDY_AFTER_FREE
+SLAB_FREE_DOUBLE
+SLAB_FREE_CROSS
+SLAB_FREE_PAGE
+#SOFTLOCKUP Hangs the system
+#HARDLOCKUP Hangs the system
+#SPINLOCKUP Hangs the system
+#HUNG_TASK Hangs the system
+EXEC_DATA
+EXEC_STACK
+EXEC_KMALLOC
+EXEC_VMALLOC
+EXEC_RODATA
+EXEC_USERSPACE
+EXEC_NULL
+ACCESS_USERSPACE
+ACCESS_NULL
+WRITE_RO
+WRITE_RO_AFTER_INIT
+WRITE_KERN
+REFCOUNT_INC_OVERFLOW
+REFCOUNT_ADD_OVERFLOW
+REFCOUNT_INC_NOT_ZERO_OVERFLOW
+REFCOUNT_ADD_NOT_ZERO_OVERFLOW
+REFCOUNT_DEC_ZERO
+REFCOUNT_DEC_NEGATIVE Negative detected: saturated
+REFCOUNT_DEC_AND_TEST_NEGATIVE Negative detected: saturated
+REFCOUNT_SUB_AND_TEST_NEGATIVE Negative detected: saturated
+REFCOUNT_INC_ZERO
+REFCOUNT_ADD_ZERO
+REFCOUNT_INC_SATURATED Saturation detected: still saturated
+REFCOUNT_DEC_SATURATED Saturation detected: still saturated
+REFCOUNT_ADD_SATURATED Saturation detected: still saturated
+REFCOUNT_INC_NOT_ZERO_SATURATED
+REFCOUNT_ADD_NOT_ZERO_SATURATED
+REFCOUNT_DEC_AND_TEST_SATURATED Saturation detected: still saturated
+REFCOUNT_SUB_AND_TEST_SATURATED Saturation detected: still saturated
+#REFCOUNT_TIMING timing only
+#ATOMIC_TIMING timing only
+USERCOPY_HEAP_SIZE_TO
+USERCOPY_HEAP_SIZE_FROM
+USERCOPY_HEAP_WHITELIST_TO
+USERCOPY_HEAP_WHITELIST_FROM
+USERCOPY_STACK_FRAME_TO
+USERCOPY_STACK_FRAME_FROM
+USERCOPY_STACK_BEYOND
+USERCOPY_KERNEL
+USERCOPY_KERNEL_DS
+STACKLEAK_ERASING OK: the rest of the thread stack is properly erased
+CFI_FORWARD_PROTO
--
2.20.1
--
Kees Cook
Hello,
I have a function returning 'unsigned long', and would like to write a kunit
test for the function, as below.
unsigned long foo(void)
{
return 42;
}
static void foo_test(struct kunit *test)
{
KUNIT_EXPECT_EQ(test, 42, foo());
}
However, this kunit gives me below warning for the above code:
/.../linux/include/linux/kernel.h:842:29: warning: comparison of distinct pointer types lacks a cast
(!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
^
/.../linux/include/kunit/test.h:493:9: note: in expansion of macro ‘__typecheck’
((void)__typecheck(__left, __right)); \
^~~~~~~~~~~
/.../linux/include/kunit/test.h:517:2: note: in expansion of macro ‘KUNIT_BASE_BINARY_ASSERTION’
KUNIT_BASE_BINARY_ASSERTION(test, \
^~~~~~~~~~~~~~~~~~~~~~~~~~~
/.../linux/include/kunit/test.h:606:2: note: in expansion of macro ‘KUNIT_BASE_EQ_MSG_ASSERTION’
KUNIT_BASE_EQ_MSG_ASSERTION(test, \
^~~~~~~~~~~~~~~~~~~~~~~~~~~
/.../linux/include/kunit/test.h:616:2: note: in expansion of macro ‘KUNIT_BINARY_EQ_MSG_ASSERTION’
KUNIT_BINARY_EQ_MSG_ASSERTION(test, \
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/.../linux/include/kunit/test.h:979:2: note: in expansion of macro ‘KUNIT_BINARY_EQ_ASSERTION’
KUNIT_BINARY_EQ_ASSERTION(test, KUNIT_EXPECTATION, left, right)
^~~~~~~~~~~~~~~~~~~~~~~~~
/.../linux/mm/foo-test.h:565:2: note: in expansion of macro ‘KUNIT_EXPECT_EQ’
KUNIT_EXPECT_EQ(test, 42, foo());
^~~~~~~~~~~~~~~
I could remove the warning by explicitly type casting the constant as below:
KUNIT_EXPECT_EQ(test, (unsigned long)42, foo());
However, now 'checkpatch.pl' complains about the type casting as below.
WARNING: Unnecessary typecast of c90 int constant
#565: FILE: mm/foo-test.h:565:
+ KUNIT_EXPECT_EQ(test, (unsigned long)42, foo());
Of course, there could be several work-arounds for these warnings, such as
using 'EXPECT_TRUE(test, 42 == foo())' or casting the function's return value.
Nonetheless, I'm not sure what is the right way. Could you please let me know
what is the recommended way for this case?
Thanks,
SeongJae Park
When handling page faults for many vCPUs during demand paging, KVM's MMU
lock becomes highly contended. This series creates a test with a naive
userfaultfd based demand paging implementation to demonstrate that
contention. This test serves both as a functional test of userfaultfd
and a microbenchmark of demand paging performance with a variable number
of vCPUs and memory per vCPU.
The test creates N userfaultfd threads, N vCPUs, and a region of memory
with M pages per vCPU. The N userfaultfd polling threads are each set up
to serve faults on a region of memory corresponding to one of the vCPUs.
Each of the vCPUs is then started, and touches each page of its disjoint
memory region, sequentially. In response to faults, the userfaultfd
threads copy a static buffer into the guest's memory. This creates a
worst case for MMU lock contention as we have removed most of the
contention between the userfaultfd threads and there is no time required
to fetch the contents of guest memory.
This test was run successfully on Intel Haswell, Broadwell, and
Cascadelake hosts with a variety of vCPU counts and memory sizes.
This test was adapted from the dirty_log_test.
The series can also be viewed in Gerrit here:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/1464
(Thanks to Dmitry Vyukov <dvyukov(a)google.com> for setting up the Gerrit
instance)
Ben Gardon (9):
KVM: selftests: Create a demand paging test
KVM: selftests: Add demand paging content to the demand paging test
KVM: selftests: Add memory size parameter to the demand paging test
KVM: selftests: Pass args to vCPU instead of using globals
KVM: selftests: Support multiple vCPUs in demand paging test
KVM: selftests: Time guest demand paging
KVM: selftests: Add parameter to _vm_create for memslot 0 base paddr
KVM: selftests: Support large VMs in demand paging test
Add static flag
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 4 +-
.../selftests/kvm/demand_paging_test.c | 610 ++++++++++++++++++
tools/testing/selftests/kvm/dirty_log_test.c | 2 +-
.../testing/selftests/kvm/include/kvm_util.h | 3 +-
tools/testing/selftests/kvm/lib/kvm_util.c | 7 +-
6 files changed, 621 insertions(+), 6 deletions(-)
create mode 100644 tools/testing/selftests/kvm/demand_paging_test.c
--
2.23.0.444.g18eeb5a265-goog
The current kunit execution model is to provide base kunit functionality
and tests built-in to the kernel. The aim of this series is to allow
building kunit itself and tests as modules. This in turn allows a
simple form of selective execution; load the module you wish to test.
In doing so, kunit itself (if also built as a module) will be loaded as
an implicit dependency.
Because this requires a core API modification - if a module delivers
multiple suites, they must be declared with the kunit_test_suites()
macro - we're proposing this patch set as a candidate to be applied to the
test tree before too many kunit consumers appear. We attempt to deal
with existing consumers in patch 3.
Changes since v6:
- reintroduce kunit_test_suite() definition to handle users in other trees
not yet converted to using kunit_test_suites() (kbuild error when
applying patches to ext4/dev tree)
- modify drivers/base/power/qos-test.c to use kunit_test_suites()
to register suite. We do not convert it to support module build now as
the suite uses a few unexported function; see patch 3 for details.
Changes since v5:
- fixed fs/ext4/Makefile to remove unneeded conditional compilation
(Iurii, patch 3)
- added Reviewed-by, Acked-by to patches 3, 4, 5 and 6
Changes since v4:
- fixed signoff chain to use Co-developed-by: prior to Knut's signoff
(Stephen, all patches)
- added Reviewed-by, Tested-by for patches 1, 2, 4 and 6
- updated comment describing try-catch-impl.h (Stephen, patch 2)
- fixed MODULE_LICENSEs to be GPL v2 (Stephen, patches 3, 5)
- added __init to kunit_init() (Stephen, patch 5)
Changes since v3:
- removed symbol lookup patch for separate submission later
- removed use of sysctl_hung_task_timeout_seconds (patch 4, as discussed
with Brendan and Stephen)
- disabled build of string-stream-test when CONFIG_KUNIT_TEST=m; this
is to avoid having to deal with symbol lookup issues
- changed string-stream-impl.h back to string-stream.h (Brendan)
- added module build support to new list, ext4 tests
Changes since v2:
- moved string-stream.h header to lib/kunit/string-stream-impl.h (Brendan)
(patch 1)
- split out non-exported interfaces in try-catch-impl.h (Brendan)
(patch 2)
- added kunit_find_symbol() and KUNIT_INIT_SYMBOL to lookup non-exported
symbols (patches 3, 4)
- removed #ifdef MODULE around module licenses (Randy, Brendan, Andy)
(patch 4)
- replaced kunit_test_suite() with kunit_test_suites() rather than
supporting both (Brendan) (patch 4)
- lookup sysctl_hung_task_timeout_secs as kunit may be built as a module
and the symbol may not be available (patch 5)
Alan Maguire (6):
kunit: move string-stream.h to lib/kunit
kunit: hide unexported try-catch interface in try-catch-impl.h
kunit: allow kunit tests to be loaded as a module
kunit: remove timeout dependence on sysctl_hung_task_timeout_seconds
kunit: allow kunit to be loaded as a module
kunit: update documentation to describe module-based build
Documentation/dev-tools/kunit/faq.rst | 3 +-
Documentation/dev-tools/kunit/index.rst | 3 ++
Documentation/dev-tools/kunit/usage.rst | 16 ++++++++++
drivers/base/power/qos-test.c | 2 +-
fs/ext4/Kconfig | 2 +-
fs/ext4/Makefile | 3 +-
fs/ext4/inode-test.c | 4 ++-
include/kunit/assert.h | 3 +-
include/kunit/test.h | 37 ++++++++++++++++------
include/kunit/try-catch.h | 10 ------
kernel/sysctl-test.c | 4 ++-
lib/Kconfig.debug | 4 +--
lib/kunit/Kconfig | 6 ++--
lib/kunit/Makefile | 14 +++++---
lib/kunit/assert.c | 10 ++++++
lib/kunit/{example-test.c => kunit-example-test.c} | 4 ++-
lib/kunit/{test-test.c => kunit-test.c} | 7 ++--
lib/kunit/string-stream-test.c | 5 +--
lib/kunit/string-stream.c | 3 +-
{include => lib}/kunit/string-stream.h | 0
lib/kunit/test.c | 25 ++++++++++++++-
lib/kunit/try-catch-impl.h | 27 ++++++++++++++++
lib/kunit/try-catch.c | 37 +++++-----------------
lib/list-test.c | 4 ++-
24 files changed, 160 insertions(+), 73 deletions(-)
rename lib/kunit/{example-test.c => kunit-example-test.c} (97%)
rename lib/kunit/{test-test.c => kunit-test.c} (98%)
rename {include => lib}/kunit/string-stream.h (100%)
create mode 100644 lib/kunit/try-catch-impl.h
--
1.8.3.1
[Cc Kees in case he knows something about where arch specific tests live
or whether we have a framework for this]
On Mon, Jan 06, 2020 at 07:03:32PM +0100, Amanieu d'Antras wrote:
> On Mon, Jan 6, 2020 at 6:39 PM Will Deacon <will(a)kernel.org> wrote:
> > I also ran the native and compat selftests but, unfortunately, they all
> > pass even without this patch. Do you reckon it would be possible to update
> > them to check the tls pointer?
>
> Here's the program I used for testing on arm64. I considered adding it
> to the selftests but there is no portable way of reading the TLS
> register on all architectures.
I'm not saying you need to do this right now.
It feels like we must've run into the "this is architecture
specific"-and-we-want-to-test-this issue before... Do we have a place
where architecture specific selftests live?
>
> #include <sys/syscall.h>
> #include <unistd.h>
> #include <stdio.h>
> #include <stdint.h>
>
> #define __NR_clone3 435
> struct clone_args {
> uint64_t flags;
> uint64_t pidfd;
> uint64_t child_tid;
> uint64_t parent_tid;
> uint64_t exit_signal;
> uint64_t stack;
> uint64_t stack_size;
> uint64_t tls;
> };
>
> #define USE_CLONE3
>
> int main() {
> printf("Before fork: tp = %p\n", __builtin_thread_pointer());
> #ifdef USE_CLONE3
> struct clone_args args = {
> .flags = CLONE_SETTLS,
> .tls = (uint64_t)__builtin_thread_pointer(),
> };
> int ret = syscall(__NR_clone3, &args, sizeof(args));
> #else
> int ret = syscall(__NR_clone, CLONE_SETTLS, 0, 0,
> __builtin_thread_pointer(), 0);
> #endif
> printf("Fork returned %d, tp = %p\n", ret, __builtin_thread_pointer());
> }
Hello self test developers,
I feel like I reported this years ago but I forget what is going on
here?
The patch 65190f77424d: "selftests/tls: add a test for fragmented
messages" from Nov 27, 2019, leads to the following static checker
warning:
tools/testing/selftests/net/tls.c:292 tls_sendmsg_fragmented()
warn: curly braces intended?
tools/testing/selftests/net/tls.c
299 TEST_F(tls, sendmsg_large)
300 {
301 void *mem = malloc(16384);
302 size_t send_len = 16384;
303 size_t sends = 128;
304 struct msghdr msg;
305 size_t recvs = 0;
306 size_t sent = 0;
307
308 memset(&msg, 0, sizeof(struct msghdr));
309 while (sent++ < sends) {
310 struct iovec vec = { (void *)mem, send_len };
311
312 msg.msg_iov = &vec;
313 msg.msg_iovlen = 1;
314 EXPECT_EQ(sendmsg(self->cfd, &msg, 0), send_len);
315 }
316
317 while (recvs++ < sends)
318 EXPECT_NE(recv(self->fd, mem, send_len, 0), -1);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is a macro (below).
319
320 free(mem);
321 }
tools/testing/selftests/kselftest_harness.h
592 /* Support an optional handler after and ASSERT_* or EXPECT_*. The approach is
593 * not thread-safe, but it should be fine in most sane test scenarios.
594 *
595 * Using __bail(), which optionally abort()s, is the easiest way to early
596 * return while still providing an optional block to the API consumer.
597 */
598 #define OPTIONAL_HANDLER(_assert) \
599 for (; _metadata->trigger; _metadata->trigger = \
600 __bail(_assert, _metadata->no_print, _metadata->step))
601
602 #define __INC_STEP(_metadata) \
603 if (_metadata->passed && _metadata->step < 255) \
604 _metadata->step++;
605
606 #define __EXPECT(_expected, _expected_str, _seen, _seen_str, _t, _assert) do { \
607 /* Avoid multiple evaluation of the cases */ \
608 __typeof__(_expected) __exp = (_expected); \
609 __typeof__(_seen) __seen = (_seen); \
610 if (_assert) __INC_STEP(_metadata); \
611 if (!(__exp _t __seen)) { \
612 unsigned long long __exp_print = (uintptr_t)__exp; \
613 unsigned long long __seen_print = (uintptr_t)__seen; \
614 __TH_LOG("Expected %s (%llu) %s %s (%llu)", \
615 _expected_str, __exp_print, #_t, \
616 _seen_str, __seen_print); \
617 _metadata->passed = 0; \
618 /* Ensure the optional handler is triggered */ \
619 _metadata->trigger = 1; \
620 } \
621 } while (0); OPTIONAL_HANDLER(_assert)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The problem is the OPTIONAL_HANDLER(). Smatch thinks it should be
included inside the do {} while(0) loop.
regards,
dan carpenter
Hi,
This implements an API naming change (put_user_page*() -->
unpin_user_page*()), and also implements tracking of FOLL_PIN pages. It
extends that tracking to a few select subsystems. More subsystems will
be added in follow up work.
Christoph Hellwig, a point of interest:
a) I've moved the bulk of the code out of the inline functions, as
requested, for the devmap changes (patch 4: "mm: devmap: refactor
1-based refcounting for ZONE_DEVICE pages").
Changes since v10: Remaining fixes resulting from Jan Kara's reviews:
* Shifted to using the sign bit in page_dma_pinned() to allow accurate
results even in the overflow case. See the comments in that routine
for details. This allowed getting rid of the new
page_ref_zero_or_close_to_bias_overflow(), in favor of a simple
sign check via "page_ref_count() <= 0").
* Simplified some of the huge_memory.c changes, and simplified a gup.c
WARN invocation.
* Now using a standard -ENOMEM for most try_grab_page() failures.
* Got rid of tabs in the comment headers (I had thought they were
required there, but it's actually the reverse: they are not
allowed there).
* Rebased against 5.5-rc2 and retested.
* Added Jan Kara's reviewed-by tag for patch 23 (the main patch of the
series).
Changes since v9: Fixes resulting from Jan Kara's and Jonathan Corbet's
reviews:
* Removed reviewed-by tags from the "mm/gup: track FOLL_PIN pages" (those
were improperly inherited from the much smaller refactoring patch that
was merged into it).
* Made try_grab_compound_head() and try_grab_page() behavior similar in
their behavior with flags, in order to avoid "gotchas" later.
* follow_trans_huge_pmd(): moved the try_grab_page() to earlier in the
routine, in order to avoid having to undo mlock_vma_page().
* follow_hugetlb_page(): removed a refcount overflow check that is now
extraneous (and weaker than what try_grab_page() provides a few lines
further down).
* Fixed up two Documentation flaws, pointed out by Jonathan Corbet's
review.
Changes since v8:
* Merged the "mm/gup: pass flags arg to __gup_device_* functions" patch
into the "mm/gup: track FOLL_PIN pages" patch, as requested by
Christoph and Jan.
* Changed void grab_page() to bool try_grab_page(), and handled errors
at the call sites. (From Jan's review comments.) try_grab_page()
attempts to avoid page refcount overflows, even when counting up with
GUP_PIN_COUNTING_BIAS increments.
* Fixed a bug that I'd introduced, when changing a BUG() to a WARN().
* Added Jan's reviewed-by tag to the " mm/gup: allow FOLL_FORCE for
get_user_pages_fast()" patch.
* Documentation: pin_user_pages.rst: fixed an incorrect gup_benchmark
invocation, left over from the pin_longterm days, spotted while preparing
this version.
* Rebased onto today's linux.git (-rc1), and re-tested.
Changes since v7:
* Rebased onto Linux 5.5-rc1
* Reworked the grab_page() and try_grab_compound_head(), for API
consistency and less diffs (thanks to Jan Kara's reviews).
* Added Leon Romanovsky's reviewed-by tags for two of the IB-related
patches.
* patch 4 refactoring changes, as mentioned above.
There is a git repo and branch, for convenience:
git@github.com:johnhubbard/linux.git pin_user_pages_tracking_v8
For the remaining list of "changes since version N", those are all in
v7, which is here:
https://lore.kernel.org/r/20191121071354.456618-1-jhubbard@nvidia.com
============================================================
Overview:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], and in a remarkable number of email threads since about
2017. :)
A new internal gup flag, FOLL_PIN is introduced, and thoroughly
documented in the last patch's Documentation/vm/pin_user_pages.rst.
I believe that this will provide a good starting point for doing the
layout lease work that Ira Weiny has been working on. That's because
these new wrapper functions provide a clean, constrained, systematically
named set of functionality that, again, is required in order to even
know if a page is "dma-pinned".
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Testing:
* I've done some overall kernel testing (LTP, and a few other goodies),
and some directed testing to exercise some of the changes. And as you
can see, gup_benchmark is enhanced to exercise this. Basically, I've
been able to runtime test the core get_user_pages() and
pin_user_pages() and related routines, but not so much on several of
the call sites--but those are generally just a couple of lines
changed, each.
Not much of the kernel is actually using this, which on one hand
reduces risk quite a lot. But on the other hand, testing coverage
is low. So I'd love it if, in particular, the Infiniband and PowerPC
folks could do a smoke test of this series for me.
Runtime testing for the call sites so far is pretty light:
* io_uring: Some directed tests from liburing exercise this, and
they pass.
* process_vm_access.c: A small directed test passes.
* gup_benchmark: the enhanced version hits the new gup.c code, and
passes.
* infiniband: ran "ib_write_bw", which exercises the umem.c changes,
but not the other changes.
* VFIO: compiles (I'm vowing to set up a run time test soon, but it's
not ready just yet)
* powerpc: it compiles...
* drm/via: compiles...
* goldfish: compiles...
* net/xdp: compiles...
* media/v4l2: compiles...
[1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/
Dan Williams (1):
mm: Cleanup __put_devmap_managed_page() vs ->page_free()
John Hubbard (24):
mm/gup: factor out duplicate code from four routines
mm/gup: move try_get_compound_head() to top, fix minor issues
mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages
goldish_pipe: rename local pin_user_pages() routine
mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM
vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call
mm/gup: allow FOLL_FORCE for get_user_pages_fast()
IB/umem: use get_user_pages_fast() to pin DMA pages
mm/gup: introduce pin_user_pages*() and FOLL_PIN
goldish_pipe: convert to pin_user_pages() and put_user_page()
IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP
mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()
drm/via: set FOLL_PIN via pin_user_pages_fast()
fs/io_uring: set FOLL_PIN via pin_user_pages()
net/xdp: set FOLL_PIN via pin_user_pages()
media/v4l2-core: set pages dirty upon releasing DMA buffers
media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page()
conversion
vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion
powerpc: book3s64: convert to pin_user_pages() and put_user_page()
mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding
"1"
mm, tree-wide: rename put_user_page*() to unpin_user_page*()
mm/gup: track FOLL_PIN pages
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
Documentation/core-api/index.rst | 1 +
Documentation/core-api/pin_user_pages.rst | 232 ++++++++
arch/powerpc/mm/book3s64/iommu_api.c | 10 +-
drivers/gpu/drm/via/via_dmablit.c | 6 +-
drivers/infiniband/core/umem.c | 19 +-
drivers/infiniband/core/umem_odp.c | 13 +-
drivers/infiniband/hw/hfi1/user_pages.c | 4 +-
drivers/infiniband/hw/mthca/mthca_memfree.c | 8 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 4 +-
drivers/infiniband/hw/qib/qib_user_sdma.c | 8 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 4 +-
drivers/infiniband/sw/siw/siw_mem.c | 4 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 8 +-
drivers/nvdimm/pmem.c | 6 -
drivers/platform/goldfish/goldfish_pipe.c | 35 +-
drivers/vfio/vfio_iommu_type1.c | 35 +-
fs/io_uring.c | 6 +-
include/linux/mm.h | 155 ++++-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 10 +
mm/gup.c | 626 +++++++++++++++-----
mm/gup_benchmark.c | 74 ++-
mm/huge_memory.c | 29 +-
mm/hugetlb.c | 38 +-
mm/memremap.c | 76 ++-
mm/process_vm_access.c | 28 +-
mm/swap.c | 24 +
mm/vmstat.c | 2 +
net/xdp/xdp_umem.c | 4 +-
tools/testing/selftests/vm/gup_benchmark.c | 21 +-
tools/testing/selftests/vm/run_vmtests | 22 +
31 files changed, 1145 insertions(+), 369 deletions(-)
create mode 100644 Documentation/core-api/pin_user_pages.rst
--
2.24.1
The current kunit execution model is to provide base kunit functionality
and tests built-in to the kernel. The aim of this series is to allow
building kunit itself and tests as modules. This in turn allows a
simple form of selective execution; load the module you wish to test.
In doing so, kunit itself (if also built as a module) will be loaded as
an implicit dependency.
Because this requires a core API modification - if a module delivers
multiple suites, they must be declared with the kunit_test_suites()
macro - we're proposing this patch set as a candidate to be applied to the
test tree before too many kunit consumers appear. We attempt to deal
with existing consumers in patch 3.
Changes since v5:
- fixed fs/ext4/Makefile to remove unneeded conditional compilation
(Iurii, patch 3)
- added Reviewed-by, Acked-by to patches 3, 4, 5 and 6
Changes since v4:
- fixed signoff chain to use Co-developed-by: prior to Knut's signoff
(Stephen, all patches)
- added Reviewed-by, Tested-by for patches 1, 2, 4 and 6
- updated comment describing try-catch-impl.h (Stephen, patch 2)
- fixed MODULE_LICENSEs to be GPL v2 (Stephen, patches 3, 5)
- added __init to kunit_init() (Stephen, patch 5)
Changes since v3:
- removed symbol lookup patch for separate submission later
- removed use of sysctl_hung_task_timeout_seconds (patch 4, as discussed
with Brendan and Stephen)
- disabled build of string-stream-test when CONFIG_KUNIT_TEST=m; this
is to avoid having to deal with symbol lookup issues
- changed string-stream-impl.h back to string-stream.h (Brendan)
- added module build support to new list, ext4 tests
Changes since v2:
- moved string-stream.h header to lib/kunit/string-stream-impl.h (Brendan)
(patch 1)
- split out non-exported interfaces in try-catch-impl.h (Brendan)
(patch 2)
- added kunit_find_symbol() and KUNIT_INIT_SYMBOL to lookup non-exported
symbols (patches 3, 4)
- removed #ifdef MODULE around module licenses (Randy, Brendan, Andy)
(patch 4)
- replaced kunit_test_suite() with kunit_test_suites() rather than
supporting both (Brendan) (patch 4)
- lookup sysctl_hung_task_timeout_secs as kunit may be built as a module
and the symbol may not be available (patch 5)
Alan Maguire (6):
kunit: move string-stream.h to lib/kunit
kunit: hide unexported try-catch interface in try-catch-impl.h
kunit: allow kunit tests to be loaded as a module
kunit: remove timeout dependence on sysctl_hung_task_timeout_seconds
kunit: allow kunit to be loaded as a module
kunit: update documentation to describe module-based build
Documentation/dev-tools/kunit/faq.rst | 3 +-
Documentation/dev-tools/kunit/index.rst | 3 ++
Documentation/dev-tools/kunit/usage.rst | 16 ++++++++++
fs/ext4/Kconfig | 2 +-
fs/ext4/Makefile | 3 +-
fs/ext4/inode-test.c | 4 ++-
include/kunit/assert.h | 3 +-
include/kunit/test.h | 35 ++++++++++++++------
include/kunit/try-catch.h | 10 ------
kernel/sysctl-test.c | 4 ++-
lib/Kconfig.debug | 4 +--
lib/kunit/Kconfig | 6 ++--
lib/kunit/Makefile | 14 +++++---
lib/kunit/assert.c | 10 ++++++
lib/kunit/{example-test.c => kunit-example-test.c} | 4 ++-
lib/kunit/{test-test.c => kunit-test.c} | 7 ++--
lib/kunit/string-stream-test.c | 5 +--
lib/kunit/string-stream.c | 3 +-
{include => lib}/kunit/string-stream.h | 0
lib/kunit/test.c | 25 ++++++++++++++-
lib/kunit/try-catch-impl.h | 27 ++++++++++++++++
lib/kunit/try-catch.c | 37 +++++-----------------
lib/list-test.c | 4 ++-
23 files changed, 157 insertions(+), 72 deletions(-)
rename lib/kunit/{example-test.c => kunit-example-test.c} (97%)
rename lib/kunit/{test-test.c => kunit-test.c} (98%)
rename {include => lib}/kunit/string-stream.h (100%)
create mode 100644 lib/kunit/try-catch-impl.h
--
1.8.3.1
From: Shuah Khan <skhan(a)linuxfoundation.org>
[ Upstream commit c65e41538b04e0d64a673828745a00cb68a24371 ]
firmware attempts to load test modules that require root access
and fail. Fix it to check for root uid and exit with skip code
instead.
Before this fix:
selftests: firmware: fw_run_tests.sh
modprobe: ERROR: could not insert 'test_firmware': Operation not permitted
You must have the following enabled in your kernel:
CONFIG_TEST_FIRMWARE=y
CONFIG_FW_LOADER=y
CONFIG_FW_LOADER_USER_HELPER=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
not ok 1 selftests: firmware: fw_run_tests.sh # SKIP
With this fix:
selftests: firmware: fw_run_tests.sh
skip all tests: must be run as root
not ok 1 selftests: firmware: fw_run_tests.sh # SKIP
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Reviwed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/firmware/fw_lib.sh | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/firmware/fw_lib.sh b/tools/testing/selftests/firmware/fw_lib.sh
index 1cbb12e284a6..8a853ace55a2 100755
--- a/tools/testing/selftests/firmware/fw_lib.sh
+++ b/tools/testing/selftests/firmware/fw_lib.sh
@@ -28,6 +28,12 @@ test_modprobe()
check_mods()
{
+ local uid=$(id -u)
+ if [ $uid -ne 0 ]; then
+ echo "skip all tests: must be run as root" >&2
+ exit $ksft_skip
+ fi
+
trap "test_modprobe" EXIT
if [ ! -d $DIR ]; then
modprobe test_firmware
--
2.20.1
From: Shuah Khan <skhan(a)linuxfoundation.org>
[ Upstream commit c65e41538b04e0d64a673828745a00cb68a24371 ]
firmware attempts to load test modules that require root access
and fail. Fix it to check for root uid and exit with skip code
instead.
Before this fix:
selftests: firmware: fw_run_tests.sh
modprobe: ERROR: could not insert 'test_firmware': Operation not permitted
You must have the following enabled in your kernel:
CONFIG_TEST_FIRMWARE=y
CONFIG_FW_LOADER=y
CONFIG_FW_LOADER_USER_HELPER=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
not ok 1 selftests: firmware: fw_run_tests.sh # SKIP
With this fix:
selftests: firmware: fw_run_tests.sh
skip all tests: must be run as root
not ok 1 selftests: firmware: fw_run_tests.sh # SKIP
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Reviwed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/firmware/fw_lib.sh | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/firmware/fw_lib.sh b/tools/testing/selftests/firmware/fw_lib.sh
index b879305a766d..5b8c0fedee76 100755
--- a/tools/testing/selftests/firmware/fw_lib.sh
+++ b/tools/testing/selftests/firmware/fw_lib.sh
@@ -34,6 +34,12 @@ test_modprobe()
check_mods()
{
+ local uid=$(id -u)
+ if [ $uid -ne 0 ]; then
+ echo "skip all tests: must be run as root" >&2
+ exit $ksft_skip
+ fi
+
trap "test_modprobe" EXIT
if [ ! -d $DIR ]; then
modprobe test_firmware
--
2.20.1
Hi Linus,
Please pull the following Kselftest update for Linux 5.5-rc4.
This Kselftest update for Linux 5.5-rc4 consists of:
-- rseq build failures fixes related to glibc 2.30 compatibility
from Mathieu Desnoyers
-- Kunit fixes and cleanups from SeongJae Park
-- Fixes to filesystems/epoll, firmware, and livepatch build failures
and skip handling.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 46cf053efec6a3a5f343fead837777efe8252a46:
Linux 5.5-rc3 (2019-12-22 17:02:23 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-5.5-rc4
for you to fetch changes up to 2a1f40adfb54ca65dc4c93bad444dd23b800a76e:
rseq/selftests: Clarify rseq_prepare_unload() helper requirements
(2019-12-23 10:52:41 -0700)
----------------------------------------------------------------
linux-kselftest-5.5-rc4
This Kselftest update for Linux 5.5-rc4 consists of:
-- rseq build failures fixes related to glibc 2.30 compatibility
from Mathieu Desnoyers
-- Kunit fixes and cleanups from SeongJae Park
-- Fixes to filesystems/epoll, firmware, and livepatch build failures
and skip handling.
----------------------------------------------------------------
Mathieu Desnoyers (3):
rseq/selftests: Turn off timeout setting
rseq/selftests: Fix: Namespace gettid() for compatibility with
glibc 2.30
rseq/selftests: Clarify rseq_prepare_unload() helper requirements
SeongJae Park (6):
docs/kunit/start: Use in-tree 'kunit_defconfig'
kunit: Remove duplicated defconfig creation
kunit: Create default config in '--build_dir'
kunit: Place 'test.log' under the 'build_dir'
kunit: Rename 'kunitconfig' to '.kunitconfig'
kunit/kunit_tool_test: Test '--build_dir' option run
Shuah Khan (3):
selftests: filesystems/epoll: fix build error
selftests: firmware: Fix it to do root uid check and skip
selftests: livepatch: Fix it to do root uid check and skip
Documentation/dev-tools/kunit/start.rst | 13 +++++--------
tools/testing/kunit/kunit.py | 18 +++++++++++-------
tools/testing/kunit/kunit_kernel.py | 10 +++++-----
tools/testing/kunit/kunit_tool_test.py | 10 +++++++++-
tools/testing/selftests/filesystems/epoll/Makefile | 2 +-
tools/testing/selftests/firmware/fw_lib.sh | 6 ++++++
tools/testing/selftests/livepatch/functions.sh | 15 ++++++++++++++-
tools/testing/selftests/livepatch/test-state.sh | 3 +--
tools/testing/selftests/rseq/param_test.c | 18 ++++++++++--------
tools/testing/selftests/rseq/rseq.h | 12 +++++++-----
tools/testing/selftests/rseq/settings | 1 +
11 files changed, 70 insertions(+), 38 deletions(-)
create mode 100644 tools/testing/selftests/rseq/settings
----------------------------------------------------------------
From: Ido Schimmel <idosch(a)mellanox.com>
[ Upstream commit 65cb13986229cec02635a1ecbcd1e2dd18353201 ]
When creating the second host in h2_create(), two addresses are assigned
to the interface, but only one is deleted. When running the test twice
in a row the following error is observed:
$ ./router_bridge_vlan.sh
TEST: ping [ OK ]
TEST: ping6 [ OK ]
TEST: vlan [ OK ]
$ ./router_bridge_vlan.sh
RTNETLINK answers: File exists
TEST: ping [ OK ]
TEST: ping6 [ OK ]
TEST: vlan [ OK ]
Fix this by deleting the address during cleanup.
Fixes: 5b1e7f9ebd56 ("selftests: forwarding: Test routed bridge interface")
Signed-off-by: Ido Schimmel <idosch(a)mellanox.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/net/forwarding/router_bridge_vlan.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/forwarding/router_bridge_vlan.sh b/tools/testing/selftests/net/forwarding/router_bridge_vlan.sh
index fef88eb4b873..fa6a88c50750 100755
--- a/tools/testing/selftests/net/forwarding/router_bridge_vlan.sh
+++ b/tools/testing/selftests/net/forwarding/router_bridge_vlan.sh
@@ -36,7 +36,7 @@ h2_destroy()
{
ip -6 route del 2001:db8:1::/64 vrf v$h2
ip -4 route del 192.0.2.0/28 vrf v$h2
- simple_if_fini $h2 192.0.2.130/28
+ simple_if_fini $h2 192.0.2.130/28 2001:db8:2::2/64
}
router_create()
--
2.20.1
From: Masami Hiramatsu <mhiramat(a)kernel.org>
[ Upstream commit 5cc6c8d4a99d0ee4d5466498e258e593df1d3eb6 ]
Fix multiple kprobe event testcase to work it correctly.
There are 2 bugfixes.
- Since `wc -l FILE` returns not only line number but also
FILE filename, following "if" statement always failed.
Fix this bug by replacing it with 'cat FILE | wc -l'
- Since "while do-done loop" block with pipeline becomes a
subshell, $N local variable is not update outside of
the loop.
Fix this bug by using actual target number (256) instead
of $N.
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/ftrace/test.d/kprobe/multiple_kprobes.tc | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc b/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc
index ce361b9d62cf..da298f191086 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc
@@ -25,9 +25,9 @@ while read i; do
test $N -eq 256 && break
done
-L=`wc -l kprobe_events`
-if [ $L -ne $N ]; then
- echo "The number of kprobes events ($L) is not $N"
+L=`cat kprobe_events | wc -l`
+if [ $L -ne 256 ]; then
+ echo "The number of kprobes events ($L) is not 256"
exit_fail
fi
--
2.20.1
From: SeongJae Park <sjpark(a)amazon.de>
[ Upstream commit 4eac734486fd431e0756cc5e929f140911a36a53 ]
On an old perl such as v5.10.1, `kselftest/prefix.pl` gives below error
message:
Can't locate object method "autoflush" via package "IO::Handle" at kselftest/prefix.pl line 10.
This commit fixes the error by explicitly specifying the use of the
`IO::Handle` package.
Signed-off-by: SeongJae Park <sjpark(a)amazon.de>
Acked-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/kselftest/prefix.pl | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/kselftest/prefix.pl b/tools/testing/selftests/kselftest/prefix.pl
index ec7e48118183..31f7c2a0a8bd 100755
--- a/tools/testing/selftests/kselftest/prefix.pl
+++ b/tools/testing/selftests/kselftest/prefix.pl
@@ -3,6 +3,7 @@
# Prefix all lines with "# ", unbuffered. Command being piped in may need
# to have unbuffering forced with "stdbuf -i0 -o0 -e0 $cmd".
use strict;
+use IO::Handle;
binmode STDIN;
binmode STDOUT;
--
2.20.1
From: SeongJae Park <sjpark(a)amazon.de>
[ Upstream commit d187801d1a46519d2a322f879f7c8f85c685372e ]
If a timeout failure occurs, kselftest kills the test process and prints
the timeout log. If the test process has killed while printing a log
that ends with new line, the timeout log can be printed in middle of the
test process output so that it can be seems like a comment, as below:
# test_process_log not ok 3 selftests: timers: nsleep-lat # TIMEOUT
This commit avoids such problem by printing one more line before the
TIMEOUT failure log.
Signed-off-by: SeongJae Park <sjpark(a)amazon.de>
Acked-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/kselftest/runner.sh | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index 84de7bc74f2c..a8d20cbb711c 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -79,6 +79,7 @@ run_one()
if [ $rc -eq $skip_rc ]; then \
echo "not ok $test_num $TEST_HDR_MSG # SKIP"
elif [ $rc -eq $timeout_rc ]; then \
+ echo "#"
echo "not ok $test_num $TEST_HDR_MSG # TIMEOUT"
else
echo "not ok $test_num $TEST_HDR_MSG # exit=$rc"
--
2.20.1
From: Ido Schimmel <idosch(a)mellanox.com>
[ Upstream commit 65cb13986229cec02635a1ecbcd1e2dd18353201 ]
When creating the second host in h2_create(), two addresses are assigned
to the interface, but only one is deleted. When running the test twice
in a row the following error is observed:
$ ./router_bridge_vlan.sh
TEST: ping [ OK ]
TEST: ping6 [ OK ]
TEST: vlan [ OK ]
$ ./router_bridge_vlan.sh
RTNETLINK answers: File exists
TEST: ping [ OK ]
TEST: ping6 [ OK ]
TEST: vlan [ OK ]
Fix this by deleting the address during cleanup.
Fixes: 5b1e7f9ebd56 ("selftests: forwarding: Test routed bridge interface")
Signed-off-by: Ido Schimmel <idosch(a)mellanox.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/net/forwarding/router_bridge_vlan.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/forwarding/router_bridge_vlan.sh b/tools/testing/selftests/net/forwarding/router_bridge_vlan.sh
index fef88eb4b873..fa6a88c50750 100755
--- a/tools/testing/selftests/net/forwarding/router_bridge_vlan.sh
+++ b/tools/testing/selftests/net/forwarding/router_bridge_vlan.sh
@@ -36,7 +36,7 @@ h2_destroy()
{
ip -6 route del 2001:db8:1::/64 vrf v$h2
ip -4 route del 192.0.2.0/28 vrf v$h2
- simple_if_fini $h2 192.0.2.130/28
+ simple_if_fini $h2 192.0.2.130/28 2001:db8:2::2/64
}
router_create()
--
2.20.1
From: Masami Hiramatsu <mhiramat(a)kernel.org>
[ Upstream commit 295c4e21cf27ac9af542140e3e797df9e0cf7b5f ]
Check the return value of setuid() and setgid().
This fixes the following warnings and improves test result.
safesetid-test.c: In function ‘main’:
safesetid-test.c:294:2: warning: ignoring return value of ‘setuid’, declared with attribute warn_unused_result [-Wunused-result]
setuid(NO_POLICY_USER);
^~~~~~~~~~~~~~~~~~~~~~
safesetid-test.c:295:2: warning: ignoring return value of ‘setgid’, declared with attribute warn_unused_result [-Wunused-result]
setgid(NO_POLICY_USER);
^~~~~~~~~~~~~~~~~~~~~~
safesetid-test.c:309:2: warning: ignoring return value of ‘setuid’, declared with attribute warn_unused_result [-Wunused-result]
setuid(RESTRICTED_PARENT);
^~~~~~~~~~~~~~~~~~~~~~~~~
safesetid-test.c:310:2: warning: ignoring return value of ‘setgid’, declared with attribute warn_unused_result [-Wunused-result]
setgid(RESTRICTED_PARENT);
^~~~~~~~~~~~~~~~~~~~~~~~~
safesetid-test.c: In function ‘test_setuid’:
safesetid-test.c:216:3: warning: ignoring return value of ‘setuid’, declared with attribute warn_unused_result [-Wunused-result]
setuid(child_uid);
^~~~~~~~~~~~~~~~~
Fixes: c67e8ec03f3f ("LSM: SafeSetID: add selftest")
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../testing/selftests/safesetid/safesetid-test.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/safesetid/safesetid-test.c b/tools/testing/selftests/safesetid/safesetid-test.c
index 8f40c6ecdad1..0c4d50644c13 100644
--- a/tools/testing/selftests/safesetid/safesetid-test.c
+++ b/tools/testing/selftests/safesetid/safesetid-test.c
@@ -213,7 +213,8 @@ static void test_setuid(uid_t child_uid, bool expect_success)
}
if (cpid == 0) { /* Code executed by child */
- setuid(child_uid);
+ if (setuid(child_uid) < 0)
+ exit(EXIT_FAILURE);
if (getuid() == child_uid)
exit(EXIT_SUCCESS);
else
@@ -291,8 +292,10 @@ int main(int argc, char **argv)
// First test to make sure we can write userns mappings from a user
// that doesn't have any restrictions (as long as it has CAP_SETUID);
- setuid(NO_POLICY_USER);
- setgid(NO_POLICY_USER);
+ if (setuid(NO_POLICY_USER) < 0)
+ die("Error with set uid(%d)\n", NO_POLICY_USER);
+ if (setgid(NO_POLICY_USER) < 0)
+ die("Error with set gid(%d)\n", NO_POLICY_USER);
// Take away all but setid caps
drop_caps(true);
@@ -306,8 +309,10 @@ int main(int argc, char **argv)
die("test_userns failed when it should work\n");
}
- setuid(RESTRICTED_PARENT);
- setgid(RESTRICTED_PARENT);
+ if (setuid(RESTRICTED_PARENT) < 0)
+ die("Error with set uid(%d)\n", RESTRICTED_PARENT);
+ if (setgid(RESTRICTED_PARENT) < 0)
+ die("Error with set gid(%d)\n", RESTRICTED_PARENT);
test_setuid(ROOT_USER, false);
test_setuid(ALLOWED_CHILD1, true);
--
2.20.1
From: Masami Hiramatsu <mhiramat(a)kernel.org>
[ Upstream commit 5cc6c8d4a99d0ee4d5466498e258e593df1d3eb6 ]
Fix multiple kprobe event testcase to work it correctly.
There are 2 bugfixes.
- Since `wc -l FILE` returns not only line number but also
FILE filename, following "if" statement always failed.
Fix this bug by replacing it with 'cat FILE | wc -l'
- Since "while do-done loop" block with pipeline becomes a
subshell, $N local variable is not update outside of
the loop.
Fix this bug by using actual target number (256) instead
of $N.
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/ftrace/test.d/kprobe/multiple_kprobes.tc | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc b/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc
index 5862eee91e1d..6e3dbe5f96b7 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc
@@ -20,9 +20,9 @@ while read i; do
test $N -eq 256 && break
done
-L=`wc -l kprobe_events`
-if [ $L -ne $N ]; then
- echo "The number of kprobes events ($L) is not $N"
+L=`cat kprobe_events | wc -l`
+if [ $L -ne 256 ]; then
+ echo "The number of kprobes events ($L) is not 256"
exit_fail
fi
--
2.20.1
From: Masami Hiramatsu <mhiramat(a)kernel.org>
[ Upstream commit 25deae098e748d8d36bc35129a66734b8f6925c9 ]
Since dynamic function tracer can be disabled, set_ftrace_filter
can be disappeared. Test cases which depends on it, must check
whether the set_ftrace_filter exists or not before testing
and if not, return as unsupported.
Also, if the function tracer itself is disabled, we can not
set "function" to current_tracer. Test cases must check it
before testing, and return as unsupported.
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/ftrace/test.d/ftrace/func-filter-stacktrace.tc | 2 ++
tools/testing/selftests/ftrace/test.d/ftrace/func_cpumask.tc | 5 +++++
2 files changed, 7 insertions(+)
diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-stacktrace.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-stacktrace.tc
index 36fb59f886ea..1a52f2883fe0 100644
--- a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-stacktrace.tc
+++ b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-stacktrace.tc
@@ -3,6 +3,8 @@
# description: ftrace - stacktrace filter command
# flags: instance
+[ ! -f set_ftrace_filter ] && exit_unsupported
+
echo _do_fork:stacktrace >> set_ftrace_filter
grep -q "_do_fork:stacktrace:unlimited" set_ftrace_filter
diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func_cpumask.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func_cpumask.tc
index 86a1f07ef2ca..71fa3f49e35e 100644
--- a/tools/testing/selftests/ftrace/test.d/ftrace/func_cpumask.tc
+++ b/tools/testing/selftests/ftrace/test.d/ftrace/func_cpumask.tc
@@ -15,6 +15,11 @@ if [ $NP -eq 1 ] ;then
exit_unresolved
fi
+if ! grep -q "function" available_tracers ; then
+ echo "Function trace is not enabled"
+ exit_unsupported
+fi
+
ORIG_CPUMASK=`cat tracing_cpumask`
do_reset() {
--
2.20.1
Clean up a handful of interrelated warts in the kernel's handling of VMX:
- Enable VMX in IA32_FEATURE_CONTROL during boot instead of on-demand
during KVM load to avoid future contention over IA32_FEATURE_CONTROL.
- Rework VMX feature reporting so that it is accurate and up-to-date,
now and in the future.
- Consolidate code across CPUs that support VMX.
This series stems from two separate but related issues. The first issue,
pointed out by Boris in the SGX enabling series[1], is that the kernel
currently doesn't ensure the IA32_FEATURE_CONTROL MSR is configured during
boot. The second issue is that the kernel's reporting of VMX features is
stale, potentially inaccurate, and difficult to maintain.
v4:
- Rebase to tip/master, 8a1b070333f4 ("Merge branch 'WIP.x86/mm'")
- Rename everything feature control related to IA32_FEAT_CTL. [Boris]
- Minor coding style tweaks [Boris and Jarkko].
- Print VMX feature flags in "vmx flags" to avoid polluting "flags",
but keep printing the current synthetic VMX in "flags" so as not to
break the ABI. [Boris]
- Don't bother printing an error message in the extremely unlikely
event VMX is supported but IA32_FEAT_CTL doesn't exist. [Boris]
- Beef up a few changelogs and comments. [Boris]
- Add a comment in the LMCE code for the new WARN. [Jarkko]
- Check CONFIG_KVM_INTEL instead of CONFIG_KVM when deciding whether
or not to enable VMX.
- Add a patch to introduce X86_FEATURE_MSR_IA32_FEAT_CTL.
- Dropped Jim's Reviewed-by from a few KVM patches due to the above
addition.
v3:
- Rebase to tip/master, ceceaf1f12ba ("Merge branch 'WIP.x86/cleanups'").
- Rename the feature control MSR bit defines [Boris].
- Rewrite the error message displayed when reading feature control MSR
faults on a VMX capable CPU to explicitly state that it's likely a
hardware or hypervisor issue [Boris].
- Collect a Reviewed-by for the LMCE change [Boris].
- Enable VMX in feature control (if it's unlocked) if and only if
KVM is enabled [Paolo].
- Remove a big pile of redudant MSR defines from the KVM selftests that
was discovered when renaming the feature control defines.
- Fix a changelog typoe [Boris].
v2:
- Rebase to latest tip/x86/cpu (1edae1ae6258, "x86/Kconfig: Enforce...)
- Collect Jim's reviews.
- Fix a typo in setting of EPT capabilities [TonyWWang-oc].
- Remove defines for reserved VMX feature flags [Paolo].
- Print the VMX features under "flags" and maintain all existing names
to be backward compatible with the ABI [Paolo].
- Create aggregate APIC features to report FLEXPRIORITY and APICV, so
that the full feature *and* their associated individual features are
printed, e.g. to aid in recognizing why an APIC feature isn't being
used.
- Fix a few copy paste errors in changelogs.
v1 cover letter:
== IA32_FEATURE_CONTROL ==
Lack of IA32_FEATURE_CONTROL configuration during boot isn't a functional
issue in the current kernel as the majority of platforms set and lock
IA32_FEATURE_CONTROL in firmware. And when the MSR is left unlocked, KVM
is the only subsystem that writes IA32_FEATURE_CONTROL. That will change
if/when SGX support is enabled, as SGX will also want to fully enable
itself when IA32_FEATURE_CONTROL is unlocked.
== VMX Feature Reporting ==
VMX features are not enumerated via CPUID, but instead are enumerated
through VMX MSRs. As a result, new VMX features are not automatically
reported via /proc/cpuinfo.
An attempt was made long ago to report interesting and/or meaningful VMX
features by synthesizing select features into a Linux-defined cpufeatures
word. Synthetic feature flags worked for the initial purpose, but the
existence of the synthetic flags was forgotten almost immediately, e.g.
only one new flag (EPT A/D) has been added in the the decade since the
synthetic VMX features were introduced, while VMX and KVM have gained
support for many new features.
Placing the synthetic flags in x86_capability also allows them to be
queried via cpu_has() and company, which is misleading as the flags exist
purely for reporting via /proc/cpuinfo. KVM, the only in-kernel user of
VMX, ignores the flags.
Last but not least, VMX features are reported in /proc/cpuinfo even
when VMX is unusable due to lack of enabling in IA32_FEATURE_CONTROL.
== Caveats ==
All of the testing of non-standard flows was done in a VM, as I don't
have a system that leaves IA32_FEATURE_CONTROL unlocked, or locks it with
VMX disabled.
The Centaur and Zhaoxin changes are somewhat speculative, as I haven't
confirmed they actually support IA32_FEATURE_CONTROL, or that they want to
gain "official" KVM support. I assume they unofficially support KVM given
that both CPUs went through the effort of enumerating VMX features. That
in turn would require them to support IA32_FEATURE_CONTROL since KVM will
fault and refuse to load if the MSR doesn't exist.
[1] https://lkml.kernel.org/r/20190925085156.GA3891@zn.tnic
Sean Christopherson (19):
x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR
selftests: kvm: Replace manual MSR defs with common msr-index.h
tools arch x86: Sync msr-index.h from kernel sources
x86/intel: Initialize IA32_FEAT_CTL MSR at boot
x86/mce: WARN once if IA32_FEAT_CTL MSR is left unlocked
x86/centaur: Use common IA32_FEAT_CTL MSR initialization
x86/zhaoxin: Use common IA32_FEAT_CTL MSR initialization
x86/cpu: Clear VMX feature flag if VMX is not fully enabled
x86/vmx: Introduce VMX_FEATURES_*
x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUs
x86/cpu: Print VMX flags in /proc/cpuinfo using VMX_FEATURES_*
x86/cpu: Set synthetic VMX cpufeatures during init_ia32_feat_ctl()
x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is
configured
KVM: VMX: Drop initialization of IA32_FEAT_CTL MSR
KVM: VMX: Use VMX feature flag to query BIOS enabling
KVM: VMX: Check for full VMX support when verifying CPU compatibility
KVM: VMX: Use VMX_FEATURE_* flags to define VMCS control bits
perf/x86: Provide stubs of KVM helpers for non-Intel CPUs
KVM: VMX: Allow KVM_INTEL when building for Centaur and/or Zhaoxin
CPUs
MAINTAINERS | 2 +-
arch/x86/Kconfig.cpu | 8 +
arch/x86/boot/mkcpustr.c | 1 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 14 +-
arch/x86/include/asm/perf_event.h | 22 +-
arch/x86/include/asm/processor.h | 4 +
arch/x86/include/asm/vmx.h | 105 +--
arch/x86/include/asm/vmxfeatures.h | 86 +++
arch/x86/kernel/cpu/Makefile | 6 +-
arch/x86/kernel/cpu/centaur.c | 35 +-
arch/x86/kernel/cpu/common.c | 3 +
arch/x86/kernel/cpu/cpu.h | 4 +
arch/x86/kernel/cpu/feat_ctl.c | 140 ++++
arch/x86/kernel/cpu/intel.c | 49 +-
arch/x86/kernel/cpu/mce/intel.c | 15 +-
arch/x86/kernel/cpu/mkcapflags.sh | 15 +-
arch/x86/kernel/cpu/proc.c | 15 +
arch/x86/kernel/cpu/zhaoxin.c | 35 +-
arch/x86/kvm/Kconfig | 10 +-
arch/x86/kvm/vmx/nested.c | 4 +-
arch/x86/kvm/vmx/vmx.c | 67 +-
arch/x86/kvm/vmx/vmx.h | 2 +-
arch/x86/kvm/x86.c | 2 +-
tools/arch/x86/include/asm/msr-index.h | 30 +-
tools/power/x86/turbostat/turbostat.c | 4 +-
tools/testing/selftests/kvm/Makefile | 4 +-
.../selftests/kvm/include/x86_64/processor.h | 726 +-----------------
tools/testing/selftests/kvm/lib/x86_64/vmx.c | 8 +-
29 files changed, 431 insertions(+), 986 deletions(-)
create mode 100644 arch/x86/include/asm/vmxfeatures.h
create mode 100644 arch/x86/kernel/cpu/feat_ctl.c
--
2.24.0
Patch changelog:
v2:
* Add include <linux/types.h> to openat2.h. [Florian Weimer]
* Move OPEN_HOW_SIZE_* constants out of UAPI. [Florian Weimer]
* Switch from __aligned_u64 to __u64 since it isn't necessary.
[David Laight]
v1: <https://lore.kernel.org/lkml/20191219105533.12508-1-cyphar@cyphar.com/>
While openat2(2) is still not yet in Linus's tree, we can take this
opportunity to iron out some small warts that weren't noticed earlier:
* A fix was suggested by Florian Weimer, to separate the openat2
definitions so glibc can use the header directly. I've put the
maintainership under VFS but let me know if you'd prefer it belong
ot the fcntl folks.
* Having heterogenous field sizes in an extensible struct results in
"padding hole" problems when adding new fields (in addition the
correct error to use for non-zero padding isn't entirely clear ).
The simplest solution is to just copy clone(3)'s model -- always use
u64s. It will waste a little more space in the struct, but it
removes a possible future headache.
Aleksa Sarai (2):
openat2: drop open_how->__padding field
uapi: split openat2(2) definitions from fcntl.h
MAINTAINERS | 1 +
fs/open.c | 2 -
include/linux/fcntl.h | 4 ++
include/uapi/linux/fcntl.h | 37 +-----------------
include/uapi/linux/openat2.h | 39 +++++++++++++++++++
tools/testing/selftests/openat2/helpers.h | 7 ++--
.../testing/selftests/openat2/openat2_test.c | 24 ++++--------
7 files changed, 56 insertions(+), 58 deletions(-)
create mode 100644 include/uapi/linux/openat2.h
base-commit: 912dfe068c43fa13c587b8d30e73d335c5ba7d44
--
2.24.0
While openat2(2) is still not yet in Linus's tree, we can take this
opportunity to iron out some small warts that weren't noticed earlier:
* A fix was suggested by Florian Weimer, to separate the openat2
definitions so glibc can use the header directly. I've put the
maintainership under VFS but let me know if you'd prefer it belong
ot the fcntl folks.
* Having heterogenous field sizes in an extensible struct results in
"padding hole" problems when adding new fields (in addition the
correct error to use for non-zero padding isn't entirely clear ).
The simplest solution is to just copy clone(3)'s model -- always use
u64s. It will waste a little more space in the struct, but it
removes a possible future headache.
Aleksa Sarai (2):
uapi: split openat2(2) definitions from fcntl.h
openat2: drop open_how->__padding field
MAINTAINERS | 1 +
fs/open.c | 2 -
include/uapi/linux/fcntl.h | 37 +----------------
include/uapi/linux/openat2.h | 40 +++++++++++++++++++
tools/testing/selftests/openat2/helpers.h | 3 +-
.../testing/selftests/openat2/openat2_test.c | 24 ++++-------
6 files changed, 51 insertions(+), 56 deletions(-)
create mode 100644 include/uapi/linux/openat2.h
base-commit: 912dfe068c43fa13c587b8d30e73d335c5ba7d44
--
2.24.0
This patchset contains trivial fixes for the kunit documentations and
the wrapper python scripts.
Baseline
--------
This patchset is based on 'kselftest/fixes' branch of
linux-kselftest[1]. A complete tree is available at my repo:
https://github.com/sjp38/linux/tree/kunit_fix/20191205_v6
Version History
---------------
Changes from v5
(https://lore.kernel.org/linux-kselftest/20191205093440.21824-1-sjpark@amazo…):
- Rebased on kselftest/fixes
- Add 'Reviewed-by' and 'Tested-by' from Brendan Higgins
Changes from v4
(https://lore.kernel.org/linux-doc/1575490683-13015-1-git-send-email-sj38.pa…):
- Rebased on Heidi Fahim's patch[2]
- Fix failing kunit_tool_test test
- Add 'build_dir' option test in 'kunit_tool_test.py'
Changes from v3
(https://lore.kernel.org/linux-kselftest/20191204192141.GA247851@google.com):
- Fix the 4th patch, "kunit: Place 'test.log' under the 'build_dir'" to
set default value of 'build_dir' as '' instead of NULL so that kunit
can run even though '--build_dir' option is not given.
Changes from v2
(https://lore.kernel.org/linux-kselftest/1575361141-6806-1-git-send-email-sj…):
- Make 'build_dir' if not exists (missed from v3 by mistake)
Changes from v1
(https://lore.kernel.org/linux-doc/1575242724-4937-1-git-send-email-sj38.par…):
- Remove "docs/kunit/start: Skip wrapper run command" (A similar
approach is ongoing)
- Make 'build_dir' if not exists
SeongJae Park (6):
docs/kunit/start: Use in-tree 'kunit_defconfig'
kunit: Remove duplicated defconfig creation
kunit: Create default config in '--build_dir'
kunit: Place 'test.log' under the 'build_dir'
kunit: Rename 'kunitconfig' to '.kunitconfig'
kunit/kunit_tool_test: Test '--build_dir' option run
Documentation/dev-tools/kunit/start.rst | 13 +++++--------
tools/testing/kunit/kunit.py | 18 +++++++++++-------
tools/testing/kunit/kunit_kernel.py | 10 +++++-----
tools/testing/kunit/kunit_tool_test.py | 10 +++++++++-
4 files changed, 30 insertions(+), 21 deletions(-)
--
2.17.1
From: SeongJae Park <sjpark(a)amazon.de>
This patchset contains trivial fixes for the kunit documentations and
the wrapper python scripts.
Baseline
--------
This patchset is based on 'kselftest/fixes' branch of
linux-kselftest[1]. A complete tree is available at my repo:
https://github.com/sjp38/linux/tree/kunit_fix/20191205_v6
Version History
---------------
Changes from v6
(https://lore.kernel.org/linux-doc/20191212022711.10062-1-sjpark@amazon.de/):
- Rebased on latest kselftest/fixes
- Add 'From: SeongJae Park <sjpark(a)amazon.de>'
Changes from v5
(https://lore.kernel.org/linux-kselftest/20191205093440.21824-1-sjpark@amazo…):
- Rebased on kselftest/fixes
- Add 'Reviewed-by' and 'Tested-by' from Brendan Higgins
Changes from v4
(https://lore.kernel.org/linux-doc/1575490683-13015-1-git-send-email-sj38.pa…):
- Rebased on Heidi Fahim's patch[2]
- Fix failing kunit_tool_test test
- Add 'build_dir' option test in 'kunit_tool_test.py'
Changes from v3
(https://lore.kernel.org/linux-kselftest/20191204192141.GA247851@google.com):
- Fix the 4th patch, "kunit: Place 'test.log' under the 'build_dir'" to
set default value of 'build_dir' as '' instead of NULL so that kunit
can run even though '--build_dir' option is not given.
Changes from v2
(https://lore.kernel.org/linux-kselftest/1575361141-6806-1-git-send-email-sj…):
- Make 'build_dir' if not exists (missed from v3 by mistake)
Changes from v1
(https://lore.kernel.org/linux-doc/1575242724-4937-1-git-send-email-sj38.par…):
- Remove "docs/kunit/start: Skip wrapper run command" (A similar
approach is ongoing)
- Make 'build_dir' if not exists
SeongJae Park (6):
docs/kunit/start: Use in-tree 'kunit_defconfig'
kunit: Remove duplicated defconfig creation
kunit: Create default config in '--build_dir'
kunit: Place 'test.log' under the 'build_dir'
kunit: Rename 'kunitconfig' to '.kunitconfig'
kunit/kunit_tool_test: Test '--build_dir' option run
Documentation/dev-tools/kunit/start.rst | 13 +++++--------
tools/testing/kunit/kunit.py | 18 +++++++++++-------
tools/testing/kunit/kunit_kernel.py | 10 +++++-----
tools/testing/kunit/kunit_tool_test.py | 10 +++++++++-
4 files changed, 30 insertions(+), 21 deletions(-)
--
2.17.1
The current kunit execution model is to provide base kunit functionality
and tests built-in to the kernel. The aim of this series is to allow
building kunit itself and tests as modules. This in turn allows a
simple form of selective execution; load the module you wish to test.
In doing so, kunit itself (if also built as a module) will be loaded as
an implicit dependency.
Because this requires a core API modification - if a module delivers
multiple suites, they must be declared with the kunit_test_suites()
macro - we're proposing this patch set as a candidate to be applied to the
test tree before too many kunit consumers appear. We attempt to deal
with existing consumers in patch 3.
Changes since v6:
- reintroduce kunit_test_suite() definition to handle users in other trees
not yet converted to using kunit_test_suites() (kbuild error when
applying patches to ext4/dev tree)
- modify drivers/base/power/qos-test.c to use kunit_test_suites()
to register suite. We do not convert it to support module build now as
the suite uses a few unexported function; see patch 3 for details.
Changes since v5:
- fixed fs/ext4/Makefile to remove unneeded conditional compilation
(Iurii, patch 3)
- added Reviewed-by, Acked-by to patches 3, 4, 5 and 6
Changes since v4:
- fixed signoff chain to use Co-developed-by: prior to Knut's signoff
(Stephen, all patches)
- added Reviewed-by, Tested-by for patches 1, 2, 4 and 6
- updated comment describing try-catch-impl.h (Stephen, patch 2)
- fixed MODULE_LICENSEs to be GPL v2 (Stephen, patches 3, 5)
- added __init to kunit_init() (Stephen, patch 5)
Changes since v3:
- removed symbol lookup patch for separate submission later
- removed use of sysctl_hung_task_timeout_seconds (patch 4, as discussed
with Brendan and Stephen)
- disabled build of string-stream-test when CONFIG_KUNIT_TEST=m; this
is to avoid having to deal with symbol lookup issues
- changed string-stream-impl.h back to string-stream.h (Brendan)
- added module build support to new list, ext4 tests
Changes since v2:
- moved string-stream.h header to lib/kunit/string-stream-impl.h (Brendan)
(patch 1)
- split out non-exported interfaces in try-catch-impl.h (Brendan)
(patch 2)
- added kunit_find_symbol() and KUNIT_INIT_SYMBOL to lookup non-exported
symbols (patches 3, 4)
- removed #ifdef MODULE around module licenses (Randy, Brendan, Andy)
(patch 4)
- replaced kunit_test_suite() with kunit_test_suites() rather than
supporting both (Brendan) (patch 4)
- lookup sysctl_hung_task_timeout_secs as kunit may be built as a module
and the symbol may not be available (patch 5)
Alan Maguire (6):
kunit: move string-stream.h to lib/kunit
kunit: hide unexported try-catch interface in try-catch-impl.h
kunit: allow kunit tests to be loaded as a module
kunit: remove timeout dependence on sysctl_hung_task_timeout_seconds
kunit: allow kunit to be loaded as a module
kunit: update documentation to describe module-based build
Documentation/dev-tools/kunit/faq.rst | 3 +-
Documentation/dev-tools/kunit/index.rst | 3 ++
Documentation/dev-tools/kunit/usage.rst | 16 ++++++++++
drivers/base/power/qos-test.c | 2 +-
fs/ext4/Kconfig | 2 +-
fs/ext4/Makefile | 3 +-
fs/ext4/inode-test.c | 4 ++-
include/kunit/assert.h | 3 +-
include/kunit/test.h | 37 ++++++++++++++++------
include/kunit/try-catch.h | 10 ------
kernel/sysctl-test.c | 4 ++-
lib/Kconfig.debug | 4 +--
lib/kunit/Kconfig | 6 ++--
lib/kunit/Makefile | 14 +++++---
lib/kunit/assert.c | 10 ++++++
lib/kunit/{example-test.c => kunit-example-test.c} | 4 ++-
lib/kunit/{test-test.c => kunit-test.c} | 7 ++--
lib/kunit/string-stream-test.c | 5 +--
lib/kunit/string-stream.c | 3 +-
{include => lib}/kunit/string-stream.h | 0
lib/kunit/test.c | 25 ++++++++++++++-
lib/kunit/try-catch-impl.h | 27 ++++++++++++++++
lib/kunit/try-catch.c | 37 +++++-----------------
lib/list-test.c | 4 ++-
24 files changed, 160 insertions(+), 73 deletions(-)
rename lib/kunit/{example-test.c => kunit-example-test.c} (97%)
rename lib/kunit/{test-test.c => kunit-test.c} (98%)
rename {include => lib}/kunit/string-stream.h (100%)
create mode 100644 lib/kunit/try-catch-impl.h
--
1.8.3.1
Hi Morimoto-san, Karl,
On Wed, Dec 18, 2019 at 6:22 AM Kuninori Morimoto
<kuninori.morimoto.gx(a)renesas.com> wrote:
> From: Kuninori Morimoto <kuninori.morimoto.gx(a)renesas.com>
>
> Current SH will get below warning at strncpy()
>
> In file included from ${LINUX}/arch/sh/include/asm/string.h:3,
> from ${LINUX}/include/linux/string.h:20,
> from ${LINUX}/include/linux/bitmap.h:9,
> from ${LINUX}/include/linux/nodemask.h:95,
> from ${LINUX}/include/linux/mmzone.h:17,
> from ${LINUX}/include/linux/gfp.h:6,
> from ${LINUX}/innclude/linux/slab.h:15,
> from ${LINUX}/linux/drivers/mmc/host/vub300.c:38:
> ${LINUX}/drivers/mmc/host/vub300.c: In function 'new_system_port_status':
> ${LINUX}/arch/sh/include/asm/string_32.h:51:42: warning: array subscript\
> 80 is above array bounds of 'char[26]' [-Warray-bounds]
> : "0" (__dest), "1" (__src), "r" (__src+__n)
> ~~~~~^~~~
>
> In general, strncpy() should behave like below.
>
> char dest[10];
> char *src = "12345";
>
> strncpy(dest, src, 10);
> // dest = {'1', '2', '3', '4', '5',
> '\0','\0','\0','\0','\0'}
>
> But, current SH strnpy() has 2 issues.
> 1st is it will access to out-of-memory (= src + 10).
I believe this is not correct: the code does not really access memory
beyond the end of the source string. (Recent) gcc just thinks so,
because "__src+__n" is used as a parameter to the routine.
> 2nd is it needs big fixup for it, and maintenance __asm__
> code is difficult.
Yeah, the padding is missing.
> To solve these issues, this patch simply uses generic strncpy()
> instead of architecture specific one.
That will definitely fix the issue, as we assume the generic
implementation is correct ;-)
Now, I've just tried, naively, to enable CONFIG_STRING_SELFTEST=y in my
rts7751r2d build (without your patch), and boot it in qemu:
String selftests succeeded
Woops, turns out lib/test_string.c does not have any testcases for
strncpy()...
So adding test code for the corner cases may be a valuable contribution.
Thanks!
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert(a)linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
The design of the original open_how struct layout was such that it
ensured that there would be no un-labelled (and thus potentially
non-zero) padding to avoid issues with struct expansion, as well as
providing a uniform representation on all architectures (to avoid
complications with OPEN_HOW_SIZE versioning).
However, there were a few other desirable features which were not
fulfilled by the previous struct layout:
* Adding new features (other than new flags) should always result in
the struct getting larger. However, by including a padding field, it
was possible for new fields to be added without expanding the
structure. This would somewhat complicate version-number based
checking of feature support.
* A non-zero bit in __padding yielded -EINVAL when it should arguably
have been -E2BIG (because the padding bits are effectively
yet-to-be-used fields). However, the semantics are not entirely clear
because userspace may expect -E2BIG to only signify that the
structure is too big. It's much simpler to just provide the guarantee
that new fields will always result in a struct size increase, and
-E2BIG indicates you're using a field that's too recent for an older
kernel.
* While the alignment for u64s was manually backed by extra padding
fields, some languages (such as Rust) do not currently support
enforcing alignment of struct field members.
* The padding wasted space needlessly, and would very likely not be
used up entirely by future extensions for a long time (because it
couldn't fit a u64).
While none of these outstanding issues are deal-breakers, we can iron
out these warts before openat2(2) lands in Linus's tree. Instead of
using alignment and padding, we simply pack the structure with
__attribute__((packed)). Rust supports #[repr(packed)] and it removes
all of the issues with having explicit padding.
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
fs/open.c | 2 --
include/uapi/linux/fcntl.h | 11 +++++------
tools/testing/selftests/openat2/helpers.h | 11 +++++------
tools/testing/selftests/openat2/openat2_test.c | 18 +-----------------
4 files changed, 11 insertions(+), 31 deletions(-)
diff --git a/fs/open.c b/fs/open.c
index 50a46501bcc9..8cdb2b675867 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -993,8 +993,6 @@ static inline int build_open_flags(const struct open_how *how,
return -EINVAL;
if (how->resolve & ~VALID_RESOLVE_FLAGS)
return -EINVAL;
- if (memchr_inv(how->__padding, 0, sizeof(how->__padding)))
- return -EINVAL;
/* Deal with the mode. */
if (WILL_CREATE(flags)) {
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index d886bdb585e4..0e070c7f568a 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -109,17 +109,16 @@
* O_TMPFILE} are set.
*
* @flags: O_* flags.
- * @mode: O_CREAT/O_TMPFILE file mode.
* @resolve: RESOLVE_* flags.
+ * @mode: O_CREAT/O_TMPFILE file mode.
*/
struct open_how {
- __aligned_u64 flags;
+ __u64 flags;
+ __u64 resolve;
__u16 mode;
- __u16 __padding[3]; /* must be zeroed */
- __aligned_u64 resolve;
-};
+} __attribute__((packed));
-#define OPEN_HOW_SIZE_VER0 24 /* sizeof first published struct */
+#define OPEN_HOW_SIZE_VER0 18 /* sizeof first published struct */
#define OPEN_HOW_SIZE_LATEST OPEN_HOW_SIZE_VER0
/* how->resolve flags for openat2(2). */
diff --git a/tools/testing/selftests/openat2/helpers.h b/tools/testing/selftests/openat2/helpers.h
index 43ca5ceab6e3..eb1535c8fa2e 100644
--- a/tools/testing/selftests/openat2/helpers.h
+++ b/tools/testing/selftests/openat2/helpers.h
@@ -32,17 +32,16 @@
* O_TMPFILE} are set.
*
* @flags: O_* flags.
- * @mode: O_CREAT/O_TMPFILE file mode.
* @resolve: RESOLVE_* flags.
+ * @mode: O_CREAT/O_TMPFILE file mode.
*/
struct open_how {
- __aligned_u64 flags;
+ __u64 flags;
+ __u64 resolve;
__u16 mode;
- __u16 __padding[3]; /* must be zeroed */
- __aligned_u64 resolve;
-};
+} __attribute__((packed));
-#define OPEN_HOW_SIZE_VER0 24 /* sizeof first published struct */
+#define OPEN_HOW_SIZE_VER0 18 /* sizeof first published struct */
#define OPEN_HOW_SIZE_LATEST OPEN_HOW_SIZE_VER0
bool needs_openat2(const struct open_how *how);
diff --git a/tools/testing/selftests/openat2/openat2_test.c b/tools/testing/selftests/openat2/openat2_test.c
index 0b64fedc008b..cbf95d160b1b 100644
--- a/tools/testing/selftests/openat2/openat2_test.c
+++ b/tools/testing/selftests/openat2/openat2_test.c
@@ -40,7 +40,7 @@ struct struct_test {
int err;
};
-#define NUM_OPENAT2_STRUCT_TESTS 10
+#define NUM_OPENAT2_STRUCT_TESTS 7
#define NUM_OPENAT2_STRUCT_VARIATIONS 13
void test_openat2_struct(void)
@@ -57,22 +57,6 @@ void test_openat2_struct(void)
.arg.inner.flags = O_RDONLY,
.size = sizeof(struct open_how_ext) },
- /* Normal struct with broken padding. */
- { .name = "normal struct (non-zero padding[0])",
- .arg.inner.flags = O_RDONLY,
- .arg.inner.__padding = {0xa0, 0x00, 0x00},
- .size = sizeof(struct open_how_ext), .err = -EINVAL },
- { .name = "normal struct (non-zero padding[1])",
- .arg.inner.flags = O_RDONLY,
- .arg.inner.__padding = {0x00, 0x1a, 0x00},
- .size = sizeof(struct open_how_ext), .err = -EINVAL },
- { .name = "normal struct (non-zero padding[2])",
- .arg.inner.flags = O_RDONLY,
- .arg.inner.__padding = {0x00, 0x00, 0xef},
- .size = sizeof(struct open_how_ext), .err = -EINVAL },
-
- /* TODO: Once expanded, check zero-padding. */
-
/* Smaller than version-0 struct. */
{ .name = "zero-sized 'struct'",
.arg.inner.flags = O_RDONLY, .size = 0, .err = -EINVAL },
base-commit: 912dfe068c43fa13c587b8d30e73d335c5ba7d44
--
2.24.0
livepatch test configures the system and debug environment to run
tests. Some of these actions fail without root access and test
dumps several permission denied messages before it exits.
Fix test-state.sh to call setup_config instead of set_dynamic_debug
as suggested by Petr Mladek <pmladek(a)suse.com>
Fix it to check root uid and exit with skip code instead.
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
tools/testing/selftests/livepatch/functions.sh | 15 ++++++++++++++-
tools/testing/selftests/livepatch/test-state.sh | 3 +--
2 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/livepatch/functions.sh b/tools/testing/selftests/livepatch/functions.sh
index 31eb09e38729..a6e3d5517a6f 100644
--- a/tools/testing/selftests/livepatch/functions.sh
+++ b/tools/testing/selftests/livepatch/functions.sh
@@ -7,6 +7,9 @@
MAX_RETRIES=600
RETRY_INTERVAL=".1" # seconds
+# Kselftest framework requirement - SKIP code is 4
+ksft_skip=4
+
# log(msg) - write message to kernel log
# msg - insightful words
function log() {
@@ -18,7 +21,16 @@ function log() {
function skip() {
log "SKIP: $1"
echo "SKIP: $1" >&2
- exit 4
+ exit $ksft_skip
+}
+
+# root test
+function is_root() {
+ uid=$(id -u)
+ if [ $uid -ne 0 ]; then
+ echo "skip all tests: must be run as root" >&2
+ exit $ksft_skip
+ fi
}
# die(msg) - game over, man
@@ -62,6 +74,7 @@ function set_ftrace_enabled() {
# for verbose livepatching output and turn on
# the ftrace_enabled sysctl.
function setup_config() {
+ is_root
push_config
set_dynamic_debug
set_ftrace_enabled 1
diff --git a/tools/testing/selftests/livepatch/test-state.sh b/tools/testing/selftests/livepatch/test-state.sh
index dc2908c22c26..a08212708115 100755
--- a/tools/testing/selftests/livepatch/test-state.sh
+++ b/tools/testing/selftests/livepatch/test-state.sh
@@ -8,8 +8,7 @@ MOD_LIVEPATCH=test_klp_state
MOD_LIVEPATCH2=test_klp_state2
MOD_LIVEPATCH3=test_klp_state3
-set_dynamic_debug
-
+setup_config
# TEST: Loading and removing a module that modifies the system state
--
2.20.1
Hi,
This implements an API naming change (put_user_page*() -->
unpin_user_page*()), and also implements tracking of FOLL_PIN pages. It
extends that tracking to a few select subsystems. More subsystems will
be added in follow up work.
Christoph Hellwig, a point of interest:
a) I've moved the bulk of the code out of the inline functions, as
requested, for the devmap changes (patch 4: "mm: devmap: refactor
1-based refcounting for ZONE_DEVICE pages").
Changes since v8:
* Merged the "mm/gup: pass flags arg to __gup_device_* functions" patch
into the "mm/gup: track FOLL_PIN pages" patch, as requested by
Christoph and Jan.
* Changed void grab_page() to bool try_grab_page(), and handled errors
at the call sites. (From Jan's review comments.) try_grab_page()
attempts to avoid page refcount overflows, even when counting up with
GUP_PIN_COUNTING_BIAS increments.
* Fixed a bug that I'd introduced, when changing a BUG() to a WARN().
* Added Jan's reviewed-by tag to the " mm/gup: allow FOLL_FORCE for
get_user_pages_fast()" patch.
* Documentation: pin_user_pages.rst: fixed an incorrect gup_benchmark
invocation, left over from the pin_longterm days, spotted while preparing
this version.
* Rebased onto today's linux.git (-rc1), and re-tested.
Changes since v7:
* Rebased onto Linux 5.5-rc1
* Reworked the grab_page() and try_grab_compound_head(), for API
consistency and less diffs (thanks to Jan Kara's reviews).
* Added Leon Romanovsky's reviewed-by tags for two of the IB-related
patches.
* patch 4 refactoring changes, as mentioned above.
There is a git repo and branch, for convenience:
git@github.com:johnhubbard/linux.git pin_user_pages_tracking_v8
For the remaining list of "changes since version N", those are all in
v7, which is here:
https://lore.kernel.org/r/20191121071354.456618-1-jhubbard@nvidia.com
============================================================
Overview:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], and in a remarkable number of email threads since about
2017. :)
A new internal gup flag, FOLL_PIN is introduced, and thoroughly
documented in the last patch's Documentation/vm/pin_user_pages.rst.
I believe that this will provide a good starting point for doing the
layout lease work that Ira Weiny has been working on. That's because
these new wrapper functions provide a clean, constrained, systematically
named set of functionality that, again, is required in order to even
know if a page is "dma-pinned".
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Testing:
* I've done some overall kernel testing (LTP, and a few other goodies),
and some directed testing to exercise some of the changes. And as you
can see, gup_benchmark is enhanced to exercise this. Basically, I've
been able to runtime test the core get_user_pages() and
pin_user_pages() and related routines, but not so much on several of
the call sites--but those are generally just a couple of lines
changed, each.
Not much of the kernel is actually using this, which on one hand
reduces risk quite a lot. But on the other hand, testing coverage
is low. So I'd love it if, in particular, the Infiniband and PowerPC
folks could do a smoke test of this series for me.
Runtime testing for the call sites so far is pretty light:
* io_uring: Some directed tests from liburing exercise this, and
they pass.
* process_vm_access.c: A small directed test passes.
* gup_benchmark: the enhanced version hits the new gup.c code, and
passes.
* infiniband: ran "ib_write_bw", which exercises the umem.c changes,
but not the other changes.
* VFIO: compiles (I'm vowing to set up a run time test soon, but it's
not ready just yet)
* powerpc: it compiles...
* drm/via: compiles...
* goldfish: compiles...
* net/xdp: compiles...
* media/v4l2: compiles...
[1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/
Dan Williams (1):
mm: Cleanup __put_devmap_managed_page() vs ->page_free()
John Hubbard (24):
mm/gup: factor out duplicate code from four routines
mm/gup: move try_get_compound_head() to top, fix minor issues
mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages
goldish_pipe: rename local pin_user_pages() routine
mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM
vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call
mm/gup: allow FOLL_FORCE for get_user_pages_fast()
IB/umem: use get_user_pages_fast() to pin DMA pages
mm/gup: introduce pin_user_pages*() and FOLL_PIN
goldish_pipe: convert to pin_user_pages() and put_user_page()
IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP
mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()
drm/via: set FOLL_PIN via pin_user_pages_fast()
fs/io_uring: set FOLL_PIN via pin_user_pages()
net/xdp: set FOLL_PIN via pin_user_pages()
media/v4l2-core: set pages dirty upon releasing DMA buffers
media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page()
conversion
vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion
powerpc: book3s64: convert to pin_user_pages() and put_user_page()
mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding
"1"
mm, tree-wide: rename put_user_page*() to unpin_user_page*()
mm/gup: track FOLL_PIN pages
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
Documentation/core-api/index.rst | 1 +
Documentation/core-api/pin_user_pages.rst | 232 ++++++++
arch/powerpc/mm/book3s64/iommu_api.c | 10 +-
drivers/gpu/drm/via/via_dmablit.c | 6 +-
drivers/infiniband/core/umem.c | 19 +-
drivers/infiniband/core/umem_odp.c | 13 +-
drivers/infiniband/hw/hfi1/user_pages.c | 4 +-
drivers/infiniband/hw/mthca/mthca_memfree.c | 8 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 4 +-
drivers/infiniband/hw/qib/qib_user_sdma.c | 8 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 4 +-
drivers/infiniband/sw/siw/siw_mem.c | 4 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 8 +-
drivers/nvdimm/pmem.c | 6 -
drivers/platform/goldfish/goldfish_pipe.c | 35 +-
drivers/vfio/vfio_iommu_type1.c | 35 +-
fs/io_uring.c | 6 +-
include/linux/mm.h | 149 ++++-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 10 +
mm/gup.c | 598 +++++++++++++++-----
mm/gup_benchmark.c | 74 ++-
mm/huge_memory.c | 26 +-
mm/hugetlb.c | 25 +-
mm/memremap.c | 76 ++-
mm/process_vm_access.c | 28 +-
mm/swap.c | 24 +
mm/vmstat.c | 2 +
net/xdp/xdp_umem.c | 4 +-
tools/testing/selftests/vm/gup_benchmark.c | 21 +-
tools/testing/selftests/vm/run_vmtests | 22 +
31 files changed, 1109 insertions(+), 355 deletions(-)
create mode 100644 Documentation/core-api/pin_user_pages.rst
--
2.24.0
Hi,
This implements an API naming change (put_user_page*() -->
unpin_user_page*()), and also implements tracking of FOLL_PIN pages. It
extends that tracking to a few select subsystems. More subsystems will
be added in follow up work.
Christoph Hellwig, a point of interest:
a) I've moved the bulk of the code out of the inline functions, as
requested, for the devmap changes (patch 4: "mm: devmap: refactor
1-based refcounting for ZONE_DEVICE pages").
Changes since v9: Fixes resulting from Jan Kara's and Jonathan Corbet's
reviews:
* Removed reviewed-by tags from the "mm/gup: track FOLL_PIN pages" (those
were improperly inherited from the much smaller refactoring patch that
was merged into it).
* Made try_grab_compound_head() and try_grab_page() behavior similar in
their behavior with flags, in order to avoid "gotchas" later.
* follow_trans_huge_pmd(): moved the try_grab_page() to earlier in the
routine, in order to avoid having to undo mlock_vma_page().
* follow_hugetlb_page(): removed a refcount overflow check that is now
extraneous (and weaker than what try_grab_page() provides a few lines
further down).
* Fixed up two Documentation flaws, pointed out by Jonathan Corbet's
review.
Changes since v8:
* Merged the "mm/gup: pass flags arg to __gup_device_* functions" patch
into the "mm/gup: track FOLL_PIN pages" patch, as requested by
Christoph and Jan.
* Changed void grab_page() to bool try_grab_page(), and handled errors
at the call sites. (From Jan's review comments.) try_grab_page()
attempts to avoid page refcount overflows, even when counting up with
GUP_PIN_COUNTING_BIAS increments.
* Fixed a bug that I'd introduced, when changing a BUG() to a WARN().
* Added Jan's reviewed-by tag to the " mm/gup: allow FOLL_FORCE for
get_user_pages_fast()" patch.
* Documentation: pin_user_pages.rst: fixed an incorrect gup_benchmark
invocation, left over from the pin_longterm days, spotted while preparing
this version.
* Rebased onto today's linux.git (-rc1), and re-tested.
Changes since v7:
* Rebased onto Linux 5.5-rc1
* Reworked the grab_page() and try_grab_compound_head(), for API
consistency and less diffs (thanks to Jan Kara's reviews).
* Added Leon Romanovsky's reviewed-by tags for two of the IB-related
patches.
* patch 4 refactoring changes, as mentioned above.
There is a git repo and branch, for convenience:
git@github.com:johnhubbard/linux.git pin_user_pages_tracking_v8
For the remaining list of "changes since version N", those are all in
v7, which is here:
https://lore.kernel.org/r/20191121071354.456618-1-jhubbard@nvidia.com
============================================================
Overview:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], and in a remarkable number of email threads since about
2017. :)
A new internal gup flag, FOLL_PIN is introduced, and thoroughly
documented in the last patch's Documentation/vm/pin_user_pages.rst.
I believe that this will provide a good starting point for doing the
layout lease work that Ira Weiny has been working on. That's because
these new wrapper functions provide a clean, constrained, systematically
named set of functionality that, again, is required in order to even
know if a page is "dma-pinned".
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Testing:
* I've done some overall kernel testing (LTP, and a few other goodies),
and some directed testing to exercise some of the changes. And as you
can see, gup_benchmark is enhanced to exercise this. Basically, I've
been able to runtime test the core get_user_pages() and
pin_user_pages() and related routines, but not so much on several of
the call sites--but those are generally just a couple of lines
changed, each.
Not much of the kernel is actually using this, which on one hand
reduces risk quite a lot. But on the other hand, testing coverage
is low. So I'd love it if, in particular, the Infiniband and PowerPC
folks could do a smoke test of this series for me.
Runtime testing for the call sites so far is pretty light:
* io_uring: Some directed tests from liburing exercise this, and
they pass.
* process_vm_access.c: A small directed test passes.
* gup_benchmark: the enhanced version hits the new gup.c code, and
passes.
* infiniband: ran "ib_write_bw", which exercises the umem.c changes,
but not the other changes.
* VFIO: compiles (I'm vowing to set up a run time test soon, but it's
not ready just yet)
* powerpc: it compiles...
* drm/via: compiles...
* goldfish: compiles...
* net/xdp: compiles...
* media/v4l2: compiles...
[1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/
Dan Williams (1):
mm: Cleanup __put_devmap_managed_page() vs ->page_free()
John Hubbard (24):
mm/gup: factor out duplicate code from four routines
mm/gup: move try_get_compound_head() to top, fix minor issues
mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages
goldish_pipe: rename local pin_user_pages() routine
mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM
vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call
mm/gup: allow FOLL_FORCE for get_user_pages_fast()
IB/umem: use get_user_pages_fast() to pin DMA pages
mm/gup: introduce pin_user_pages*() and FOLL_PIN
goldish_pipe: convert to pin_user_pages() and put_user_page()
IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP
mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()
drm/via: set FOLL_PIN via pin_user_pages_fast()
fs/io_uring: set FOLL_PIN via pin_user_pages()
net/xdp: set FOLL_PIN via pin_user_pages()
media/v4l2-core: set pages dirty upon releasing DMA buffers
media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page()
conversion
vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion
powerpc: book3s64: convert to pin_user_pages() and put_user_page()
mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding
"1"
mm, tree-wide: rename put_user_page*() to unpin_user_page*()
mm/gup: track FOLL_PIN pages
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
Documentation/core-api/index.rst | 1 +
Documentation/core-api/pin_user_pages.rst | 232 ++++++++
arch/powerpc/mm/book3s64/iommu_api.c | 10 +-
drivers/gpu/drm/via/via_dmablit.c | 6 +-
drivers/infiniband/core/umem.c | 19 +-
drivers/infiniband/core/umem_odp.c | 13 +-
drivers/infiniband/hw/hfi1/user_pages.c | 4 +-
drivers/infiniband/hw/mthca/mthca_memfree.c | 8 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 4 +-
drivers/infiniband/hw/qib/qib_user_sdma.c | 8 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 4 +-
drivers/infiniband/sw/siw/siw_mem.c | 4 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 8 +-
drivers/nvdimm/pmem.c | 6 -
drivers/platform/goldfish/goldfish_pipe.c | 35 +-
drivers/vfio/vfio_iommu_type1.c | 35 +-
fs/io_uring.c | 6 +-
include/linux/mm.h | 149 ++++-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 10 +
mm/gup.c | 626 +++++++++++++++-----
mm/gup_benchmark.c | 74 ++-
mm/huge_memory.c | 45 +-
mm/hugetlb.c | 38 +-
mm/memremap.c | 76 ++-
mm/process_vm_access.c | 28 +-
mm/swap.c | 24 +
mm/vmstat.c | 2 +
net/xdp/xdp_umem.c | 4 +-
tools/testing/selftests/vm/gup_benchmark.c | 21 +-
tools/testing/selftests/vm/run_vmtests | 22 +
31 files changed, 1147 insertions(+), 377 deletions(-)
create mode 100644 Documentation/core-api/pin_user_pages.rst
--
2.24.0
On 2019-12-16, Florian Weimer <fweimer(a)redhat.com> wrote:
> > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > index 1d338357df8a..58c3a0e543c6 100644
> > --- a/include/uapi/linux/fcntl.h
> > +++ b/include/uapi/linux/fcntl.h
> > @@ -93,5 +93,40 @@
> >
> > #define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
> >
> > +/*
> > + * Arguments for how openat2(2) should open the target path. If @resolve is
> > + * zero, then openat2(2) operates very similarly to openat(2).
> > + *
> > + * However, unlike openat(2), unknown bits in @flags result in -EINVAL rather
> > + * than being silently ignored. @mode must be zero unless one of {O_CREAT,
> > + * O_TMPFILE} are set.
> > + *
> > + * @flags: O_* flags.
> > + * @mode: O_CREAT/O_TMPFILE file mode.
> > + * @resolve: RESOLVE_* flags.
> > + */
> > +struct open_how {
> > + __aligned_u64 flags;
> > + __u16 mode;
> > + __u16 __padding[3]; /* must be zeroed */
> > + __aligned_u64 resolve;
> > +};
> > +
> > +#define OPEN_HOW_SIZE_VER0 24 /* sizeof first published struct */
> > +#define OPEN_HOW_SIZE_LATEST OPEN_HOW_SIZE_VER0
> > +
> > +/* how->resolve flags for openat2(2). */
> > +#define RESOLVE_NO_XDEV 0x01 /* Block mount-point crossings
> > + (includes bind-mounts). */
> > +#define RESOLVE_NO_MAGICLINKS 0x02 /* Block traversal through procfs-style
> > + "magic-links". */
> > +#define RESOLVE_NO_SYMLINKS 0x04 /* Block traversal through all symlinks
> > + (implies OEXT_NO_MAGICLINKS) */
> > +#define RESOLVE_BENEATH 0x08 /* Block "lexical" trickery like
> > + "..", symlinks, and absolute
> > + paths which escape the dirfd. */
> > +#define RESOLVE_IN_ROOT 0x10 /* Make all jumps to "/" and ".."
> > + be scoped inside the dirfd
> > + (similar to chroot(2)). */
> >
> > #endif /* _UAPI_LINUX_FCNTL_H */
>
> Would it be possible to move these to a new UAPI header?
>
> In glibc, we currently do not #include <linux/fcntl.h>. We need some of
> the AT_* constants in POSIX mode, and the header is not necessarily
> namespace-clean. If there was a separate header for openat2 support, we
> could use that easily, and we would only have to maintain the baseline
> definitions (which never change).
Sure, (assuming nobody objects) I can move it to "linux/openat2.h".
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
When handling page faults for many vCPUs during demand paging, KVM's MMU
lock becomes highly contended. This series creates a test with a naive
userfaultfd based demand paging implementation to demonstrate that
contention. This test serves both as a functional test of userfaultfd
and a microbenchmark of demand paging performance with a variable number
of vCPUs and memory per vCPU.
The test creates N userfaultfd threads, N vCPUs, and a region of memory
with M pages per vCPU. The N userfaultfd polling threads are each set up
to serve faults on a region of memory corresponding to one of the vCPUs.
Each of the vCPUs is then started, and touches each page of its disjoint
memory region, sequentially. In response to faults, the userfaultfd
threads copy a static buffer into the guest's memory. This creates a
worst case for MMU lock contention as we have removed most of the
contention between the userfaultfd threads and there is no time required
to fetch the contents of guest memory.
This test was run successfully on Intel Haswell, Broadwell, and
Cascadelake hosts with a variety of vCPU counts and memory sizes.
This test was adapted from the dirty_log_test.
The series can also be viewed in Gerrit here:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/1464
(Thanks to Dmitry Vyukov <dvyukov(a)google.com> for setting up the Gerrit
instance)
Ben Gardon (9):
KVM: selftests: Create a demand paging test
KVM: selftests: Add demand paging content to the demand paging test
KVM: selftests: Add memory size parameter to the demand paging test
KVM: selftests: Pass args to vCPU instead of using globals
KVM: selftests: Support multiple vCPUs in demand paging test
KVM: selftests: Time guest demand paging
KVM: selftests: Add parameter to _vm_create for memslot 0 base paddr
KVM: selftests: Support large VMs in demand paging test
Add static flag
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 4 +-
.../selftests/kvm/demand_paging_test.c | 610 ++++++++++++++++++
tools/testing/selftests/kvm/dirty_log_test.c | 2 +-
.../testing/selftests/kvm/include/kvm_util.h | 3 +-
tools/testing/selftests/kvm/lib/kvm_util.c | 7 +-
6 files changed, 621 insertions(+), 6 deletions(-)
create mode 100644 tools/testing/selftests/kvm/demand_paging_test.c
--
2.23.0.444.g18eeb5a265-goog
Hi Linus,
Please pull the following Kselftest update for Linux 5.5-rc3.
This Kselftest fixes update for Linux 5.5-rc2 consists of
-- ftrace and safesetid test fixes from Masami Hiramatsu
-- Kunit fixes from Brendan Higgins, Iurii Zaikin, and Heidi Fahim
-- Kselftest framework fixes from SeongJae Park and Michael Ellerman
I was planning to send this for rc2 and ran into kernel.org outage
and decided to wait on it.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit e42617b825f8073569da76dc4510bfa019b1c35a:
Linux 5.5-rc1 (2019-12-08 14:57:55 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-5.5-rc2
for you to fetch changes up to 4eac734486fd431e0756cc5e929f140911a36a53:
kselftest: Support old perl versions (2019-12-11 10:31:16 -0700)
----------------------------------------------------------------
linux-kselftest-5.5-rc2
This Kselftest fixes update for Linux 5.5-rc2 consists of
-- ftrace and safesetid test fixes from Masami Hiramatsu
-- Kunit fixes from Brendan Higgins, Iurii Zaikin, and Heidi Fahim
-- Kselftest framework fixes from SeongJae Park and Michael Ellerman
----------------------------------------------------------------
Brendan Higgins (2):
Documentation: kunit: fix typos and gramatical errors
Documentation: kunit: add documentation for kunit_tool
Heidi Fahim (1):
kunit: testing kunit: Bug fix in test_run_timeout function
Iurii Zaikin (1):
fs/ext4/inode-test: Fix inode test on 32 bit platforms.
Masami Hiramatsu (7):
selftests/ftrace: Fix to check the existence of set_ftrace_filter
selftests/ftrace: Fix ftrace test cases to check unsupported
selftests/ftrace: Do not to use absolute debugfs path
selftests/ftrace: Fix multiple kprobe testcase
selftests: safesetid: Move link library to LDLIBS
selftests: safesetid: Check the return value of setuid/setgid
selftests: safesetid: Fix Makefile to set correct test program
Michael Ellerman (1):
selftests: Fix dangling documentation references to
kselftest_module.sh
SeongJae Park (2):
kselftest/runner: Print new line in print of timeout log
kselftest: Support old perl versions
Documentation/dev-tools/kselftest.rst | 8 +--
Documentation/dev-tools/kunit/index.rst | 1 +
Documentation/dev-tools/kunit/kunit-tool.rst | 57
++++++++++++++++++++++
Documentation/dev-tools/kunit/start.rst | 13 +++--
Documentation/dev-tools/kunit/usage.rst | 24 ++++-----
fs/ext4/inode-test.c | 2 +-
tools/testing/kunit/kunit_tool_test.py | 2 +-
.../ftrace/test.d/ftrace/func-filter-stacktrace.tc | 2 +
.../selftests/ftrace/test.d/ftrace/func_cpumask.tc | 5 ++
tools/testing/selftests/ftrace/test.d/functions | 5 +-
.../ftrace/test.d/kprobe/multiple_kprobes.tc | 6 +--
.../inter-event/trigger-action-hist-xfail.tc | 4 +-
.../inter-event/trigger-onchange-action-hist.tc | 2 +-
.../inter-event/trigger-snapshot-action-hist.tc | 4 +-
tools/testing/selftests/kselftest/module.sh | 2 +-
tools/testing/selftests/kselftest/prefix.pl | 1 +
tools/testing/selftests/kselftest/runner.sh | 1 +
tools/testing/selftests/safesetid/Makefile | 5 +-
tools/testing/selftests/safesetid/safesetid-test.c | 15 ++++--
19 files changed, 119 insertions(+), 40 deletions(-)
create mode 100644 Documentation/dev-tools/kunit/kunit-tool.rst
----------------------------------------------------------------
livepatch test configures the system and debug environment to run
tests. Some of these actions fail without root access and test
dumps several permission denied messages before it exits.
Fix it to check root uid and exit with skip code instead.
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
tools/testing/selftests/livepatch/functions.sh | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/livepatch/functions.sh b/tools/testing/selftests/livepatch/functions.sh
index 31eb09e38729..014b587692f0 100644
--- a/tools/testing/selftests/livepatch/functions.sh
+++ b/tools/testing/selftests/livepatch/functions.sh
@@ -7,6 +7,9 @@
MAX_RETRIES=600
RETRY_INTERVAL=".1" # seconds
+# Kselftest framework requirement - SKIP code is 4
+ksft_skip=4
+
# log(msg) - write message to kernel log
# msg - insightful words
function log() {
@@ -18,7 +21,16 @@ function log() {
function skip() {
log "SKIP: $1"
echo "SKIP: $1" >&2
- exit 4
+ exit $ksft_skip
+}
+
+# root test
+function is_root() {
+ uid=$(id -u)
+ if [ $uid -ne 0 ]; then
+ echo "skip all tests: must be run as root" >&2
+ exit $ksft_skip
+ fi
}
# die(msg) - game over, man
@@ -45,6 +57,7 @@ function pop_config() {
}
function set_dynamic_debug() {
+ is_root
cat <<-EOF > /sys/kernel/debug/dynamic_debug/control
file kernel/livepatch/* +p
func klp_try_switch_task -p
@@ -62,6 +75,7 @@ function set_ftrace_enabled() {
# for verbose livepatching output and turn on
# the ftrace_enabled sysctl.
function setup_config() {
+ is_root
push_config
set_dynamic_debug
set_ftrace_enabled 1
--
2.20.1
This test only works when [1] is applied, which was rejected.
Basically, the errors are reported and cleared. In this particular case of
tls sockets, following reads will block.
The test case was originally submitted with the rejected patch, but, then,
was included as part of a different patchset, possibly by mistake.
[1] https://lore.kernel.org/netdev/20191007035323.4360-2-jakub.kicinski@netrono…
Thanks Paolo Pisati for pointing out the original patchset where this
appeared.
Cc: Jakub Kicinski <jakub.kicinski(a)netronome.com>
Fixes: 65190f77424d (selftests/tls: add a test for fragmented messages)
Reported-by: Paolo Pisati <paolo.pisati(a)canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
---
tools/testing/selftests/net/tls.c | 28 ----------------------------
1 file changed, 28 deletions(-)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index 13e5ef615026..0ea44d975b6c 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -722,34 +722,6 @@ TEST_F(tls, recv_lowat)
EXPECT_EQ(memcmp(send_mem, recv_mem + 10, 5), 0);
}
-TEST_F(tls, recv_rcvbuf)
-{
- char send_mem[4096];
- char recv_mem[4096];
- int rcv_buf = 1024;
-
- memset(send_mem, 0x1c, sizeof(send_mem));
-
- EXPECT_EQ(setsockopt(self->cfd, SOL_SOCKET, SO_RCVBUF,
- &rcv_buf, sizeof(rcv_buf)), 0);
-
- EXPECT_EQ(send(self->fd, send_mem, 512, 0), 512);
- memset(recv_mem, 0, sizeof(recv_mem));
- EXPECT_EQ(recv(self->cfd, recv_mem, sizeof(recv_mem), 0), 512);
- EXPECT_EQ(memcmp(send_mem, recv_mem, 512), 0);
-
- if (self->notls)
- return;
-
- EXPECT_EQ(send(self->fd, send_mem, 4096, 0), 4096);
- memset(recv_mem, 0, sizeof(recv_mem));
- EXPECT_EQ(recv(self->cfd, recv_mem, sizeof(recv_mem), 0), -1);
- EXPECT_EQ(errno, EMSGSIZE);
-
- EXPECT_EQ(recv(self->cfd, recv_mem, sizeof(recv_mem), 0), -1);
- EXPECT_EQ(errno, EMSGSIZE);
-}
-
TEST_F(tls, bidir)
{
char const *test_str = "test_read";
--
2.24.0
firmware attempts to load test modules that require root access
and fail. Fix it to check for root uid and exit with skip code
instead.
Before this fix:
selftests: firmware: fw_run_tests.sh
modprobe: ERROR: could not insert 'test_firmware': Operation not permitted
You must have the following enabled in your kernel:
CONFIG_TEST_FIRMWARE=y
CONFIG_FW_LOADER=y
CONFIG_FW_LOADER_USER_HELPER=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
not ok 1 selftests: firmware: fw_run_tests.sh # SKIP
With this fix:
selftests: firmware: fw_run_tests.sh
skip all tests: must be run as root
not ok 1 selftests: firmware: fw_run_tests.sh # SKIP
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
tools/testing/selftests/firmware/fw_lib.sh | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/firmware/fw_lib.sh b/tools/testing/selftests/firmware/fw_lib.sh
index b879305a766d..5b8c0fedee76 100755
--- a/tools/testing/selftests/firmware/fw_lib.sh
+++ b/tools/testing/selftests/firmware/fw_lib.sh
@@ -34,6 +34,12 @@ test_modprobe()
check_mods()
{
+ local uid=$(id -u)
+ if [ $uid -ne 0 ]; then
+ echo "skip all tests: must be run as root" >&2
+ exit $ksft_skip
+ fi
+
trap "test_modprobe" EXIT
if [ ! -d $DIR ]; then
modprobe test_firmware
--
2.20.1
From: SeongJae Park <sjpark(a)amazon.de>
This patchset contains trivial fixes for the kunit documentations and
the wrapper python scripts.
This patchset is based on 'kselftest/test' branch of linux-kselftest[1]
and depends on Heidi's patch[2]. A complete tree is available at my repo:
https://github.com/sjp38/linux/tree/kunit_fix/20191205_v5
Changes from v4
(https://lore.kernel.org/linux-doc/1575490683-13015-1-git-send-email-sj38.pa…):
- Rebased on Heidi Fahim's patch[2]
- Fix failing kunit_tool_test test
- Add 'build_dir' option test in 'kunit_tool_test.py'
Changes from v3
(https://lore.kernel.org/linux-kselftest/20191204192141.GA247851@google.com):
- Fix the 4th patch, "kunit: Place 'test.log' under the 'build_dir'" to
set default value of 'build_dir' as '' instead of NULL so that kunit
can run even though '--build_dir' option is not given.
Changes from v2
(https://lore.kernel.org/linux-kselftest/1575361141-6806-1-git-send-email-sj…):
- Make 'build_dir' if not exists (missed from v3 by mistake)
Changes from v1
(https://lore.kernel.org/linux-doc/1575242724-4937-1-git-send-email-sj38.par…):
- Remove "docs/kunit/start: Skip wrapper run command" (A similar
approach is ongoing)
- Make 'build_dir' if not exists
SeongJae Park (6):
docs/kunit/start: Use in-tree 'kunit_defconfig'
kunit: Remove duplicated defconfig creation
kunit: Create default config in '--build_dir'
kunit: Place 'test.log' under the 'build_dir'
kunit: Rename 'kunitconfig' to '.kunitconfig'
kunit/kunit_tool_test: Test '--build_dir' option run
Documentation/dev-tools/kunit/start.rst | 13 +++++--------
tools/testing/kunit/kunit.py | 18 +++++++++++-------
tools/testing/kunit/kunit_kernel.py | 10 +++++-----
tools/testing/kunit/kunit_tool_test.py | 10 +++++++++-
4 files changed, 30 insertions(+), 21 deletions(-)
--
[1] git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git
[2] "kunit: testing kunit: Bug fix in test_run_timeout function",
https://lore.kernel.org/linux-kselftest/CAFd5g47a7a8q7by+1ALBtepeegLvfkgwvC…)
2.17.1
From: SeongJae Park <sjpark(a)amazon.de>
If a timeout failure occurs, kselftest kills the test process and prints
the timeout log. If the test process has killed while printing a log
that ends with new line, the timeout log can be printed in middle of the
test process output so that it can be seems like a comment, as below:
# test_process_log not ok 3 selftests: timers: nsleep-lat # TIMEOUT
This commit avoids such problem by printing one more line before the
TIMEOUT failure log.
Signed-off-by: SeongJae Park <sjpark(a)amazon.de>
---
tools/testing/selftests/kselftest/runner.sh | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index 84de7bc74f2c..a8d20cbb711c 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -79,6 +79,7 @@ run_one()
if [ $rc -eq $skip_rc ]; then \
echo "not ok $test_num $TEST_HDR_MSG # SKIP"
elif [ $rc -eq $timeout_rc ]; then \
+ echo "#"
echo "not ok $test_num $TEST_HDR_MSG # TIMEOUT"
else
echo "not ok $test_num $TEST_HDR_MSG # exit=$rc"
--
2.17.1
Commit c78fd76f2b67 ("selftests: Move kselftest_module.sh into
kselftest/") moved kselftest_module.sh but missed updating a few
references to the path in documentation.
Fixes: c78fd76f2b67 ("selftests: Move kselftest_module.sh into kselftest/")
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
---
Documentation/dev-tools/kselftest.rst | 8 ++++----
tools/testing/selftests/kselftest/module.sh | 2 +-
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst
index ecdfdc9d4b03..61ae13c44f91 100644
--- a/Documentation/dev-tools/kselftest.rst
+++ b/Documentation/dev-tools/kselftest.rst
@@ -203,12 +203,12 @@ Test Module
Kselftest tests the kernel from userspace. Sometimes things need
testing from within the kernel, one method of doing this is to create a
test module. We can tie the module into the kselftest framework by
-using a shell script test runner. ``kselftest_module.sh`` is designed
+using a shell script test runner. ``kselftest/module.sh`` is designed
to facilitate this process. There is also a header file provided to
assist writing kernel modules that are for use with kselftest:
- ``tools/testing/kselftest/kselftest_module.h``
-- ``tools/testing/kselftest/kselftest_module.sh``
+- ``tools/testing/kselftest/kselftest/module.sh``
How to use
----------
@@ -247,7 +247,7 @@ Example Module
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
- #include "../tools/testing/selftests/kselftest_module.h"
+ #include "../tools/testing/selftests/kselftest/module.h"
KSTM_MODULE_GLOBALS();
@@ -276,7 +276,7 @@ Example test script
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+
- $(dirname $0)/../kselftest_module.sh "foo" test_foo
+ $(dirname $0)/../kselftest/module.sh "foo" test_foo
Test Harness
diff --git a/tools/testing/selftests/kselftest/module.sh b/tools/testing/selftests/kselftest/module.sh
index 18e1c7992d30..fb4733faff12 100755
--- a/tools/testing/selftests/kselftest/module.sh
+++ b/tools/testing/selftests/kselftest/module.sh
@@ -9,7 +9,7 @@
#
# #!/bin/sh
# SPDX-License-Identifier: GPL-2.0+
-# $(dirname $0)/../kselftest_module.sh "description" module_name
+# $(dirname $0)/../kselftest/module.sh "description" module_name
#
# Example: tools/testing/selftests/lib/printf.sh
--
2.21.0
Hi Mathieu,
I am seeing rseq test build failure on Linux 5.5-rc1.
gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
param_test.c -lpthread -lrseq -o ...tools/testing/selftests/rseq/param_test
param_test.c:18:21: error: static declaration of ‘gettid’ follows
non-static declaration
18 | static inline pid_t gettid(void)
| ^~~~~~
In file included from /usr/include/unistd.h:1170,
from param_test.c:11:
/usr/include/x86_64-linux-gnu/bits/unistd_ext.h:34:16: note: previous
declaration of ‘gettid’ was here
34 | extern __pid_t gettid (void) __THROW;
| ^~~~~~
make: *** [Makefile:28: ...tools/testing/selftests/rseq/param_test] Error 1
The following obvious change fixes it. However, there could be reason
why this was defined here. If you think this is the right fix, I can
send the patch. I started seeing this with gcc version 9.2.1 20191008
diff --git a/tools/testing/selftests/rseq/param_test.c
b/tools/testing/selftests/rseq/param_test.c
index eec2663261f2..18a0fa1235a7 100644
--- a/tools/testing/selftests/rseq/param_test.c
+++ b/tools/testing/selftests/rseq/param_test.c
@@ -15,11 +15,6 @@
#include <errno.h>
#include <stddef.h>
-static inline pid_t gettid(void)
-{
- return syscall(__NR_gettid);
-}
-
thanks,
-- Shuah
Hi,
This implements an API naming change (put_user_page*() -->
unpin_user_page*()), and also implements tracking of FOLL_PIN pages. It
extends that tracking to a few select subsystems. More subsystems will
be added in follow up work.
Christoph Hellwig, a couple of points of interest:
a) I've moved the bulk of the code out of the inline functions, as
requested, for the devmap changes (patch 4: "mm: devmap: refactor
1-based refcounting for ZONE_DEVICE pages").
b) Contrary to my earlier response to your review, I have not actually
merged patch 23 ("mm/gup: pass flags arg to __gup_device_*
functions") into patch 24 ("mm/gup: track FOLL_PIN pages"). This is
because I suspect that it's better to avoid making patch 24 any larger
and worse to review than it already is. But if you feel strongly
about it, I'll combine them anyway.
Changes since v7:
* Rebased onto Linux 5.5-rc1
* Reworked the grab_page() and try_grab_compound_head(), for API
consistency and less diffs (thanks to Jan Kara's reviews).
* Added Leon Romanovsky's reviewed-by tags for two of the IB-related
patches.
* patch 4 refactoring changes, as mentioned above.
There is a git repo and branch, for convenience:
git@github.com:johnhubbard/linux.git pin_user_pages_tracking_v8
For the remaining list of "changes since version N", those are all in
v7, which is here:
https://lore.kernel.org/r/20191121071354.456618-1-jhubbard@nvidia.com
============================================================
Overview:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], and in a remarkable number of email threads since about
2017. :)
A new internal gup flag, FOLL_PIN is introduced, and thoroughly
documented in the last patch's Documentation/vm/pin_user_pages.rst.
I believe that this will provide a good starting point for doing the
layout lease work that Ira Weiny has been working on. That's because
these new wrapper functions provide a clean, constrained, systematically
named set of functionality that, again, is required in order to even
know if a page is "dma-pinned".
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Testing:
* I've done some overall kernel testing (LTP, and a few other goodies),
and some directed testing to exercise some of the changes. And as you
can see, gup_benchmark is enhanced to exercise this. Basically, I've
been able to runtime test the core get_user_pages() and
pin_user_pages() and related routines, but not so much on several of
the call sites--but those are generally just a couple of lines
changed, each.
Not much of the kernel is actually using this, which on one hand
reduces risk quite a lot. But on the other hand, testing coverage
is low. So I'd love it if, in particular, the Infiniband and PowerPC
folks could do a smoke test of this series for me.
Runtime testing for the call sites so far is pretty light:
* io_uring: Some directed tests from liburing exercise this, and
they pass.
* process_vm_access.c: A small directed test passes.
* gup_benchmark: the enhanced version hits the new gup.c code, and
passes.
* infiniband: ran "ib_write_bw", which exercises the umem.c changes,
but not the other changes.
* VFIO: compiles (I'm vowing to set up a run time test soon, but it's
not ready just yet)
* powerpc: it compiles...
* drm/via: compiles...
* goldfish: compiles...
* net/xdp: compiles...
* media/v4l2: compiles...
[1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/
Dan Williams (1):
mm: Cleanup __put_devmap_managed_page() vs ->page_free()
John Hubbard (25):
mm/gup: factor out duplicate code from four routines
mm/gup: move try_get_compound_head() to top, fix minor issues
mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages
goldish_pipe: rename local pin_user_pages() routine
mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM
vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call
mm/gup: allow FOLL_FORCE for get_user_pages_fast()
IB/umem: use get_user_pages_fast() to pin DMA pages
mm/gup: introduce pin_user_pages*() and FOLL_PIN
goldish_pipe: convert to pin_user_pages() and put_user_page()
IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP
mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()
drm/via: set FOLL_PIN via pin_user_pages_fast()
fs/io_uring: set FOLL_PIN via pin_user_pages()
net/xdp: set FOLL_PIN via pin_user_pages()
media/v4l2-core: set pages dirty upon releasing DMA buffers
media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page()
conversion
vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion
powerpc: book3s64: convert to pin_user_pages() and put_user_page()
mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding
"1"
mm, tree-wide: rename put_user_page*() to unpin_user_page*()
mm/gup: pass flags arg to __gup_device_* functions
mm/gup: track FOLL_PIN pages
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
Documentation/core-api/index.rst | 1 +
Documentation/core-api/pin_user_pages.rst | 233 ++++++++
arch/powerpc/mm/book3s64/iommu_api.c | 12 +-
drivers/gpu/drm/via/via_dmablit.c | 6 +-
drivers/infiniband/core/umem.c | 19 +-
drivers/infiniband/core/umem_odp.c | 13 +-
drivers/infiniband/hw/hfi1/user_pages.c | 4 +-
drivers/infiniband/hw/mthca/mthca_memfree.c | 8 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 4 +-
drivers/infiniband/hw/qib/qib_user_sdma.c | 8 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 4 +-
drivers/infiniband/sw/siw/siw_mem.c | 4 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 8 +-
drivers/nvdimm/pmem.c | 6 -
drivers/platform/goldfish/goldfish_pipe.c | 35 +-
drivers/vfio/vfio_iommu_type1.c | 35 +-
fs/io_uring.c | 6 +-
include/linux/mm.h | 145 ++++-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 10 +
mm/gup.c | 595 +++++++++++++++-----
mm/gup_benchmark.c | 74 ++-
mm/huge_memory.c | 23 +-
mm/hugetlb.c | 15 +-
mm/memremap.c | 76 ++-
mm/process_vm_access.c | 28 +-
mm/swap.c | 24 +
mm/vmstat.c | 2 +
net/xdp/xdp_umem.c | 4 +-
tools/testing/selftests/vm/gup_benchmark.c | 21 +-
tools/testing/selftests/vm/run_vmtests | 22 +
31 files changed, 1093 insertions(+), 354 deletions(-)
create mode 100644 Documentation/core-api/pin_user_pages.rst
--
2.24.0
From: Ivan Khoronzhuk <ivan.khoronzhuk(a)linaro.org>
[ Upstream commit c588146378962786ddeec817f7736a53298a7b01 ]
The "path" buf is supposed to contain path + printf msg up to 24 bytes.
It will be cut anyway, but compiler generates truncation warns like:
"
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c: In
function ‘setup_cgroup_environment’:
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:52:34:
warning: ‘/cgroup.controllers’ directive output may be truncated
writing 19 bytes into a region of size between 1 and 4097
[-Wformat-truncation=]
snprintf(path, sizeof(path), "%s/cgroup.controllers", cgroup_path);
^~~~~~~~~~~~~~~~~~~
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:52:2:
note: ‘snprintf’ output between 20 and 4116 bytes into a destination
of size 4097
snprintf(path, sizeof(path), "%s/cgroup.controllers", cgroup_path);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:72:34:
warning: ‘/cgroup.subtree_control’ directive output may be truncated
writing 23 bytes into a region of size between 1 and 4097
[-Wformat-truncation=]
snprintf(path, sizeof(path), "%s/cgroup.subtree_control",
^~~~~~~~~~~~~~~~~~~~~~~
cgroup_path);
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:72:2:
note: ‘snprintf’ output between 24 and 4120 bytes into a destination
of size 4097
snprintf(path, sizeof(path), "%s/cgroup.subtree_control",
cgroup_path);
"
In order to avoid warns, lets decrease buf size for cgroup workdir on
24 bytes with assumption to include also "/cgroup.subtree_control" to
the address. The cut will never happen anyway.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk(a)linaro.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Song Liu <songliubraving(a)fb.com>
Link: https://lore.kernel.org/bpf/20191002120404.26962-3-ivan.khoronzhuk@linaro.o…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/cgroup_helpers.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/cgroup_helpers.c b/tools/testing/selftests/bpf/cgroup_helpers.c
index cf16948aad4ad..6af24f9a780de 100644
--- a/tools/testing/selftests/bpf/cgroup_helpers.c
+++ b/tools/testing/selftests/bpf/cgroup_helpers.c
@@ -44,7 +44,7 @@
*/
int setup_cgroup_environment(void)
{
- char cgroup_workdir[PATH_MAX + 1];
+ char cgroup_workdir[PATH_MAX - 24];
format_cgroup_path(cgroup_workdir, "");
--
2.20.1
From: Yonghong Song <yhs(a)fb.com>
[ Upstream commit 2ea2612b987ad703235c92be21d4e98ee9c2c67c ]
Currently, with latest llvm trunk, selftest test_progs failed obj
file test_seg6_loop.o with the following error in verifier:
infinite loop detected at insn 76
The byte code sequence looks like below, and noted that alu32 has been
turned off by default for better generated codes in general:
48: w3 = 100
49: *(u32 *)(r10 - 68) = r3
...
; if (tlv.type == SR6_TLV_PADDING) {
76: if w3 == 5 goto -18 <LBB0_19>
...
85: r1 = *(u32 *)(r10 - 68)
; for (int i = 0; i < 100; i++) {
86: w1 += -1
87: if w1 == 0 goto +5 <LBB0_20>
88: *(u32 *)(r10 - 68) = r1
The main reason for verification failure is due to partial spills at
r10 - 68 for induction variable "i".
Current verifier only handles spills with 8-byte values. The above 4-byte
value spill to stack is treated to STACK_MISC and its content is not
saved. For the above example:
w3 = 100
R3_w=inv100 fp-64_w=inv1086626730498
*(u32 *)(r10 - 68) = r3
R3_w=inv100 fp-64_w=inv1086626730498
...
r1 = *(u32 *)(r10 - 68)
R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff))
fp-64=inv1086626730498
To resolve this issue, verifier needs to be extended to track sub-registers
in spilling, or llvm needs to enhanced to prevent sub-register spilling
in register allocation phase. The former will increase verifier complexity
and the latter will need some llvm "hacking".
Let us workaround this issue by declaring the induction variable as "long"
type so spilling will happen at non sub-register level. We can revisit this
later if sub-register spilling causes similar or other verification issues.
Signed-off-by: Yonghong Song <yhs(a)fb.com>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Andrii Nakryiko <andriin(a)fb.com>
Link: https://lore.kernel.org/bpf/20191117214036.1309510-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/test_seg6_loop.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/progs/test_seg6_loop.c b/tools/testing/selftests/bpf/progs/test_seg6_loop.c
index c4d104428643e..69880c1e7700c 100644
--- a/tools/testing/selftests/bpf/progs/test_seg6_loop.c
+++ b/tools/testing/selftests/bpf/progs/test_seg6_loop.c
@@ -132,8 +132,10 @@ static __always_inline int is_valid_tlv_boundary(struct __sk_buff *skb,
*pad_off = 0;
// we can only go as far as ~10 TLVs due to the BPF max stack size
+ // workaround: define induction variable "i" as "long" instead
+ // of "int" to prevent alu32 sub-register spilling.
#pragma clang loop unroll(disable)
- for (int i = 0; i < 100; i++) {
+ for (long i = 0; i < 100; i++) {
struct sr6_tlv_t tlv;
if (cur_off == *tlv_off)
--
2.20.1
From: Jiri Benc <jbenc(a)redhat.com>
[ Upstream commit 3b054b7133b4ad93671c82e8d6185258e3f1a7a5 ]
When run_kselftests.sh is run, it hangs after test_tc_tunnel.sh. The reason
is test_tc_tunnel.sh ensures the server ('nc -l') is run all the time,
starting it again every time it is expected to terminate. The exception is
the final client_connect: the server is not started anymore, which ensures
no process is kept running after the test is finished.
For a sit test, though, the script is terminated prematurely without the
final client_connect and the 'nc' process keeps running. This in turn causes
the run_one function in kselftest/runner.sh to hang forever, waiting for the
runaway process to finish.
Ensure a remaining server is terminated on cleanup.
Fixes: f6ad6accaa99 ("selftests/bpf: expand test_tc_tunnel with SIT encap")
Signed-off-by: Jiri Benc <jbenc(a)redhat.com>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Willem de Bruijn <willemb(a)google.com>
Link: https://lore.kernel.org/bpf/60919291657a9ee89c708d8aababc28ebe1420be.157382…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index ff0d31d38061f..7c76b841b17bb 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -62,6 +62,10 @@ cleanup() {
if [[ -f "${infile}" ]]; then
rm "${infile}"
fi
+
+ if [[ -n $server_pid ]]; then
+ kill $server_pid 2> /dev/null
+ fi
}
server_listen() {
@@ -77,6 +81,7 @@ client_connect() {
verify_data() {
wait "${server_pid}"
+ server_pid=
# sha1sum returns two fields [sha1] [filepath]
# convert to bash array and access first elem
insum=($(sha1sum ${infile}))
--
2.20.1
From: Yonghong Song <yhs(a)fb.com>
[ Upstream commit b7a0d65d80a0c5034b366392624397a0915b7556 ]
With latest llvm compiler, running test_progs will have the following
verifier failure for test_sysctl_loop1.o:
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf:
invalid indirect read from stack var_off (0x0; 0xff)+196 size 7
...
libbpf: -- END LOG --
libbpf: failed to load program 'cgroup/sysctl'
libbpf: failed to load object 'test_sysctl_loop1.o'
The related bytecode looks as below:
0000000000000308 LBB0_8:
97: r4 = r10
98: r4 += -288
99: r4 += r7
100: w8 &= 255
101: r1 = r10
102: r1 += -488
103: r1 += r8
104: r2 = 7
105: r3 = 0
106: call 106
107: w1 = w0
108: w1 += -1
109: if w1 > 6 goto -24 <LBB0_5>
110: w0 += w8
111: r7 += 8
112: w8 = w0
113: if r7 != 224 goto -17 <LBB0_8>
And source code:
for (i = 0; i < ARRAY_SIZE(tcp_mem); ++i) {
ret = bpf_strtoul(value + off, MAX_ULONG_STR_LEN, 0,
tcp_mem + i);
if (ret <= 0 || ret > MAX_ULONG_STR_LEN)
return 0;
off += ret & MAX_ULONG_STR_LEN;
}
Current verifier is not able to conclude that register w0 before '+'
at insn 110 has a range of 1 to 7 and thinks it is from 0 - 255. This
leads to more conservative range for w8 at insn 112, and later verifier
complaint.
Let us workaround this issue until we found a compiler and/or verifier
solution. The workaround in this patch is to make variable 'ret' volatile,
which will force a reload and then '&' operation to ensure better value
range. With this patch, I got the below byte code for the loop:
0000000000000328 LBB0_9:
101: r4 = r10
102: r4 += -288
103: r4 += r7
104: w8 &= 255
105: r1 = r10
106: r1 += -488
107: r1 += r8
108: r2 = 7
109: r3 = 0
110: call 106
111: *(u32 *)(r10 - 64) = r0
112: r1 = *(u32 *)(r10 - 64)
113: if w1 s< 1 goto -28 <LBB0_5>
114: r1 = *(u32 *)(r10 - 64)
115: if w1 s> 7 goto -30 <LBB0_5>
116: r1 = *(u32 *)(r10 - 64)
117: w1 &= 7
118: w1 += w8
119: r7 += 8
120: w8 = w1
121: if r7 != 224 goto -21 <LBB0_9>
Insn 117 did the '&' operation and we got more precise value range
for 'w8' at insn 120. The test is happy then:
#3/17 test_sysctl_loop1.o:OK
Signed-off-by: Yonghong Song <yhs(a)fb.com>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Song Liu <songliubraving(a)fb.com>
Link: https://lore.kernel.org/bpf/20191107170045.2503480-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/test_sysctl_loop1.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c b/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
index 608a06871572d..d22e438198cf7 100644
--- a/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
+++ b/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
@@ -44,7 +44,10 @@ int sysctl_tcp_mem(struct bpf_sysctl *ctx)
unsigned long tcp_mem[TCP_MEM_LOOPS] = {};
char value[MAX_VALUE_STR_LEN];
unsigned char i, off = 0;
- int ret;
+ /* a workaround to prevent compiler from generating
+ * codes verifier cannot handle yet.
+ */
+ volatile int ret;
if (ctx->write)
return 0;
--
2.20.1
From: Masami Hiramatsu <mhiramat(a)kernel.org>
[ Upstream commit 2f3571ea71311bbb2cbb9c3bbefc9c1969a3e889 ]
Currently proc-self-map-files-002.c sets va_max (max test address
of user virtual address) to 4GB, but it is too big for 32bit
arch and 1UL << 32 is overflow on 32bit long.
Also since this value should be enough bigger than vm.mmap_min_addr
(64KB or 32KB by default), 1MB should be enough.
Make va_max 1MB unconditionally.
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/proc/proc-self-map-files-002.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/proc/proc-self-map-files-002.c b/tools/testing/selftests/proc/proc-self-map-files-002.c
index 47b7473dedef7..e6aa00a183bcd 100644
--- a/tools/testing/selftests/proc/proc-self-map-files-002.c
+++ b/tools/testing/selftests/proc/proc-self-map-files-002.c
@@ -47,7 +47,11 @@ static void fail(const char *fmt, unsigned long a, unsigned long b)
int main(void)
{
const int PAGE_SIZE = sysconf(_SC_PAGESIZE);
- const unsigned long va_max = 1UL << 32;
+ /*
+ * va_max must be enough bigger than vm.mmap_min_addr, which is
+ * 64KB/32KB by default. (depends on CONFIG_LSM_MMAP_MIN_ADDR)
+ */
+ const unsigned long va_max = 1UL << 20;
unsigned long va;
void *p;
int fd;
--
2.20.1
From: Ivan Khoronzhuk <ivan.khoronzhuk(a)linaro.org>
[ Upstream commit c588146378962786ddeec817f7736a53298a7b01 ]
The "path" buf is supposed to contain path + printf msg up to 24 bytes.
It will be cut anyway, but compiler generates truncation warns like:
"
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c: In
function ‘setup_cgroup_environment’:
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:52:34:
warning: ‘/cgroup.controllers’ directive output may be truncated
writing 19 bytes into a region of size between 1 and 4097
[-Wformat-truncation=]
snprintf(path, sizeof(path), "%s/cgroup.controllers", cgroup_path);
^~~~~~~~~~~~~~~~~~~
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:52:2:
note: ‘snprintf’ output between 20 and 4116 bytes into a destination
of size 4097
snprintf(path, sizeof(path), "%s/cgroup.controllers", cgroup_path);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:72:34:
warning: ‘/cgroup.subtree_control’ directive output may be truncated
writing 23 bytes into a region of size between 1 and 4097
[-Wformat-truncation=]
snprintf(path, sizeof(path), "%s/cgroup.subtree_control",
^~~~~~~~~~~~~~~~~~~~~~~
cgroup_path);
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:72:2:
note: ‘snprintf’ output between 24 and 4120 bytes into a destination
of size 4097
snprintf(path, sizeof(path), "%s/cgroup.subtree_control",
cgroup_path);
"
In order to avoid warns, lets decrease buf size for cgroup workdir on
24 bytes with assumption to include also "/cgroup.subtree_control" to
the address. The cut will never happen anyway.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk(a)linaro.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Song Liu <songliubraving(a)fb.com>
Link: https://lore.kernel.org/bpf/20191002120404.26962-3-ivan.khoronzhuk@linaro.o…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/cgroup_helpers.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/cgroup_helpers.c b/tools/testing/selftests/bpf/cgroup_helpers.c
index e95c33e333a40..b29a73fe64dbc 100644
--- a/tools/testing/selftests/bpf/cgroup_helpers.c
+++ b/tools/testing/selftests/bpf/cgroup_helpers.c
@@ -98,7 +98,7 @@ int enable_all_controllers(char *cgroup_path)
*/
int setup_cgroup_environment(void)
{
- char cgroup_workdir[PATH_MAX + 1];
+ char cgroup_workdir[PATH_MAX - 24];
format_cgroup_path(cgroup_workdir, "");
--
2.20.1
Currently, when some of the KSFT subsystems fails to build, the toplevel
KSFT Makefile just keeps carrying on with the build process.
This behaviour is expected and desirable especially in the context of a CI
system running KSelfTest, since it is not always easy to guarantee that the
most recent and esoteric dependencies are respected across all KSFT TARGETS
in a timely manner.
Unfortunately, as of now, this holds true only if the very last of the
built subsystems could have been successfully compiled: if the last of
those subsystem instead failed to build, such failure is taken as the whole
outcome of the Makefile target and the complete build/install process halts
even though many other preceding subsytems were in fact already built
successfully.
Fix the KSFT Makefile behaviour related to all/install targets in order
to fail as a whole only when the all/install targets have failed for all
of the requested TARGETS, while succeeding when at least one of TARGETS
has been successfully built.
Signed-off-by: Cristian Marussi <cristian.marussi(a)arm.com>
---
This patch is based on ksft/fixes branch from:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git
on top of commit (~5.5-rc1):
99e51aa8f701 Documentation: kunit: add documentation for kunit_tool
Building with either:
make kselftest-install \
KSFT_INSTALL_PATH=/tmp/KSFT \
TARGETS="exec arm64 bpf"
make -C tools/testing/selftests install \
KSFT_INSTALL_PATH=/tmp/KSFT \
TARGETS="exec arm64 bpf"
(with 'bpf' not building clean on my setup in the above case)
and veryfying that build/install completes if at least one of TARGETS can
be successfully built, and any successfully built subsystem is installed.
Changes:
-------
V1 --> V2
- rebased on 5.5-rc1
- rewording commit message
- dropped RFC tag
---
tools/testing/selftests/Makefile | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index b001c602414b..86b2a3fca04d 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -143,11 +143,13 @@ else
endif
all: khdr
- @for TARGET in $(TARGETS); do \
- BUILD_TARGET=$$BUILD/$$TARGET; \
- mkdir $$BUILD_TARGET -p; \
- $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET;\
- done;
+ @ret=1; \
+ for TARGET in $(TARGETS); do \
+ BUILD_TARGET=$$BUILD/$$TARGET; \
+ mkdir $$BUILD_TARGET -p; \
+ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET; \
+ ret=$$((ret * $$?)); \
+ done; exit $$ret;
run_tests: all
@for TARGET in $(TARGETS); do \
@@ -196,10 +198,12 @@ ifdef INSTALL_PATH
install -m 744 kselftest/module.sh $(INSTALL_PATH)/kselftest/
install -m 744 kselftest/runner.sh $(INSTALL_PATH)/kselftest/
install -m 744 kselftest/prefix.pl $(INSTALL_PATH)/kselftest/
- @for TARGET in $(TARGETS); do \
+ @ret=1; \
+ for TARGET in $(TARGETS); do \
BUILD_TARGET=$$BUILD/$$TARGET; \
$(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET INSTALL_PATH=$(INSTALL_PATH)/$$TARGET install; \
- done;
+ ret=$$((ret * $$?)); \
+ done; exit $$ret;
@# Ask all targets to emit their test scripts
echo "#!/bin/sh" > $(ALL_SCRIPT)
--
2.17.1
When reading the codes, I find the definitions of interrupt-window exiting and
nmi-window exiting don't match the names in latest intel SDM.
I have no idea whether it's the historical names, rename them to match the
latest SDM to avoid confusion.
CPU_BASED_USE_TSC_OFFSETING mis-spelling in Patch 3 is found by checkpatch.pl.
Xiaoyao Li (3):
KVM: VMX: Rename INTERRUPT_PENDING to INTERRUPT_WINDOW
KVM: VMX: Rename NMI_PENDING to NMI_WINDOW
KVM: VMX: Fix the spelling of CPU_BASED_USE_TSC_OFFSETTING
arch/x86/include/asm/vmx.h | 6 ++--
arch/x86/include/uapi/asm/vmx.h | 4 +--
arch/x86/kvm/vmx/nested.c | 28 +++++++++----------
arch/x86/kvm/vmx/vmx.c | 20 ++++++-------
tools/arch/x86/include/uapi/asm/vmx.h | 4 +--
.../selftests/kvm/include/x86_64/vmx.h | 8 +++---
.../kvm/x86_64/vmx_tsc_adjust_test.c | 2 +-
7 files changed, 36 insertions(+), 36 deletions(-)
--
2.19.1
When using fragments with size 8 and payload larger than 8000, the backlog
might fill up and packets will be dropped, causing the test to fail. This
happens often enough when conntrack is on during the IPv6 test.
As the larger payload in the test is 10000, using a backlog of 1250 allow
the test to run repeatedly without failure. At least a 1000 runs were
possible with no failures, when usually less than 50 runs were good enough
for showing a failure.
As netdev_max_backlog is not a pernet setting, this sets the backlog to
1000 during exit to prevent disturbing following tests.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
Fixes: 4c3510483d26 (selftests: net: ip_defrag: cover new IPv6 defrag behavior)
---
tools/testing/selftests/net/ip_defrag.sh | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/net/ip_defrag.sh b/tools/testing/selftests/net/ip_defrag.sh
index 15d3489ecd9c..c91cfecfa245 100755
--- a/tools/testing/selftests/net/ip_defrag.sh
+++ b/tools/testing/selftests/net/ip_defrag.sh
@@ -12,6 +12,8 @@ setup() {
ip netns add "${NETNS}"
ip -netns "${NETNS}" link set lo up
+ sysctl -w net.core.netdev_max_backlog=1250 >/dev/null 2>&1
+
ip netns exec "${NETNS}" sysctl -w net.ipv4.ipfrag_high_thresh=9000000 >/dev/null 2>&1
ip netns exec "${NETNS}" sysctl -w net.ipv4.ipfrag_low_thresh=7000000 >/dev/null 2>&1
ip netns exec "${NETNS}" sysctl -w net.ipv4.ipfrag_time=1 >/dev/null 2>&1
@@ -30,6 +32,7 @@ setup() {
cleanup() {
ip netns del "${NETNS}"
+ sysctl -w net.core.netdev_max_backlog=1000 >/dev/null 2>&1
}
trap cleanup EXIT
--
2.24.0