Base
====
Since the original series [1] was merged into Andrew's tree, some issues were
noticed. Up to this point, we had been working on fixing what's in Andrew's
tree [2], but at this point we've changed direction enough that a lot of the
fix's delta is undoing what was done in the original series, thereby making it
hard to review.
As suggested by Hugh Dickins and Peter Xu, this series takes a step back. It can
be considered a v3 of the original series [1] - it combines those patches with
the fixes, reordered / broken up to allow for easier review.
The idea is that it will apply cleanly to akpm's tree, *replacing* the following
patches (i.e., drop these first, and then apply this series):
userfaultfd-support-minor-fault-handling-for-shmem.patch
userfaultfd-support-minor-fault-handling-for-shmem-fix.patch
userfaultfd-support-minor-fault-handling-for-shmem-fix-2.patch
userfaultfd-support-minor-fault-handling-for-shmem-fix-3.patch
userfaultfd-support-minor-fault-handling-for-shmem-fix-4.patch
userfaultfd-selftests-use-memfd_create-for-shmem-test-type.patch
userfaultfd-selftests-create-alias-mappings-in-the-shmem-test.patch
userfaultfd-selftests-reinitialize-test-context-in-each-test.patch
userfaultfd-selftests-exercise-minor-fault-handling-shmem-support.patch
Changelog
=========
Changes since the most recent fixup patch [2]:
- Squash the fixes ([2]) in with the original series ([1]). This makes reviewing
easier, as we no longer have to sift through deltas undoing what we had done
before. [Hugh, Peter]
- Modify shmem_mcopy_atomic_pte() to use the new mcopy_atomic_install_ptes()
helper, reducing code duplication. [Hugh]
- Properly trigger handle_userfault() in the shmem_swapin_page() case. [Hugh]
- Use shmem_getpage() instead of find_lock_page() to lookup the existing page in
for continue. This properly deals with swapped-out pages. [Hugh]
- Unconditionally pte_mkdirty() for anon memory (as before). [Peter]
- Don't include userfaultfd_k.h in either hugetlb.h or shmem_fs.h. [Hugh]
- Add comment for UFFD_FEATURE_MINOR_SHMEM (to match _HUGETLBFS). [Hugh]
- Fix some small cleanup issues (parens, reworded conditionals, reduced plumbing
of some parameters, simplify labels/gotos, ...). [Hugh, Peter]
Overview
========
See the series which added minor faults for hugetlbfs [3] for a detailed
overview of minor fault handling in general. This series adds the same support
for shmem-backed areas.
This series is structured as follows:
- Commits 1 and 2 are cleanups.
- Commits 3 and 4 implement the new feature (minor fault handling for shmem).
- Commits 5, 6, 7, 8 update the userfaultfd selftest to exercise the feature.
- Commit 9 is one final cleanup, modifying an existing code path to re-use a new
helper we've introduced. We rely on the selftest to show that this change
doesn't break anything.
Use Case
========
In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.
Additionally, Android folks (Lokesh Gidra <lokeshgidra(a)google.com>) hope to
optimize the Android Runtime garbage collector using this feature:
"The plan is to use userfaultfd for concurrently compacting the heap. With
this feature, the heap can be shared-mapped at another location where the
GC-thread(s) could continue the compaction operation without the need to
invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads
get faults on the heap, UFFDIO_CONTINUE can be used to resume execution.
Furthermore, this feature enables updating references in the 'non-moving'
portion of the heap efficiently. Without this feature, uneccessary page
copying (ioctl(UFFDIO_COPY)) would be required."
[1] https://lore.kernel.org/patchwork/cover/1388144/
[2] https://lore.kernel.org/patchwork/patch/1408161/
[3] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen…
Axel Rasmussen (9):
userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h
userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
userfaultfd/shmem: support minor fault registration for shmem
userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
userfaultfd/selftests: use memfd_create for shmem test type
userfaultfd/selftests: create alias mappings in the shmem test
userfaultfd/selftests: reinitialize test context in each test
userfaultfd/selftests: exercise minor fault handling shmem support
userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_ptes
fs/userfaultfd.c | 6 +-
include/linux/hugetlb.h | 5 +-
include/linux/shmem_fs.h | 15 +-
include/linux/userfaultfd_k.h | 5 +
include/uapi/linux/userfaultfd.h | 7 +-
mm/hugetlb.c | 1 +
mm/memory.c | 8 +-
mm/shmem.c | 122 ++++------
mm/userfaultfd.c | 183 ++++++++++-----
tools/testing/selftests/vm/userfaultfd.c | 280 +++++++++++++++--------
10 files changed, 387 insertions(+), 245 deletions(-)
--
2.31.1.295.g9ea45b61b8-goog
The rp_filter testcase is used to test whether local packets redirected
from dummy1 to lo could pass the checking of rp_filter.
In fact, the packets passed the checking, but the testing process cannot
receive any reply packets, leading to test failure. The reason is that
the device dummy1 lacks ip address, caused the incorrect routing of
reply packets.
This patch adds ip address for dummy1 device.
Signed-off-by: Qiao Ma <mqaio(a)linux.alibaba.com>
---
tools/testing/selftests/net/fib_tests.sh | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index 2b5707738609..9a843ca0b913 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -448,6 +448,7 @@ fib_rp_filter_test()
$IP link set dummy0 address 52:54:00:6a:c7:5e
$IP link add dummy1 type dummy
$IP link set dummy1 address 52:54:00:6a:c7:5e
+ $IP address add 198.51.101.1/24 dev dummy1
$IP link set dev dummy1 up
$NS_EXEC sysctl -qw net.ipv4.conf.all.rp_filter=1
$NS_EXEC sysctl -qw net.ipv4.conf.all.accept_local=1
--
2.18.2
The kernel now has a number of testing and debugging tools, and we've
seen a bit of confusion about what the differences between them are.
Add a basic documentation outlining the testing tools, when to use each,
and how they interact.
This is a pretty quick overview rather than the idealised "kernel
testing guide" that'd probably be optimal, but given the number of times
questions like "When do you use KUnit and when do you use Kselftest?"
are being asked, it seemed worth at least having something. Hopefully
this can form the basis for more detailed documentation later.
Signed-off-by: David Gow <davidgow(a)google.com>
---
Thanks, everyone, for the comments on the doc. I've made a few of the
suggested changes. Please let me know what you think!
-- David
Changes since v1:
https://lore.kernel.org/linux-kselftest/20210410070529.4113432-1-davidgow@g…
- Note KUnit's speed and that one should provide selftests for syscalls
- Mention lockdep as a Dynamic Analysis Tool
- Refer to "Dynamic Analysis Tools" instead of "Sanitizers"
- A number of minor formatting tweaks and rewordings for clarity.
Not changed:
- I haven't included an exhaustive list of differences, advantages, etc,
between KUnit and kselftest: for now, the doc continues to focus on
the difference between 'in-kernel' and 'userspace' testing here.
- Similarly, I'm not linking out to docs defining and describing "Unit"
tests versus "End-to-end" tests. None of the existing documentation
elsewhere quite matches what we do in the kernel perfectly, so it
seems less confusing to focus on the 'in-kernel'/'userspace'
distinction, and leave other definitions as a passing mention for
those who are already familiar with the concepts.
- I haven't linked to any talk videos here: a few of them are linked on
(e.g.) the KUnit webpage, but I wanted to keep the Kernel documentation
more self-contained for now. No objection to adding them in a follow-up
patch if people feel strongly about it, though.
- The link from index.rst to this doc is unchanged. I personally think
that the link is prominent enough there: it's the first link, and
shows up a few times. One possibility if people disagreed would be to
merge this page with the index, but given not all dev-tools are going
to be testing-related, it seemed a bit arrogant. :-)
Documentation/dev-tools/index.rst | 3 +
Documentation/dev-tools/testing-overview.rst | 117 +++++++++++++++++++
2 files changed, 120 insertions(+)
create mode 100644 Documentation/dev-tools/testing-overview.rst
diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index 1b1cf4f5c9d9..f590e5860794 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -7,6 +7,8 @@ be used to work on the kernel. For now, the documents have been pulled
together without any significant effort to integrate them into a coherent
whole; patches welcome!
+A brief overview of testing-specific tools can be found in :doc:`testing-overview`.
+
.. class:: toc-title
Table of contents
@@ -14,6 +16,7 @@ whole; patches welcome!
.. toctree::
:maxdepth: 2
+ testing-overview
coccinelle
sparse
kcov
diff --git a/Documentation/dev-tools/testing-overview.rst b/Documentation/dev-tools/testing-overview.rst
new file mode 100644
index 000000000000..ce36a8cdf6b5
--- /dev/null
+++ b/Documentation/dev-tools/testing-overview.rst
@@ -0,0 +1,117 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Kernel Testing Guide
+====================
+
+
+There are a number of different tools for testing the Linux kernel, so knowing
+when to use each of them can be a challenge. This document provides a rough
+overview of their differences, and how they fit together.
+
+
+Writing and Running Tests
+=========================
+
+The bulk of kernel tests are written using either the kselftest or KUnit
+frameworks. These both provide infrastructure to help make running tests and
+groups of tests easier, as well as providing helpers to aid in writing new
+tests.
+
+If you're looking to verify the behaviour of the Kernel — particularly specific
+parts of the kernel — then you'll want to use KUnit or kselftest.
+
+
+The Difference Between KUnit and kselftest
+------------------------------------------
+
+KUnit (Documentation/dev-tools/kunit/index.rst) is an entirely in-kernel system
+for "white box" testing: because test code is part of the kernel, it can access
+internal structures and functions which aren't exposed to userspace.
+
+KUnit tests therefore are best written against small, self-contained parts
+of the kernel, which can be tested in isolation. This aligns well with the
+concept of 'unit' testing.
+
+For example, a KUnit test might test an individual kernel function (or even a
+single codepath through a function, such as an error handling case), rather
+than a feature as a whole.
+
+This also makes KUnit tests very fast to build and run, allowing them to be
+run frequently as part of the development process.
+
+There is a KUnit test style guide which may give further pointers in
+Documentation/dev-tools/kunit/style.rst
+
+
+kselftest (Documentation/dev-tools/kselftest.rst), on the other hand, is
+largely implemented in userspace, and tests are normal userspace scripts or
+programs.
+
+This makes it easier to write more complicated tests, or tests which need to
+manipulate the overall system state more (e.g., spawning processes, etc.).
+However, it's not possible to call kernel functions directly from kselftest.
+This means that only kernel functionality which is exposed to userspace somhow
+(e.g. by a syscall, device, filesystem, etc.) can be tested with kselftest. To
+work around this, some tests include a companion kernel module which exposes
+more information or functionality. If a test runs mostly or entirely within the
+kernel, however, KUnit may be the more appropriate tool.
+
+kselftest is therefore suited well to tests of whole features, as these will
+expose an interface to userspace, which can be tested, but not implementation
+details. This aligns well with 'system' or 'end-to-end' testing.
+
+For example, all new system calls should be accompanied by kselftest tests.
+
+Code Coverage Tools
+===================
+
+The Linux Kernel supports two different code coverage measurement tools. These
+can be used to verify that a test is executing particular functions or lines
+of code. This is useful for determining how much of the kernel is being tested,
+and for finding corner-cases which are not covered by the appropriate test.
+
+:doc:`gcov` is GCC's coverage testing tool, which can be used with the kernel
+to get global or per-module coverage. Unlike KCOV, it does not record per-task
+coverage. Coverage data can be read from debugfs, and interpreted using the
+usual gcov tooling.
+
+:doc:`kcov` is a feature which can be built in to the kernel to allow
+capturing coverage on a per-task level. It's therefore useful for fuzzing and
+other situations where information about code executed during, for example, a
+single syscall is useful.
+
+
+Dynamic Analysis Tools
+======================
+
+The kernel also supports a number of dynamic analysis tools, which attempt to
+detect classes of issues when the occur in a running kernel. These typically
+look for undefined behaviour of some kind, such as invalid memory accesses,
+concurrency issues such as data races, or other undefined behaviour like
+integer overflows.
+
+Some of these tools are listed below:
+
+* kmemleak detects possible memory leaks. See
+ Documentation/dev-tools/kmemleak.rst
+* KASAN detects invalid memory accesses such as out-of-bounds and
+ use-after-free errors. See Documentation/dev-tools/kasan.rst
+* UBSAN detects behaviour that is undefined by the C standard, like integer
+ overflows. See Documentation/dev-tools/ubsan.rst
+* KCSAN detects data races. See Documentation/dev-tools/kcsan.rst
+* KFENCE is a low-overhead detector of memory issues, which is much faster than
+ KASAN and can be used in production. See Documentation/dev-tools/kfence.rst
+* lockdep is a locking correctness validator. See
+ Documentation/locking/lockdep-design.rst
+* There are several other pieces of debug instrumentation in the kernel, many
+ of which can be found in lib/Kconfig.debug
+
+These tools tend to test the kernel as a whole, and do not "pass" like
+kselftest or KUnit tests. They can be combined with KUnit or kselftest by
+running tests on a kernel with a sanitizer enabled: you can then be sure
+that none of these errors are occurring during the test.
+
+Some of these tools integrate with KUnit or kselftest and will
+automatically fail tests if an issue is detected.
+
--
2.31.1.295.g9ea45b61b8-goog
This is long overdue.
There are several things that aren't nailed down (in-tree
.kunitconfig's), or partially broken (GCOV on UML), but having them
documented, warts and all, is better than having nothing.
This covers a bunch of the more recent features
* kunit_filter_glob
* kunit.py run --kunitconfig
* slightly more detail on building tests as modules
* CONFIG_KUNIT_DEBUGFS
By my count, the only headline features now not mentioned are the KASAN
integration and KernelCI json output support (kunit.py run --json).
And then it also discusses how to get code coverage reports under UML
and non-UML since this is a question people have repeatedly asked.
Non-UML coverage collection is no different from normal, but we should
probably explicitly call this out.
As for UML, I was able to get it working again with two small hacks.*
E.g. with CONFIG_KUNIT=y && CONFIG_KUNIT_ALL_TESTS=y
Overall coverage rate:
lines......: 15.1% (18294 of 120776 lines)
functions..: 16.8% (1860 of 11050 functions)
Note: this doesn't document --alltests since this is not stable yet.
Hopefully being run more frequently as part of KernelCI will help...
*Using gcc/gcov-6 and not using uml_abort() in os_dump_core().
I've documented these hacks in "Notes" but left TODOs for
brendanhiggins(a)google.com who tracked down the runtime issue in GCC.
To be clear: these are not issues specific to KUnit, but rather to UML.
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
v2 -> v3:
* Suggest --make_options=CC=/usr/bin/gcc-6 instead of manually editing
kunit_kernel.py
* update instructions on how to remove uml_abort() call
v1 -> v2:
Fix typos, drop --alltests, changed wordiing on config fragments.
---
Documentation/dev-tools/kunit/index.rst | 1 +
.../dev-tools/kunit/running_tips.rst | 258 ++++++++++++++++++
Documentation/dev-tools/kunit/start.rst | 2 +
3 files changed, 261 insertions(+)
create mode 100644 Documentation/dev-tools/kunit/running_tips.rst
diff --git a/Documentation/dev-tools/kunit/index.rst b/Documentation/dev-tools/kunit/index.rst
index 848478838347..7f7cf8d2ab20 100644
--- a/Documentation/dev-tools/kunit/index.rst
+++ b/Documentation/dev-tools/kunit/index.rst
@@ -14,6 +14,7 @@ KUnit - Unit Testing for the Linux Kernel
style
faq
tips
+ running_tips
What is KUnit?
==============
diff --git a/Documentation/dev-tools/kunit/running_tips.rst b/Documentation/dev-tools/kunit/running_tips.rst
new file mode 100644
index 000000000000..e2e9af711d1b
--- /dev/null
+++ b/Documentation/dev-tools/kunit/running_tips.rst
@@ -0,0 +1,258 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+Tips For Running KUnit Tests
+============================
+
+Using ``kunit.py run`` ("kunit tool")
+=====================================
+
+Running from any directory
+--------------------------
+
+It can be handy to create a bash function like:
+
+.. code-block:: bash
+
+ function run_kunit() {
+ ( cd "$(git rev-parse --show-toplevel)" && ./tools/testing/kunit/kunit.py run $@ )
+ }
+
+.. note::
+ Early versions of ``kunit.py`` (before 5.6) didn't work unless run from
+ the kernel root, hence the use of a subshell and ``cd``.
+
+Running a subset of tests
+-------------------------
+
+``kunit.py run`` accepts an optional glob argument to filter tests. Currently
+this only matches against suite names, but this may change in the future.
+
+Say that we wanted to run the sysctl tests, we could do so via:
+
+.. code-block:: bash
+
+ $ echo -e 'CONFIG_KUNIT=y\nCONFIG_KUNIT_ALL_TESTS=y' > .kunit/.kunitconfig
+ $ ./tools/testing/kunit/kunit.py run 'sysctl*'
+
+We're paying the cost of building more tests than we need this way, but it's
+easier than fiddling with ``.kunitconfig`` files or commenting out
+``kunit_suite``'s.
+
+However, if we wanted to define a set of tests in a less ad hoc way, the next
+tip is useful.
+
+Defining a set of tests
+-----------------------
+
+``kunit.py run`` (along with ``build``, and ``config``) supports a
+``--kunitconfig`` flag. So if you have a set of tests that you want to run on a
+regular basis (especially if they have other dependencies), you can create a
+specific ``.kunitconfig`` for them.
+
+E.g. kunit has one for its tests:
+
+.. code-block:: bash
+
+ $ ./tools/testing/kunit/kunit.py run --kunitconfig=lib/kunit/.kunitconfig
+
+Alternatively, if you're following the convention of naming your
+file ``.kunitconfig``, you can just pass in the dir, e.g.
+
+.. code-block:: bash
+
+ $ ./tools/testing/kunit/kunit.py run --kunitconfig=lib/kunit
+
+.. note::
+ This is a relatively new feature (5.12+) so we don't have any
+ conventions yet about on what files should be checked in versus just
+ kept around locally. It's up to you and your maintainer to decide if a
+ config is useful enough to submit (and therefore have to maintain).
+
+.. note::
+ Having ``.kunitconfig`` fragments in a parent and child directory is
+ iffy. There's discussion about adding an "import" statement in these
+ files to make it possible to have a top-level config run tests from all
+ child directories. But that would mean ``.kunitconfig`` files are no
+ longer just simple .config fragments.
+
+ One alternative would be to have kunit tool recursively combine configs
+ automagically, but tests could theoretically depend on incompatible
+ options, so handling that would be tricky.
+
+Generating code coverage reports under UML
+------------------------------------------
+
+.. note::
+ TODO(brendanhiggins(a)google.com): There are various issues with UML and
+ versions of gcc 7 and up. You're likely to run into missing ``.gcda``
+ files or compile errors. We know one `faulty GCC commit
+ <https://github.com/gcc-mirror/gcc/commit/8c9434c2f9358b8b8bad2c1990edf10a21…>`_
+ but not how we'd go about getting this fixed. The compile errors still
+ need some investigation.
+
+.. note::
+ TODO(brendanhiggins(a)google.com): for recent versions of Linux
+ (5.10-5.12, maybe earlier), there's a bug with gcov counters not being
+ flushed in UML. This translates to very low (<1%) reported coverage. This is
+ related to the above issue and can be worked around by replacing the
+ one call to ``uml_abort()`` with a plain ``exit()``.
+
+
+This is different from the "normal" way of getting coverage information that is
+documented in Documentation/dev-tools/gcov.rst.
+
+Instead of enabling ``CONFIG_GCOV_KERNEL=y``, we can set these options:
+
+.. code-block:: none
+
+ CONFIG_DEBUG_KERNEL=y
+ CONFIG_DEBUG_INFO=y
+ CONFIG_GCOV=y
+
+
+Putting it together into a copy-pastable sequence of commands:
+
+.. code-block:: bash
+
+ # Append coverage options to the current config
+ $ echo -e "CONFIG_DEBUG_KERNEL=y\nCONFIG_DEBUG_INFO=y\nCONFIG_GCOV=y" >> .kunit/.kunitconfig
+ $ ./tools/testing/kunit/kunit.py run
+ # Extract the coverage information from the build dir (.kunit/)
+ $ lcov -t "my_kunit_tests" -o coverage.info -c -d .kunit/
+
+ # From here on, it's the same process as with CONFIG_GCOV_KERNEL=y
+ # E.g. can generate an HTML report in a tmp dir like so:
+ $ genhtml -o /tmp/coverage_html coverage.info
+
+
+If your installed version of gcc doesn't work, you can tweak the steps:
+
+.. code-block:: bash
+
+ $ ./tools/testing/kunit/kunit.py run --make_options=CC=/usr/bin/gcc-6
+ $ lcov -t "my_kunit_tests" -o coverage.info -c -d .kunit/ --gcov-tool=/usr/bin/gcov-6
+
+
+Running tests manually
+======================
+
+Running tests without using ``kunit.py run`` is also an important use case.
+Currently it's your only option if you want to test on architectures other than
+UML.
+
+As running the tests under UML is fairly straightforward (configure and compile
+the kernel, run the ``./linux`` binary), this section will focus on testing
+non-UML architectures.
+
+
+Running built-in tests
+----------------------
+
+When setting tests to ``=y``, the tests will run as part of boot and print
+results to dmesg in TAP format. So you just need to add your tests to your
+``.config``, build and boot your kernel as normal.
+
+So if we compiled our kernel with:
+
+.. code-block:: none
+
+ CONFIG_KUNIT=y
+ CONFIG_KUNIT_EXAMPLE_TEST=y
+
+Then we'd see output like this in dmesg signaling the test ran and passed:
+
+.. code-block:: none
+
+ TAP version 14
+ 1..1
+ # Subtest: example
+ 1..1
+ # example_simple_test: initializing
+ ok 1 - example_simple_test
+ ok 1 - example
+
+Running tests as modules
+------------------------
+
+Depending on the tests, you can build them as loadable modules.
+
+For example, we'd change the config options from before to
+
+.. code-block:: none
+
+ CONFIG_KUNIT=y
+ CONFIG_KUNIT_EXAMPLE_TEST=m
+
+Then after booting into our kernel, we can run the test via
+
+.. code-block:: none
+
+ $ modprobe kunit-example-test
+
+This will then cause it to print TAP output to stdout.
+
+.. note::
+ The ``modprobe`` will *not* have a non-zero exit code if any test
+ failed (as of 5.13). But ``kunit.py parse`` would, see below.
+
+.. note::
+ You can set ``CONFIG_KUNIT=m`` as well, however, some features will not
+ work and thus some tests might break. Ideally tests would specify they
+ depend on ``KUNIT=y`` in their ``Kconfig``'s, but this is an edge case
+ most test authors won't think about.
+ As of 5.13, the only difference is that ``current->kunit_test`` will
+ not exist.
+
+Pretty-printing results
+-----------------------
+
+You can use ``kunit.py parse`` to parse dmesg for test output and print out
+results in the same familiar format that ``kunit.py run`` does.
+
+.. code-block:: bash
+
+ $ ./tools/testing/kunit/kunit.py parse /var/log/dmesg
+
+
+Retrieving per suite results
+----------------------------
+
+Regardless of how you're running your tests, you can enable
+``CONFIG_KUNIT_DEBUGFS`` to expose per-suite TAP-formatted results:
+
+.. code-block:: none
+
+ CONFIG_KUNIT=y
+ CONFIG_KUNIT_EXAMPLE_TEST=m
+ CONFIG_KUNIT_DEBUGFS=y
+
+The results for each suite will be exposed under
+``/sys/kernel/debug/kunit/<suite>/results``.
+So using our example config:
+
+.. code-block:: bash
+
+ $ modprobe kunit-example-test > /dev/null
+ $ cat /sys/kernel/debug/kunit/example/results
+ ... <TAP output> ...
+
+ # After removing the module, the corresponding files will go away
+ $ modprobe -r kunit-example-test
+ $ cat /sys/kernel/debug/kunit/example/results
+ /sys/kernel/debug/kunit/example/results: No such file or directory
+
+Generating code coverage reports
+--------------------------------
+
+See Documentation/dev-tools/gcov.rst for details on how to do this.
+
+The only vaguely KUnit-specific advice here is that you probably want to build
+your tests as modules. That way you can isolate the coverage from tests from
+other code executed during boot, e.g.
+
+.. code-block:: bash
+
+ # Reset coverage counters before running the test.
+ $ echo 0 > /sys/kernel/debug/gcov/reset
+ $ modprobe kunit-example-test
diff --git a/Documentation/dev-tools/kunit/start.rst b/Documentation/dev-tools/kunit/start.rst
index 0e65cabe08eb..aa56d7ca6bfb 100644
--- a/Documentation/dev-tools/kunit/start.rst
+++ b/Documentation/dev-tools/kunit/start.rst
@@ -236,5 +236,7 @@ Next Steps
==========
* Check out the :doc:`tips` page for tips on
writing idiomatic KUnit tests.
+* Check out the :doc:`running_tips` page for tips on
+ how to make running KUnit tests easier.
* Optional: see the :doc:`usage` page for a more
in-depth explanation of KUnit.
base-commit: de2fcb3e62013738f22bbb42cbd757d9a242574e
--
2.31.1.295.g9ea45b61b8-goog
This is long overdue.
There are several things that aren't nailed down (in-tree
.kunitconfig's), or partially broken (GCOV on UML), but having them
documented, warts and all, is better than having nothing.
This covers a bunch of the more recent features
* kunit_filter_glob
* kunit.py run --kunitconfig
* slightly more detail on building tests as modules
* CONFIG_KUNIT_DEBUGFS
By my count, the only headline features now not mentioned are the KASAN
integration and KernelCI json output support (kunit.py run --json).
And then it also discusses how to get code coverage reports under UML
and non-UML since this is a question people have repeatedly asked.
Non-UML coverage collection is no different from normal, but we should
probably explicitly call this out.
As for UML, I was able to get it working again with two small hacks.*
E.g. with CONFIG_KUNIT=y && CONFIG_KUNIT_ALL_TESTS=y
Overall coverage rate:
lines......: 15.1% (18294 of 120776 lines)
functions..: 16.8% (1860 of 11050 functions)
Note: this doesn't document --alltests since this is not stable yet.
Hopefully being run more frequently as part of KernelCI will help...
*Using gcc/gcov-6 and not using uml_abort() in os_dump_core().
I've documented these hacks in "Notes" but left TODOs for
brendanhiggins(a)google.com who tracked down the runtime issue in GCC.
To be clear: these are not issues specific to KUnit, but rather to UML.
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
Documentation/dev-tools/kunit/index.rst | 1 +
.../dev-tools/kunit/running_tips.rst | 260 ++++++++++++++++++
Documentation/dev-tools/kunit/start.rst | 2 +
3 files changed, 263 insertions(+)
create mode 100644 Documentation/dev-tools/kunit/running_tips.rst
diff --git a/Documentation/dev-tools/kunit/index.rst b/Documentation/dev-tools/kunit/index.rst
index 848478838347..7f7cf8d2ab20 100644
--- a/Documentation/dev-tools/kunit/index.rst
+++ b/Documentation/dev-tools/kunit/index.rst
@@ -14,6 +14,7 @@ KUnit - Unit Testing for the Linux Kernel
style
faq
tips
+ running_tips
What is KUnit?
==============
diff --git a/Documentation/dev-tools/kunit/running_tips.rst b/Documentation/dev-tools/kunit/running_tips.rst
new file mode 100644
index 000000000000..52cc62d1c83b
--- /dev/null
+++ b/Documentation/dev-tools/kunit/running_tips.rst
@@ -0,0 +1,260 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+Tips For Running KUnit Tests
+============================
+
+Using ``kunit.py run`` ("kunit tool")
+=====================================
+
+Running from any directory
+--------------------------
+
+It can be handy to create a bash function like:
+
+.. code-block:: bash
+
+ function run_kunit() {
+ ( cd "$(git rev-parse --show-toplevel)" && ./tools/testing/kunit/kunit.py run $@ )
+ }
+
+.. note::
+ Early versions of ``kunit.py`` (before 5.6) didn't work unless run from
+ the kernel root, hence the use of a subshell and ``cd``.
+
+Running a subset of tests
+-------------------------
+
+``kunit.py run`` accepts an optional glob argument to filter tests. Currently
+this only matches against suite names, but this may change in the future.
+
+Say that we wanted to run the sysctl tests, we could do so via:
+
+.. code-block:: bash
+
+ $ echo -e 'CONFIG_KUNIT=y\nCONFIG_KUNIT_ALL_TESTS=y' > .kunit/.kunitconfig
+ $ ./tools/testing/kunit/kunit.py run 'sysctl*'
+
+We're paying the cost of building more tests than we need this way, but it's
+easier than fiddling with ``.kunitconfig`` files or commenting out
+``kunit_suite``'s.
+
+However, if we wanted to define a set of tests in a less ad hoc way, the next
+tip is useful.
+
+Defining a set of tests
+-----------------------
+
+``kunit.py run`` (along with ``build``, and ``config``) supports a
+``--kunitconfig`` flag. So if you have a set of tests that you want to run on a
+regular basis (especially if they have other dependencies), you can create a
+specific ``.kunitconfig`` for them.
+
+E.g. kunit has one for its tests:
+
+.. code-block:: bash
+
+ $ ./tools/testing/kunit/kunit.py run --kunitconfig=lib/kunit/.kunitconfig
+
+Alternatively, if you're following the convention of naming your
+file ``.kunitconfig``, you can just pass in the dir, e.g.
+
+.. code-block:: bash
+
+ $ ./tools/testing/kunit/kunit.py run --kunitconfig=lib/kunit
+
+.. note::
+ This is a relatively new feature (5.12+) so we don't have any
+ conventions yet about on what files should be checked in versus just
+ kept around locally. It's up to you and your maintainer to decide if a
+ config is useful enough to submit (and therefore have to maintain).
+
+.. note::
+ Having ``.kunitconfig`` fragments in a parent and child directory is
+ iffy. There's discussion about adding an "import" statement in these
+ files to make it possible to have a top-level config run tests from all
+ child directories. But that would mean ``.kunitconfig`` files are no
+ longer just simple .config fragments.
+
+ One alternative would be to have kunit tool recursively combine configs
+ automagically, but tests could theoretically depend on incompatible
+ options, so handling that would be tricky.
+
+Generating code coverage reports under UML
+------------------------------------------
+
+.. note::
+ TODO(brendanhiggins(a)google.com): There are various issues with UML and
+ versions of gcc 7 and up. You're likely to run into missing ``.gcda``
+ files or compile errors. We know one `faulty GCC commit
+ <https://github.com/gcc-mirror/gcc/commit/8c9434c2f9358b8b8bad2c1990edf10a21…>`_
+ but not how we'd go about getting this fixed. The compile errors still
+ need some investigation.
+
+.. note::
+ TODO(brendanhiggins(a)google.com): for recent versions of Linux
+ (5.10-5.12, maybe earlier), there's a bug with gcov counters not being
+ flushed in UML. This translates to very low (<1%) reported coverage. This is
+ related to the above issue and can be worked around by replacing the
+ one call to ``uml_abort()`` with a plain ``exit()``.
+
+
+This is different from the "normal" way of getting coverage information that is
+documented in Documentation/dev-tools/gcov.rst.
+
+Instead of enabling ``CONFIG_GCOV_KERNEL=y``, we can set these options:
+
+.. code-block:: none
+
+ CONFIG_DEBUG_KERNEL=y
+ CONFIG_DEBUG_INFO=y
+ CONFIG_GCOV=y
+
+
+Putting it together into a copy-pastable sequence of commands:
+
+.. code-block:: bash
+
+ # Append coverage options to the current config
+ $ echo -e "CONFIG_DEBUG_KERNEL=y\nCONFIG_DEBUG_INFO=y\nCONFIG_GCOV=y" >> .kunit/.kunitconfig
+ $ ./tools/testing/kunit/kunit.py run
+ # Extract the coverage information from the build dir (.kunit/)
+ $ lcov -t "my_kunit_tests" -o coverage.info -c -d .kunit/
+
+ # From here on, it's the same process as with CONFIG_GCOV_KERNEL=y
+ # E.g. can generate an HTML report in a tmp dir like so:
+ $ genhtml -o /tmp/coverage_html coverage.info
+
+
+If your installed version of gcc doesn't work, you can tweak the steps:
+
+.. code-block:: bash
+
+ # need to edit tools/testing/kunit/kunit_kernel.py to call make with 'CC=/usr/bin/gcc-6'
+ $ $EDITOR tools/testing/kunit/kunit_kernel.py
+
+ $ lcov -t "my_kunit_tests" -o coverage.info -c -d .kunit/ --gcov-tool=/usr/bin/gcov-6
+
+
+Running tests manually
+======================
+
+Running tests without using ``kunit.py run`` is also an important use case.
+Currently it's your only option if you want to test on architectures other than
+UML.
+
+As running the tests under UML is fairly straightforward (configure and compile
+the kernel, run the ``./linux`` binary), this section will focus on testing
+non-UML architectures.
+
+
+Running built-in tests
+----------------------
+
+When setting tests to ``=y``, the tests will run as part of boot and print
+results to dmesg in TAP format. So you just need to add your tests to your
+``.config``, build and boot your kernel as normal.
+
+So if we compiled our kernel with:
+
+.. code-block:: none
+
+ CONFIG_KUNIT=y
+ CONFIG_KUNIT_EXAMPLE_TEST=y
+
+Then we'd see output like this in dmesg signaling the test ran and passed:
+
+.. code-block:: none
+
+ TAP version 14
+ 1..1
+ # Subtest: example
+ 1..1
+ # example_simple_test: initializing
+ ok 1 - example_simple_test
+ ok 1 - example
+
+Running tests as modules
+------------------------
+
+Depending on the tests, you can build them as loadable modules.
+
+For example, we'd change the config options from before to
+
+.. code-block:: none
+
+ CONFIG_KUNIT=y
+ CONFIG_KUNIT_EXAMPLE_TEST=m
+
+Then after booting into our kernel, we can run the test via
+
+.. code-block:: none
+
+ $ modprobe kunit-example-test
+
+This will then cause it to print TAP output to stdout.
+
+.. note::
+ The ``modprobe`` will *not* have a non-zero exit code if any test
+ failed (as of 5.13). But ``kunit.py parse`` would, see below.
+
+.. note::
+ You can set ``CONFIG_KUNIT=m`` as well, however, some features will not
+ work and thus some tests might break. Ideally tests would specify they
+ depend on ``KUNIT=y`` in their ``Kconfig``'s, but this is an edge case
+ most test authors won't think about.
+ As of 5.13, the only difference is that ``current->kunit_test`` will
+ not exist.
+
+Pretty-printing results
+-----------------------
+
+You can use ``kunit.py parse`` to parse dmesg for test output and print out
+results in the same familiar format that ``kunit.py run`` does.
+
+.. code-block:: bash
+
+ $ ./tools/testing/kunit/kunit.py parse /var/log/dmesg
+
+
+Retrieving per suite results
+----------------------------
+
+Regardless of how you're running your tests, you can enable
+``CONFIG_KUNIT_DEBUGFS`` to expose per-suite TAP-formatted results:
+
+.. code-block:: none
+
+ CONFIG_KUNIT=y
+ CONFIG_KUNIT_EXAMPLE_TEST=m
+ CONFIG_KUNIT_DEBUGFS=y
+
+The results for each suite will be exposed under
+``/sys/kernel/debug/kunit/<suite>/results``.
+So using our example config:
+
+.. code-block:: bash
+
+ $ modprobe kunit-example-test > /dev/null
+ $ cat /sys/kernel/debug/kunit/example/results
+ ... <TAP output> ...
+
+ # After removing the module, the corresponding files will go away
+ $ modprobe -r kunit-example-test
+ $ cat /sys/kernel/debug/kunit/example/results
+ /sys/kernel/debug/kunit/example/results: No such file or directory
+
+Generating code coverage reports
+--------------------------------
+
+See Documentation/dev-tools/gcov.rst for details on how to do this.
+
+The only vaguely KUnit-specific advice here is that you probably want to build
+your tests as modules. That way you can isolate the coverage from tests from
+other code executed during boot, e.g.
+
+.. code-block:: bash
+
+ # Reset coverage counters before running the test.
+ $ echo 0 > /sys/kernel/debug/gcov/reset
+ $ modprobe kunit-example-test
diff --git a/Documentation/dev-tools/kunit/start.rst b/Documentation/dev-tools/kunit/start.rst
index 0e65cabe08eb..aa56d7ca6bfb 100644
--- a/Documentation/dev-tools/kunit/start.rst
+++ b/Documentation/dev-tools/kunit/start.rst
@@ -236,5 +236,7 @@ Next Steps
==========
* Check out the :doc:`tips` page for tips on
writing idiomatic KUnit tests.
+* Check out the :doc:`running_tips` page for tips on
+ how to make running KUnit tests easier.
* Optional: see the :doc:`usage` page for a more
in-depth explanation of KUnit.
base-commit: de2fcb3e62013738f22bbb42cbd757d9a242574e
--
2.31.1.295.g9ea45b61b8-goog
This is long overdue.
There are several things that aren't nailed down (in-tree
.kunitconfig's), or partially broken (GCOV on UML), but having them
documented, warts and all, is better than having nothing.
This covers a bunch of the more recent features
* kunit_filter_glob
* kunit.py run --kunitconfig
* kunit.py run --alltests
* slightly more detail on building tests as modules
* CONFIG_KUNIT_DEBUGFS
By my count, the only headline features now not mentioned are the KASAN
integration and KernelCI json output support (kunit.py run --json).
And then it also discusses how to get code coverage reports under UML
and non-UML since this is a question people have repeatedly asked.
Non-UML coverage collection is no differnt from normal, but we should
probably explicitly call thsi out.
As for UML, I was able to get it working again with two small hacks.*
E.g. with CONFIG_KUNIT=y && CONFIG_KUNIT_ALL_TESTS=y
Overall coverage rate:
lines......: 15.1% (18294 of 120776 lines)
functions..: 16.8% (1860 of 11050 functions)
*Switching to use gcc/gcov-6 and not using uml_abort().
I've documented these hacks in "Notes" but left TODOs for
brendanhiggins(a)google.com who tracked down the runtime issue in GCC.
To be clear: these are not issues specific to KUnit, but rather to UML.
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
Documentation/dev-tools/kunit/index.rst | 1 +
.../dev-tools/kunit/running_tips.rst | 278 ++++++++++++++++++
Documentation/dev-tools/kunit/start.rst | 2 +
3 files changed, 281 insertions(+)
create mode 100644 Documentation/dev-tools/kunit/running_tips.rst
diff --git a/Documentation/dev-tools/kunit/index.rst b/Documentation/dev-tools/kunit/index.rst
index 848478838347..7f7cf8d2ab20 100644
--- a/Documentation/dev-tools/kunit/index.rst
+++ b/Documentation/dev-tools/kunit/index.rst
@@ -14,6 +14,7 @@ KUnit - Unit Testing for the Linux Kernel
style
faq
tips
+ running_tips
What is KUnit?
==============
diff --git a/Documentation/dev-tools/kunit/running_tips.rst b/Documentation/dev-tools/kunit/running_tips.rst
new file mode 100644
index 000000000000..d38e665e530f
--- /dev/null
+++ b/Documentation/dev-tools/kunit/running_tips.rst
@@ -0,0 +1,278 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+Tips For Running KUnit Tests
+============================
+
+Using ``kunit.py run`` ("kunit tool")
+=====================================
+
+Running from any directory
+--------------------------
+
+It can be handy to create a bash function like:
+
+.. code-block:: bash
+
+ function run_kunit() {
+ ( cd "$(git rev-parse --show-toplevel)" && ./tools/testing/kunit/kunit.py run $@ )
+ }
+
+.. note::
+ Early versions of ``kunit.py`` (before 5.6) didn't work unless run from
+ the kernel root, hence the use of a subshell and ``cd``.
+
+Running a subset of tests
+-------------------------
+
+``kunit.py run`` accepts an optional glob argument to filter tests. Currently
+this only matches against suite names, but this may change in the future.
+
+Say that we wanted to run the sysctl tests, we could do so via:
+
+.. code-block:: bash
+
+ $ echo -e 'CONFIG_KUNIT=y\nCONFIG_KUNIT_ALL_TESTS=y' > .kunit/.kunitconfig
+ $ ./tools/testing/kunit/kunit.py run 'sysctl*'
+
+We're paying the cost of building more tests than we need this way, but it's
+easier than fiddling with ``.kunitconfig`` files or commenting out
+``kunit_suite``'s.
+
+However, if we wanted to define a set of tests in a less ad hoc way, the next
+tip is useful.
+
+Defining a set of tests
+-----------------------
+
+``kunit.py run`` (along with ``build``, and ``config``) supports a
+``--kunitconfig`` flag. So if you have a set of tests that you want to run on a
+regular basis (especially if they have other dependencies), you can create a
+specific ``.kunitconfig`` for them.
+
+E.g. kunit has own for its tests:
+
+.. code-block:: bash
+
+ $ ./tools/testing/kunit/kunit.py run --kunitconfig=lib/kunit/.kunitconfig
+
+Alternatively, if you're following the convention of naming your
+file ``.kunitconfig``, you can just pass in the dir, e.g.
+
+.. code-block:: bash
+
+ $ ./tools/testing/kunit/kunit.py run --kunitconfig=lib/kunit
+
+.. note::
+ This is a relatively new feature (5.12+) so we don't have any
+ conventions yet about on what files should be checked in versus just
+ kept around locally. But if the tests don't have any dependencies
+ (beyond ``CONFIG_KUNIT``), it's probably not worth writing and
+ maintaining a ``.kunitconfig`` fragment. Running with
+ ``CONFIG_KUNIT_ALL_TESTS=y`` is probably easier.
+
+.. note::
+ Having ``.kunitconfig`` fragments in a parent and child directory is
+ iffy. There's discussion about adding an "import" statement in these
+ files to make it possible to have a top-level config run tests from all
+ child directories. But that would mean ``.kunitconfig`` files are no
+ longer just simple .config fragments.
+
+ One alternative would be to have kunit tool recursively combine configs
+ automagically, but tests could theoretically depend on incompatible
+ options, so handling that would be tricky.
+
+Running with ``allyesconfig``
+-----------------------------
+
+.. code-block:: bash
+
+ $ ./tools/testing/kunit/kunit.py run --alltests
+
+This will try and use ``allyesconfig``, or rather ``allyesconfig`` with a list
+of UML-incompatible configs turned off. That list is maintained in
+``tools/testing/kunit/configs/broken_on_uml.config``.
+
+.. note::
+ This will take a *lot* longer to run and might be broken from time to
+ time, especially on -next. It's not recommended to use this unless you
+ need to or are morbidly curious.
+
+Generating code coverage reports under UML
+------------------------------------------
+
+.. note::
+ TODO(brendanhiggins(a)google.com): There are various issues with UML and
+ versions of gcc 7 and up. You're likely to run into missing ``.gcda``
+ files or compile errors. We know one `faulty GCC commit
+ <https://github.com/gcc-mirror/gcc/commit/8c9434c2f9358b8b8bad2c1990edf10a21…>`_
+ but not how we'd go about getting this fixed. The compile errors still
+ need some investigation.
+
+.. note::
+ TODO(brendanhiggins(a)google.com): for recent versions of Linux
+ (5.10-5.12, maybe earlier), there's a bug with gcov counters not being
+ flushed in UML. This translates to very low (<1%) reported coverage. This is
+ related to the above issue and can be worked around by replacing the
+ one call to ``uml_abort()`` with a plain ``exit()``.
+
+
+This is different from the "normal" way of getting coverage information that is
+documented in Documentation/dev-tools/gcov.rst.
+
+Instead of enabling ``CONFIG_GCOV_KERNEL=y``, we can set these options:
+
+.. code-block:: none
+
+ CONFIG_DEBUG_KERNEL=y
+ CONFIG_DEBUG_INFO=y
+ CONFIG_GCOV=y
+
+
+Putting it together into a copy-pastable sequence of commands:
+
+.. code-block:: bash
+
+ # Append coverage options to the current config
+ $ echo -e "CONFIG_DEBUG_KERNEL=y\nCONFIG_DEBUG_INFO=y\nCONFIG_GCOV=y" >> .kunit/.kunitconfig
+ $ ./tools/testing/kunit/kunit.py run
+ # Extract the coverage information from the build dir (.kunit/)
+ $ lcov -t "my_kunit_tests" -o coverage.info -c -d .kunit/
+
+ # From here on, it's the same process as with CONFIG_GCOV_KERNEL=y
+ # E.g. can generate an HTML report in a tmp dir like so:
+ $ genhtml -o /tmp/coverage_html coverage.info
+
+
+If your installed version of gcc doesn't work, you can tweak the steps:
+
+.. code-block:: bash
+
+ # need to edit tools/testing/kunit/kunit_kernel.py to call make with 'CC=/usr/bin/gcc-6'
+ $ $EDITOR tools/testing/kunit/kunit_kernel.py
+
+ $ lcov -t "my_kunit_tests" -o coverage.info -c -d .kunit/ --gcov-tool=/usr/bin/gcov-6
+
+
+Running tests manually
+======================
+
+Running tests without using ``kunit.py run`` is also an important use case.
+Currently it's your only option if you want to test on architectures other than
+UML.
+
+As running the tests under UML is fairly straightforward (configure and compile
+the kernel, run the ``./linux`` binary), this section will focus on testing
+non-UML architectures.
+
+
+Running built-in tests
+----------------------
+
+When setting tests to ``=y``, the tests will run as part of boot and print
+results to dmesg in TAP format. So you just need to add your tests to your
+``.config``, build and boot your kernel as normal.
+
+So if we compiled our kernel with:
+
+.. code-block:: none
+
+ CONFIG_KUNIT=y
+ CONFIG_KUNIT_EXAMPLE_TEST=y
+
+Then we'd see output like this in dmesg signaling the test ran and passed:
+
+.. code-block:: none
+
+ TAP version 14
+ 1..1
+ # Subtest: example
+ 1..1
+ # example_simple_test: initializing
+ ok 1 - example_simple_test
+ ok 1 - example
+
+Running tests as modules
+------------------------
+
+Depending on the tests, you can build them as loadable modules.
+
+For example, we'd change the config options from before to
+
+.. code-block:: none
+
+ CONFIG_KUNIT=y
+ CONFIG_KUNIT_EXAMPLE_TEST=m
+
+Then after booting into our kernel, we can run the test via
+
+.. code-block:: none
+
+ $ modprobe kunit-example-test
+
+This will then cause it to print TAP output to stdout.
+
+.. note::
+ The ``modprobe`` will *not* have a non-zero exit code if any test
+ failed (as of 5.13). But ``kunit.py parse`` would, see below.
+
+.. note::
+ You can set ``CONFIG_KUNIT=m`` as well, however, some features will not
+ work and thus some tests might break. Ideally tests would specify they
+ depend on ``KUNIT=y`` in their ``Kconfig``'s, but this is an edge case
+ most test authors won't think about.
+ As of 5.13, the only difference is that ``current->kunit_test`` will
+ not exist.
+
+Pretty-printing results
+-----------------------
+
+You can use ``kunit.py parse`` to parse dmesg for test output and print out
+results in the same familiar format that ``kunit.py run`` does.
+
+.. code-block:: bash
+
+ $ ./tools/testing/kunit/kunit.py parse /var/log/dmesg
+
+
+Retrieving per suite results
+----------------------------
+
+Regardless of how you're running your tests, you can enable
+``CONFIG_KUNIT_DEBUGFS`` to expose per-suite TAP-formatted results:
+
+.. code-block:: none
+
+ CONFIG_KUNIT=y
+ CONFIG_KUNIT_EXAMPLE_TEST=m
+ CONFIG_KUNIT_DEBUGFS=y
+
+The results for each suite will be exposed under
+``/sys/kernel/debug/kunit/<suite>/results``.
+So using our example config:
+
+.. code-block:: bash
+
+ $ modprobe kunit-example-test > /dev/null
+ $ cat /sys/kernel/debug/kunit/example/results
+ ... <TAP output> ...
+
+ # After removing the module, the corresponding files will go away
+ $ modprobe -r kunit-example-test
+ $ cat /sys/kernel/debug/kunit/example/results
+ /sys/kernel/debug/kunit/example/results: No such file or directory
+
+Generating code coverage reports
+--------------------------------
+
+See Documentation/dev-tools/gcov.rst for details on how to do this.
+
+The only vaguely KUnit-specific advice here is that you probably want to build
+your tests as modules. That way you can isolate the coverage from tests from
+other code executed during boot, e.g.
+
+.. code-block:: bash
+
+ # Reset coverage counters before running the test.
+ $ echo 0 > /sys/kernel/debug/gcov/reset
+ $ modprobe kunit-example-test
diff --git a/Documentation/dev-tools/kunit/start.rst b/Documentation/dev-tools/kunit/start.rst
index 0e65cabe08eb..aa56d7ca6bfb 100644
--- a/Documentation/dev-tools/kunit/start.rst
+++ b/Documentation/dev-tools/kunit/start.rst
@@ -236,5 +236,7 @@ Next Steps
==========
* Check out the :doc:`tips` page for tips on
writing idiomatic KUnit tests.
+* Check out the :doc:`running_tips` page for tips on
+ how to make running KUnit tests easier.
* Optional: see the :doc:`usage` page for a more
in-depth explanation of KUnit.
base-commit: de2fcb3e62013738f22bbb42cbd757d9a242574e
--
2.31.1.295.g9ea45b61b8-goog
This patchset introduces batched operations for the per-cpu variant of
the array map.
It also removes the percpu macros from 'bpf_util.h'. This change was
suggested by Andrii in a earlier iteration of this patchset.
The tests were updated to reflect all the new changes.
v2 -> v3:
- Remove percpu macros as suggested by Andrii
- Update tests that used the per cpu macros
v1 -> v2:
- Amended a more descriptive commit message
Pedro Tammela (3):
bpf: add batched ops support for percpu array
bpf: selftests: remove percpu macros from bpf_util.h
bpf: selftests: update array map tests for per-cpu batched ops
kernel/bpf/arraymap.c | 2 +
tools/testing/selftests/bpf/bpf_util.h | 7 --
.../bpf/map_tests/array_map_batch_ops.c | 110 +++++++++++++-----
.../bpf/map_tests/htab_map_batch_ops.c | 71 ++++++-----
.../selftests/bpf/prog_tests/map_init.c | 9 +-
tools/testing/selftests/bpf/test_maps.c | 84 +++++++------
6 files changed, 171 insertions(+), 112 deletions(-)
--
2.25.1
The kernel now has a number of testing and debugging tools, and we've
seen a bit of confusion about what the differences between them are.
Add a basic documentation outlining the testing tools, when to use each,
and how they interact.
This is a pretty quick overview rather than the idealised "kernel
testing guide" that'd probably be optimal, but given the number of times
questions like "When do you use KUnit and when do you use Kselftest?"
are being asked, it seemed worth at least having something. Hopefully
this can form the basis for more detailed documentation later.
Signed-off-by: David Gow <davidgow(a)google.com>
---
Documentation/dev-tools/index.rst | 3 +
Documentation/dev-tools/testing-overview.rst | 102 +++++++++++++++++++
2 files changed, 105 insertions(+)
create mode 100644 Documentation/dev-tools/testing-overview.rst
diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index 1b1cf4f5c9d9..f590e5860794 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -7,6 +7,8 @@ be used to work on the kernel. For now, the documents have been pulled
together without any significant effort to integrate them into a coherent
whole; patches welcome!
+A brief overview of testing-specific tools can be found in :doc:`testing-overview`.
+
.. class:: toc-title
Table of contents
@@ -14,6 +16,7 @@ whole; patches welcome!
.. toctree::
:maxdepth: 2
+ testing-overview
coccinelle
sparse
kcov
diff --git a/Documentation/dev-tools/testing-overview.rst b/Documentation/dev-tools/testing-overview.rst
new file mode 100644
index 000000000000..8452adcb8608
--- /dev/null
+++ b/Documentation/dev-tools/testing-overview.rst
@@ -0,0 +1,102 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Kernel Testing Guide
+====================
+
+
+There are a number of different tools for testing the Linux kernel, so knowing
+when to use each of them can be a challenge. This document provides a rough
+overview of their differences, and how they fit together.
+
+
+Writing and Running Tests
+=========================
+
+The bulk of kernel tests are written using either the :doc:`kselftest
+<kselftest>` or :doc:`KUnit <kunit/index>` frameworks. These both provide
+infrastructure to help make running tests and groups of tests easier, as well
+as providing helpers to aid in writing new tests.
+
+If you're looking to verify the behaviour of the Kernel — particularly specific
+parts of the kernel — then you'll want to use `KUnit` or `kselftest`.
+
+
+The Difference Between KUnit and kselftest
+------------------------------------------
+
+:doc:`KUnit <kunit/index>` is an entirely in-kernel system for "white box"
+testing: because test code is part of the kernel, it can access internal
+structures and functions which aren't exposed to userspace.
+
+`KUnit` tests therefore are best written against small, self-contained parts
+of the kernel, which can be tested in isolation. This aligns well with the
+concept of Unit testing.
+
+For example, a KUnit test might test an individual kernel function (or even a
+single codepath through a function, such as an error handling case), rather
+than a feature as a whole.
+
+There is a KUnit test style guide which may give further pointers
+
+
+:doc:`kselftest <kselftest>`, on the other hand, is largely implemented in
+userspace, and tests are normal userspace scripts or programs.
+
+This makes it easier to write more complicated tests, or tests which need to
+manipulate the overall system state more (e.g., spawning processes, etc.).
+However, it's not possible to call kernel functions directly unless they're
+exposed to userspace (by a syscall, device, filesystem, etc.) Some tests to
+also provide a kernel module which is loaded by the test, though for tests
+which run mostly or entirely within the kernel, `KUnit` may be the better tool.
+
+`kselftest` is therefore suited well to tests of whole features, as these will
+expose an interface to userspace, which can be tested, but not implementation
+details. This aligns well with 'system' or 'end-to-end' testing.
+
+
+Code Coverage Tools
+===================
+
+The Linux Kernel supports two different code coverage mesurement tools. These
+can be used to verify that a test is executing particular functions or lines
+of code. This is useful for determining how much of the kernel is being tested,
+and for finding corner-cases which are not covered by the appropriate test.
+
+:doc:`kcov` is a feature which can be built in to the kernel to allow
+capturing coverage on a per-task level. It's therefore useful for fuzzing and
+other situations where information about code executed during, for example, a
+single syscall is useful.
+
+:doc:`gcov` is GCC's coverage testing tool, which can be used with the kernel
+to get global or per-module coverage. Unlike KCOV, it does not record per-task
+coverage. Coverage data can be read from debugfs, and interpreted using the
+usual gcov tooling.
+
+
+Sanitizers
+==========
+
+The kernel also supports a number of sanitizers, which attempt to detect
+classes of issues when the occur in a running kernel. These typically
+look for undefined behaviour of some kind, such as invalid memory accesses,
+concurrency issues such as data races, or other undefined behaviour like
+integer overflows.
+
+* :doc:`kmemleak` (Kmemleak) detects possible memory leaks.
+* :doc:`kasan` detects invalid memory accesses such as out-of-bounds and
+ use-after-free errors.
+* :doc:`ubsan` detects behaviour that is undefined by the C standard, like
+ integer overflows.
+* :doc:`kcsan` detects data races.
+* :doc:`kfence` is a low-overhead detector of memory issues, which is much
+ faster than KASAN and can be used in production.
+
+These tools tend to test the kernel as a whole, and do not "pass" like
+kselftest or KUnit tests. They can be combined with KUnit or kselftest by
+running tests on a kernel with a sanitizer enabled: you can then be sure
+that none of these errors are occurring during the test.
+
+Some of these sanitizers integrate with KUnit or kselftest and will
+automatically fail tests if an issue is detected by a sanitizer.
+
--
2.31.1.295.g9ea45b61b8-goog
Changelog
RFC v2-->v3
Based on comments by Doug Smythies,
1. Changed commit log to reflect the test must be run as super user.
2. Added a comment specifying a method to run the test bash script
without recompiling.
3. Enable all the idle states after the experiments are completed so
that the system is in a coherent state after the tests have run
4. Correct the return status of a CPU that cannot be off-lined.
RFC v2: https://lkml.org/lkml/2021/4/1/615
---
A kernel module + userspace driver to estimate the wakeup latency
caused by going into stop states. The motivation behind this program is
to find significant deviations behind advertised latency and residency
values.
The patchset measures latencies for two kinds of events. IPIs and Timers
As this is a software-only mechanism, there will additional latencies of
the kernel-firmware-hardware interactions. To account for that, the
program also measures a baseline latency on a 100 percent loaded CPU
and the latencies achieved must be in view relative to that.
To achieve this, we introduce a kernel module and expose its control
knobs through the debugfs interface that the selftests can engage with.
The kernel module provides the following interfaces within
/sys/kernel/debug/latency_test/ for,
IPI test:
ipi_cpu_dest = Destination CPU for the IPI
ipi_cpu_src = Origin of the IPI
ipi_latency_ns = Measured latency time in ns
Timeout test:
timeout_cpu_src = CPU on which the timer to be queued
timeout_expected_ns = Timer duration
timeout_diff_ns = Difference of actual duration vs expected timer
Sample output on a POWER9 system is as follows:
# --IPI Latency Test---
# Baseline Average IPI latency(ns): 3114
# Observed Average IPI latency(ns) - State0: 3265
# Observed Average IPI latency(ns) - State1: 3507
# Observed Average IPI latency(ns) - State2: 3739
# Observed Average IPI latency(ns) - State3: 3807
# Observed Average IPI latency(ns) - State4: 17070
# Observed Average IPI latency(ns) - State5: 1038174
# Observed Average IPI latency(ns) - State6: 1068784
#
# --Timeout Latency Test--
# Baseline Average timeout diff(ns): 1420
# Observed Average timeout diff(ns) - State0: 1640
# Observed Average timeout diff(ns) - State1: 1764
# Observed Average timeout diff(ns) - State2: 1715
# Observed Average timeout diff(ns) - State3: 1845
# Observed Average timeout diff(ns) - State4: 16581
# Observed Average timeout diff(ns) - State5: 939977
# Observed Average timeout diff(ns) - State6: 1073024
Things to keep in mind:
1. This kernel module + bash driver does not guarantee idleness on a
core when the IPI and the Timer is armed. It only invokes sleep and
hopes that the core is idle once the IPI/Timer is invoked onto it.
Hence this program must be run on a completely idle system for best
results
2. Even on a completely idle system, there maybe book-keeping tasks or
jitter tasks that can run on the core we want idle. This can create
outliers in the latency measurement. Thankfully, these outliers
should be large enough to easily weed them out.
3. A userspace only selftest variant was also sent out as RFC based on
suggestions over the previous patchset to simply the kernel
complexeity. However, a userspace only approach had more noise in
the latency measurement due to userspace-kernel interactions
which led to run to run variance and a lesser accurate test.
Another downside of the nature of a userspace program is that it
takes orders of magnitude longer to complete a full system test
compared to the kernel framework.
RFC patch: https://lkml.org/lkml/2020/9/2/356
4. For Intel Systems, the Timer based latencies don't exactly give out
the measure of idle latencies. This is because of a hardware
optimization mechanism that pre-arms a CPU when a timer is set to
wakeup. That doesn't make this metric useless for Intel systems,
it just means that is measuring IPI/Timer responding latency rather
than idle wakeup latencies.
(Source: https://lkml.org/lkml/2020/9/2/610)
For solution to this problem, a hardware based latency analyzer is
devised by Artem Bityutskiy from Intel.
https://youtu.be/Opk92aQyvt0?t=8266https://intel.github.io/wult/
Pratik Rajesh Sampat (2):
cpuidle: Extract IPI based and timer based wakeup latency from idle
states
selftest/cpuidle: Add support for cpuidle latency measurement
drivers/cpuidle/Makefile | 1 +
drivers/cpuidle/test-cpuidle_latency.c | 157 ++++++++++
lib/Kconfig.debug | 10 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/cpuidle/Makefile | 6 +
tools/testing/selftests/cpuidle/cpuidle.sh | 326 +++++++++++++++++++++
tools/testing/selftests/cpuidle/settings | 2 +
7 files changed, 503 insertions(+)
create mode 100644 drivers/cpuidle/test-cpuidle_latency.c
create mode 100644 tools/testing/selftests/cpuidle/Makefile
create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh
create mode 100644 tools/testing/selftests/cpuidle/settings
--
2.17.1
This series aims to clarify the behavior of the KVM_GET_EMULATED_CPUID
ioctl, and fix a corner case where -E2BIG is returned when
the nent field of struct kvm_cpuid2 is matching the amount of
emulated entries that kvm returns.
Patch 1 proposes the nent field fix to cpuid.c,
patch 2 updates the ioctl documentation accordingly and
patches 3 and 4 extend the x86_64/get_cpuid_test.c selftest to check
the intended behavior of KVM_GET_EMULATED_CPUID.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit(a)redhat.com>
---
v5:
- Better comment in cpuid.c (patch 1)
Emanuele Giuseppe Esposito (4):
KVM: x86: Fix a spurious -E2BIG in KVM_GET_EMULATED_CPUID
Documentation: KVM: update KVM_GET_EMULATED_CPUID ioctl description
selftests: add kvm_get_emulated_cpuid to processor.h
selftests: KVM: extend get_cpuid_test to include
KVM_GET_EMULATED_CPUID
Documentation/virt/kvm/api.rst | 10 +--
arch/x86/kvm/cpuid.c | 33 ++++---
.../selftests/kvm/include/x86_64/processor.h | 1 +
.../selftests/kvm/lib/x86_64/processor.c | 33 +++++++
.../selftests/kvm/x86_64/get_cpuid_test.c | 90 ++++++++++++++++++-
5 files changed, 142 insertions(+), 25 deletions(-)
--
2.30.2
This series aims to clarify the behavior of the KVM_GET_EMULATED_CPUID
ioctl, and fix a corner case where -E2BIG is returned when
the nent field of struct kvm_cpuid2 is matching the amount of
emulated entries that kvm returns.
Patch 1 proposes the nent field fix to cpuid.c,
patch 2 updates the ioctl documentation accordingly and
patches 3 and 4 extend the x86_64/get_cpuid_test.c selftest to check
the intended behavior of KVM_GET_EMULATED_CPUID.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit(a)redhat.com>
---
v4:
- Address nitpicks given in the mailing list
Emanuele Giuseppe Esposito (4):
KVM: x86: Fix a spurious -E2BIG in KVM_GET_EMULATED_CPUID
Documentation: KVM: update KVM_GET_EMULATED_CPUID ioctl description
selftests: add kvm_get_emulated_cpuid to processor.h
selftests: KVM: extend get_cpuid_test to include
KVM_GET_EMULATED_CPUID
Documentation/virt/kvm/api.rst | 10 +--
arch/x86/kvm/cpuid.c | 33 ++++---
.../selftests/kvm/include/x86_64/processor.h | 1 +
.../selftests/kvm/lib/x86_64/processor.c | 33 +++++++
.../selftests/kvm/x86_64/get_cpuid_test.c | 90 ++++++++++++++++++-
5 files changed, 142 insertions(+), 25 deletions(-)
--
2.30.2
As of commit 359a376081d4 ("kunit: support failure from dynamic analysis
tools"), we can use current->kunit_test to find the current kunit test.
Mention this in tips.rst and give an example of how this can be used in
conjunction with `test->priv` to pass around state and specifically
implement something like mocking.
There's a lot more we could go into on that topic, but given that
example is already longer than every other "tip" on this page, we just
point to the API docs and leave filling in the blanks as an exercise to
the reader.
Also give an example of kunit_fail_current_test().
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
Documentation/dev-tools/kunit/tips.rst | 78 +++++++++++++++++++++++++-
1 file changed, 76 insertions(+), 2 deletions(-)
diff --git a/Documentation/dev-tools/kunit/tips.rst b/Documentation/dev-tools/kunit/tips.rst
index a6ca0af14098..8d8c238f7f79 100644
--- a/Documentation/dev-tools/kunit/tips.rst
+++ b/Documentation/dev-tools/kunit/tips.rst
@@ -78,8 +78,82 @@ Similarly to the above, it can be useful to add test-specific logic.
void test_only_hook(void) { }
#endif
-TODO(dlatypov(a)google.com): add an example of using ``current->kunit_test`` in
-such a hook when it's not only updated for ``CONFIG_KASAN=y``.
+This test-only code can be made more useful by accessing the current kunit
+test, see below.
+
+Accessing the current test
+--------------------------
+
+In some cases, you need to call test-only code from outside the test file, e.g.
+like in the example above or if you're providing a fake implementation of an
+ops struct.
+There is a ``kunit_test`` field in ``task_struct``, so you can access it via
+``current->kunit_test``.
+
+Here's a slightly in-depth example of how one could implement "mocking":
+
+.. code-block:: c
+
+ #include <linux/sched.h> /* for current */
+
+ struct test_data {
+ int foo_result;
+ int want_foo_called_with;
+ };
+
+ static int fake_foo(int arg)
+ {
+ struct kunit *test = current->kunit_test;
+ struct test_data *test_data = test->priv;
+
+ KUNIT_EXPECT_EQ(test, test_data->want_foo_called_with, arg);
+ return test_data->foo_result;
+ }
+
+ static void example_simple_test(struct kunit *test)
+ {
+ /* Assume priv is allocated in the suite's .init */
+ struct test_data *test_data = test->priv;
+
+ test_data->foo_result = 42;
+ test_data->want_foo_called_with = 1;
+
+ /* In a real test, we'd probably pass a pointer to fake_foo somewhere
+ * like an ops struct, etc. instead of calling it directly. */
+ KUNIT_EXPECT_EQ(test, fake_foo(1), 42);
+ }
+
+
+Note: here we're able to get away with using ``test->priv``, but if you wanted
+something more flexible you could use a named ``kunit_resource``, see :doc:`api/test`.
+
+Failing the current test
+------------------------
+
+But sometimes, you might just want to fail the current test. In that case, we
+have ``kunit_fail_current_test(fmt, args...)`` which is defined in ``<kunit/test-bug.h>`` and
+doesn't require pulling in ``<kunit/test.h>``.
+
+E.g. say we had an option to enable some extra debug checks on some data structure:
+
+.. code-block:: c
+
+ #include <kunit/test-bug.h>
+
+ #ifdef CONFIG_EXTRA_DEBUG_CHECKS
+ static void validate_my_data(struct data *data)
+ {
+ if (is_valid(data))
+ return;
+
+ kunit_fail_current_test("data %p is invalid", data);
+
+ /* Normal, non-KUnit, error reporting code here. */
+ }
+ #else
+ static void my_debug_function(void) { }
+ #endif
+
Customizing error messages
--------------------------
base-commit: 0a50438c84363bd37fe18fe432888ae9a074dcab
--
2.31.0.208.g409f899ff0-goog
This patchset introduces batched operations for the per-cpu variant of
the array map.
It also introduces a standard way to define per-cpu values via the
'BPF_PERCPU_TYPE()' macro, which handles the alignment transparently.
This was already implemented in the selftests and was merely refactored
out to libbpf, with some simplifications for reuse.
The tests were updated to reflect all the new changes.
v1 -> v2:
- Amended a more descriptive commit message
Pedro Tammela (3):
bpf: add batched ops support for percpu array
libbpf: selftests: refactor 'BPF_PERCPU_TYPE()' and 'bpf_percpu()'
macros
bpf: selftests: update array map tests for per-cpu batched ops
kernel/bpf/arraymap.c | 2 +
tools/lib/bpf/bpf.h | 10 ++
tools/testing/selftests/bpf/bpf_util.h | 7 --
.../bpf/map_tests/array_map_batch_ops.c | 114 +++++++++++++-----
.../bpf/map_tests/htab_map_batch_ops.c | 48 ++++----
.../selftests/bpf/prog_tests/map_init.c | 5 +-
tools/testing/selftests/bpf/test_maps.c | 16 +--
7 files changed, 133 insertions(+), 69 deletions(-)
--
2.25.1
This series aims to clarify the behavior of the KVM_GET_EMULATED_CPUID
ioctl, and fix a corner case where -E2BIG is returned when
the nent field of struct kvm_cpuid2 is matching the amount of
emulated entries that kvm returns.
Patch 1 proposes the nent field fix to cpuid.c,
patch 2 updates the ioctl documentation accordingly and
patches 3 and 4 extend the x86_64/get_cpuid_test.c selftest to check
the intended behavior of KVM_GET_EMULATED_CPUID.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit(a)redhat.com>
---
v3:
- clearer commit message and problem explanation
- pre-initialize the stack variable 'entry' in __do_cpuid_func_emulated
so that the various eax/ebx/ecx are initialized if not set by func.
Emanuele Giuseppe Esposito (4):
KVM: x86: Fix a spurious -E2BIG in KVM_GET_EMULATED_CPUID
Documentation: KVM: update KVM_GET_EMULATED_CPUID ioctl description
selftests: add kvm_get_emulated_cpuid to processor.h
selftests: KVM: extend get_cpuid_test to include
KVM_GET_EMULATED_CPUID
Documentation/virt/kvm/api.rst | 10 +--
arch/x86/kvm/cpuid.c | 33 ++++---
.../selftests/kvm/include/x86_64/processor.h | 1 +
.../selftests/kvm/lib/x86_64/processor.c | 33 +++++++
.../selftests/kvm/x86_64/get_cpuid_test.c | 90 ++++++++++++++++++-
5 files changed, 142 insertions(+), 25 deletions(-)
--
2.30.2
This patchset introduces batched operations for the per-cpu variant of
the array map.
It also introduces a standard way to define per-cpu values via the
'BPF_PERCPU_TYPE()' macro, which handles the alignment transparently.
This was already implemented in the selftests and was merely refactored
out to libbpf, with some simplifications for reuse.
The tests were updated to reflect all the new changes.
Pedro Tammela (3):
bpf: add batched ops support for percpu array
libbpf: selftests: refactor 'BPF_PERCPU_TYPE()' and 'bpf_percpu()'
macros
bpf: selftests: update array map tests for per-cpu batched ops
kernel/bpf/arraymap.c | 2 +
tools/lib/bpf/bpf.h | 10 ++
tools/testing/selftests/bpf/bpf_util.h | 7 --
.../bpf/map_tests/array_map_batch_ops.c | 114 +++++++++++++-----
.../bpf/map_tests/htab_map_batch_ops.c | 48 ++++----
.../selftests/bpf/prog_tests/map_init.c | 5 +-
tools/testing/selftests/bpf/test_maps.c | 16 +--
7 files changed, 133 insertions(+), 69 deletions(-)
--
2.25.1
Changelog
RFC v1-->v2
The timer based test produces run to run variance on some intel based
systems that sport a mechansim of "C-state pre-wake" which can
pre-wake a CPU from an idle state when timers are armed.
Hence invoking the timer tests is now parameterized for systems and
architectures that don't support pre-wakeup logic and need granular
timer measurements along with IPI results.
This RFC does not yet support treating of CPU 0s idle states differently
especially as reported on Intel systems. More understanding is needed
on systems to determine if only CPU 0 is treated differently of if they
are more CPUs that cannot have its idle state properties changed.
RFC v1: https://lkml.org/lkml/2021/3/15/492
---
A kernel module + userspace driver to estimate the wakeup latency
caused by going into stop states. The motivation behind this program is
to find significant deviations behind advertised latency and residency
values.
The patchset measures latencies for two kinds of events. IPIs and Timers
As this is a software-only mechanism, there will additional latencies of
the kernel-firmware-hardware interactions. To account for that, the
program also measures a baseline latency on a 100 percent loaded CPU
and the latencies achieved must be in view relative to that.
To achieve this, we introduce a kernel module and expose its control
knobs through the debugfs interface that the selftests can engage with.
The kernel module provides the following interfaces within
/sys/kernel/debug/latency_test/ for,
IPI test:
ipi_cpu_dest = Destination CPU for the IPI
ipi_cpu_src = Origin of the IPI
ipi_latency_ns = Measured latency time in ns
Timeout test:
timeout_cpu_src = CPU on which the timer to be queued
timeout_expected_ns = Timer duration
timeout_diff_ns = Difference of actual duration vs expected timer
Sample output on a POWER9 system is as follows:
# --IPI Latency Test---
# Baseline Average IPI latency(ns): 3114
# Observed Average IPI latency(ns) - State0: 3265
# Observed Average IPI latency(ns) - State1: 3507
# Observed Average IPI latency(ns) - State2: 3739
# Observed Average IPI latency(ns) - State3: 3807
# Observed Average IPI latency(ns) - State4: 17070
# Observed Average IPI latency(ns) - State5: 1038174
# Observed Average IPI latency(ns) - State6: 1068784
#
# --Timeout Latency Test--
# Baseline Average timeout diff(ns): 1420
# Observed Average timeout diff(ns) - State0: 1640
# Observed Average timeout diff(ns) - State1: 1764
# Observed Average timeout diff(ns) - State2: 1715
# Observed Average timeout diff(ns) - State3: 1845
# Observed Average timeout diff(ns) - State4: 16581
# Observed Average timeout diff(ns) - State5: 939977
# Observed Average timeout diff(ns) - State6: 1073024
Things to keep in mind:
1. This kernel module + bash driver does not guarantee idleness on a
core when the IPI and the Timer is armed. It only invokes sleep and
hopes that the core is idle once the IPI/Timer is invoked onto it.
Hence this program must be run on a completely idle system for best
results
2. Even on a completely idle system, there maybe book-keeping tasks or
jitter tasks that can run on the core we want idle. This can create
outliers in the latency measurement. Thankfully, these outliers
should be large enough to easily weed them out.
3. A userspace only selftest variant was also sent out as RFC based on
suggestions over the previous patchset to simply the kernel
complexeity. However, a userspace only approach had more noise in
the latency measurement due to userspace-kernel interactions
which led to run to run variance and a lesser accurate test.
Another downside of the nature of a userspace program is that it
takes orders of magnitude longer to complete a full system test
compared to the kernel framework.
RFC patch: https://lkml.org/lkml/2020/9/2/356
4. For Intel Systems, the Timer based latencies don't exactly give out
the measure of idle latencies. This is because of a hardware
optimization mechanism that pre-arms a CPU when a timer is set to
wakeup. That doesn't make this metric useless for Intel systems,
it just means that is measuring IPI/Timer responding latency rather
than idle wakeup latencies.
(Source: https://lkml.org/lkml/2020/9/2/610)
For solution to this problem, a hardware based latency analyzer is
devised by Artem Bityutskiy from Intel.
https://youtu.be/Opk92aQyvt0?t=8266https://intel.github.io/wult/
Pratik Rajesh Sampat (2):
cpuidle: Extract IPI based and timer based wakeup latency from idle
states
selftest/cpuidle: Add support for cpuidle latency measurement
drivers/cpuidle/Makefile | 1 +
drivers/cpuidle/test-cpuidle_latency.c | 157 ++++++++++
lib/Kconfig.debug | 10 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/cpuidle/Makefile | 6 +
tools/testing/selftests/cpuidle/cpuidle.sh | 323 +++++++++++++++++++++
tools/testing/selftests/cpuidle/settings | 2 +
7 files changed, 500 insertions(+)
create mode 100644 drivers/cpuidle/test-cpuidle_latency.c
create mode 100644 tools/testing/selftests/cpuidle/Makefile
create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh
create mode 100644 tools/testing/selftests/cpuidle/settings
--
2.17.1
This patchset provides a file descriptor for every VM and VCPU to read
KVM statistics data in binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for user space telemetry applications to pull the
statistics data periodically for large scale systems. The pulling
frequency could be as high as a few times per second.
In this patchset, every statistics data are treated to have some
attributes as below:
* architecture dependent or common
* VM statistics data or VCPU statistics data
* type: cumulative, instantaneous,
* unit: none for simple counter, nanosecond, microsecond,
millisecond, second, Byte, KiByte, MiByte, GiByte. Clock Cycles
Since no lock/synchronization is used, the consistency between all
the statistics data is not guaranteed. That means not all statistics
data are read out at the exact same time, since the statistics date
are still being updated by KVM subsystems while they are read out.
Jing Zhang (4):
KVM: stats: Separate common stats from architecture specific ones
KVM: stats: Add fd-based API to read binary stats data
KVM: stats: Add documentation for statistics data binary interface
KVM: selftests: Add selftest for KVM statistics data binary interface
Documentation/virt/kvm/api.rst | 169 ++++++++
arch/arm64/include/asm/kvm_host.h | 9 +-
arch/arm64/kvm/guest.c | 42 +-
arch/mips/include/asm/kvm_host.h | 9 +-
arch/mips/kvm/mips.c | 67 +++-
arch/powerpc/include/asm/kvm_host.h | 9 +-
arch/powerpc/kvm/book3s.c | 68 +++-
arch/powerpc/kvm/book3s_hv.c | 12 +-
arch/powerpc/kvm/book3s_pr.c | 2 +-
arch/powerpc/kvm/book3s_pr_papr.c | 2 +-
arch/powerpc/kvm/booke.c | 63 ++-
arch/s390/include/asm/kvm_host.h | 9 +-
arch/s390/kvm/kvm-s390.c | 133 ++++++-
arch/x86/include/asm/kvm_host.h | 9 +-
arch/x86/kvm/x86.c | 71 +++-
include/linux/kvm_host.h | 132 ++++++-
include/linux/kvm_types.h | 12 +
include/uapi/linux/kvm.h | 48 +++
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 3 +
.../testing/selftests/kvm/include/kvm_util.h | 3 +
.../selftests/kvm/kvm_bin_form_stats.c | 370 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 11 +
virt/kvm/kvm_main.c | 237 ++++++++++-
24 files changed, 1401 insertions(+), 90 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c
base-commit: f96be2deac9bca3ef5a2b0b66b71fcef8bad586d
--
2.31.0.208.g409f899ff0-goog
The current way to provide a no-op flag to 'bpf_ringbuf_submit()',
'bpf_ringbuf_discard()' and 'bpf_ringbuf_output()' is to provide a '0'
value.
A '0' value might notify the consumer if it already caught up in processing,
so let's provide a more descriptive notation for this value.
Signed-off-by: Pedro Tammela <pctammela(a)mojatatu.com>
---
include/uapi/linux/bpf.h | 8 ++++++++
tools/include/uapi/linux/bpf.h | 8 ++++++++
tools/testing/selftests/bpf/progs/ima.c | 2 +-
tools/testing/selftests/bpf/progs/ringbuf_bench.c | 2 +-
tools/testing/selftests/bpf/progs/test_ringbuf.c | 2 +-
tools/testing/selftests/bpf/progs/test_ringbuf_multi.c | 2 +-
6 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 598716742593..100cb2e4c104 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4058,6 +4058,8 @@ union bpf_attr {
* Copy *size* bytes from *data* into a ring buffer *ringbuf*.
* If **BPF_RB_NO_WAKEUP** is specified in *flags*, no notification
* of new data availability is sent.
+ * If **BPF_RB_MAY_WAKEUP** is specified in *flags*, notification
+ * of new data availability is sent if needed.
* If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
* of new data availability is sent unconditionally.
* Return
@@ -4066,6 +4068,7 @@ union bpf_attr {
* void *bpf_ringbuf_reserve(void *ringbuf, u64 size, u64 flags)
* Description
* Reserve *size* bytes of payload in a ring buffer *ringbuf*.
+ * *flags* must be 0.
* Return
* Valid pointer with *size* bytes of memory available; NULL,
* otherwise.
@@ -4075,6 +4078,8 @@ union bpf_attr {
* Submit reserved ring buffer sample, pointed to by *data*.
* If **BPF_RB_NO_WAKEUP** is specified in *flags*, no notification
* of new data availability is sent.
+ * If **BPF_RB_MAY_WAKEUP** is specified in *flags*, notification
+ * of new data availability is sent if needed.
* If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
* of new data availability is sent unconditionally.
* Return
@@ -4085,6 +4090,8 @@ union bpf_attr {
* Discard reserved ring buffer sample, pointed to by *data*.
* If **BPF_RB_NO_WAKEUP** is specified in *flags*, no notification
* of new data availability is sent.
+ * If **BPF_RB_MAY_WAKEUP** is specified in *flags*, notification
+ * of new data availability is sent if needed.
* If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
* of new data availability is sent unconditionally.
* Return
@@ -4965,6 +4972,7 @@ enum {
* BPF_FUNC_bpf_ringbuf_output flags.
*/
enum {
+ BPF_RB_MAY_WAKEUP = 0,
BPF_RB_NO_WAKEUP = (1ULL << 0),
BPF_RB_FORCE_WAKEUP = (1ULL << 1),
};
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index ab9f2233607c..3d6d324184c0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4058,6 +4058,8 @@ union bpf_attr {
* Copy *size* bytes from *data* into a ring buffer *ringbuf*.
* If **BPF_RB_NO_WAKEUP** is specified in *flags*, no notification
* of new data availability is sent.
+ * If **BPF_RB_MAY_WAKEUP** is specified in *flags*, notification
+ * of new data availability is sent if needed.
* If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
* of new data availability is sent unconditionally.
* Return
@@ -4066,6 +4068,7 @@ union bpf_attr {
* void *bpf_ringbuf_reserve(void *ringbuf, u64 size, u64 flags)
* Description
* Reserve *size* bytes of payload in a ring buffer *ringbuf*.
+ * *flags* must be 0.
* Return
* Valid pointer with *size* bytes of memory available; NULL,
* otherwise.
@@ -4075,6 +4078,8 @@ union bpf_attr {
* Submit reserved ring buffer sample, pointed to by *data*.
* If **BPF_RB_NO_WAKEUP** is specified in *flags*, no notification
* of new data availability is sent.
+ * If **BPF_RB_MAY_WAKEUP** is specified in *flags*, notification
+ * of new data availability is sent if needed.
* If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
* of new data availability is sent unconditionally.
* Return
@@ -4085,6 +4090,8 @@ union bpf_attr {
* Discard reserved ring buffer sample, pointed to by *data*.
* If **BPF_RB_NO_WAKEUP** is specified in *flags*, no notification
* of new data availability is sent.
+ * If **BPF_RB_MAY_WAKEUP** is specified in *flags*, notification
+ * of new data availability is sent if needed.
* If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
* of new data availability is sent unconditionally.
* Return
@@ -4959,6 +4966,7 @@ enum {
* BPF_FUNC_bpf_ringbuf_output flags.
*/
enum {
+ BPF_RB_MAY_WAKEUP = 0,
BPF_RB_NO_WAKEUP = (1ULL << 0),
BPF_RB_FORCE_WAKEUP = (1ULL << 1),
};
diff --git a/tools/testing/selftests/bpf/progs/ima.c b/tools/testing/selftests/bpf/progs/ima.c
index 96060ff4ffc6..0f4daced6aad 100644
--- a/tools/testing/selftests/bpf/progs/ima.c
+++ b/tools/testing/selftests/bpf/progs/ima.c
@@ -38,7 +38,7 @@ void BPF_PROG(ima, struct linux_binprm *bprm)
return;
*sample = ima_hash;
- bpf_ringbuf_submit(sample, 0);
+ bpf_ringbuf_submit(sample, BPF_RB_MAY_WAKEUP);
}
return;
diff --git a/tools/testing/selftests/bpf/progs/ringbuf_bench.c b/tools/testing/selftests/bpf/progs/ringbuf_bench.c
index 123607d314d6..808e2e0e3d64 100644
--- a/tools/testing/selftests/bpf/progs/ringbuf_bench.c
+++ b/tools/testing/selftests/bpf/progs/ringbuf_bench.c
@@ -24,7 +24,7 @@ static __always_inline long get_flags()
long sz;
if (!wakeup_data_size)
- return 0;
+ return BPF_RB_MAY_WAKEUP;
sz = bpf_ringbuf_query(&ringbuf, BPF_RB_AVAIL_DATA);
return sz >= wakeup_data_size ? BPF_RB_FORCE_WAKEUP : BPF_RB_NO_WAKEUP;
diff --git a/tools/testing/selftests/bpf/progs/test_ringbuf.c b/tools/testing/selftests/bpf/progs/test_ringbuf.c
index 8ba9959b036b..03a5cbd21356 100644
--- a/tools/testing/selftests/bpf/progs/test_ringbuf.c
+++ b/tools/testing/selftests/bpf/progs/test_ringbuf.c
@@ -21,7 +21,7 @@ struct {
/* inputs */
int pid = 0;
long value = 0;
-long flags = 0;
+long flags = BPF_RB_MAY_WAKEUP;
/* outputs */
long total = 0;
diff --git a/tools/testing/selftests/bpf/progs/test_ringbuf_multi.c b/tools/testing/selftests/bpf/progs/test_ringbuf_multi.c
index edf3b6953533..f33c3fdfb1d6 100644
--- a/tools/testing/selftests/bpf/progs/test_ringbuf_multi.c
+++ b/tools/testing/selftests/bpf/progs/test_ringbuf_multi.c
@@ -71,7 +71,7 @@ int test_ringbuf(void *ctx)
sample->seq = total;
total += 1;
- bpf_ringbuf_submit(sample, 0);
+ bpf_ringbuf_submit(sample, BPF_RB_MAY_WAKEUP);
return 0;
}
--
2.25.1
v1 by Uriel is here: [1].
Since it's been a while, I've dropped the Reviewed-By's.
It depended on commit 83c4e7a0363b ("KUnit: KASAN Integration") which
hadn't been merged yet, so that caused some kerfuffle with applying them
previously and the series was reverted.
This revives the series but makes the kunit_fail_current_test() function
take a format string and logs the file and line number of the failing
code, addressing Alan Maguire's comments on the previous version.
As a result, the patch that makes UBSAN errors was tweaked slightly to
include an error message.
v2 -> v3:
Try and fail to make kunit_fail_current_test() work on CONFIG_KUNIT=m
s/_/__ on the helper func to match others in test.c
v3 -> v4:
Revert to only enabling kunit_fail_current_test() for CONFIG_KUNIT=y
[1] https://lore.kernel.org/linux-kselftest/20200806174326.3577537-1-urielguaja…
Uriel Guajardo (2):
kunit: support failure from dynamic analysis tools
kunit: ubsan integration
include/kunit/test-bug.h | 30 ++++++++++++++++++++++++++++++
lib/kunit/test.c | 39 +++++++++++++++++++++++++++++++++++----
lib/ubsan.c | 3 +++
3 files changed, 68 insertions(+), 4 deletions(-)
create mode 100644 include/kunit/test-bug.h
base-commit: a74e6a014c9d4d4161061f770c9b4f98372ac778
--
2.31.0.rc2.261.g7f71774620-goog
This series improves the defensive posture of sysfs's use of seq_file
to gain the vmap guard pages at the end of vmalloc buffers to stop a
class of recurring flaw[1]. The long-term goal is to switch sysfs from
a buffer to using seq_file directly, but this will take time to refactor.
Included is also a Clang fix for NULL arithmetic and an LKDTM test to
validate vmalloc guard pages.
v4:
- fix NULL arithmetic (Arnd)
- add lkdtm test
- reword commit message
v3: https://lore.kernel.org/lkml/20210401022145.2019422-1-keescook@chromium.org/
v2: https://lore.kernel.org/lkml/20210315174851.622228-1-keescook@chromium.org/
v1: https://lore.kernel.org/lkml/20210312205558.2947488-1-keescook@chromium.org/
Thanks!
-Kees
Arnd Bergmann (1):
seq_file: Fix clang warning for NULL pointer arithmetic
Kees Cook (2):
lkdtm/heap: Add vmalloc linear overflow test
sysfs: Unconditionally use vmalloc for buffer
drivers/misc/lkdtm/core.c | 3 ++-
drivers/misc/lkdtm/heap.c | 21 +++++++++++++++++-
drivers/misc/lkdtm/lkdtm.h | 3 ++-
fs/kernfs/file.c | 9 +++++---
fs/seq_file.c | 5 ++++-
fs/sysfs/file.c | 29 +++++++++++++++++++++++++
include/linux/seq_file.h | 6 +++++
tools/testing/selftests/lkdtm/tests.txt | 3 ++-
8 files changed, 71 insertions(+), 8 deletions(-)
--
2.25.1
v1 by Uriel is here: [1].
Since it's been a while, I've dropped the Reviewed-By's.
It depended on commit 83c4e7a0363b ("KUnit: KASAN Integration") which
hadn't been merged yet, so that caused some kerfuffle with applying them
previously and the series was reverted.
This revives the series but makes the kunit_fail_current_test() function
take a format string and logs the file and line number of the failing
code, addressing Alan Maguire's comments on the previous version.
As a result, the patch that makes UBSAN errors was tweaked slightly to
include an error message.
v2 -> v3:
Try and fail to make kunit_fail_current_test() work on CONFIG_KUNIT=m
s/_/__ on the helper func to match others in test.c
v3 -> v4:
Revert to only enabling kunit_fail_current_test() for CONFIG_KUNIT=y
v4 -> v5:
Delete blank line to make checkpatch.pl --strict happy
[1] https://lore.kernel.org/linux-kselftest/20200806174326.3577537-1-urielguaja…
Uriel Guajardo (2):
kunit: support failure from dynamic analysis tools
kunit: ubsan integration
include/kunit/test-bug.h | 29 +++++++++++++++++++++++++++++
lib/kunit/test.c | 39 +++++++++++++++++++++++++++++++++++----
lib/ubsan.c | 3 +++
3 files changed, 67 insertions(+), 4 deletions(-)
create mode 100644 include/kunit/test-bug.h
base-commit: 1678e493d530e7977cce34e59a86bb86f3c5631e
--
2.31.0.208.g409f899ff0-goog
This patch set has several miscellaneous fixes to resctrl selftest tool
that are easily visible to user. V1 had fixes to CAT test and CMT test
but they were dropped in V2 because having them here made the patchset
humongous. So, changes to CAT test and CMT test will be posted in another
patchset.
Change Log:
v6:
- Add Tested-by: Babu Moger <babu.moger(a)amd.com>.
- Replace "cat" by CAT_STR etc (Babu).
- Capitalize the first letter of printed message (Babu).
v5:
- Address various comments from Shuah Khan:
1. Move a few fixing patches before cleaning patches.
2. Call kselftest APIs to log test results instead of printf().
3. Add .gitignore to ignore resctrl_tests.
4. Share show_cache_info() in CAT and CMT tests.
5. Define long_mask, cbm_mask, count_of_bits etc as static variables.
v4:
- Address various comments from Shuah Khan:
1. Combine a few patches e.g. a couple of fixing typos patches into one
and a couple of unmounting patches into one etc.
2. Add config file.
3. Remove "Fixes" tags.
4. Change strcmp() to strncmp().
5. Move the global variable fixing patch to the patch 1 so that the
compilation issue is fixed first.
Please note:
- I didn't move the patch of renaming CQM to CMT to the end of the series
because code and commit messages in a few other patches depend on the
new term of "CMT". If move the renaming patch to the end, the previous
patches use the old "CQM" term and code which will be changed soon at
the end of series and will cause more code and explanations.
[v3: https://lkml.org/lkml/2020/10/28/137]
v3:
Address various comments (commit messages, return value on test failure,
print failure info on test failure etc) from Reinette and Tony.
[v2: https://lore.kernel.org/linux-kselftest/cover.1589835155.git.sai.praneeth.p…]
v2:
1. Dropped changes to CAT test and CMT test as they will be posted in a later
series.
2. Added several other fixes
[v1: https://lore.kernel.org/linux-kselftest/cover.1583657204.git.sai.praneeth.p…]
Fenghua Yu (19):
selftests/resctrl: Enable gcc checks to detect buffer overflows
selftests/resctrl: Fix compilation issues for global variables
selftests/resctrl: Fix compilation issues for other global variables
selftests/resctrl: Clean up resctrl features check
selftests/resctrl: Fix missing options "-n" and "-p"
selftests/resctrl: Rename CQM test as CMT test
selftests/resctrl: Call kselftest APIs to log test results
selftests/resctrl: Share show_cache_info() by CAT and CMT tests
selftests/resctrl: Add config dependencies
selftests/resctrl: Check for resctrl mount point only if resctrl FS is
supported
selftests/resctrl: Use resctrl/info for feature detection
selftests/resctrl: Fix MBA/MBM results reporting format
selftests/resctrl: Don't hard code value of "no_of_bits" variable
selftests/resctrl: Modularize resctrl test suite main() function
selftests/resctrl: Skip the test if requested resctrl feature is not
supported
selftests/resctrl: Fix unmount resctrl FS
selftests/resctrl: Fix incorrect parsing of iMC counters
selftests/resctrl: Fix checking for < 0 for unsigned values
selftests/resctrl: Create .gitignore to include resctrl_tests
Reinette Chatre (2):
selftests/resctrl: Ensure sibling CPU is not same as original CPU
selftests/resctrl: Fix a printed message
tools/testing/selftests/resctrl/.gitignore | 2 +
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/resctrl/README | 4 +-
tools/testing/selftests/resctrl/cache.c | 52 +++++-
tools/testing/selftests/resctrl/cat_test.c | 57 ++----
.../resctrl/{cqm_test.c => cmt_test.c} | 75 +++-----
tools/testing/selftests/resctrl/config | 2 +
tools/testing/selftests/resctrl/fill_buf.c | 4 +-
tools/testing/selftests/resctrl/mba_test.c | 43 ++---
tools/testing/selftests/resctrl/mbm_test.c | 42 ++---
tools/testing/selftests/resctrl/resctrl.h | 29 +++-
.../testing/selftests/resctrl/resctrl_tests.c | 163 ++++++++++++------
tools/testing/selftests/resctrl/resctrl_val.c | 95 ++++++----
tools/testing/selftests/resctrl/resctrlfs.c | 134 ++++++++------
14 files changed, 408 insertions(+), 296 deletions(-)
create mode 100644 tools/testing/selftests/resctrl/.gitignore
rename tools/testing/selftests/resctrl/{cqm_test.c => cmt_test.c} (56%)
create mode 100644 tools/testing/selftests/resctrl/config
--
2.31.0
A 'single_cpu_test' parameter is odd and it does not exist
anymore. Instead there was introduced a 'nr_threads' one.
If it is not set it behaves as the former parameter.
That is why update a "stress mode" according to this change
specifying number of workers which are equal to number of CPUs.
Also update an output of help message based on a new interface.
CC: linux-kselftest(a)vger.kernel.org
CC: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
---
tools/testing/selftests/vm/test_vmalloc.sh | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/vm/test_vmalloc.sh b/tools/testing/selftests/vm/test_vmalloc.sh
index 06d2bb109f06..d73b846736f1 100755
--- a/tools/testing/selftests/vm/test_vmalloc.sh
+++ b/tools/testing/selftests/vm/test_vmalloc.sh
@@ -11,6 +11,7 @@
TEST_NAME="vmalloc"
DRIVER="test_${TEST_NAME}"
+NUM_CPUS=`grep -c ^processor /proc/cpuinfo`
# 1 if fails
exitcode=1
@@ -22,9 +23,9 @@ ksft_skip=4
# Static templates for performance, stressing and smoke tests.
# Also it is possible to pass any supported parameters manualy.
#
-PERF_PARAM="single_cpu_test=1 sequential_test_order=1 test_repeat_count=3"
-SMOKE_PARAM="single_cpu_test=1 test_loop_count=10000 test_repeat_count=10"
-STRESS_PARAM="test_repeat_count=20"
+PERF_PARAM="sequential_test_order=1 test_repeat_count=3"
+SMOKE_PARAM="test_loop_count=10000 test_repeat_count=10"
+STRESS_PARAM="nr_threads=$NUM_CPUS test_repeat_count=20"
check_test_requirements()
{
@@ -58,8 +59,8 @@ run_perfformance_check()
run_stability_check()
{
- echo "Run stability tests. In order to stress vmalloc subsystem we run"
- echo "all available test cases on all available CPUs simultaneously."
+ echo "Run stability tests. In order to stress vmalloc subsystem all"
+ echo "available test cases are run by NUM_CPUS workers simultaneously."
echo "It will take time, so be patient."
modprobe $DRIVER $STRESS_PARAM > /dev/null 2>&1
@@ -92,17 +93,17 @@ usage()
echo "# Shows help message"
echo "./${DRIVER}.sh"
echo
- echo "# Runs 1 test(id_1), repeats it 5 times on all online CPUs"
- echo "./${DRIVER}.sh run_test_mask=1 test_repeat_count=5"
+ echo "# Runs 1 test(id_1), repeats it 5 times by NUM_CPUS workers"
+ echo "./${DRIVER}.sh nr_threads=$NUM_CPUS run_test_mask=1 test_repeat_count=5"
echo
echo -n "# Runs 4 tests(id_1|id_2|id_4|id_16) on one CPU with "
echo "sequential order"
- echo -n "./${DRIVER}.sh single_cpu_test=1 sequential_test_order=1 "
+ echo -n "./${DRIVER}.sh sequential_test_order=1 "
echo "run_test_mask=23"
echo
- echo -n "# Runs all tests on all online CPUs, shuffled order, repeats "
+ echo -n "# Runs all tests by NUM_CPUS workers, shuffled order, repeats "
echo "20 times"
- echo "./${DRIVER}.sh test_repeat_count=20"
+ echo "./${DRIVER}.sh nr_threads=$NUM_CPUS test_repeat_count=20"
echo
echo "# Performance analysis"
echo "./${DRIVER}.sh performance"
--
2.20.1
TL;DR
$ ./tools/testing/kunit/kunit.py run --kunitconfig=lib/kunit
Per suggestion from Ted [1], we can reduce the amount of typing by
assuming a convention that these files are named '.kunitconfig'.
In the case of [1], we now have
$ ./tools/testing/kunit/kunit.py run --kunitconfig=fs/ext4
Also add in such a fragment for kunit itself so we can give that as an
example more close to home (and thus less likely to be accidentally
broken).
[1] https://lore.kernel.org/linux-ext4/YCNF4yP1dB97zzwD@mit.edu/
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
lib/kunit/.kunitconfig | 3 +++
tools/testing/kunit/kunit.py | 4 +++-
tools/testing/kunit/kunit_kernel.py | 2 ++
tools/testing/kunit/kunit_tool_test.py | 6 ++++++
4 files changed, 14 insertions(+), 1 deletion(-)
create mode 100644 lib/kunit/.kunitconfig
diff --git a/lib/kunit/.kunitconfig b/lib/kunit/.kunitconfig
new file mode 100644
index 000000000000..9235b7d42d38
--- /dev/null
+++ b/lib/kunit/.kunitconfig
@@ -0,0 +1,3 @@
+CONFIG_KUNIT=y
+CONFIG_KUNIT_TEST=y
+CONFIG_KUNIT_EXAMPLE_TEST=y
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index d5144fcb03ac..5da8fb3762f9 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -184,7 +184,9 @@ def add_common_opts(parser) -> None:
help='Run all KUnit tests through allyesconfig',
action='store_true')
parser.add_argument('--kunitconfig',
- help='Path to Kconfig fragment that enables KUnit tests',
+ help='Path to Kconfig fragment that enables KUnit tests.'
+ ' If given a directory, (e.g. lib/kunit), "/.kunitconfig" '
+ 'will get automatically appended.',
metavar='kunitconfig')
def add_build_opts(parser) -> None:
diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
index f309a33256cd..89a7d4024e87 100644
--- a/tools/testing/kunit/kunit_kernel.py
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -132,6 +132,8 @@ class LinuxSourceTree(object):
return
if kunitconfig_path:
+ if os.path.isdir(kunitconfig_path):
+ kunitconfig_path = os.path.join(kunitconfig_path, KUNITCONFIG_PATH)
if not os.path.exists(kunitconfig_path):
raise ConfigError(f'Specified kunitconfig ({kunitconfig_path}) does not exist')
else:
diff --git a/tools/testing/kunit/kunit_tool_test.py b/tools/testing/kunit/kunit_tool_test.py
index 1ad3049e9069..2e809dd956a7 100755
--- a/tools/testing/kunit/kunit_tool_test.py
+++ b/tools/testing/kunit/kunit_tool_test.py
@@ -251,6 +251,12 @@ class LinuxSourceTreeTest(unittest.TestCase):
with tempfile.NamedTemporaryFile('wt') as kunitconfig:
tree = kunit_kernel.LinuxSourceTree('', kunitconfig_path=kunitconfig.name)
+ def test_dir_kunitconfig(self):
+ with tempfile.TemporaryDirectory('') as dir:
+ with open(os.path.join(dir, '.kunitconfig'), 'w') as f:
+ pass
+ tree = kunit_kernel.LinuxSourceTree('', kunitconfig_path=dir)
+
# TODO: add more test cases.
base-commit: b12b47249688915e987a9a2a393b522f86f6b7ab
--
2.30.0.617.g56c4b15f3c-goog
This patch set has several miscellaneous fixes to resctrl selftest tool
that are easily visible to user. V1 had fixes to CAT test and CMT test
but they were dropped in V2 because having them here made the patchset
humongous. So, changes to CAT test and CMT test will be posted in another
patchset.
Change Log:
v5:
- Address various comments from Shuah Khan:
1. Move a few fixing patches before cleaning patches.
2. Call kselftest APIs to log test results instead of printf().
3. Add .gitignore to ignore resctrl_tests.
4. Share show_cache_info() in CAT and CMT tests.
5. Define long_mask, cbm_mask, count_of_bits etc as static variables.
v4:
- Address various comments from Shuah Khan:
1. Combine a few patches e.g. a couple of fixing typos patches into one
and a couple of unmounting patches into one etc.
2. Add config file.
3. Remove "Fixes" tags.
4. Change strcmp() to strncmp().
5. Move the global variable fixing patch to the patch 1 so that the
compilation issue is fixed first.
Please note:
- I didn't move the patch of renaming CQM to CMT to the end of the series
because code and commit messages in a few other patches depend on the
new term of "CMT". If move the renaming patch to the end, the previous
patches use the old "CQM" term and code which will be changed soon at
the end of series and will cause more code and explanations.
[v3: https://lkml.org/lkml/2020/10/28/137]
v3:
Address various comments (commit messages, return value on test failure,
print failure info on test failure etc) from Reinette and Tony.
[v2: https://lore.kernel.org/linux-kselftest/cover.1589835155.git.sai.praneeth.p…]
v2:
1. Dropped changes to CAT test and CMT test as they will be posted in a later
series.
2. Added several other fixes
[v1: https://lore.kernel.org/linux-kselftest/cover.1583657204.git.sai.praneeth.p…]
Fenghua Yu (19):
selftests/resctrl: Enable gcc checks to detect buffer overflows
selftests/resctrl: Fix compilation issues for global variables
selftests/resctrl: Fix compilation issues for other global variables
selftests/resctrl: Clean up resctrl features check
selftests/resctrl: Fix missing options "-n" and "-p"
selftests/resctrl: Rename CQM test as CMT test
selftests/resctrl: Call kselftest APIs to log test results
selftests/resctrl: Share show_cache_info() by CAT and CMT tests
selftests/resctrl: Add config dependencies
selftests/resctrl: Check for resctrl mount point only if resctrl FS is
supported
selftests/resctrl: Use resctrl/info for feature detection
selftests/resctrl: Fix MBA/MBM results reporting format
selftests/resctrl: Don't hard code value of "no_of_bits" variable
selftests/resctrl: Modularize resctrl test suite main() function
selftests/resctrl: Skip the test if requested resctrl feature is not
supported
selftests/resctrl: Fix unmount resctrl FS
selftests/resctrl: Fix incorrect parsing of iMC counters
selftests/resctrl: Fix checking for < 0 for unsigned values
selftests/resctrl: Create .gitignore to include resctrl_tests
Reinette Chatre (2):
selftests/resctrl: Ensure sibling CPU is not same as original CPU
selftests/resctrl: Fix a printed message
tools/testing/selftests/resctrl/.gitignore | 2 +
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/resctrl/README | 4 +-
tools/testing/selftests/resctrl/cache.c | 52 +++++-
tools/testing/selftests/resctrl/cat_test.c | 57 ++----
.../resctrl/{cqm_test.c => cmt_test.c} | 75 +++-----
tools/testing/selftests/resctrl/config | 2 +
tools/testing/selftests/resctrl/fill_buf.c | 4 +-
tools/testing/selftests/resctrl/mba_test.c | 43 ++---
tools/testing/selftests/resctrl/mbm_test.c | 42 ++---
tools/testing/selftests/resctrl/resctrl.h | 29 +++-
.../testing/selftests/resctrl/resctrl_tests.c | 163 ++++++++++++------
tools/testing/selftests/resctrl/resctrl_val.c | 95 ++++++----
tools/testing/selftests/resctrl/resctrlfs.c | 134 ++++++++------
14 files changed, 408 insertions(+), 296 deletions(-)
create mode 100644 tools/testing/selftests/resctrl/.gitignore
rename tools/testing/selftests/resctrl/{cqm_test.c => cmt_test.c} (56%)
create mode 100644 tools/testing/selftests/resctrl/config
--
2.30.1
This series aims to clarify the behavior of
KVM_GET_EMULATED_CPUID and KVM_GET_SUPPORTED
ioctls, and fix a corner case where the nent field of the
struct kvm_cpuid2 is matching the amount of entries that kvm returns.
Patch 1 proposes the nent field fix to cpuid.c,
patch 2 updates the ioctl documentation accordingly and
patches 3 and 4 provide a selftest to check KVM_GET_EMULATED_CPUID
accordingly.
Emanuele Giuseppe Esposito (4):
kvm: cpuid: adjust the returned nent field of kvm_cpuid2 for
KVM_GET_SUPPORTED_CPUID and KVM_GET_EMULATED_CPUID
Documentation: kvm: update KVM_GET_EMULATED_CPUID ioctl description
selftests: add kvm_get_emulated_cpuid
selftests: kvm: add get_emulated_cpuid test
Documentation/virt/kvm/api.rst | 10 +-
arch/x86/kvm/cpuid.c | 6 +
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/processor.h | 1 +
.../selftests/kvm/lib/x86_64/processor.c | 33 ++++
.../selftests/kvm/x86_64/get_emulated_cpuid.c | 183 ++++++++++++++++++
7 files changed, 229 insertions(+), 6 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/get_emulated_cpuid.c
--
2.30.2
This series aims to clarify the behavior of
KVM_GET_EMULATED_CPUID and KVM_GET_SUPPORTED
ioctls, and fix a corner case where the nent field of the
struct kvm_cpuid2 is matching the amount of entries that kvm returns.
Patch 1 proposes the nent field fix to cpuid.c,
patch 2 updates the ioctl documentation accordingly and
patches 3 and 4 provide a selftest to check KVM_GET_EMULATED_CPUID
accordingly.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit(a)redhat.com>
---
v2:
- better fix in cpuid.c, perform the nent check after the switch statement
- fix bug in get_emulated_cpuid.c selftest, each entry needs to have at least
the padding zeroed otherwise it fails.
Emanuele Giuseppe Esposito (4):
kvm: cpuid: adjust the returned nent field of kvm_cpuid2 for
KVM_GET_SUPPORTED_CPUID and KVM_GET_EMULATED_CPUID
Documentation: kvm: update KVM_GET_EMULATED_CPUID ioctl description
selftests: add kvm_get_emulated_cpuid
selftests: kvm: add get_emulated_cpuid test
Documentation/virt/kvm/api.rst | 10 +-
arch/x86/kvm/cpuid.c | 35 ++--
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/processor.h | 1 +
.../selftests/kvm/lib/x86_64/processor.c | 33 +++
.../selftests/kvm/x86_64/get_emulated_cpuid.c | 198 ++++++++++++++++++
7 files changed, 256 insertions(+), 23 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/get_emulated_cpuid.c
--
2.30.2
From: Ira Weiny <ira.weiny(a)intel.com>
Introduce a new page protection mechanism for supervisor pages, Protection Key
Supervisor (PKS).
Generally PKS enables protections on 'domains' of supervisor pages to limit
supervisor mode access to pages beyond the normal paging protections. PKS
works in a similar fashion to user space pkeys, PKU. As with PKU, supervisor
pkeys are checked in addition to normal paging protections and Access or Writes
can be disabled via a MSR update without TLB flushes when permissions change.
Also like PKU, a page mapping is assigned to a domain by setting pkey bits in
the page table entry for that mapping.
Access is controlled through a PKRS register which is updated via WRMSR/RDMSR.
XSAVE is not supported for the PKRS MSR. Therefore the implementation
saves/restores the MSR across context switches and during exceptions. Nested
exceptions are supported by each exception getting a new PKS state.
For consistent behavior with current paging protections, pkey 0 is reserved and
configured to allow full access via the pkey mechanism, thus preserving the
default paging protections on mappings with the default pkey value of 0.
Other keys, (1-15) are allocated by an allocator which prepares us for key
contention from day one. Kernel users should be prepared for the allocator to
fail either because of key exhaustion or due to PKS not being supported on the
CPU instance.
The following are key attributes of PKS.
1) Fast switching of permissions
1a) Prevents access without page table manipulations
1b) No TLB flushes required
2) Works on a per thread basis
PKS is available with 4 and 5 level paging. Like PKRU it consumes 4 bits from
the PTE to store the pkey within the entry.
All code to support PKS is configured via ARCH_ENABLE_SUPERVISOR_PKEYS which
is designed to only be turned on when a user is configured on in the kernel.
Those users must depend on ARCH_HAS_SUPERVISOR_PKEYS to properly work with
other architectures which do not yet support PKS.
Originally this series was submitted as part of a large patch set which
converted the kmap call sites.[1]
Many follow on discussions revealed a few problems. The first of which was
that some callers leak a kmap mapping across threads rather than containing it
to a critical section. Attempts were made to see if these 'global kmaps' could
be supported.[2] However, supporting global kmaps had many problems. Work is
being done in parallel on converting as many kmap calls to the new
kmap_local_page().[3]
Changes from V4 [5]
From kernel test robot <lkp(a)intel.com>
Fix i386 build: pks_init_task not found
Move MSR_IA32_PKRS and INIT_PKRS_VALUE into patch 5 where they are
first 'used'. (Technically nothing is 'used' until the final
test patch. But review wise this is much cleaner.)
From Sean Christoperson
Add documentation details on what happens if the pkey is violated
Change cpu_feature_enabled to be in WARN_ON check
Clean up commit message of patch 6
Fix some checkpatch errors
[1] https://lore.kernel.org/lkml/20201009195033.3208459-1-ira.weiny@intel.com/
[2] https://lore.kernel.org/lkml/87mtycqcjf.fsf@nanos.tec.linutronix.de/
[3] https://lore.kernel.org/lkml/20210128061503.1496847-1-ira.weiny@intel.com/https://lore.kernel.org/lkml/20210210062221.3023586-1-ira.weiny@intel.com/https://lore.kernel.org/lkml/20210205170030.856723-1-ira.weiny@intel.com/https://lore.kernel.org/lkml/20210217024826.3466046-1-ira.weiny@intel.com/
[4] https://lore.kernel.org/lkml/20201106232908.364581-1-ira.weiny@intel.com/
[5] https://lore.kernel.org/lkml/20210322053020.2287058-1-ira.weiny@intel.com/
Fenghua Yu (1):
x86/pks: Add PKS kernel API
Ira Weiny (9):
x86/pkeys: Create pkeys_common.h
x86/fpu: Refactor arch_set_user_pkey_access() for PKS support
x86/pks: Add additional PKEY helper macros
x86/pks: Add PKS defines and Kconfig options
x86/pks: Add PKS setup code
x86/fault: Adjust WARN_ON for PKey fault
x86/pks: Preserve the PKRS MSR on context switch
x86/entry: Preserve PKRS MSR across exceptions
x86/pks: Add PKS test code
Documentation/core-api/protection-keys.rst | 112 +++-
arch/x86/Kconfig | 1 +
arch/x86/entry/calling.h | 26 +
arch/x86/entry/common.c | 58 ++
arch/x86/entry/entry_64.S | 22 +-
arch/x86/entry/entry_64_compat.S | 6 +-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/pgtable.h | 10 +-
arch/x86/include/asm/pgtable_types.h | 12 +
arch/x86/include/asm/pkeys.h | 4 +
arch/x86/include/asm/pkeys_common.h | 34 +
arch/x86/include/asm/pks.h | 54 ++
arch/x86/include/asm/processor-flags.h | 2 +
arch/x86/include/asm/processor.h | 47 +-
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/fpu/xstate.c | 22 +-
arch/x86/kernel/head_64.S | 7 +-
arch/x86/kernel/process.c | 3 +
arch/x86/kernel/process_64.c | 2 +
arch/x86/mm/fault.c | 31 +-
arch/x86/mm/pkeys.c | 218 +++++-
include/linux/pgtable.h | 4 +
include/linux/pkeys.h | 34 +
kernel/entry/common.c | 14 +-
lib/Kconfig.debug | 11 +
lib/Makefile | 3 +
lib/pks/Makefile | 3 +
lib/pks/pks_test.c | 693 ++++++++++++++++++++
mm/Kconfig | 5 +
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/test_pks.c | 150 +++++
34 files changed, 1528 insertions(+), 77 deletions(-)
create mode 100644 arch/x86/include/asm/pkeys_common.h
create mode 100644 arch/x86/include/asm/pks.h
create mode 100644 lib/pks/Makefile
create mode 100644 lib/pks/pks_test.c
create mode 100644 tools/testing/selftests/x86/test_pks.c
--
2.28.0.rc0.12.gb6a658bd00c9
From: Mike Rapoport <rppt(a)linux.ibm.com>
Yuri Norov says:
If parameter size is the same for native and compat ABIs, we may
wire a syscall made by compat client to native handler. This is
true for unsigned int, but not true for unsigned long or pointer.
That's why I suggest using unsigned int and so avoid creating compat
entry point.
Use unsigned int as the type of the flags parameter in memfd_secret()
system call.
Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com>
---
@Andrew,
The patch is vs v5.12-rc5-mmots-2021-03-30-23, I'd appreciate if it would
be added as a fixup to the memfd_secret series.
include/linux/syscalls.h | 2 +-
mm/secretmem.c | 2 +-
tools/testing/selftests/vm/memfd_secret.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 49c93c906893..1a1b5d724497 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1050,7 +1050,7 @@ asmlinkage long sys_landlock_create_ruleset(const struct landlock_ruleset_attr _
asmlinkage long sys_landlock_add_rule(int ruleset_fd, enum landlock_rule_type rule_type,
const void __user *rule_attr, __u32 flags);
asmlinkage long sys_landlock_restrict_self(int ruleset_fd, __u32 flags);
-asmlinkage long sys_memfd_secret(unsigned long flags);
+asmlinkage long sys_memfd_secret(unsigned int flags);
/*
* Architecture-specific system calls
diff --git a/mm/secretmem.c b/mm/secretmem.c
index f2ae3f32a193..3b1ba3991964 100644
--- a/mm/secretmem.c
+++ b/mm/secretmem.c
@@ -199,7 +199,7 @@ static struct file *secretmem_file_create(unsigned long flags)
return file;
}
-SYSCALL_DEFINE1(memfd_secret, unsigned long, flags)
+SYSCALL_DEFINE1(memfd_secret, unsigned int, flags)
{
struct file *file;
int fd, err;
diff --git a/tools/testing/selftests/vm/memfd_secret.c b/tools/testing/selftests/vm/memfd_secret.c
index c878c2b841fc..2462f52e9c96 100644
--- a/tools/testing/selftests/vm/memfd_secret.c
+++ b/tools/testing/selftests/vm/memfd_secret.c
@@ -38,7 +38,7 @@ static unsigned long page_size;
static unsigned long mlock_limit_cur;
static unsigned long mlock_limit_max;
-static int memfd_secret(unsigned long flags)
+static int memfd_secret(unsigned int flags)
{
return syscall(__NR_memfd_secret, flags);
}
--
2.28.0
The perf subsystem today unifies various tracing and monitoring
features, from both software and hardware. One benefit of the perf
subsystem is automatically inheriting events to child tasks, which
enables process-wide events monitoring with low overheads. By default
perf events are non-intrusive, not affecting behaviour of the tasks
being monitored.
For certain use-cases, however, it makes sense to leverage the
generality of the perf events subsystem and optionally allow the tasks
being monitored to receive signals on events they are interested in.
This patch series adds the option to synchronously signal user space on
events.
To better support process-wide synchronous self-monitoring, without
events propagating to children that do not share the current process's
shared environment, two pre-requisite patches are added to optionally
restrict inheritance to CLONE_THREAD, and remove events on exec (without
affecting the parent).
Examples how to use these features can be found in the tests added at
the end of the series. In addition to the tests added, the series has
also been subjected to syzkaller fuzzing (focus on 'kernel/events/'
coverage).
Motivation and Example Uses
---------------------------
1. Our immediate motivation is low-overhead sampling-based race
detection for user space [1]. By using perf_event_open() at
process initialization, we can create hardware
breakpoint/watchpoint events that are propagated automatically
to all threads in a process. As far as we are aware, today no
existing kernel facility (such as ptrace) allows us to set up
process-wide watchpoints with minimal overheads (that are
comparable to mprotect() of whole pages).
2. Other low-overhead error detectors that rely on detecting
accesses to certain memory locations or code, process-wide and
also only in a specific set of subtasks or threads.
[1] https://llvm.org/devmtg/2020-09/slides/Morehouse-GWP-Tsan.pdf
Other ideas for use-cases we found interesting, but should only
illustrate the range of potential to further motivate the utility (we're
sure there are more):
3. Code hot patching without full stop-the-world. Specifically, by
setting a code breakpoint to entry to the patched routine, then
send signals to threads and check that they are not in the
routine, but without stopping them further. If any of the
threads will enter the routine, it will receive SIGTRAP and
pause.
4. Safepoints without mprotect(). Some Java implementations use
"load from a known memory location" as a safepoint. When threads
need to be stopped, the page containing the location is
mprotect()ed and threads get a signal. This could be replaced with
a watchpoint, which does not require a whole page nor DTLB
shootdowns.
5. Threads receiving signals on performance events to
throttle/unthrottle themselves.
6. Tracking data flow globally.
Changelog
---------
v3:
* Add patch "perf: Rework perf_event_exit_event()" to beginning of
series, courtesy of Peter Zijlstra.
* Rework "perf: Add support for event removal on exec" based on
the added "perf: Rework perf_event_exit_event()".
* Fix kselftests to work with more recent libc, due to the way it forces
using the kernel's own siginfo_t.
* Add basic perf-tool built-in test.
v2/RFC: https://lkml.kernel.org/r/20210310104139.679618-1-elver@google.com
* Patch "Support only inheriting events if cloned with CLONE_THREAD"
added to series.
* Patch "Add support for event removal on exec" added to series.
* Patch "Add kselftest for process-wide sigtrap handling" added to
series.
* Patch "Add kselftest for remove_on_exec" added to series.
* Implicitly restrict inheriting events if sigtrap, but the child was
cloned with CLONE_CLEAR_SIGHAND, because it is not generally safe if
the child cleared all signal handlers to continue sending SIGTRAP.
* Various minor fixes (see details in patches).
v1/RFC: https://lkml.kernel.org/r/20210223143426.2412737-1-elver@google.com
Pre-series: The discussion at [2] led to the changes in this series. The
approach taken in "Add support for SIGTRAP on perf events" to trigger
the signal was suggested by Peter Zijlstra in [3].
[2] https://lore.kernel.org/lkml/CACT4Y+YPrXGw+AtESxAgPyZ84TYkNZdP0xpocX2jwVAbZ…
[3] https://lore.kernel.org/lkml/YBv3rAT566k+6zjg@hirez.programming.kicks-ass.n…
Marco Elver (10):
perf: Apply PERF_EVENT_IOC_MODIFY_ATTRIBUTES to children
perf: Support only inheriting events if cloned with CLONE_THREAD
perf: Add support for event removal on exec
signal: Introduce TRAP_PERF si_code and si_perf to siginfo
perf: Add support for SIGTRAP on perf events
perf: Add breakpoint information to siginfo on SIGTRAP
selftests/perf_events: Add kselftest for process-wide sigtrap handling
selftests/perf_events: Add kselftest for remove_on_exec
tools headers uapi: Sync tools/include/uapi/linux/perf_event.h
perf test: Add basic stress test for sigtrap handling
Peter Zijlstra (1):
perf: Rework perf_event_exit_event()
arch/m68k/kernel/signal.c | 3 +
arch/x86/kernel/signal_compat.c | 5 +-
fs/signalfd.c | 4 +
include/linux/compat.h | 2 +
include/linux/perf_event.h | 6 +-
include/linux/signal.h | 1 +
include/uapi/asm-generic/siginfo.h | 6 +-
include/uapi/linux/perf_event.h | 5 +-
include/uapi/linux/signalfd.h | 4 +-
kernel/events/core.c | 297 +++++++++++++-----
kernel/fork.c | 2 +-
kernel/signal.c | 11 +
tools/include/uapi/linux/perf_event.h | 5 +-
tools/perf/tests/Build | 1 +
tools/perf/tests/builtin-test.c | 5 +
tools/perf/tests/sigtrap.c | 148 +++++++++
tools/perf/tests/tests.h | 1 +
.../testing/selftests/perf_events/.gitignore | 3 +
tools/testing/selftests/perf_events/Makefile | 6 +
tools/testing/selftests/perf_events/config | 1 +
.../selftests/perf_events/remove_on_exec.c | 260 +++++++++++++++
tools/testing/selftests/perf_events/settings | 1 +
.../selftests/perf_events/sigtrap_threads.c | 206 ++++++++++++
23 files changed, 896 insertions(+), 87 deletions(-)
create mode 100644 tools/perf/tests/sigtrap.c
create mode 100644 tools/testing/selftests/perf_events/.gitignore
create mode 100644 tools/testing/selftests/perf_events/Makefile
create mode 100644 tools/testing/selftests/perf_events/config
create mode 100644 tools/testing/selftests/perf_events/remove_on_exec.c
create mode 100644 tools/testing/selftests/perf_events/settings
create mode 100644 tools/testing/selftests/perf_events/sigtrap_threads.c
--
2.31.0.291.g576ba9dcdaf-goog
Previously, we shared too much of the code with COPY and ZEROPAGE, so we
manipulated things in various invalid ways:
- Previously, we unconditionally called shmem_inode_acct_block. In the
continue case, we're looking up an existing page which would have been
accounted for properly when it was allocated. So doing it twice
results in double-counting, and eventually leaking.
- Previously, we made the pte writable whenever the VMA was writable.
However, for continue, consider this case:
1. A tmpfs file was created
2. The non-UFFD-registered side mmap()-s with MAP_SHARED
3. The UFFD-registered side mmap()-s with MAP_PRIVATE
In this case, even though the UFFD-registered VMA may be writable, we
still want CoW behavior. So, check for this case and don't make the
pte writable.
- The initial pgoff / max_off check isn't necessary, so we can skip past
it. The second one seems likely to be unnecessary too, but keep it
just in case. Modify both checks to use pgoff, as offset is equivalent
and not needed.
- Previously, we unconditionally called ClearPageDirty() in the error
path. In the continue case though, since this is an existing page, it
might have already been dirty before we started touching it. It's very
problematic to clear the bit incorrectly, but not a problem to leave
it - so, just omit the ClearPageDirty() entirely.
- Previously, we unconditionally removed the page from the page cache in
the error path. But in the continue case, we didn't add it - it was
already there because the page is present in some second
(non-UFFD-registered) mapping. So, removing it is invalid.
Because the error handling issues are easy to exercise in the selftest,
make a small modification there to do so.
Finally, refactor shmem_mcopy_atomic_pte a bit. By this point, we've
added a lot of "if (!is_continue)"-s everywhere. It's cleaner to just
check for that mode first thing, and then "goto" down to where the parts
we actually want are. This leaves the code in between cleaner.
Changes since v2:
- Drop the ClearPageDirty() entirely, instead of trying to remember the
old value.
- Modify both pgoff / max_off checks to use pgoff. It's equivalent to
offset, but offset wasn't initialized until the first check (which
we're skipping).
- Keep the second pgoff / max_off check in the continue case.
Changes since v1:
- Refactor to skip ahead with goto, instead of adding several more
"if (!is_continue)".
- Fix unconditional ClearPageDirty().
- Don't pte_mkwrite() when is_continue && !VM_SHARED.
Fixes: 00da60b9d0a0 ("userfaultfd: support minor fault handling for shmem")
Signed-off-by: Axel Rasmussen <axelrasmussen(a)google.com>
---
mm/shmem.c | 60 +++++++++++++-----------
tools/testing/selftests/vm/userfaultfd.c | 12 +++++
2 files changed, 44 insertions(+), 28 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index d2e0e81b7d2e..fbcce850a16e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2377,18 +2377,22 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
struct page *page;
pte_t _dst_pte, *dst_pte;
int ret;
- pgoff_t offset, max_off;
-
- ret = -ENOMEM;
- if (!shmem_inode_acct_block(inode, 1))
- goto out;
+ pgoff_t max_off;
+ int writable;
if (is_continue) {
ret = -EFAULT;
page = find_lock_page(mapping, pgoff);
if (!page)
- goto out_unacct_blocks;
- } else if (!*pagep) {
+ goto out;
+ goto install_ptes;
+ }
+
+ ret = -ENOMEM;
+ if (!shmem_inode_acct_block(inode, 1))
+ goto out;
+
+ if (!*pagep) {
page = shmem_alloc_page(gfp, info, pgoff);
if (!page)
goto out_unacct_blocks;
@@ -2415,30 +2419,29 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
*pagep = NULL;
}
- if (!is_continue) {
- VM_BUG_ON(PageSwapBacked(page));
- VM_BUG_ON(PageLocked(page));
- __SetPageLocked(page);
- __SetPageSwapBacked(page);
- __SetPageUptodate(page);
- }
+ VM_BUG_ON(PageSwapBacked(page));
+ VM_BUG_ON(PageLocked(page));
+ __SetPageLocked(page);
+ __SetPageSwapBacked(page);
+ __SetPageUptodate(page);
ret = -EFAULT;
- offset = linear_page_index(dst_vma, dst_addr);
max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
- if (unlikely(offset >= max_off))
+ if (unlikely(pgoff >= max_off))
goto out_release;
- /* If page wasn't already in the page cache, add it. */
- if (!is_continue) {
- ret = shmem_add_to_page_cache(page, mapping, pgoff, NULL,
- gfp & GFP_RECLAIM_MASK, dst_mm);
- if (ret)
- goto out_release;
- }
+ ret = shmem_add_to_page_cache(page, mapping, pgoff, NULL,
+ gfp & GFP_RECLAIM_MASK, dst_mm);
+ if (ret)
+ goto out_release;
+install_ptes:
_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
- if (dst_vma->vm_flags & VM_WRITE)
+ /* For CONTINUE on a non-shared VMA, don't pte_mkwrite for CoW. */
+ writable = is_continue && !(dst_vma->vm_flags & VM_SHARED)
+ ? 0
+ : dst_vma->vm_flags & VM_WRITE;
+ if (writable)
_dst_pte = pte_mkwrite(pte_mkdirty(_dst_pte));
else {
/*
@@ -2455,7 +2458,7 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
ret = -EFAULT;
max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
- if (unlikely(offset >= max_off))
+ if (unlikely(pgoff >= max_off))
goto out_release_unlock;
ret = -EEXIST;
@@ -2485,13 +2488,14 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
return ret;
out_release_unlock:
pte_unmap_unlock(dst_pte, ptl);
- ClearPageDirty(page);
- delete_from_page_cache(page);
+ if (!is_continue)
+ delete_from_page_cache(page);
out_release:
unlock_page(page);
put_page(page);
out_unacct_blocks:
- shmem_inode_unacct_blocks(inode, 1);
+ if (!is_continue)
+ shmem_inode_unacct_blocks(inode, 1);
goto out;
}
#endif /* CONFIG_USERFAULTFD */
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index f6c86b036d0f..d8541a59dae5 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -485,6 +485,7 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool wp)
static void continue_range(int ufd, __u64 start, __u64 len)
{
struct uffdio_continue req;
+ int ret;
req.range.start = start;
req.range.len = len;
@@ -493,6 +494,17 @@ static void continue_range(int ufd, __u64 start, __u64 len)
if (ioctl(ufd, UFFDIO_CONTINUE, &req))
err("UFFDIO_CONTINUE failed for address 0x%" PRIx64,
(uint64_t)start);
+
+ /*
+ * Error handling within the kernel for continue is subtly different
+ * from copy or zeropage, so it may be a source of bugs. Trigger an
+ * error (-EEXIST) on purpose, to verify doing so doesn't cause a BUG.
+ */
+ req.mapped = 0;
+ ret = ioctl(ufd, UFFDIO_CONTINUE, &req);
+ if (ret >= 0 || req.mapped != -EEXIST)
+ err("failed to exercise UFFDIO_CONTINUE error handling, ret=%d, mapped=%" PRId64,
+ ret, req.mapped);
}
static void *locking_thread(void *arg)
--
2.31.0.291.g576ba9dcdaf-goog
Good Day Sir/Ms,
We are please to invite you or your company to quote the
following item listed below:
Product/Model No: A702TH FYNE PRESSURE REGULATOR
Model Number: A702TH
Qty. 30 units
Compulsory,Kindly send your quotation to:
quotation(a)pfizerbvsupply.com
for immediate approval.
Kind Regards,
Albert Bourla
PFIZER B.V Supply Chain Manager
Tel: +31(0)208080 880
ADDRESS: Rivium Westlaan 142, 2909 LD
Capelle aan den IJssel, Netherlands
From: Ira Weiny <ira.weiny(a)intel.com>
Introduce a new page protection mechanism for supervisor pages, Protection Key
Supervisor (PKS).
Generally PKS enables protections on 'domains' of supervisor pages to limit
supervisor mode access to pages beyond the normal paging protections. PKS
works in a similar fashion to user space pkeys, PKU. As with PKU, supervisor
pkeys are checked in addition to normal paging protections and Access or Writes
can be disabled via a MSR update without TLB flushes when permissions change.
Also like PKU, a page mapping is assigned to a domain by setting pkey bits in
the page table entry for that mapping.
Access is controlled through a PKRS register which is updated via WRMSR/RDMSR.
XSAVE is not supported for the PKRS MSR. Therefore the implementation
saves/restores the MSR across context switches and during exceptions. Nested
exceptions are supported by each exception getting a new PKS state.
For consistent behavior with current paging protections, pkey 0 is reserved and
configured to allow full access via the pkey mechanism, thus preserving the
default paging protections on mappings with the default pkey value of 0.
Other keys, (1-15) are allocated by an allocator which prepares us for key
contention from day one. Kernel users should be prepared for the allocator to
fail either because of key exhaustion or due to PKS not being supported on the
CPU instance.
The following are key attributes of PKS.
1) Fast switching of permissions
1a) Prevents access without page table manipulations
1b) No TLB flushes required
2) Works on a per thread basis
PKS is available with 4 and 5 level paging. Like PKRU it consumes 4 bits from
the PTE to store the pkey within the entry.
All code to support PKS is configured via ARCH_ENABLE_SUPERVISOR_PKEYS which
is designed to only be turned on when a user is configured on in the kernel.
Those users must depend on ARCH_HAS_SUPERVISOR_PKEYS to properly work with
other architectures which do not yet support PKS.
Originally this series was submitted as part of a large patch set which
converted the kmap call sites.[1]
Many follow on discussions revealed a few problems. The first of which was
that some callers leak a kmap mapping across threads rather than containing it
to a critical section. Attempts were made to see if these 'global kmaps' could
be supported.[2] However, supporting global kmaps had many problems. Work is
being done in parallel on converting as many kmap calls to the new
kmap_local_page().[3]
Changes from V3 [4]
Add ARCH_ENABLE_SUPERVISOR_PKEYS config which is selected by kernel
users to add the functionality to the core. However, they should only
select this if ARCH_HAS_SUPERVISOR_PKEYS is available.
Clean up test code for context switching
Adjust for extended_pt_regs
Reduce output unless --debug is specified
Address internal review comments from Dan Williams and Dave Hansen
Help with macros and assembly coding
Change names of various functions
Clean up documentation
Move all #ifdefery into header files.
Clean up cover letter.
Make extended_pt_regs handling a macro rather than coding
around every call to C
Add macross for PKS shift/mask
New patch : x86/pks: Add additional PKEY helper macros
Preserve pkrs_cache as static when PKS_TEST is not configured
Remove unnecessary pr_* prints
Clarify pks_key_alloc flags parameter
Change CONFIG_PKS_TESTING to CONFIG_PKS_TEST
Clean up test code separation from main code in fault.c
Remove module boilerplate from test code
Clean up all commit messages
Address comments from Thomas Gleixner
Provide a warning and fallback to no protection if a global
mapping is requested.
Fix context switch. Fix where pks_sched_in() is called.
Fix test to actually do a context switch
Remove unecessary noinstr's
From Andy Lutomirski
Use extended_pt_regs idea to stash pks values on the stack
Drop patches 5/10 and 7/10
And use extended_pt_regs to print pkey info on fault
Adjust tests
Comments from Randy Dunlap:
Fix gramatical errors in doc
Clean up kernel docs
Rebase to 5.12
[1] https://lore.kernel.org/lkml/20201009195033.3208459-1-ira.weiny@intel.com/
[2] https://lore.kernel.org/lkml/87mtycqcjf.fsf@nanos.tec.linutronix.de/
[3] https://lore.kernel.org/lkml/20210128061503.1496847-1-ira.weiny@intel.com/https://lore.kernel.org/lkml/20210210062221.3023586-1-ira.weiny@intel.com/https://lore.kernel.org/lkml/20210205170030.856723-1-ira.weiny@intel.com/https://lore.kernel.org/lkml/20210217024826.3466046-1-ira.weiny@intel.com/
[4] https://lore.kernel.org/lkml/20201106232908.364581-1-ira.weiny@intel.com/
</proposed cover letter>
Fenghua Yu (1):
x86/pks: Add PKS kernel API
Ira Weiny (9):
x86/pkeys: Create pkeys_common.h
x86/fpu: Refactor arch_set_user_pkey_access() for PKS support
x86/pks: Add additional PKEY helper macros
x86/pks: Add PKS defines and Kconfig options
x86/pks: Add PKS setup code
x86/fault: Adjust WARN_ON for PKey fault
x86/pks: Preserve the PKRS MSR on context switch
x86/entry: Preserve PKRS MSR across exceptions
x86/pks: Add PKS test code
Documentation/core-api/protection-keys.rst | 111 +++-
arch/x86/Kconfig | 1 +
arch/x86/entry/calling.h | 26 +
arch/x86/entry/common.c | 58 ++
arch/x86/entry/entry_64.S | 22 +-
arch/x86/entry/entry_64_compat.S | 6 +-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/pgtable.h | 10 +-
arch/x86/include/asm/pgtable_types.h | 12 +
arch/x86/include/asm/pkeys.h | 4 +
arch/x86/include/asm/pkeys_common.h | 34 +
arch/x86/include/asm/pks.h | 54 ++
arch/x86/include/asm/processor-flags.h | 2 +
arch/x86/include/asm/processor.h | 43 +-
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/fpu/xstate.c | 22 +-
arch/x86/kernel/head_64.S | 7 +-
arch/x86/kernel/process.c | 3 +
arch/x86/kernel/process_64.c | 2 +
arch/x86/mm/fault.c | 27 +-
arch/x86/mm/pkeys.c | 218 +++++-
include/linux/pgtable.h | 4 +
include/linux/pkeys.h | 34 +
kernel/entry/common.c | 14 +-
lib/Kconfig.debug | 11 +
lib/Makefile | 3 +
lib/pks/Makefile | 3 +
lib/pks/pks_test.c | 693 ++++++++++++++++++++
mm/Kconfig | 5 +
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/test_pks.c | 150 +++++
34 files changed, 1519 insertions(+), 77 deletions(-)
create mode 100644 arch/x86/include/asm/pkeys_common.h
create mode 100644 arch/x86/include/asm/pks.h
create mode 100644 lib/pks/Makefile
create mode 100644 lib/pks/pks_test.c
create mode 100644 tools/testing/selftests/x86/test_pks.c
--
2.28.0.rc0.12.gb6a658bd00c9
If a signed number field starts with a '-' the field width must be > 1,
or unlimited, to allow at least one digit after the '-'.
This patch adds a check for this. If a signed field starts with '-'
and field_width == 1 the scanf will quit.
It is ok for a signed number field to have a field width of 1 if it
starts with a digit. In that case the single digit can be converted.
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
Reviewed-by: Petr Mladek <pmladek(a)suse.com>
Acked-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
---
lib/vsprintf.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 41ddc353ebb8..f78651e9b030 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -3466,8 +3466,12 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
str = skip_spaces(str);
digit = *str;
- if (is_sign && digit == '-')
+ if (is_sign && digit == '-') {
+ if (field_width == 1)
+ break;
+
digit = *(str + 1);
+ }
if (!digit
|| (base == 16 && !isxdigit(digit))
--
2.20.1
Previously, we shared too much of the code with COPY and ZEROPAGE, so we
manipulated things in various invalid ways:
- Previously, we unconditionally called shmem_inode_acct_block. In the
continue case, we're looking up an existing page which would have been
accounted for properly when it was allocated. So doing it twice
results in double-counting, and eventually leaking.
- Previously, we made the pte writable whenever the VMA was writable.
However, for continue, consider this case:
1. A tmpfs file was created
2. The non-UFFD-registered side mmap()-s with MAP_SHARED
3. The UFFD-registered side mmap()-s with MAP_PRIVATE
In this case, even though the UFFD-registered VMA may be writable, we
still want CoW behavior. So, check for this case and don't make the
pte writable.
- The offset / max_off checking doesn't necessarily hurt anything, but
it's not needed in the CONTINUE case, so skip it.
- Previously, we unconditionally called ClearPageDirty() in the error
path. In the continue case though, since this is an existing page, it
might have already been dirty before we started touching it. So,
remember whether or not it was dirty before we set_page_dirty(), and
only clear the bit if it wasn't dirty before.
- Previously, we unconditionally removed the page from the page cache in
the error path. But in the continue case, we didn't add it - it was
already there because the page is present in some second
(non-UFFD-registered) mapping. So, removing it is invalid.
Because the error handling issues are easy to exercise in the selftest,
make a small modification there to do so.
Finally, refactor shmem_mcopy_atomic_pte a bit. By this point, we've
added a lot of "if (!is_continue)"-s everywhere. It's cleaner to just
check for that mode first thing, and then "goto" down to where the parts
we actually want are. This leaves the code in between cleaner.
Changes since v1:
- Refactor to skip ahead with goto, instead of adding several more
"if (!is_continue)".
- Fix unconditional ClearPageDirty().
- Don't pte_mkwrite() when is_continue && !VM_SHARED.
Fixes: 00da60b9d0a0 ("userfaultfd: support minor fault handling for shmem")
Signed-off-by: Axel Rasmussen <axelrasmussen(a)google.com>
---
mm/shmem.c | 67 ++++++++++++++----------
tools/testing/selftests/vm/userfaultfd.c | 12 +++++
2 files changed, 51 insertions(+), 28 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index d2e0e81b7d2e..8ab1f1f29987 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2378,17 +2378,22 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
pte_t _dst_pte, *dst_pte;
int ret;
pgoff_t offset, max_off;
-
- ret = -ENOMEM;
- if (!shmem_inode_acct_block(inode, 1))
- goto out;
+ int writable;
+ bool was_dirty;
if (is_continue) {
ret = -EFAULT;
page = find_lock_page(mapping, pgoff);
if (!page)
- goto out_unacct_blocks;
- } else if (!*pagep) {
+ goto out;
+ goto install_ptes;
+ }
+
+ ret = -ENOMEM;
+ if (!shmem_inode_acct_block(inode, 1))
+ goto out;
+
+ if (!*pagep) {
page = shmem_alloc_page(gfp, info, pgoff);
if (!page)
goto out_unacct_blocks;
@@ -2415,13 +2420,11 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
*pagep = NULL;
}
- if (!is_continue) {
- VM_BUG_ON(PageSwapBacked(page));
- VM_BUG_ON(PageLocked(page));
- __SetPageLocked(page);
- __SetPageSwapBacked(page);
- __SetPageUptodate(page);
- }
+ VM_BUG_ON(PageSwapBacked(page));
+ VM_BUG_ON(PageLocked(page));
+ __SetPageLocked(page);
+ __SetPageSwapBacked(page);
+ __SetPageUptodate(page);
ret = -EFAULT;
offset = linear_page_index(dst_vma, dst_addr);
@@ -2429,16 +2432,18 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
if (unlikely(offset >= max_off))
goto out_release;
- /* If page wasn't already in the page cache, add it. */
- if (!is_continue) {
- ret = shmem_add_to_page_cache(page, mapping, pgoff, NULL,
- gfp & GFP_RECLAIM_MASK, dst_mm);
- if (ret)
- goto out_release;
- }
+ ret = shmem_add_to_page_cache(page, mapping, pgoff, NULL,
+ gfp & GFP_RECLAIM_MASK, dst_mm);
+ if (ret)
+ goto out_release;
+install_ptes:
_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
- if (dst_vma->vm_flags & VM_WRITE)
+ /* For CONTINUE on a non-shared VMA, don't pte_mkwrite for CoW. */
+ writable = is_continue && !(dst_vma->vm_flags & VM_SHARED)
+ ? 0
+ : dst_vma->vm_flags & VM_WRITE;
+ if (writable)
_dst_pte = pte_mkwrite(pte_mkdirty(_dst_pte));
else {
/*
@@ -2448,15 +2453,18 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
* unconditionally before unlock_page(), but doing it
* only if VM_WRITE is not set is faster.
*/
+ was_dirty = PageDirty(page);
set_page_dirty(page);
}
dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
- ret = -EFAULT;
- max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
- if (unlikely(offset >= max_off))
- goto out_release_unlock;
+ if (!is_continue) {
+ ret = -EFAULT;
+ max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
+ if (unlikely(offset >= max_off))
+ goto out_release_unlock;
+ }
ret = -EEXIST;
if (!pte_none(*dst_pte))
@@ -2485,13 +2493,16 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
return ret;
out_release_unlock:
pte_unmap_unlock(dst_pte, ptl);
- ClearPageDirty(page);
- delete_from_page_cache(page);
+ if (!was_dirty)
+ ClearPageDirty(page);
+ if (!is_continue)
+ delete_from_page_cache(page);
out_release:
unlock_page(page);
put_page(page);
out_unacct_blocks:
- shmem_inode_unacct_blocks(inode, 1);
+ if (!is_continue)
+ shmem_inode_unacct_blocks(inode, 1);
goto out;
}
#endif /* CONFIG_USERFAULTFD */
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index f6c86b036d0f..d8541a59dae5 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -485,6 +485,7 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool wp)
static void continue_range(int ufd, __u64 start, __u64 len)
{
struct uffdio_continue req;
+ int ret;
req.range.start = start;
req.range.len = len;
@@ -493,6 +494,17 @@ static void continue_range(int ufd, __u64 start, __u64 len)
if (ioctl(ufd, UFFDIO_CONTINUE, &req))
err("UFFDIO_CONTINUE failed for address 0x%" PRIx64,
(uint64_t)start);
+
+ /*
+ * Error handling within the kernel for continue is subtly different
+ * from copy or zeropage, so it may be a source of bugs. Trigger an
+ * error (-EEXIST) on purpose, to verify doing so doesn't cause a BUG.
+ */
+ req.mapped = 0;
+ ret = ioctl(ufd, UFFDIO_CONTINUE, &req);
+ if (ret >= 0 || req.mapped != -EEXIST)
+ err("failed to exercise UFFDIO_CONTINUE error handling, ret=%d, mapped=%" PRId64,
+ ret, req.mapped);
}
static void *locking_thread(void *arg)
--
2.31.0.291.g576ba9dcdaf-goog
From: Andre Przywara <andre.przywara(a)arm.com>
[ Upstream commit 7011d72588d16a9e5f5d85acbc8b10019809599c ]
The "First Fault Register" (FFR) is an SVE register that mimics a
predicate register, but clears bits when a load or store fails to handle
an element of a vector. The supposed usage scenario is to initialise
this register (using SETFFR), then *read* it later on to learn about
elements that failed to load or store. Explicit writes to this register
using the WRFFR instruction are only supposed to *restore* values
previously read from the register (for context-switching only).
As the manual describes, this register holds only certain values, it:
"... contains a monotonic predicate value, in which starting from bit 0
there are zero or more 1 bits, followed only by 0 bits in any remaining
bit positions."
Any other value is UNPREDICTABLE and is not supposed to be "restored"
into the register.
The SVE test currently tries to write a signature pattern into the
register, which is *not* a canonical FFR value. Apparently the existing
setups treat UNPREDICTABLE as "read-as-written", but a new
implementation actually only stores canonical values. As a consequence,
the sve-test fails immediately when comparing the FFR value:
-----------
# ./sve-test
Vector length: 128 bits
PID: 207
Mismatch: PID=207, iteration=0, reg=48
Expected [cf00]
Got [0f00]
Aborted
-----------
Fix this by only populating the FFR with proper canonical values.
Effectively the requirement described above limits us to 17 unique
values over 16 bits worth of FFR, so we condense our signature down to 4
bits (2 bits from the PID, 2 bits from the generation) and generate the
canonical pattern from it. Any bits describing elements above the
minimum 128 bit are set to 0.
This aligns the FFR usage to the architecture and fixes the test on
microarchitectures implementing FFR in a more restricted way.
Signed-off-by: Andre Przywara <andre.przywara(a)arm.com>
Reviwed-by: Mark Brown <broonie(a)kernel.org>
Link: https://lore.kernel.org/r/20210319120128.29452-1-andre.przywara@arm.com
Signed-off-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/arm64/fp/sve-test.S | 22 ++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/arm64/fp/sve-test.S b/tools/testing/selftests/arm64/fp/sve-test.S
index f95074c9b48b..07f14e279a90 100644
--- a/tools/testing/selftests/arm64/fp/sve-test.S
+++ b/tools/testing/selftests/arm64/fp/sve-test.S
@@ -284,16 +284,28 @@ endfunction
// Set up test pattern in the FFR
// x0: pid
// x2: generation
+//
+// We need to generate a canonical FFR value, which consists of a number of
+// low "1" bits, followed by a number of zeros. This gives us 17 unique values
+// per 16 bits of FFR, so we create a 4 bit signature out of the PID and
+// generation, and use that as the initial number of ones in the pattern.
+// We fill the upper lanes of FFR with zeros.
// Beware: corrupts P0.
function setup_ffr
mov x4, x30
- bl pattern
+ and w0, w0, #0x3
+ bfi w0, w2, #2, #2
+ mov w1, #1
+ lsl w1, w1, w0
+ sub w1, w1, #1
+
ldr x0, =ffrref
- ldr x1, =scratch
- rdvl x2, #1
- lsr x2, x2, #3
- bl memcpy
+ strh w1, [x0], 2
+ rdvl x1, #1
+ lsr x1, x1, #3
+ sub x1, x1, #2
+ bl memclr
mov x0, #0
ldr x1, =ffrref
--
2.30.1
From: David Gow <davidgow(a)google.com>
[ Upstream commit 7421b1a4d10c633ca5f14c8236d3e2c1de07e52b ]
The first argument to namedtuple() should match the name of the type,
which wasn't the case for KconfigEntryBase.
Fixing this is enough to make mypy show no python typing errors again.
Fixes 97752c39bd ("kunit: kunit_tool: Allow .kunitconfig to disable config items")
Signed-off-by: David Gow <davidgow(a)google.com>
Reviewed-by: Daniel Latypov <dlatypov(a)google.com>
Acked-by: Brendan Higgins <brendanhiggins(a)google.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/kunit/kunit_config.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/kunit/kunit_config.py b/tools/testing/kunit/kunit_config.py
index 02ffc3a3e5dc..b30e9d6db6b4 100644
--- a/tools/testing/kunit/kunit_config.py
+++ b/tools/testing/kunit/kunit_config.py
@@ -12,7 +12,7 @@ import re
CONFIG_IS_NOT_SET_PATTERN = r'^# CONFIG_(\w+) is not set$'
CONFIG_PATTERN = r'^CONFIG_(\w+)=(\S+|".*")$'
-KconfigEntryBase = collections.namedtuple('KconfigEntry', ['name', 'value'])
+KconfigEntryBase = collections.namedtuple('KconfigEntryBase', ['name', 'value'])
class KconfigEntry(KconfigEntryBase):
--
2.30.1
From: Andre Przywara <andre.przywara(a)arm.com>
[ Upstream commit 7011d72588d16a9e5f5d85acbc8b10019809599c ]
The "First Fault Register" (FFR) is an SVE register that mimics a
predicate register, but clears bits when a load or store fails to handle
an element of a vector. The supposed usage scenario is to initialise
this register (using SETFFR), then *read* it later on to learn about
elements that failed to load or store. Explicit writes to this register
using the WRFFR instruction are only supposed to *restore* values
previously read from the register (for context-switching only).
As the manual describes, this register holds only certain values, it:
"... contains a monotonic predicate value, in which starting from bit 0
there are zero or more 1 bits, followed only by 0 bits in any remaining
bit positions."
Any other value is UNPREDICTABLE and is not supposed to be "restored"
into the register.
The SVE test currently tries to write a signature pattern into the
register, which is *not* a canonical FFR value. Apparently the existing
setups treat UNPREDICTABLE as "read-as-written", but a new
implementation actually only stores canonical values. As a consequence,
the sve-test fails immediately when comparing the FFR value:
-----------
# ./sve-test
Vector length: 128 bits
PID: 207
Mismatch: PID=207, iteration=0, reg=48
Expected [cf00]
Got [0f00]
Aborted
-----------
Fix this by only populating the FFR with proper canonical values.
Effectively the requirement described above limits us to 17 unique
values over 16 bits worth of FFR, so we condense our signature down to 4
bits (2 bits from the PID, 2 bits from the generation) and generate the
canonical pattern from it. Any bits describing elements above the
minimum 128 bit are set to 0.
This aligns the FFR usage to the architecture and fixes the test on
microarchitectures implementing FFR in a more restricted way.
Signed-off-by: Andre Przywara <andre.przywara(a)arm.com>
Reviwed-by: Mark Brown <broonie(a)kernel.org>
Link: https://lore.kernel.org/r/20210319120128.29452-1-andre.przywara@arm.com
Signed-off-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/arm64/fp/sve-test.S | 22 ++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/arm64/fp/sve-test.S b/tools/testing/selftests/arm64/fp/sve-test.S
index 9210691aa998..e3e08d9c7020 100644
--- a/tools/testing/selftests/arm64/fp/sve-test.S
+++ b/tools/testing/selftests/arm64/fp/sve-test.S
@@ -284,16 +284,28 @@ endfunction
// Set up test pattern in the FFR
// x0: pid
// x2: generation
+//
+// We need to generate a canonical FFR value, which consists of a number of
+// low "1" bits, followed by a number of zeros. This gives us 17 unique values
+// per 16 bits of FFR, so we create a 4 bit signature out of the PID and
+// generation, and use that as the initial number of ones in the pattern.
+// We fill the upper lanes of FFR with zeros.
// Beware: corrupts P0.
function setup_ffr
mov x4, x30
- bl pattern
+ and w0, w0, #0x3
+ bfi w0, w2, #2, #2
+ mov w1, #1
+ lsl w1, w1, w0
+ sub w1, w1, #1
+
ldr x0, =ffrref
- ldr x1, =scratch
- rdvl x2, #1
- lsr x2, x2, #3
- bl memcpy
+ strh w1, [x0], 2
+ rdvl x1, #1
+ lsr x1, x1, #3
+ sub x1, x1, #2
+ bl memclr
mov x0, #0
ldr x1, =ffrref
--
2.30.1
From: David Gow <davidgow(a)google.com>
[ Upstream commit 7421b1a4d10c633ca5f14c8236d3e2c1de07e52b ]
The first argument to namedtuple() should match the name of the type,
which wasn't the case for KconfigEntryBase.
Fixing this is enough to make mypy show no python typing errors again.
Fixes 97752c39bd ("kunit: kunit_tool: Allow .kunitconfig to disable config items")
Signed-off-by: David Gow <davidgow(a)google.com>
Reviewed-by: Daniel Latypov <dlatypov(a)google.com>
Acked-by: Brendan Higgins <brendanhiggins(a)google.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/kunit/kunit_config.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/kunit/kunit_config.py b/tools/testing/kunit/kunit_config.py
index bdd60230764b..27fe086d2d0d 100644
--- a/tools/testing/kunit/kunit_config.py
+++ b/tools/testing/kunit/kunit_config.py
@@ -13,7 +13,7 @@ from typing import List, Set
CONFIG_IS_NOT_SET_PATTERN = r'^# CONFIG_(\w+) is not set$'
CONFIG_PATTERN = r'^CONFIG_(\w+)=(\S+|".*")$'
-KconfigEntryBase = collections.namedtuple('KconfigEntry', ['name', 'value'])
+KconfigEntryBase = collections.namedtuple('KconfigEntryBase', ['name', 'value'])
class KconfigEntry(KconfigEntryBase):
--
2.30.1
If a signed number field starts with a '-' the field width must be > 1,
or unlimited, to allow at least one digit after the '-'.
This patch adds a check for this. If a signed field starts with '-'
and field_width == 1 the scanf will quit.
It is ok for a signed number field to have a field width of 1 if it
starts with a digit. In that case the single digit can be converted.
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
Reviewed-by: Petr Mladek <pmladek(a)suse.com>
Acked-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
---
lib/vsprintf.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 41ddc353ebb8..f78651e9b030 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -3466,8 +3466,12 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
str = skip_spaces(str);
digit = *str;
- if (is_sign && digit == '-')
+ if (is_sign && digit == '-') {
+ if (field_width == 1)
+ break;
+
digit = *(str + 1);
+ }
if (!digit
|| (base == 16 && !isxdigit(digit))
--
2.20.1
Hi,
This v5 series can mainly include two parts.
Based on kvm queue branch: https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=queue
In the first part, all the known hugetlb backing src types specified
with different hugepage sizes are listed, so that we can specify use
of hugetlb source of the exact granularity that we want, instead of
the system default ones. And as all the known hugetlb page sizes are
listed, it's appropriate for all architectures. Besides, a helper that
can get granularity of different backing src types(anonumous/thp/hugetlb)
is added, so that we can use the accurate backing src granularity for
kinds of alignment or guest memory accessing of vcpus.
In the second part, a new test is added:
This test is added to serve as a performance tester and a bug reproducer
for kvm page table code (GPA->HPA mappings), it gives guidance for the
people trying to make some improvement for kvm. And the following explains
what we can exactly do through this test.
The function guest_code() can cover the conditions where a single vcpu or
multiple vcpus access guest pages within the same memory region, in three
VM stages(before dirty logging, during dirty logging, after dirty logging).
Besides, the backing src memory type(ANONYMOUS/THP/HUGETLB) of the tested
memory region can be specified by users, which means normal page mappings
or block mappings can be chosen by users to be created in the test.
If ANONYMOUS memory is specified, kvm will create normal page mappings
for the tested memory region before dirty logging, and update attributes
of the page mappings from RO to RW during dirty logging. If THP/HUGETLB
memory is specified, kvm will create block mappings for the tested memory
region before dirty logging, and split the blcok mappings into normal page
mappings during dirty logging, and coalesce the page mappings back into
block mappings after dirty logging is stopped.
So in summary, as a performance tester, this test can present the
performance of kvm creating/updating normal page mappings, or the
performance of kvm creating/splitting/recovering block mappings,
through execution time.
When we need to coalesce the page mappings back to block mappings after
dirty logging is stopped, we have to firstly invalidate *all* the TLB
entries for the page mappings right before installation of the block entry,
because a TLB conflict abort error could occur if we can't invalidate the
TLB entries fully. We have hit this TLB conflict twice on aarch64 software
implementation and fixed it. As this test can imulate process from dirty
logging enabled to dirty logging stopped of a VM with block mappings,
so it can also reproduce this TLB conflict abort due to inadequate TLB
invalidation when coalescing tables.
Links about the TLB conflict abort:
https://lore.kernel.org/lkml/20201201201034.116760-3-wangyanan55@huawei.com/
---
Change logs:
v4->v5:
- Use synchronization(sem_wait) for time measurement
- Add a new patch about TEST_ASSERT(patch 4)
- Address Andrew Jones's comments for v4 series
- Add Andrew Jones's R-b tags in some patches
- v4: https://lore.kernel.org/lkml/20210302125751.19080-1-wangyanan55@huawei.com/
v3->v4:
- Add a helper to get system default hugetlb page size
- Add tags of Reviewed-by of Ben in the patches
- v3: https://lore.kernel.org/lkml/20210301065916.11484-1-wangyanan55@huawei.com/
v2->v3:
- Add tags of Suggested-by, Reviewed-by in the patches
- Add a generic micro to get hugetlb page sizes
- Some changes for suggestions about v2 series
- v2: https://lore.kernel.org/lkml/20210225055940.18748-1-wangyanan55@huawei.com/
v1->v2:
- Add a patch to sync header files
- Add helpers to get granularity of different backing src types
- Some changes for suggestions about v1 series
- v1: https://lore.kernel.org/lkml/20210208090841.333724-1-wangyanan55@huawei.com/
---
Yanan Wang (10):
tools headers: sync headers of asm-generic/hugetlb_encode.h
tools headers: Add a macro to get HUGETLB page sizes for mmap
KVM: selftests: Use flag CLOCK_MONOTONIC_RAW for timing
KVM: selftests: Print the errno besides error-string in TEST_ASSERT
KVM: selftests: Make a generic helper to get vm guest mode strings
KVM: selftests: Add a helper to get system configured THP page size
KVM: selftests: Add a helper to get system default hugetlb page size
KVM: selftests: List all hugetlb src types specified with page sizes
KVM: selftests: Adapt vm_userspace_mem_region_add to new helpers
KVM: selftests: Add a test for kvm page table code
include/uapi/linux/mman.h | 2 +
tools/include/asm-generic/hugetlb_encode.h | 3 +
tools/include/uapi/linux/mman.h | 2 +
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 3 +
.../selftests/kvm/demand_paging_test.c | 8 +-
.../selftests/kvm/dirty_log_perf_test.c | 14 +-
.../testing/selftests/kvm/include/kvm_util.h | 4 +-
.../testing/selftests/kvm/include/test_util.h | 21 +-
.../selftests/kvm/kvm_page_table_test.c | 512 ++++++++++++++++++
tools/testing/selftests/kvm/lib/assert.c | 4 +-
tools/testing/selftests/kvm/lib/kvm_util.c | 59 +-
tools/testing/selftests/kvm/lib/test_util.c | 163 +++++-
tools/testing/selftests/kvm/steal_time.c | 4 +-
14 files changed, 739 insertions(+), 61 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_page_table_test.c
--
2.23.0
Previously, in the error path, we unconditionally removed the page from
the page cache. But in the continue case, we didn't add it - it was
already there because the page is used by a second (non-UFFD-registered)
mapping. So, in that case, it's incorrect to remove it as the other
mapping may still use it normally.
For this error handling failure, trivially exercise it in the
userfaultfd selftest, to detect this kind of bug in the future.
Also, we previously were unconditionally calling shmem_inode_acct_block.
In the continue case, however, this is incorrect, because we would have
already accounted for the RAM usage when the page was originally
allocated (since at this point it's already in the page cache). So,
doing it in the continue case causes us to double-count.
Fixes: 00da60b9d0a0 ("userfaultfd: support minor fault handling for shmem")
Signed-off-by: Axel Rasmussen <axelrasmussen(a)google.com>
---
mm/shmem.c | 15 ++++++++++-----
tools/testing/selftests/vm/userfaultfd.c | 12 ++++++++++++
2 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index d2e0e81b7d2e..5ac8ea737004 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2379,9 +2379,11 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
int ret;
pgoff_t offset, max_off;
- ret = -ENOMEM;
- if (!shmem_inode_acct_block(inode, 1))
- goto out;
+ if (!is_continue) {
+ ret = -ENOMEM;
+ if (!shmem_inode_acct_block(inode, 1))
+ goto out;
+ }
if (is_continue) {
ret = -EFAULT;
@@ -2389,6 +2391,7 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
if (!page)
goto out_unacct_blocks;
} else if (!*pagep) {
+ ret = -ENOMEM;
page = shmem_alloc_page(gfp, info, pgoff);
if (!page)
goto out_unacct_blocks;
@@ -2486,12 +2489,14 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
out_release_unlock:
pte_unmap_unlock(dst_pte, ptl);
ClearPageDirty(page);
- delete_from_page_cache(page);
+ if (!is_continue)
+ delete_from_page_cache(page);
out_release:
unlock_page(page);
put_page(page);
out_unacct_blocks:
- shmem_inode_unacct_blocks(inode, 1);
+ if (!is_continue)
+ shmem_inode_unacct_blocks(inode, 1);
goto out;
}
#endif /* CONFIG_USERFAULTFD */
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index f6c86b036d0f..d8541a59dae5 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -485,6 +485,7 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool wp)
static void continue_range(int ufd, __u64 start, __u64 len)
{
struct uffdio_continue req;
+ int ret;
req.range.start = start;
req.range.len = len;
@@ -493,6 +494,17 @@ static void continue_range(int ufd, __u64 start, __u64 len)
if (ioctl(ufd, UFFDIO_CONTINUE, &req))
err("UFFDIO_CONTINUE failed for address 0x%" PRIx64,
(uint64_t)start);
+
+ /*
+ * Error handling within the kernel for continue is subtly different
+ * from copy or zeropage, so it may be a source of bugs. Trigger an
+ * error (-EEXIST) on purpose, to verify doing so doesn't cause a BUG.
+ */
+ req.mapped = 0;
+ ret = ioctl(ufd, UFFDIO_CONTINUE, &req);
+ if (ret >= 0 || req.mapped != -EEXIST)
+ err("failed to exercise UFFDIO_CONTINUE error handling, ret=%d, mapped=%" PRId64,
+ ret, req.mapped);
}
static void *locking_thread(void *arg)
--
2.31.0.291.g576ba9dcdaf-goog
From: Colin Ian King <colin.king(a)canonical.com>
There is a spelling mistake in a comment. Fix it.
Signed-off-by: Colin Ian King <colin.king(a)canonical.com>
---
tools/testing/selftests/timers/clocksource-switch.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/timers/clocksource-switch.c b/tools/testing/selftests/timers/clocksource-switch.c
index bfc974b4572d..2d66abd877e6 100644
--- a/tools/testing/selftests/timers/clocksource-switch.c
+++ b/tools/testing/selftests/timers/clocksource-switch.c
@@ -3,7 +3,7 @@
* (C) Copyright IBM 2012
* Licensed under the GPLv2
*
- * NOTE: This is a meta-test which quickly changes the clocksourc and
+ * NOTE: This is a meta-test which quickly changes the clocksource and
* then uses other tests to detect problems. Thus this test requires
* that the inconsistency-check and nanosleep tests be present in the
* same directory it is run from.
--
2.30.2
Currently the following command produces an error message:
linux# make kselftest TARGETS=bpf O=/mnt/linux-build
# selftests: bpf: test_libbpf.sh
# ./test_libbpf.sh: line 23: ./test_libbpf_open: No such file or directory
# test_libbpf: failed at file test_l4lb.o
# selftests: test_libbpf [FAILED]
The error message might not affect the return code of make, therefore
one needs to grep make output in order to detect it.
This is not the only instance of the same underlying problem; any test
with more than one element in $(TEST_PROGS) fails the same way. Another
example:
linux# make O=/mnt/linux-build TARGETS=splice kselftest
[...]
# ./short_splice_read.sh: 15: ./splice_read: not found
# FAIL: /sys/module/test_module/sections/.init.text 2
not ok 2 selftests: splice: short_splice_read.sh # exit=1
The current logic prepends $(OUTPUT) only to the first member of
$(TEST_PROGS). After that, run_one() does
cd `dirname $TEST`
For all tests except the first one, `dirname $TEST` is ., which means
they cannot access the files generated in $(OUTPUT).
Fix by using $(addprefix) to prepend $(OUTPUT)/ to each member of
$(TEST_PROGS).
Fixes: 1a940687e424 ("selftests: lib.mk: copy test scripts and test files for make O=dir run")
Signed-off-by: Ilya Leoshkevich <iii(a)linux.ibm.com>
---
v1->v2:
- Append / to $(OUTPUT).
- Use $(addprefix) instead of $(foreach).
v2->v3:
- Split the patch in two.
- Improve the commit message.
v3: https://lore.kernel.org/linux-kselftest/20191024121347.22189-1-iii@linux.ib…
v3->v4:
- Drop the first patch.
- Add a note regarding make return code to the commit message.
v4: https://lore.kernel.org/linux-kselftest/20191115150428.61131-1-iii@linux.ib…
v4->v5:
- Add another reproducer to the commit message.
tools/testing/selftests/lib.mk | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index a5ce26d548e4..be17462fe146 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -74,7 +74,8 @@ ifdef building_out_of_srctree
rsync -aq $(TEST_PROGS) $(TEST_PROGS_EXTENDED) $(TEST_FILES) $(OUTPUT); \
fi
@if [ "X$(TEST_PROGS)" != "X" ]; then \
- $(call RUN_TESTS, $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS) $(OUTPUT)/$(TEST_PROGS)) ; \
+ $(call RUN_TESTS, $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS) \
+ $(addprefix $(OUTPUT)/,$(TEST_PROGS))) ; \
else \
$(call RUN_TESTS, $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS)); \
fi
--
2.29.2
Attacks against vulnerable userspace applications with the purpose to break
ASLR or bypass canaries traditionally use some level of brute force with
the help of the fork system call. This is possible since when creating a
new process using fork its memory contents are the same as those of the
parent process (the process that called the fork system call). So, the
attacker can test the memory infinite times to find the correct memory
values or the correct memory addresses without worrying about crashing the
application.
Based on the above scenario it would be nice to have this detected and
mitigated, and this is the goal of this patch serie. Specifically the
following attacks are expected to be detected:
1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
desirable memory layout is got (e.g. Stack Clash).
2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly
until a desirable memory layout is got (e.g. what CTFs do for simple
network service).
3.- Launching processes without exec() (e.g. Android Zygote) and exposing
state to attack a sibling.
4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until
the previously shared memory layout of all the other children is
exposed (e.g. kind of related to HeartBleed).
In each case, a privilege boundary has been crossed:
Case 1: setuid/setgid process
Case 2: network to local
Case 3: privilege changes
Case 4: network to local
So, what will really be detected are fork/exec brute force attacks that
cross any of the commented bounds.
The implementation details and comparison against other existing
implementations can be found in the "Documentation" patch.
Knowing all this information I will explain now the different patches:
The 1/8 patch defines a new LSM hook to get the fatal signal of a task.
This will be useful during the attack detection phase.
The 2/8 patch defines a new LSM and manages the statistical data shared by
all the fork hierarchy processes.
The 3/8 patch detects a fork/exec brute force attack.
The 4/8 patch narrows the detection taken into account the privilege
boundary crossing.
The 5/8 patch mitigates a brute force attack.
The 6/8 patch adds self-tests to validate the Brute LSM expectations.
The 7/8 patch adds the documentation to explain this implementation.
The 8/8 patch updates the maintainers file.
This patch serie is a task of the KSPP [1] and can also be accessed from my
github tree [2] in the "brute_v6" branch.
[1] https://github.com/KSPP/linux/issues/39
[2] https://github.com/johwood/linux/
The previous versions can be found in:
RFC
https://lore.kernel.org/kernel-hardening/20200910202107.3799376-1-keescook@…
Version 2
https://lore.kernel.org/kernel-hardening/20201025134540.3770-1-john.wood@gm…
Version 3
https://lore.kernel.org/lkml/20210221154919.68050-1-john.wood@gmx.com/
Version 4
https://lore.kernel.org/lkml/20210227150956.6022-1-john.wood@gmx.com/
Version 5
https://lore.kernel.org/kernel-hardening/20210227153013.6747-1-john.wood@gm…
Changelog RFC -> v2
-------------------
- Rename this feature with a more suitable name (Jann Horn, Kees Cook).
- Convert the code to an LSM (Kees Cook).
- Add locking to avoid data races (Jann Horn).
- Add a new LSM hook to get the fatal signal of a task (Jann Horn, Kees
Cook).
- Add the last crashes timestamps list to avoid false positives in the
attack detection (Jann Horn).
- Use "period" instead of "rate" (Jann Horn).
- Other minor changes suggested (Jann Horn, Kees Cook).
Changelog v2 -> v3
------------------
- Compute the application crash period on an on-going basis (Kees Cook).
- Detect a brute force attack through the execve system call (Kees Cook).
- Detect an slow brute force attack (Randy Dunlap).
- Fine tuning the detection taken into account privilege boundary crossing
(Kees Cook).
- Taken into account only fatal signals delivered by the kernel (Kees
Cook).
- Remove the sysctl attributes to fine tuning the detection (Kees Cook).
- Remove the prctls to allow per process enabling/disabling (Kees Cook).
- Improve the documentation (Kees Cook).
- Fix some typos in the documentation (Randy Dunlap).
- Add self-test to validate the expectations (Kees Cook).
Changelog v3 -> v4
------------------
- Fix all the warnings shown by the tool "scripts/kernel-doc" (Randy
Dunlap).
Changelog v4 -> v5
------------------
- Fix some typos (Randy Dunlap).
Changelog v5 -> v6
------------------
- Fix a reported deadlock (kernel test robot).
- Add high level details to the documentation (Andi Kleen).
Any constructive comments are welcome.
Thanks.
John Wood (8):
security: Add LSM hook at the point where a task gets a fatal signal
security/brute: Define a LSM and manage statistical data
securtiy/brute: Detect a brute force attack
security/brute: Fine tuning the attack detection
security/brute: Mitigate a brute force attack
selftests/brute: Add tests for the Brute LSM
Documentation: Add documentation for the Brute LSM
MAINTAINERS: Add a new entry for the Brute LSM
Documentation/admin-guide/LSM/Brute.rst | 278 ++++++
Documentation/admin-guide/LSM/index.rst | 1 +
MAINTAINERS | 7 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 4 +
kernel/signal.c | 1 +
security/Kconfig | 11 +-
security/Makefile | 4 +
security/brute/Kconfig | 13 +
security/brute/Makefile | 2 +
security/brute/brute.c | 1107 ++++++++++++++++++++++
security/security.c | 5 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/brute/.gitignore | 2 +
tools/testing/selftests/brute/Makefile | 5 +
tools/testing/selftests/brute/config | 1 +
tools/testing/selftests/brute/exec.c | 44 +
tools/testing/selftests/brute/test.c | 507 ++++++++++
tools/testing/selftests/brute/test.sh | 226 +++++
20 files changed, 2219 insertions(+), 5 deletions(-)
create mode 100644 Documentation/admin-guide/LSM/Brute.rst
create mode 100644 security/brute/Kconfig
create mode 100644 security/brute/Makefile
create mode 100644 security/brute/brute.c
create mode 100644 tools/testing/selftests/brute/.gitignore
create mode 100644 tools/testing/selftests/brute/Makefile
create mode 100644 tools/testing/selftests/brute/config
create mode 100644 tools/testing/selftests/brute/exec.c
create mode 100644 tools/testing/selftests/brute/test.c
create mode 100755 tools/testing/selftests/brute/test.sh
--
2.25.1
sched.h has been included at line 33, so remove the
duplicate one at line 36.
inttypes.h has been included at line 19, so remove the
duplicate one at line 23.
pthread.h has been included at line 17,so remove the
duplicate one at line 20.
Signed-off-by: Wan Jiabing <wanjiabing(a)vivo.com>
---
tools/testing/selftests/powerpc/mm/tlbie_test.c | 1 -
tools/testing/selftests/powerpc/tm/tm-poison.c | 1 -
tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c | 1 -
3 files changed, 3 deletions(-)
diff --git a/tools/testing/selftests/powerpc/mm/tlbie_test.c b/tools/testing/selftests/powerpc/mm/tlbie_test.c
index f85a0938ab25..48344a74b212 100644
--- a/tools/testing/selftests/powerpc/mm/tlbie_test.c
+++ b/tools/testing/selftests/powerpc/mm/tlbie_test.c
@@ -33,7 +33,6 @@
#include <sched.h>
#include <time.h>
#include <stdarg.h>
-#include <sched.h>
#include <pthread.h>
#include <signal.h>
#include <sys/prctl.h>
diff --git a/tools/testing/selftests/powerpc/tm/tm-poison.c b/tools/testing/selftests/powerpc/tm/tm-poison.c
index 29e5f26af7b9..27c083a03d1f 100644
--- a/tools/testing/selftests/powerpc/tm/tm-poison.c
+++ b/tools/testing/selftests/powerpc/tm/tm-poison.c
@@ -20,7 +20,6 @@
#include <sched.h>
#include <sys/types.h>
#include <signal.h>
-#include <inttypes.h>
#include "tm.h"
diff --git a/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c b/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c
index e2a0c07e8362..9ef37a9836ac 100644
--- a/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c
+++ b/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c
@@ -17,7 +17,6 @@
#include <pthread.h>
#include <sys/mman.h>
#include <unistd.h>
-#include <pthread.h>
#include "tm.h"
#include "utils.h"
--
2.25.1
This fix is analogous to Peter Xu's fix for hugetlb [0]. If we don't
put_page() after getting the page out of the page cache, we leak the
reference.
The fix can be verified by checking /proc/meminfo and running the
userfaultfd selftest in shmem mode. Without the fix, we see MemFree /
MemAvailable steadily decreasing with each run of the test. With the
fix, memory is correctly freed after the test program exits.
Fixes: 00da60b9d0a0 ("userfaultfd: support minor fault handling for shmem")
Signed-off-by: Axel Rasmussen <axelrasmussen(a)google.com>
---
mm/shmem.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/shmem.c b/mm/shmem.c
index ef8c9f5e92fc..d2e0e81b7d2e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1831,6 +1831,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
if (page && vma && userfaultfd_minor(vma)) {
unlock_page(page);
+ put_page(page);
*fault_type = handle_userfault(vmf, VM_UFFD_MINOR);
return 0;
}
--
2.31.0.rc2.261.g7f71774620-goog
string.h has been included at line 15.So we remove the
duplicate one at line 17.
Signed-off-by: Wan Jiabing <wanjiabing(a)vivo.com>
---
tools/testing/selftests/mincore/mincore_selftest.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/tools/testing/selftests/mincore/mincore_selftest.c b/tools/testing/selftests/mincore/mincore_selftest.c
index 5a1e85ff5d32..e54106643337 100644
--- a/tools/testing/selftests/mincore/mincore_selftest.c
+++ b/tools/testing/selftests/mincore/mincore_selftest.c
@@ -14,7 +14,6 @@
#include <sys/mman.h>
#include <string.h>
#include <fcntl.h>
-#include <string.h>
#include "../kselftest.h"
#include "../kselftest_harness.h"
--
2.25.1
The perf subsystem today unifies various tracing and monitoring
features, from both software and hardware. One benefit of the perf
subsystem is automatically inheriting events to child tasks, which
enables process-wide events monitoring with low overheads. By default
perf events are non-intrusive, not affecting behaviour of the tasks
being monitored.
For certain use-cases, however, it makes sense to leverage the
generality of the perf events subsystem and optionally allow the tasks
being monitored to receive signals on events they are interested in.
This patch series adds the option to synchronously signal user space on
events.
To better support process-wide synchronous self-monitoring, without
events propagating to children that do not share the current process's
shared environment, two pre-requisite patches are added to optionally
restrict inheritance to CLONE_THREAD, and remove events on exec (without
affecting the parent).
Examples how to use these features can be found in the two kselftests at
the end of the series. The kselftests verify and stress test the basic
functionality.
The discussion at [1] led to the changes proposed in this series. The
approach taken in patch "Add support for SIGTRAP on perf events" to use
'event_limit' to trigger the signal was kindly suggested by Peter
Zijlstra in [2].
[1] https://lore.kernel.org/lkml/CACT4Y+YPrXGw+AtESxAgPyZ84TYkNZdP0xpocX2jwVAbZ…
[2] https://lore.kernel.org/lkml/YBv3rAT566k+6zjg@hirez.programming.kicks-ass.n…
Motivation and example uses:
1. Our immediate motivation is low-overhead sampling-based race
detection for user space [3]. By using perf_event_open() at
process initialization, we can create hardware
breakpoint/watchpoint events that are propagated automatically
to all threads in a process. As far as we are aware, today no
existing kernel facility (such as ptrace) allows us to set up
process-wide watchpoints with minimal overheads (that are
comparable to mprotect() of whole pages).
[3] https://llvm.org/devmtg/2020-09/slides/Morehouse-GWP-Tsan.pdf
2. Other low-overhead error detectors that rely on detecting
accesses to certain memory locations or code, process-wide and
also only in a specific set of subtasks or threads.
Other ideas for use-cases we found interesting, but should only
illustrate the range of potential to further motivate the utility (we're
sure there are more):
3. Code hot patching without full stop-the-world. Specifically, by
setting a code breakpoint to entry to the patched routine, then
send signals to threads and check that they are not in the
routine, but without stopping them further. If any of the
threads will enter the routine, it will receive SIGTRAP and
pause.
4. Safepoints without mprotect(). Some Java implementations use
"load from a known memory location" as a safepoint. When threads
need to be stopped, the page containing the location is
mprotect()ed and threads get a signal. This could be replaced with
a watchpoint, which does not require a whole page nor DTLB
shootdowns.
5. Threads receiving signals on performance events to
throttle/unthrottle themselves.
6. Tracking data flow globally.
---
v2:
* Patch "Support only inheriting events if cloned with CLONE_THREAD"
added to series.
* Patch "Add support for event removal on exec" added to series.
* Patch "Add kselftest for process-wide sigtrap handling" added to
series.
* Patch "Add kselftest for remove_on_exec" added to series.
* Implicitly restrict inheriting events if sigtrap, but the child was
cloned with CLONE_CLEAR_SIGHAND, because it is not generally safe if
the child cleared all signal handlers to continue sending SIGTRAP.
* Various minor fixes (see details in patches).
v1: https://lkml.kernel.org/r/20210223143426.2412737-1-elver@google.com
Marco Elver (8):
perf/core: Apply PERF_EVENT_IOC_MODIFY_ATTRIBUTES to children
perf/core: Support only inheriting events if cloned with CLONE_THREAD
perf/core: Add support for event removal on exec
signal: Introduce TRAP_PERF si_code and si_perf to siginfo
perf/core: Add support for SIGTRAP on perf events
perf/core: Add breakpoint information to siginfo on SIGTRAP
selftests/perf: Add kselftest for process-wide sigtrap handling
selftests/perf: Add kselftest for remove_on_exec
arch/m68k/kernel/signal.c | 3 +
arch/x86/kernel/signal_compat.c | 5 +-
fs/signalfd.c | 4 +
include/linux/compat.h | 2 +
include/linux/perf_event.h | 5 +-
include/linux/signal.h | 1 +
include/uapi/asm-generic/siginfo.h | 6 +-
include/uapi/linux/perf_event.h | 5 +-
include/uapi/linux/signalfd.h | 4 +-
kernel/events/core.c | 130 ++++++++-
kernel/fork.c | 2 +-
kernel/signal.c | 11 +
.../testing/selftests/perf_events/.gitignore | 3 +
tools/testing/selftests/perf_events/Makefile | 6 +
tools/testing/selftests/perf_events/config | 1 +
.../selftests/perf_events/remove_on_exec.c | 256 ++++++++++++++++++
tools/testing/selftests/perf_events/settings | 1 +
.../selftests/perf_events/sigtrap_threads.c | 202 ++++++++++++++
18 files changed, 632 insertions(+), 15 deletions(-)
create mode 100644 tools/testing/selftests/perf_events/.gitignore
create mode 100644 tools/testing/selftests/perf_events/Makefile
create mode 100644 tools/testing/selftests/perf_events/config
create mode 100644 tools/testing/selftests/perf_events/remove_on_exec.c
create mode 100644 tools/testing/selftests/perf_events/settings
create mode 100644 tools/testing/selftests/perf_events/sigtrap_threads.c
--
2.30.1.766.gb4fecdf3b7-goog
When trying to run the arm64 MTE (Memory Tagging Extension) selftests
on a model with the new FEAT_MTE3 capability, the MTE feature detection
failed, because it was overzealously checking for one exact feature
version only (0b0010). Trying to fix that (patch 06/11) led me into the
rabbit hole of userland tool compilation, which triggered patches
01-05/11, to let me actually compile the selftests on an arm64
machine running Ubuntu 20.04. Before I actually fixed that, I tried some
other compiler and distro; patches 07 and 08 are my witnesses.
Then I got brave and tried clang: entering patches 09/11 and 10/11.
Eventually I tried to run the whole thing on that model again, and,
you guessed it, patch 11/11 concludes this apparent "2 minute job".
Eventually I can now compile the mte selftests on Ubuntu 20.04 with both
the native gcc and clang without warnings, also with some custom made
cross compiler. And they even run now!
Please have a look, also you may try to compile it on your setup, if you
feel adventurous:
$ make -C tools/testing/selftests TARGETS=arm64 ARM64_SUBTARGETS=mte
Cheers,
Andre
Andre Przywara (11):
kselftest/arm64: mte: Fix compilation with native compiler
kselftest/arm64: mte: Fix pthread linking
kselftest/arm64: mte: ksm_options: Fix fscanf warning
kselftest/arm64: mte: user_mem: Fix write() warning
kselftest/arm64: mte: common: Fix write() warnings
kselftest/arm64: mte: Fix MTE feature detection
kselftest/arm64: mte: Use cross-compiler if specified
kselftest/arm64: mte: Output warning about failing compiler
kselftest/arm64: mte: Makefile: Fix clang compilation
kselftest/arm64: mte: Fix clang warning
kselftest/arm64: mte: Report filename on failing temp file creation
tools/testing/selftests/arm64/mte/Makefile | 15 +++++--
.../selftests/arm64/mte/check_ksm_options.c | 5 ++-
.../selftests/arm64/mte/check_user_mem.c | 3 +-
.../selftests/arm64/mte/mte_common_util.c | 39 +++++++++++--------
4 files changed, 39 insertions(+), 23 deletions(-)
--
2.17.5
Hi Linus,
Please pull the following KUnit fixes update for Linux 5.12-rc5.
This KUnit update for Linux 5.12-rc5 consists of two fixes to kunit
tool from David Gow.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit a38fd8748464831584a19438cbb3082b5a2dab15:
Linux 5.12-rc2 (2021-03-05 17:33:41 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-kunit-fixes-5.12-rc5.1
for you to fetch changes up to 7fd53f41f771d250eb08db08650940f017e37c26:
kunit: tool: Disable PAGE_POISONING under --alltests (2021-03-11
14:37:37 -0700)
----------------------------------------------------------------
linux-kselftest-kunit-fixes-5.12-rc5.1
This KUnit update for Linux 5.12-rc5 consists of two fixes to kunit
tool from David Gow.
----------------------------------------------------------------
David Gow (2):
kunit: tool: Fix a python tuple typing error
kunit: tool: Disable PAGE_POISONING under --alltests
tools/testing/kunit/configs/broken_on_uml.config | 2 ++
tools/testing/kunit/kunit_config.py | 2 +-
2 files changed, 3 insertions(+), 1 deletion(-)
----------------------------------------------------------------
sched.h has been included at line 33.
So we remove the duplicate one at line 36.
Signed-off-by: Wan Jiabing <wanjiabing(a)vivo.com>
---
tools/testing/selftests/powerpc/mm/tlbie_test.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/tools/testing/selftests/powerpc/mm/tlbie_test.c b/tools/testing/selftests/powerpc/mm/tlbie_test.c
index f85a0938ab25..48344a74b212 100644
--- a/tools/testing/selftests/powerpc/mm/tlbie_test.c
+++ b/tools/testing/selftests/powerpc/mm/tlbie_test.c
@@ -33,7 +33,6 @@
#include <sched.h>
#include <time.h>
#include <stdarg.h>
-#include <sched.h>
#include <pthread.h>
#include <signal.h>
#include <sys/prctl.h>
--
2.25.1
inttypes.h has been included at line 19.
So we remove the duplicate one at line 23.
Signed-off-by: Wan Jiabing <wanjiabing(a)vivo.com>
---
tools/testing/selftests/powerpc/tm/tm-poison.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/tools/testing/selftests/powerpc/tm/tm-poison.c b/tools/testing/selftests/powerpc/tm/tm-poison.c
index 29e5f26af7b9..27c083a03d1f 100644
--- a/tools/testing/selftests/powerpc/tm/tm-poison.c
+++ b/tools/testing/selftests/powerpc/tm/tm-poison.c
@@ -20,7 +20,6 @@
#include <sched.h>
#include <sys/types.h>
#include <signal.h>
-#include <inttypes.h>
#include "tm.h"
--
2.25.1
pthread.h has been included at line 17.
So we remove the duplicate one at line 20.
Signed-off-by: Wan Jiabing <wanjiabing(a)vivo.com>
---
tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c b/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c
index e2a0c07e8362..9ef37a9836ac 100644
--- a/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c
+++ b/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c
@@ -17,7 +17,6 @@
#include <pthread.h>
#include <sys/mman.h>
#include <unistd.h>
-#include <pthread.h>
#include "tm.h"
#include "utils.h"
--
2.25.1
The patch itself is straightforward thanks to the infrastructure that is
already in-place.
The tests follows the other '*_map_batch_ops' tests with minor tweaks.
v1 -> v2:
Fixes for checkpatch warnings
Pedro Tammela (2):
bpf: add support for batched operations in LPM trie maps
bpf: selftests: add tests for batched ops in LPM trie maps
kernel/bpf/lpm_trie.c | 3 +
.../map_tests/lpm_trie_map_batch_ops.c (new) | 158 ++++++++++++++++++
2 files changed, 161 insertions(+)
create mode 100644 tools/testing/selftests/bpf/map_tests/lpm_trie_map_batch_ops.c
--
2.25.1
The "First Fault Register" (FFR) is an SVE register that mimics a
predicate register, but clears bits when a load or store fails to handle
an element of a vector. The supposed usage scenario is to initialise
this register (using SETFFR), then *read* it later on to learn about
elements that failed to load or store. Explicit writes to this register
using the WRFFR instruction are only supposed to *restore* values
previously read from the register (for context-switching only).
As the manual describes, this register holds only certain values, it:
"... contains a monotonic predicate value, in which starting from bit 0
there are zero or more 1 bits, followed only by 0 bits in any remaining
bit positions."
Any other value is UNPREDICTABLE and is not supposed to be "restored"
into the register.
The SVE test currently tries to write a signature pattern into the
register, which is *not* a canonical FFR value. Apparently the existing
setups treat UNPREDICTABLE as "read-as-written", but a new
implementation actually only stores canonical values. As a consequence,
the sve-test fails immediately when comparing the FFR value:
-----------
# ./sve-test
Vector length: 128 bits
PID: 207
Mismatch: PID=207, iteration=0, reg=48
Expected [cf00]
Got [0f00]
Aborted
-----------
Fix this by only populating the FFR with proper canonical values.
Effectively the requirement described above limits us to 17 unique
values over 16 bits worth of FFR, so we condense our signature down to 4
bits (2 bits from the PID, 2 bits from the generation) and generate the
canonical pattern from it. Any bits describing elements above the
minimum 128 bit are set to 0.
This aligns the FFR usage to the architecture and fixes the test on
microarchitectures implementing FFR in a more restricted way.
Signed-off-by: Andre Przywara <andre.przywara(a)arm.com>
---
tools/testing/selftests/arm64/fp/sve-test.S | 22 ++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/arm64/fp/sve-test.S b/tools/testing/selftests/arm64/fp/sve-test.S
index 9210691aa998..e3e08d9c7020 100644
--- a/tools/testing/selftests/arm64/fp/sve-test.S
+++ b/tools/testing/selftests/arm64/fp/sve-test.S
@@ -284,16 +284,28 @@ endfunction
// Set up test pattern in the FFR
// x0: pid
// x2: generation
+//
+// We need to generate a canonical FFR value, which consists of a number of
+// low "1" bits, followed by a number of zeros. This gives us 17 unique values
+// per 16 bits of FFR, so we create a 4 bit signature out of the PID and
+// generation, and use that as the initial number of ones in the pattern.
+// We fill the upper lanes of FFR with zeros.
// Beware: corrupts P0.
function setup_ffr
mov x4, x30
- bl pattern
+ and w0, w0, #0x3
+ bfi w0, w2, #2, #2
+ mov w1, #1
+ lsl w1, w1, w0
+ sub w1, w1, #1
+
ldr x0, =ffrref
- ldr x1, =scratch
- rdvl x2, #1
- lsr x2, x2, #3
- bl memcpy
+ strh w1, [x0], 2
+ rdvl x1, #1
+ lsr x1, x1, #3
+ sub x1, x1, #2
+ bl memclr
mov x0, #0
ldr x1, =ffrref
--
2.25.1
A kernel module + userspace driver to estimate the wakeup latency
caused by going into stop states. The motivation behind this program is
to find significant deviations behind advertised latency and residency
values.
The patchset measures latencies for two kinds of events. IPIs and Timers
As this is a software-only mechanism, there will additional latencies of
the kernel-firmware-hardware interactions. To account for that, the
program also measures a baseline latency on a 100 percent loaded CPU
and the latencies achieved must be in view relative to that.
To achieve this, we introduce a kernel module and expose its control
knobs through the debugfs interface that the selftests can engage with.
The kernel module provides the following interfaces within
/sys/kernel/debug/latency_test/ for,
IPI test:
ipi_cpu_dest = Destination CPU for the IPI
ipi_cpu_src = Origin of the IPI
ipi_latency_ns = Measured latency time in ns
Timeout test:
timeout_cpu_src = CPU on which the timer to be queued
timeout_expected_ns = Timer duration
timeout_diff_ns = Difference of actual duration vs expected timer
Sample output on a POWER9 system is as follows:
# --IPI Latency Test---
# Baseline Average IPI latency(ns): 3114
# Observed Average IPI latency(ns) - State0: 3265
# Observed Average IPI latency(ns) - State1: 3507
# Observed Average IPI latency(ns) - State2: 3739
# Observed Average IPI latency(ns) - State3: 3807
# Observed Average IPI latency(ns) - State4: 17070
# Observed Average IPI latency(ns) - State5: 1038174
# Observed Average IPI latency(ns) - State6: 1068784
#
# --Timeout Latency Test--
# Baseline Average timeout diff(ns): 1420
# Observed Average timeout diff(ns) - State0: 1640
# Observed Average timeout diff(ns) - State1: 1764
# Observed Average timeout diff(ns) - State2: 1715
# Observed Average timeout diff(ns) - State3: 1845
# Observed Average timeout diff(ns) - State4: 16581
# Observed Average timeout diff(ns) - State5: 939977
# Observed Average timeout diff(ns) - State6: 1073024
Things to keep in mind:
1. This kernel module + bash driver does not guarantee idleness on a
core when the IPI and the Timer is armed. It only invokes sleep and
hopes that the core is idle once the IPI/Timer is invoked onto it.
Hence this program must be run on a completely idle system for best
results
2. Even on a completely idle system, there maybe book-keeping tasks or
jitter tasks that can run on the core we want idle. This can create
outliers in the latency measurement. Thankfully, these outliers
should be large enough to easily weed them out.
3. A userspace only selftest variant was also sent out as RFC based on
suggestions over the previous patchset to simply the kernel
complexeity. However, a userspace only approach had more noise in
the latency measurement due to userspace-kernel interactions
which led to run to run variance and a lesser accurate test.
Another downside of the nature of a userspace program is that it
takes orders of magnitude longer to complete a full system test
compared to the kernel framework.
RFC patch: https://lkml.org/lkml/2020/9/2/356
4. For Intel Systems, the Timer based latencies don't exactly give out
the measure of idle latencies. This is because of a hardware
optimization mechanism that pre-arms a CPU when a timer is set to
wakeup. That doesn't make this metric useless for Intel systems,
it just means that is measuring IPI/Timer responding latency rather
than idle wakeup latencies.
(Source: https://lkml.org/lkml/2020/9/2/610)
For solution to this problem, a hardware based latency analyzer is
devised by Artem Bityutskiy from Intel.
https://youtu.be/Opk92aQyvt0?t=8266https://intel.github.io/wult/
Pratik Rajesh Sampat (2):
cpuidle: Extract IPI based and timer based wakeup latency from idle
states
selftest/cpuidle: Add support for cpuidle latency measurement
drivers/cpuidle/Makefile | 1 +
drivers/cpuidle/test-cpuidle_latency.c | 157 ++++++++++
lib/Kconfig.debug | 10 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/cpuidle/Makefile | 6 +
tools/testing/selftests/cpuidle/cpuidle.sh | 316 +++++++++++++++++++++
tools/testing/selftests/cpuidle/settings | 2 +
7 files changed, 493 insertions(+)
create mode 100644 drivers/cpuidle/test-cpuidle_latency.c
create mode 100644 tools/testing/selftests/cpuidle/Makefile
create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh
create mode 100644 tools/testing/selftests/cpuidle/settings
--
2.17.1
Hi,
This v4 series can mainly include two parts.
Based on kvm queue branch: https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=queue
Links of v1: https://lore.kernel.org/lkml/20210208090841.333724-1-wangyanan55@huawei.com/
Links of v2: https://lore.kernel.org/lkml/20210225055940.18748-1-wangyanan55@huawei.com/
Links of v3: https://lore.kernel.org/lkml/20210301065916.11484-1-wangyanan55@huawei.com/
In the first part, all the known hugetlb backing src types specified
with different hugepage sizes are listed, so that we can specify use
of hugetlb source of the exact granularity that we want, instead of
the system default ones. And as all the known hugetlb page sizes are
listed, it's appropriate for all architectures. Besides, a helper that
can get granularity of different backing src types(anonumous/thp/hugetlb)
is added, so that we can use the accurate backing src granularity for
kinds of alignment or guest memory accessing of vcpus.
In the second part, a new test is added:
This test is added to serve as a performance tester and a bug reproducer
for kvm page table code (GPA->HPA mappings), it gives guidance for the
people trying to make some improvement for kvm. And the following explains
what we can exactly do through this test.
The function guest_code() can cover the conditions where a single vcpu or
multiple vcpus access guest pages within the same memory region, in three
VM stages(before dirty logging, during dirty logging, after dirty logging).
Besides, the backing src memory type(ANONYMOUS/THP/HUGETLB) of the tested
memory region can be specified by users, which means normal page mappings
or block mappings can be chosen by users to be created in the test.
If ANONYMOUS memory is specified, kvm will create normal page mappings
for the tested memory region before dirty logging, and update attributes
of the page mappings from RO to RW during dirty logging. If THP/HUGETLB
memory is specified, kvm will create block mappings for the tested memory
region before dirty logging, and split the blcok mappings into normal page
mappings during dirty logging, and coalesce the page mappings back into
block mappings after dirty logging is stopped.
So in summary, as a performance tester, this test can present the
performance of kvm creating/updating normal page mappings, or the
performance of kvm creating/splitting/recovering block mappings,
through execution time.
When we need to coalesce the page mappings back to block mappings after
dirty logging is stopped, we have to firstly invalidate *all* the TLB
entries for the page mappings right before installation of the block entry,
because a TLB conflict abort error could occur if we can't invalidate the
TLB entries fully. We have hit this TLB conflict twice on aarch64 software
implementation and fixed it. As this test can imulate process from dirty
logging enabled to dirty logging stopped of a VM with block mappings,
so it can also reproduce this TLB conflict abort due to inadequate TLB
invalidation when coalescing tables.
Links about the TLB conflict abort:
https://lore.kernel.org/lkml/20201201201034.116760-3-wangyanan55@huawei.com/
---
Change logs:
v3->v4:
- Add a helper to get system default hugetlb page size
- Add tags of Reviewed-by of Ben in the patches
v2->v3:
- Add tags of Suggested-by, Reviewed-by in the patches
- Add a generic micro to get hugetlb page sizes
- Some changes for suggestions about v2 series
v1->v2:
- Add a patch to sync header files
- Add helpers to get granularity of different backing src types
- Some changes for suggestions about v1 series
---
Yanan Wang (9):
tools headers: sync headers of asm-generic/hugetlb_encode.h
tools headers: Add a macro to get HUGETLB page sizes for mmap
KVM: selftests: Use flag CLOCK_MONOTONIC_RAW for timing
KVM: selftests: Make a generic helper to get vm guest mode strings
KVM: selftests: Add a helper to get system configured THP page size
KVM: selftests: Add a helper to get system default hugetlb page size
KVM: selftests: List all hugetlb src types specified with page sizes
KVM: selftests: Adapt vm_userspace_mem_region_add to new helpers
KVM: selftests: Add a test for kvm page table code
include/uapi/linux/mman.h | 2 +
tools/include/asm-generic/hugetlb_encode.h | 3 +
tools/include/uapi/linux/mman.h | 2 +
tools/testing/selftests/kvm/Makefile | 3 +
.../selftests/kvm/demand_paging_test.c | 8 +-
.../selftests/kvm/dirty_log_perf_test.c | 14 +-
.../testing/selftests/kvm/include/kvm_util.h | 4 +-
.../testing/selftests/kvm/include/test_util.h | 21 +-
.../selftests/kvm/kvm_page_table_test.c | 476 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 59 ++-
tools/testing/selftests/kvm/lib/test_util.c | 122 ++++-
tools/testing/selftests/kvm/steal_time.c | 4 +-
12 files changed, 659 insertions(+), 59 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_page_table_test.c
--
2.19.1
The patch itself is straightforward thanks to the infrastructure that is
already in-place.
The tests follows the other '*_map_batch_ops' tests with minor tweaks.
Pedro Tammela (2):
bpf: add support for batched operations in LPM trie maps
bpf: selftests: add tests for batched ops in LPM trie maps
kernel/bpf/lpm_trie.c | 3 +
.../map_tests/lpm_trie_map_batch_ops.c (new) | 158 ++++++++++++++++++
2 files changed, 161 insertions(+)
create mode 100644 tools/testing/selftests/bpf/map_tests/lpm_trie_map_batch_ops.c
--
2.25.1
s/verfied/verified/
Signed-off-by: Bhaskar Chowdhury <unixbhaskar(a)gmail.com>
---
tools/testing/selftests/net/forwarding/fib_offload_lib.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/forwarding/fib_offload_lib.sh b/tools/testing/selftests/net/forwarding/fib_offload_lib.sh
index 66496659bea7..e134a5f529c9 100644
--- a/tools/testing/selftests/net/forwarding/fib_offload_lib.sh
+++ b/tools/testing/selftests/net/forwarding/fib_offload_lib.sh
@@ -224,7 +224,7 @@ fib_ipv4_plen_test()
ip -n $ns link set dev dummy1 up
# Add two routes with the same key and different prefix length and
- # make sure both are in hardware. It can be verfied that both are
+ # make sure both are in hardware. It can be verified that both are
# sharing the same leaf by checking the /proc/net/fib_trie
ip -n $ns route add 192.0.2.0/24 dev dummy1
ip -n $ns route add 192.0.2.0/25 dev dummy1
--
2.26.2
On Tue, Mar 16, 2021 at 09:42:50PM +0100, Mickaël Salaün wrote:
> From: Mickaël Salaün <mic(a)linux.microsoft.com>
>
> Test all Landlock system calls, ptrace hooks semantic and filesystem
> access-control with multiple layouts.
>
> Test coverage for security/landlock/ is 93.6% of lines. The code not
> covered only deals with internal kernel errors (e.g. memory allocation)
> and race conditions.
>
> Cc: James Morris <jmorris(a)namei.org>
> Cc: Jann Horn <jannh(a)google.com>
> Cc: Kees Cook <keescook(a)chromium.org>
> Cc: Serge E. Hallyn <serge(a)hallyn.com>
> Cc: Shuah Khan <shuah(a)kernel.org>
> Signed-off-by: Mickaël Salaün <mic(a)linux.microsoft.com>
> Reviewed-by: Vincent Dagonneau <vincent.dagonneau(a)ssi.gouv.fr>
> Link: https://lore.kernel.org/r/20210316204252.427806-11-mic@digikod.net
This is terrific. I love the coverage. How did you measure this, BTW?
To increase it into memory allocation failures, have you tried
allocation fault injection:
https://www.kernel.org/doc/html/latest/fault-injection/fault-injection.html
> [...]
> +TEST(inconsistent_attr) {
> + const long page_size = sysconf(_SC_PAGESIZE);
> + char *const buf = malloc(page_size + 1);
> + struct landlock_ruleset_attr *const ruleset_attr = (void *)buf;
> +
> + ASSERT_NE(NULL, buf);
> +
> + /* Checks copy_from_user(). */
> + ASSERT_EQ(-1, landlock_create_ruleset(ruleset_attr, 0, 0));
> + /* The size if less than sizeof(struct landlock_attr_enforce). */
> + ASSERT_EQ(EINVAL, errno);
> + ASSERT_EQ(-1, landlock_create_ruleset(ruleset_attr, 1, 0));
> + ASSERT_EQ(EINVAL, errno);
Almost everywhere you're using ASSERT instead of EXPECT. Is this correct
(in the sense than as soon as an ASSERT fails the rest of the test is
skipped)? I do see you using EXPECT is some places, but I figured I'd
ask about the intention here.
> +/*
> + * TEST_F_FORK() is useful when a test drop privileges but the corresponding
> + * FIXTURE_TEARDOWN() requires them (e.g. to remove files from a directory
> + * where write actions are denied). For convenience, FIXTURE_TEARDOWN() is
> + * also called when the test failed, but not when FIXTURE_SETUP() failed. For
> + * this to be possible, we must not call abort() but instead exit smoothly
> + * (hence the step print).
> + */
Hm, interesting. I think this should be extracted into a separate patch
and added to the test harness proper.
Could this be solved with TEARDOWN being called on SETUP failure?
> +#define TEST_F_FORK(fixture_name, test_name) \
> + static void fixture_name##_##test_name##_child( \
> + struct __test_metadata *_metadata, \
> + FIXTURE_DATA(fixture_name) *self, \
> + const FIXTURE_VARIANT(fixture_name) *variant); \
> + TEST_F(fixture_name, test_name) \
> + { \
> + int status; \
> + const pid_t child = fork(); \
> + if (child < 0) \
> + abort(); \
> + if (child == 0) { \
> + _metadata->no_print = 1; \
> + fixture_name##_##test_name##_child(_metadata, self, variant); \
> + if (_metadata->skip) \
> + _exit(255); \
> + if (_metadata->passed) \
> + _exit(0); \
> + _exit(_metadata->step); \
> + } \
> + if (child != waitpid(child, &status, 0)) \
> + abort(); \
> + if (WIFSIGNALED(status) || !WIFEXITED(status)) { \
> + _metadata->passed = 0; \
> + _metadata->step = 1; \
> + return; \
> + } \
> + switch (WEXITSTATUS(status)) { \
> + case 0: \
> + _metadata->passed = 1; \
> + break; \
> + case 255: \
> + _metadata->passed = 1; \
> + _metadata->skip = 1; \
> + break; \
> + default: \
> + _metadata->passed = 0; \
> + _metadata->step = WEXITSTATUS(status); \
> + break; \
> + } \
> + } \
This looks like a subset of __wait_for_test()? Could __TEST_F_IMPL() be
updated instead to do this? (Though the fork overhead might not be great
for everyone.)
--
Kees Cook
From: Dave Hansen <dave.hansen(a)linux.intel.com>
The SGX device file (/dev/sgx_enclave) is unusual in that it requires
execute permissions. It has to be both "chmod +x" *and* be on a
filesystem without 'noexec'.
In the future, udev and systemd should get updates to set up systems
automatically. But, for now, nobody's systems do this automatically,
and everybody gets error messages like this when running ./test_sgx:
0x0000000000000000 0x0000000000002000 0x03
0x0000000000002000 0x0000000000001000 0x05
0x0000000000003000 0x0000000000003000 0x03
mmap() failed, errno=1.
That isn't very user friendly, even for forgetful kernel developers.
Further, the test case is rather haphazard about its use of fprintf()
versus perror().
Improve the error messages. Use perror() where possible. Lastly,
do some sanity checks on opening and mmap()ing the device file so
that we can get a decent error message out to the user.
Now, if your user doesn't have permission, you'll get the following:
$ ls -l /dev/sgx_enclave
crw------- 1 root root 10, 126 Mar 18 11:29 /dev/sgx_enclave
$ ./test_sgx
Unable to open /dev/sgx_enclave: Permission denied
If you then 'chown dave:dave /dev/sgx_enclave' (or whatever), but
you leave execute permissions off, you'll get:
$ ls -l /dev/sgx_enclave
crw------- 1 dave dave 10, 126 Mar 18 11:29 /dev/sgx_enclave
$ ./test_sgx
no execute permissions on device file
If you fix that with "chmod ug+x /dev/sgx" but you leave /dev as
noexec, you'll get this:
$ mount | grep "/dev .*noexec"
udev on /dev type devtmpfs (rw,nosuid,noexec,...)
$ ./test_sgx
ERROR: mmap for exec: Operation not permitted
mmap() succeeded for PROT_READ, but failed for PROT_EXEC
check that user has execute permissions on /dev/sgx_enclave and
that /dev does not have noexec set: 'mount | grep "/dev .*noexec"'
That can be fixed with:
mount -o remount,noexec /devESC
Hopefully, the combination of better error messages and the search
engines indexing this message will help people fix their systems
until we do this properly.
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Jarkko Sakkinen <jarkko(a)kernel.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: x86(a)kernel.org
Cc: linux-sgx(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
---
b/tools/testing/selftests/sgx/load.c | 66 +++++++++++++++++++++++++++--------
b/tools/testing/selftests/sgx/main.c | 2 -
2 files changed, 53 insertions(+), 15 deletions(-)
diff -puN tools/testing/selftests/sgx/load.c~sgx-selftest-err-rework tools/testing/selftests/sgx/load.c
--- a/tools/testing/selftests/sgx/load.c~sgx-selftest-err-rework 2021-03-18 12:18:38.649828215 -0700
+++ b/tools/testing/selftests/sgx/load.c 2021-03-18 12:40:46.388824904 -0700
@@ -45,19 +45,19 @@ static bool encl_map_bin(const char *pat
fd = open(path, O_RDONLY);
if (fd == -1) {
- perror("open()");
+ perror("enclave executable open()");
return false;
}
ret = stat(path, &sb);
if (ret) {
- perror("stat()");
+ perror("enclave executable stat()");
goto err;
}
bin = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if (bin == MAP_FAILED) {
- perror("mmap()");
+ perror("enclave executable mmap()");
goto err;
}
@@ -90,8 +90,7 @@ static bool encl_ioc_create(struct encl
ioc.src = (unsigned long)secs;
rc = ioctl(encl->fd, SGX_IOC_ENCLAVE_CREATE, &ioc);
if (rc) {
- fprintf(stderr, "SGX_IOC_ENCLAVE_CREATE failed: errno=%d\n",
- errno);
+ perror("SGX_IOC_ENCLAVE_CREATE failed");
munmap((void *)secs->base, encl->encl_size);
return false;
}
@@ -116,31 +115,69 @@ static bool encl_ioc_add_pages(struct en
rc = ioctl(encl->fd, SGX_IOC_ENCLAVE_ADD_PAGES, &ioc);
if (rc < 0) {
- fprintf(stderr, "SGX_IOC_ENCLAVE_ADD_PAGES failed: errno=%d.\n",
- errno);
+ perror("SGX_IOC_ENCLAVE_ADD_PAGES failed");
return false;
}
return true;
}
+
+
bool encl_load(const char *path, struct encl *encl)
{
+ const char device_path[] = "/dev/sgx_enclave";
Elf64_Phdr *phdr_tbl;
off_t src_offset;
Elf64_Ehdr *ehdr;
+ struct stat sb;
+ void *ptr;
int i, j;
int ret;
+ int fd = -1;
memset(encl, 0, sizeof(*encl));
- ret = open("/dev/sgx_enclave", O_RDWR);
- if (ret < 0) {
- fprintf(stderr, "Unable to open /dev/sgx_enclave\n");
+ fd = open(device_path, O_RDWR);
+ if (fd < 0) {
+ perror("Unable to open /dev/sgx_enclave");
+ goto err;
+ }
+
+ ret = stat(device_path, &sb);
+ if (ret) {
+ perror("device file stat()");
+ goto err;
+ }
+
+ /*
+ * This just checks if the /dev file has these permission
+ * bits set. It does not check that the current user is
+ * the owner or in the owning group.
+ */
+ if (!(sb.st_mode & (S_IXUSR | S_IXGRP | S_IXOTH))) {
+ fprintf(stderr, "no execute permissions on device file\n");
+ goto err;
+ }
+
+ ptr = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_SHARED, fd, 0);
+ if (ptr == (void *)-1) {
+ perror("mmap for read");
+ goto err;
+ }
+ munmap(ptr, PAGE_SIZE);
+
+ ptr = mmap(NULL, PAGE_SIZE, PROT_EXEC, MAP_SHARED, fd, 0);
+ if (ptr == (void *)-1) {
+ perror("ERROR: mmap for exec");
+ fprintf(stderr, "mmap() succeeded for PROT_READ, but failed for PROT_EXEC\n");
+ fprintf(stderr, "check that user has execute permissions on %s and\n", device_path);
+ fprintf(stderr, "that /dev does not have noexec set: 'mount | grep \"/dev .*noexec\"'\n");
goto err;
}
+ munmap(ptr, PAGE_SIZE);
- encl->fd = ret;
+ encl->fd = fd;
if (!encl_map_bin(path, encl))
goto err;
@@ -217,6 +254,8 @@ bool encl_load(const char *path, struct
return true;
err:
+ if (fd != -1)
+ close(fd);
encl_delete(encl);
return false;
}
@@ -229,7 +268,7 @@ static bool encl_map_area(struct encl *e
area = mmap(NULL, encl_size * 2, PROT_NONE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (area == MAP_FAILED) {
- perror("mmap");
+ perror("reservation mmap()");
return false;
}
@@ -268,8 +307,7 @@ bool encl_build(struct encl *encl)
ioc.sigstruct = (uint64_t)&encl->sigstruct;
ret = ioctl(encl->fd, SGX_IOC_ENCLAVE_INIT, &ioc);
if (ret) {
- fprintf(stderr, "SGX_IOC_ENCLAVE_INIT failed: errno=%d\n",
- errno);
+ perror("SGX_IOC_ENCLAVE_INIT failed");
return false;
}
diff -puN tools/testing/selftests/sgx/main.c~sgx-selftest-err-rework tools/testing/selftests/sgx/main.c
--- a/tools/testing/selftests/sgx/main.c~sgx-selftest-err-rework 2021-03-18 12:18:38.652828215 -0700
+++ b/tools/testing/selftests/sgx/main.c 2021-03-18 12:18:38.657828215 -0700
@@ -195,7 +195,7 @@ int main(int argc, char *argv[], char *e
addr = mmap((void *)encl.encl_base + seg->offset, seg->size,
seg->prot, MAP_SHARED | MAP_FIXED, encl.fd, 0);
if (addr == MAP_FAILED) {
- fprintf(stderr, "mmap() failed, errno=%d.\n", errno);
+ perror("mmap() segment failed");
exit(KSFT_FAIL);
}
}
_
From: Mark Brown <broonie(a)kernel.org>
[ Upstream commit 07e644885bf6727a48db109fad053cb43f3c9859 ]
We track if sve-ptrace encountered a failure in a variable but don't
actually use that value when we exit the program, do so.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Link: https://lore.kernel.org/r/20210309190304.39169-1-broonie@kernel.org
Signed-off-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/arm64/fp/sve-ptrace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/fp/sve-ptrace.c b/tools/testing/selftests/arm64/fp/sve-ptrace.c
index b2282be6f938..612d3899614a 100644
--- a/tools/testing/selftests/arm64/fp/sve-ptrace.c
+++ b/tools/testing/selftests/arm64/fp/sve-ptrace.c
@@ -332,5 +332,5 @@ int main(void)
ksft_print_cnts();
- return 0;
+ return ret;
}
--
2.30.1
From: Mark Brown <broonie(a)kernel.org>
[ Upstream commit 07e644885bf6727a48db109fad053cb43f3c9859 ]
We track if sve-ptrace encountered a failure in a variable but don't
actually use that value when we exit the program, do so.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Link: https://lore.kernel.org/r/20210309190304.39169-1-broonie@kernel.org
Signed-off-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/arm64/fp/sve-ptrace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/fp/sve-ptrace.c b/tools/testing/selftests/arm64/fp/sve-ptrace.c
index b2282be6f938..612d3899614a 100644
--- a/tools/testing/selftests/arm64/fp/sve-ptrace.c
+++ b/tools/testing/selftests/arm64/fp/sve-ptrace.c
@@ -332,5 +332,5 @@ int main(void)
ksft_print_cnts();
- return 0;
+ return ret;
}
--
2.30.1
This patchset extends IOCTL interface to retrieve KVM statistics data
in aggregated binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for userspace telemetry applications to pull the
statistics data periodically for large scale systems.
The capability is indicated by KVM_CAP_STATS_BINARY_FORM.
Ioctl KVM_STATS_GET_INFO is used to get the information about VM or
vCPU statistics data (The number of supported statistics data which is
used for buffer allocation).
Ioctl KVM_STATS_GET_NAMES is used to get the list of name strings of
all supported statistics data.
Ioctl KVM_STATS_GET_DATA is used to get the aggregated statistics data
per VM or vCPU in the same order as the list of name strings. This is
the ioctl which would be called periodically to retrieve statistics
data per VM or vCPU.
Jing Zhang (4):
KVM: stats: Separate statistics name strings from debugfs code
KVM: stats: Define APIs for aggregated stats retrieval in binary
format
KVM: stats: Add ioctl commands to pull statistics in binary format
KVM: selftests: Add selftest for KVM binary form statistics interface
Documentation/virt/kvm/api.rst | 79 +++++
arch/arm64/kvm/guest.c | 47 ++-
arch/mips/kvm/mips.c | 114 +++++--
arch/powerpc/kvm/book3s.c | 107 ++++--
arch/powerpc/kvm/booke.c | 84 +++--
arch/s390/kvm/kvm-s390.c | 320 ++++++++++++------
arch/x86/kvm/x86.c | 127 ++++---
include/linux/kvm_host.h | 30 +-
include/uapi/linux/kvm.h | 60 ++++
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 3 +
.../selftests/kvm/kvm_bin_form_stats.c | 89 +++++
virt/kvm/kvm_main.c | 115 +++++++
13 files changed, 935 insertions(+), 241 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c
base-commit: 357ad203d45c0f9d76a8feadbd5a1c5d460c638b
--
2.30.1.766.gb4fecdf3b7-goog
Attacks against vulnerable userspace applications with the purpose to break
ASLR or bypass canaries traditionally use some level of brute force with
the help of the fork system call. This is possible since when creating a
new process using fork its memory contents are the same as those of the
parent process (the process that called the fork system call). So, the
attacker can test the memory infinite times to find the correct memory
values or the correct memory addresses without worrying about crashing the
application.
Based on the above scenario it would be nice to have this detected and
mitigated, and this is the goal of this patch serie. Specifically the
following attacks are expected to be detected:
1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
desirable memory layout is got (e.g. Stack Clash).
2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly
until a desirable memory layout is got (e.g. what CTFs do for simple
network service).
3.- Launching processes without exec() (e.g. Android Zygote) and exposing
state to attack a sibling.
4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until
the previously shared memory layout of all the other children is
exposed (e.g. kind of related to HeartBleed).
In each case, a privilege boundary has been crossed:
Case 1: setuid/setgid process
Case 2: network to local
Case 3: privilege changes
Case 4: network to local
So, what will really be detected are fork/exec brute force attacks that
cross any of the commented bounds.
The implementation details and comparison against other existing
implementations can be found in the "Documentation" patch.
This v5 version has changed a lot from the v2. Basically the application
crash period is now compute on an on-going basis using an exponential
moving average (EMA), a detection of a brute force attack through the
"execve" system call has been added and the crossing of the commented
privilege bounds are taken into account. Also, the fine tune has also been
removed and now, all this kind of attacks are detected without
administrator intervention.
In the v2 version Kees Cook suggested to study if the statistical data
shared by all the fork hierarchy processes can be tracked in some other
way. Specifically the question was if this info can be hold by the family
hierarchy of the mm struct. After studying this hierarchy I think it is not
suitable for the Brute LSM since they are totally copied on fork() and in
this case we want that they are shared. So I leave this road.
So, knowing all this information I will explain now the different patches:
The 1/8 patch defines a new LSM hook to get the fatal signal of a task.
This will be useful during the attack detection phase.
The 2/8 patch defines a new LSM and manages the statistical data shared by
all the fork hierarchy processes.
The 3/8 patch detects a fork/exec brute force attack.
The 4/8 patch narrows the detection taken into account the privilege
boundary crossing.
The 5/8 patch mitigates a brute force attack.
The 6/8 patch adds self-tests to validate the Brute LSM expectations.
The 7/8 patch adds the documentation to explain this implementation.
The 8/8 patch updates the maintainers file.
This patch serie is a task of the KSPP [1] and can also be accessed from my
github tree [2] in the "brute_v4" branch.
[1] https://github.com/KSPP/linux/issues/39
[2] https://github.com/johwood/linux/
The previous versions can be found in:
RFC
https://lore.kernel.org/kernel-hardening/20200910202107.3799376-1-keescook@…
Version 2
https://lore.kernel.org/kernel-hardening/20201025134540.3770-1-john.wood@gm…
Version 3
https://lore.kernel.org/lkml/20210221154919.68050-1-john.wood@gmx.com/
Version 4
https://lore.kernel.org/lkml/20210227150956.6022-1-john.wood@gmx.com/
Changelog RFC -> v2
-------------------
- Rename this feature with a more suitable name (Jann Horn, Kees Cook).
- Convert the code to an LSM (Kees Cook).
- Add locking to avoid data races (Jann Horn).
- Add a new LSM hook to get the fatal signal of a task (Jann Horn, Kees
Cook).
- Add the last crashes timestamps list to avoid false positives in the
attack detection (Jann Horn).
- Use "period" instead of "rate" (Jann Horn).
- Other minor changes suggested (Jann Horn, Kees Cook).
Changelog v2 -> v3
------------------
- Compute the application crash period on an on-going basis (Kees Cook).
- Detect a brute force attack through the execve system call (Kees Cook).
- Detect an slow brute force attack (Randy Dunlap).
- Fine tuning the detection taken into account privilege boundary crossing
(Kees Cook).
- Taken into account only fatal signals delivered by the kernel (Kees
Cook).
- Remove the sysctl attributes to fine tuning the detection (Kees Cook).
- Remove the prctls to allow per process enabling/disabling (Kees Cook).
- Improve the documentation (Kees Cook).
- Fix some typos in the documentation (Randy Dunlap).
- Add self-test to validate the expectations (Kees Cook).
Changelog v3 -> v4
------------------
- Fix all the warnings shown by the tool "scripts/kernel-doc" (Randy
Dunlap).
Changelog v4 -> v5
------------------
- Fix some typos (Randy Dunlap).
Any constructive comments are welcome.
Thanks.
John Wood (8):
security: Add LSM hook at the point where a task gets a fatal signal
security/brute: Define a LSM and manage statistical data
securtiy/brute: Detect a brute force attack
security/brute: Fine tuning the attack detection
security/brute: Mitigate a brute force attack
selftests/brute: Add tests for the Brute LSM
Documentation: Add documentation for the Brute LSM
MAINTAINERS: Add a new entry for the Brute LSM
Documentation/admin-guide/LSM/Brute.rst | 224 +++++
Documentation/admin-guide/LSM/index.rst | 1 +
MAINTAINERS | 7 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 4 +
kernel/signal.c | 1 +
security/Kconfig | 11 +-
security/Makefile | 4 +
security/brute/Kconfig | 13 +
security/brute/Makefile | 2 +
security/brute/brute.c | 1102 ++++++++++++++++++++++
security/security.c | 5 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/brute/.gitignore | 2 +
tools/testing/selftests/brute/Makefile | 5 +
tools/testing/selftests/brute/config | 1 +
tools/testing/selftests/brute/exec.c | 44 +
tools/testing/selftests/brute/test.c | 507 ++++++++++
tools/testing/selftests/brute/test.sh | 226 +++++
20 files changed, 2160 insertions(+), 5 deletions(-)
create mode 100644 Documentation/admin-guide/LSM/Brute.rst
create mode 100644 security/brute/Kconfig
create mode 100644 security/brute/Makefile
create mode 100644 security/brute/brute.c
create mode 100644 tools/testing/selftests/brute/.gitignore
create mode 100644 tools/testing/selftests/brute/Makefile
create mode 100644 tools/testing/selftests/brute/config
create mode 100644 tools/testing/selftests/brute/exec.c
create mode 100644 tools/testing/selftests/brute/test.c
create mode 100755 tools/testing/selftests/brute/test.sh
--
2.25.1
We track if sve-ptrace encountered a failure in a variable but don't
actually use that value when we exit the program, do so.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/fp/sve-ptrace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/fp/sve-ptrace.c b/tools/testing/selftests/arm64/fp/sve-ptrace.c
index b2282be6f938..612d3899614a 100644
--- a/tools/testing/selftests/arm64/fp/sve-ptrace.c
+++ b/tools/testing/selftests/arm64/fp/sve-ptrace.c
@@ -332,5 +332,5 @@ int main(void)
ksft_print_cnts();
- return 0;
+ return ret;
}
--
2.20.1
Base
====
This series is based on top of my series which adds minor fault handling for
hugetlbfs [1]. (And, therefore, it is based on 5.12-rc1 and Peter Xu's series
for disabling huge pmd sharing as well.)
[1] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen…
Changelog
=========
v1->v2:
- For UFFDIO_CONTINUE, don't mess with page flags. Just use find_lock_page to
get a locked page from the page cache, instead of doing __SetPageLocked.
This fixes a VM_BUG_ON v1 hit when handling minor faults for THP-backed
shmem (a tmpfs mounted with huge=always).
Overview
========
See my original series linked above for a detailed overview of minor fault
handling in general. The feature in this series works exactly like the
hugetblfs version (from userspace's perspective).
I'm sending this as a separate series because:
- The original minor fault handling series has a full set of R-Bs, and seems
close to being merged. So, it seems reasonable to start looking at this next
step, which extends the basic functionality.
- shmem is different enough that this series may require some additional work
before it's ready, and I don't want to delay the original series
unnecessarily by bundling them together.
Use Case
========
In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.
Additionally, Android folks (Lokesh Gidra <lokeshgidra(a)google.com>) hope to
optimize the Android Runtime garbage collector using this feature:
"The plan is to use userfaultfd for concurrently compacting the heap. With
this feature, the heap can be shared-mapped at another location where the
GC-thread(s) could continue the compaction operation without the need to
invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads
get faults on the heap, UFFDIO_CONTINUE can be used to resume execution.
Furthermore, this feature enables updating references in the 'non-moving'
portion of the heap efficiently. Without this feature, uneccessary page
copying (ioctl(UFFDIO_COPY)) would be required."
Axel Rasmussen (5):
userfaultfd: support minor fault handling for shmem
userfaultfd/selftests: use memfd_create for shmem test type
userfaultfd/selftests: create alias mappings in the shmem test
userfaultfd/selftests: reinitialize test context in each test
userfaultfd/selftests: exercise minor fault handling shmem support
fs/userfaultfd.c | 6 +-
include/linux/shmem_fs.h | 26 +-
include/uapi/linux/userfaultfd.h | 4 +-
mm/memory.c | 8 +-
mm/shmem.c | 92 +++----
mm/userfaultfd.c | 27 +-
tools/testing/selftests/vm/userfaultfd.c | 322 +++++++++++++++--------
7 files changed, 295 insertions(+), 190 deletions(-)
--
2.30.1.766.gb4fecdf3b7-goog
Hi,
This patch series introduces the futex2 syscalls.
* What happened to the current futex()?
For some years now, developers have been trying to add new features to
futex, but maintainers have been reluctant to accept then, given the
multiplexed interface full of legacy features and tricky to do big
changes. Some problems that people tried to address with patchsets are:
NUMA-awareness[0], smaller sized futexes[1], wait on multiple futexes[2].
NUMA, for instance, just doesn't fit the current API in a reasonable
way. Considering that, it's not possible to merge new features into the
current futex.
** The NUMA problem
At the current implementation, all futex kernel side infrastructure is
stored on a single node. Given that, all futex() calls issued by
processors that aren't located on that node will have a memory access
penalty when doing it.
** The 32bit sized futex problem
Embedded systems or anything with memory constrains would benefit of
using smaller sizes for the futex userspace integer. Also, a mutex
implementation can be done using just three values, so 8 bits is enough
for various scenarios.
** The wait on multiple problem
The use case lies in the Wine implementation of the Windows NT interface
WaitMultipleObjects. This Windows API function allows a thread to sleep
waiting on the first of a set of event sources (mutexes, timers, signal,
console input, etc) to signal. Considering this is a primitive
synchronization operation for Windows applications, being able to quickly
signal events on the producer side, and quickly go to sleep on the
consumer side is essential for good performance of those running over Wine.
[0] https://lore.kernel.org/lkml/20160505204230.932454245@linutronix.de/
[1] https://lore.kernel.org/lkml/20191221155659.3159-2-malteskarupke@web.de/
[2] https://lore.kernel.org/lkml/20200213214525.183689-1-andrealmeid@collabora.…
* The solution
As proposed by Peter Zijlstra and Florian Weimer[3], a new interface
is required to solve this, which must be designed with those features in
mind. futex2() is that interface. As opposed to the current multiplexed
interface, the new one should have one syscall per operation. This will
allow the maintainability of the API if it gets extended, and will help
users with type checking of arguments.
In particular, the new interface is extended to support the ability to
wait on any of a list of futexes at a time, which could be seen as a
vectored extension of the FUTEX_WAIT semantics.
[3] https://lore.kernel.org/lkml/20200303120050.GC2596@hirez.programming.kicks-…
* The interface
The new interface can be seen in details in the following patches, but
this is a high level summary of what the interface can do:
- Supports wake/wait semantics, as in futex()
- Supports requeue operations, similarly as FUTEX_CMP_REQUEUE, but with
individual flags for each address
- Supports waiting for a vector of futexes, using a new syscall named
futex_waitv()
- Supports variable sized futexes (8bits, 16bits and 32bits)
- Supports NUMA-awareness operations, where the user can specify on
which memory node would like to operate
* Implementation
The internal implementation follows a similar design to the original futex.
Given that we want to replicate the same external behavior of current
futex, this should be somewhat expected. For some functions, like the
init and the code to get a shared key, I literally copied code and
comments from kernel/futex.c. I decided to do so instead of exposing the
original function as a public function since in that way we can freely
modify our implementation if required, without any impact on old futex.
Also, the comments precisely describes the details and corner cases of
the implementation.
Each patch contains a brief description of implementation, but patch 6
"docs: locking: futex2: Add documentation" adds a more complete document
about it.
* The patchset
This patchset can be also found at my git tree:
https://gitlab.collabora.com/tonyk/linux/-/tree/futex2-dev
- Patch 1: Implements wait/wake, and the basics foundations of futex2
- Patches 2-4: Implement the remaining features (shared, waitv, requeue).
- Patch 5: Adds the x86_x32 ABI handling. I kept it in a separated
patch since I'm not sure if x86_x32 is still a thing, or if it should
return -ENOSYS.
- Patch 6: Add a documentation file which details the interface and
the internal implementation.
- Patches 7-13: Selftests for all operations along with perf
support for futex2.
- Patch 14: While working on porting glibc for futex2, I found out
that there's a futex_wake() call at the user thread exit path, if
that thread was created with clone(..., CLONE_CHILD_SETTID, ...). In
order to make pthreads work with futex2, it was required to add
this patch. Note that this is more a proof-of-concept of what we
will need to do in future, rather than part of the interface and
shouldn't be merged as it is.
* Testing:
This patchset provides selftests for each operation and their flags.
Along with that, the following work was done:
** Stability
To stress the interface in "real world scenarios":
- glibc[4]: nptl's low level locking was modified to use futex2 API
(except for robust and PI things). All relevant nptl/ tests passed.
- Wine[5]: Proton/Wine was modified in order to use futex2() for the
emulation of Windows NT sync mechanisms based on futex, called "fsync".
Triple-A games with huge CPU's loads and tons of parallel jobs worked
as expected when compared with the previous FUTEX_WAIT_MULTIPLE
implementation at futex(). Some games issue 42k futex2() calls
per second.
- Full GNU/Linux distro: I installed the modified glibc in my host
machine, so all pthread's programs would use futex2(). After tweaking
systemd[6] to allow futex2() calls at seccomp, everything worked as
expected (web browsers do some syscall sandboxing and need some
configuration as well).
- perf: The perf benchmarks tests can also be used to stress the
interface, and they can be found in this patchset.
** Performance
- For comparing futex() and futex2() performance, I used the artificial
benchmarks implemented at perf (wake, wake-parallel, hash and
requeue). The setup was 200 runs for each test and using 8, 80, 800,
8000 for the number of threads, Note that for this test, I'm not using
patch 14 ("kernel: Enable waitpid() for futex2") , for reasons explained
at "The patchset" section.
- For the first three ones, I measured an average of 4% gain in
performance. This is not a big step, but it shows that the new
interface is at least comparable in performance with the current one.
- For requeue, I measured an average of 21% decrease in performance
compared to the original futex implementation. This is expected given
the new design with individual flags. The performance trade-offs are
explained at patch 4 ("futex2: Implement requeue operation").
[4] https://gitlab.collabora.com/tonyk/glibc/-/tree/futex2
[5] https://gitlab.collabora.com/tonyk/wine/-/tree/proton_5.13
[6] https://gitlab.collabora.com/tonyk/systemd
* FAQ
** "Where's the code for NUMA and FUTEX_8/16?"
The current code is already complex enough to take some time for
review, so I believe it's better to split that work out to a future
iteration of this patchset. Besides that, this RFC is the core part of the
infrastructure, and the following features will not pose big design
changes to it, the work will be more about wiring up the flags and
modifying some functions.
** "And what's about FUTEX_64?"
By supporting 64 bit futexes, the kernel structure for futex would
need to have a 64 bit field for the value, and that could defeat one of
the purposes of having different sized futexes in the first place:
supporting smaller ones to decrease memory usage. This might be
something that could be disabled for 32bit archs (and even for
CONFIG_BASE_SMALL).
Which use case would benefit for FUTEX_64? Does it worth the trade-offs?
** "Where's the PI/robust stuff?"
As said by Peter Zijlstra at [3], all those new features are related to
the "simple" futex interface, that doesn't use PI or robust. Do we want
to have this complexity at futex2() and if so, should it be part of
this patchset or can it be future work?
Thanks,
André
* Changelog
Changes from v1:
- Unified futex_set_timer_and_wait and __futex_wait code
- Dropped _carefull from linked list function calls
- Fixed typos on docs patch
- uAPI flags are now added as features are introduced, instead of all flags
in patch 1
- Removed struct futex_single_waiter in favor of an anon struct
v1: https://lore.kernel.org/lkml/20210215152404.250281-1-andrealmeid@collabora.…
André Almeida (13):
futex2: Implement wait and wake functions
futex2: Add support for shared futexes
futex2: Implement vectorized wait
futex2: Implement requeue operation
futex2: Add compatibility entry point for x86_x32 ABI
docs: locking: futex2: Add documentation
selftests: futex2: Add wake/wait test
selftests: futex2: Add timeout test
selftests: futex2: Add wouldblock test
selftests: futex2: Add waitv test
selftests: futex2: Add requeue test
perf bench: Add futex2 benchmark tests
kernel: Enable waitpid() for futex2
Documentation/locking/futex2.rst | 198 +++
Documentation/locking/index.rst | 1 +
MAINTAINERS | 2 +-
arch/arm/tools/syscall.tbl | 4 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 8 +
arch/x86/entry/syscalls/syscall_32.tbl | 4 +
arch/x86/entry/syscalls/syscall_64.tbl | 4 +
fs/inode.c | 1 +
include/linux/compat.h | 23 +
include/linux/fs.h | 1 +
include/linux/syscalls.h | 18 +
include/uapi/asm-generic/unistd.h | 14 +-
include/uapi/linux/futex.h | 31 +
init/Kconfig | 7 +
kernel/Makefile | 1 +
kernel/fork.c | 2 +
kernel/futex2.c | 1239 +++++++++++++++++
kernel/sys_ni.c | 6 +
tools/arch/x86/include/asm/unistd_64.h | 12 +
tools/include/uapi/asm-generic/unistd.h | 11 +-
.../arch/x86/entry/syscalls/syscall_64.tbl | 4 +
tools/perf/bench/bench.h | 4 +
tools/perf/bench/futex-hash.c | 24 +-
tools/perf/bench/futex-requeue.c | 57 +-
tools/perf/bench/futex-wake-parallel.c | 41 +-
tools/perf/bench/futex-wake.c | 37 +-
tools/perf/bench/futex.h | 47 +
tools/perf/builtin-bench.c | 18 +-
.../selftests/futex/functional/.gitignore | 3 +
.../selftests/futex/functional/Makefile | 8 +-
.../futex/functional/futex2_requeue.c | 164 +++
.../selftests/futex/functional/futex2_wait.c | 209 +++
.../selftests/futex/functional/futex2_waitv.c | 157 +++
.../futex/functional/futex_wait_timeout.c | 58 +-
.../futex/functional/futex_wait_wouldblock.c | 33 +-
.../testing/selftests/futex/functional/run.sh | 6 +
.../selftests/futex/include/futex2test.h | 121 ++
38 files changed, 2527 insertions(+), 53 deletions(-)
create mode 100644 Documentation/locking/futex2.rst
create mode 100644 kernel/futex2.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_requeue.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_wait.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_waitv.c
create mode 100644 tools/testing/selftests/futex/include/futex2test.h
--
2.30.1
From: Aaron Lewis <aaronlewis(a)google.com>
[ Upstream commit 6528fc0a11de3d16339cf17639e2f69a68fcaf4d ]
The vcpu mmap area may consist of more than just the kvm_run struct.
Allocate enough space for the entire vcpu mmap area. Without this, on
x86, the PIO page, for example, will be missing. This is problematic
when dealing with an unhandled exception from the guest as the exception
vector will be incorrectly reported as 0x0.
Message-Id: <20210210165035.3712489-1-aaronlewis(a)google.com>
Reviewed-by: Andrew Jones <drjones(a)redhat.com>
Co-developed-by: Steve Rutherford <srutherford(a)google.com>
Signed-off-by: Aaron Lewis <aaronlewis(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/kvm/lib/kvm_util.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index fa5a90e6c6f0..859a0b57c683 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -21,6 +21,8 @@
#define KVM_UTIL_PGS_PER_HUGEPG 512
#define KVM_UTIL_MIN_PFN 2
+static int vcpu_mmap_sz(void);
+
/* Aligns x up to the next multiple of size. Size must be a power of 2. */
static void *align(void *x, size_t size)
{
@@ -509,7 +511,7 @@ static void vm_vcpu_rm(struct kvm_vm *vm, struct vcpu *vcpu)
vcpu->dirty_gfns = NULL;
}
- ret = munmap(vcpu->state, sizeof(*vcpu->state));
+ ret = munmap(vcpu->state, vcpu_mmap_sz());
TEST_ASSERT(ret == 0, "munmap of VCPU fd failed, rc: %i "
"errno: %i", ret, errno);
close(vcpu->fd);
@@ -978,7 +980,7 @@ void vm_vcpu_add(struct kvm_vm *vm, uint32_t vcpuid)
TEST_ASSERT(vcpu_mmap_sz() >= sizeof(*vcpu->state), "vcpu mmap size "
"smaller than expected, vcpu_mmap_sz: %i expected_min: %zi",
vcpu_mmap_sz(), sizeof(*vcpu->state));
- vcpu->state = (struct kvm_run *) mmap(NULL, sizeof(*vcpu->state),
+ vcpu->state = (struct kvm_run *) mmap(NULL, vcpu_mmap_sz(),
PROT_READ | PROT_WRITE, MAP_SHARED, vcpu->fd, 0);
TEST_ASSERT(vcpu->state != MAP_FAILED, "mmap vcpu_state failed, "
"vcpu id: %u errno: %i", vcpuid, errno);
--
2.30.1
Fix the following coccicheck warnings:
./tools/testing/selftests/bpf/test_sockmap.c:735:35-37: WARNING !A || A
&& B is equivalent to !A || B.
Reported-by: Abaci Robot <abaci(a)linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
---
tools/testing/selftests/bpf/test_sockmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 427ca00..eefd445 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -732,7 +732,7 @@ static int sendmsg_test(struct sockmap_options *opt)
* socket is not a valid test. So in this case lets not
* enable kTLS but still run the test.
*/
- if (!txmsg_redir || (txmsg_redir && txmsg_ingress)) {
+ if (!txmsg_redir || txmsg_ingress) {
err = sockmap_init_ktls(opt->verbose, rx_fd);
if (err)
return err;
--
1.8.3.1
On Tue, Mar 02, 2021 at 01:49:41PM +0800, kernel test robot wrote:
>
>
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: cfe92ab6a3ea700c08ba673b46822d51f38d6b40 ("[PATCH v5 2/8] security/brute: Define a LSM and manage statistical data")
> url: https://github.com/0day-ci/linux/commits/John-Wood/Fork-brute-force-attack-…
> base: https://git.kernel.org/cgit/linux/kernel/git/shuah/linux-kselftest.git next
>
> in testcase: trinity
> version: trinity-static-i386-x86_64-f93256fb_2019-08-28
> with following parameters:
>
> group: ["group-00", "group-01", "group-02", "group-03", "group-04"]
>
> test-description: Trinity is a linux system call fuzz tester.
> test-url: http://codemonkey.org.uk/projects/trinity/
>
>
> on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> +-------------------------------------------------------------------------+------------+------------+
> | | 1d53b7aac6 | cfe92ab6a3 |
> +-------------------------------------------------------------------------+------------+------------+
> | WARNING:inconsistent_lock_state | 0 | 6 |
> | inconsistent{IN-SOFTIRQ-W}->{SOFTIRQ-ON-W}usage | 0 | 6 |
> +-------------------------------------------------------------------------+------------+------------+
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <oliver.sang(a)intel.com>
>
>
> [ 116.852721] ================================
> [ 116.853120] WARNING: inconsistent lock state
> [ 116.853120] 5.11.0-rc7-00013-gcfe92ab6a3ea #1 Tainted: G S
> [ 116.853120] --------------------------------
>
> [...]
Thanks for the report. I will work on this for the next version.
> Thanks,
> Oliver Sang
Thanks,
John Wood
Hi,
This v3 series can mainly include two parts.
Based on kvm queue branch: https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=queue
Links of v1: https://lore.kernel.org/lkml/20210208090841.333724-1-wangyanan55@huawei.com/
Links of v2: https://lore.kernel.org/lkml/20210225055940.18748-1-wangyanan55@huawei.com/
In the first part, all the known hugetlb backing src types specified
with different hugepage sizes are listed, so that we can specify use
of hugetlb source of the exact granularity that we want, instead of
the system default ones. And as all the known hugetlb page sizes are
listed, it's appropriate for all architectures. Besides, a helper that
can get granularity of different backing src types(anonumous/thp/hugetlb)
is added, so that we can use the accurate backing src granularity for
kinds of alignment or guest memory accessing of vcpus.
In the second part, a new test is added:
This test is added to serve as a performance tester and a bug reproducer
for kvm page table code (GPA->HPA mappings), it gives guidance for the
people trying to make some improvement for kvm. And the following explains
what we can exactly do through this test.
The function guest_code() can cover the conditions where a single vcpu or
multiple vcpus access guest pages within the same memory region, in three
VM stages(before dirty logging, during dirty logging, after dirty logging).
Besides, the backing src memory type(ANONYMOUS/THP/HUGETLB) of the tested
memory region can be specified by users, which means normal page mappings
or block mappings can be chosen by users to be created in the test.
If ANONYMOUS memory is specified, kvm will create normal page mappings
for the tested memory region before dirty logging, and update attributes
of the page mappings from RO to RW during dirty logging. If THP/HUGETLB
memory is specified, kvm will create block mappings for the tested memory
region before dirty logging, and split the blcok mappings into normal page
mappings during dirty logging, and coalesce the page mappings back into
block mappings after dirty logging is stopped.
So in summary, as a performance tester, this test can present the
performance of kvm creating/updating normal page mappings, or the
performance of kvm creating/splitting/recovering block mappings,
through execution time.
When we need to coalesce the page mappings back to block mappings after
dirty logging is stopped, we have to firstly invalidate *all* the TLB
entries for the page mappings right before installation of the block entry,
because a TLB conflict abort error could occur if we can't invalidate the
TLB entries fully. We have hit this TLB conflict twice on aarch64 software
implementation and fixed it. As this test can imulate process from dirty
logging enabled to dirty logging stopped of a VM with block mappings,
so it can also reproduce this TLB conflict abort due to inadequate TLB
invalidation when coalescing tables.
Links about the TLB conflict abort:
https://lore.kernel.org/lkml/20201201201034.116760-3-wangyanan55@huawei.com/
---
Change logs:
v2->v3:
- Add tags of Suggested-by, Reviewed-by in the patches
- Add a generic micro to get hugetlb page sizes
- Some changes for suggestions about v2 series
v1->v2:
- Add a patch to sync header files
- Add helpers to get granularity of different backing src types
- Some changes for suggestions about v1 series
---
Yanan Wang (7):
tools headers: sync headers of asm-generic/hugetlb_encode.h
tools headers: Add a macro to get HUGETLB page sizes for mmap
KVM: selftests: Use flag CLOCK_MONOTONIC_RAW for timing
KVM: selftests: Make a generic helper to get vm guest mode strings
KVM: selftests: Add a helper to get system configured THP page size
KVM: selftests: List all hugetlb src types specified with page sizes
KVM: selftests: Adapt vm_userspace_mem_region_add to new helpers
KVM: selftests: Add a test for kvm page table code
include/uapi/linux/mman.h | 2 +
tools/include/asm-generic/hugetlb_encode.h | 3 +
tools/include/uapi/linux/mman.h | 2 +
tools/testing/selftests/kvm/Makefile | 3 +
.../selftests/kvm/demand_paging_test.c | 8 +-
.../selftests/kvm/dirty_log_perf_test.c | 14 +-
.../testing/selftests/kvm/include/kvm_util.h | 4 +-
.../testing/selftests/kvm/include/test_util.h | 21 +-
.../selftests/kvm/kvm_page_table_test.c | 476 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 59 ++-
tools/testing/selftests/kvm/lib/test_util.c | 92 +++-
tools/testing/selftests/kvm/steal_time.c | 4 +-
12 files changed, 628 insertions(+), 60 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_page_table_test.c
--
2.19.1
From: John Stultz <john.stultz(a)linaro.org>
[ Upstream commit 64ba3d591c9d2be2a9c09e99b00732afe002ad0d ]
Copied in from somewhere else, the makefile was including
the kerne's usr/include dir, which caused the asm/ioctl.h file
to be used.
Unfortunately, that file has different values for _IOC_SIZEBITS
and _IOC_WRITE than include/uapi/asm-generic/ioctl.h which then
causes the _IOCW macros to give the wrong ioctl numbers,
specifically for DMA_BUF_IOCTL_SYNC.
This patch simply removes the extra include from the Makefile
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Brian Starkey <brian.starkey(a)arm.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Laura Abbott <labbott(a)kernel.org>
Cc: Hridya Valsaraju <hridya(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Sandeep Patil <sspatil(a)google.com>
Cc: Daniel Mentz <danielmentz(a)google.com>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linux-kselftest(a)vger.kernel.org
Fixes: a8779927fd86c ("kselftests: Add dma-heap test")
Signed-off-by: John Stultz <john.stultz(a)linaro.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/dmabuf-heaps/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/dmabuf-heaps/Makefile b/tools/testing/selftests/dmabuf-heaps/Makefile
index 607c2acd20829..604b43ece15f5 100644
--- a/tools/testing/selftests/dmabuf-heaps/Makefile
+++ b/tools/testing/selftests/dmabuf-heaps/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-CFLAGS += -static -O3 -Wl,-no-as-needed -Wall -I../../../../usr/include
+CFLAGS += -static -O3 -Wl,-no-as-needed -Wall
TEST_GEN_PROGS = dmabuf-heap
--
2.27.0
From: John Stultz <john.stultz(a)linaro.org>
[ Upstream commit 64ba3d591c9d2be2a9c09e99b00732afe002ad0d ]
Copied in from somewhere else, the makefile was including
the kerne's usr/include dir, which caused the asm/ioctl.h file
to be used.
Unfortunately, that file has different values for _IOC_SIZEBITS
and _IOC_WRITE than include/uapi/asm-generic/ioctl.h which then
causes the _IOCW macros to give the wrong ioctl numbers,
specifically for DMA_BUF_IOCTL_SYNC.
This patch simply removes the extra include from the Makefile
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Brian Starkey <brian.starkey(a)arm.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Laura Abbott <labbott(a)kernel.org>
Cc: Hridya Valsaraju <hridya(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Sandeep Patil <sspatil(a)google.com>
Cc: Daniel Mentz <danielmentz(a)google.com>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linux-kselftest(a)vger.kernel.org
Fixes: a8779927fd86c ("kselftests: Add dma-heap test")
Signed-off-by: John Stultz <john.stultz(a)linaro.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/dmabuf-heaps/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/dmabuf-heaps/Makefile b/tools/testing/selftests/dmabuf-heaps/Makefile
index 607c2acd20829..604b43ece15f5 100644
--- a/tools/testing/selftests/dmabuf-heaps/Makefile
+++ b/tools/testing/selftests/dmabuf-heaps/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-CFLAGS += -static -O3 -Wl,-no-as-needed -Wall -I../../../../usr/include
+CFLAGS += -static -O3 -Wl,-no-as-needed -Wall
TEST_GEN_PROGS = dmabuf-heap
--
2.27.0
Attacks against vulnerable userspace applications with the purpose to break
ASLR or bypass canaries traditionally use some level of brute force with
the help of the fork system call. This is possible since when creating a
new process using fork its memory contents are the same as those of the
parent process (the process that called the fork system call). So, the
attacker can test the memory infinite times to find the correct memory
values or the correct memory addresses without worrying about crashing the
application.
Based on the above scenario it would be nice to have this detected and
mitigated, and this is the goal of this patch serie. Specifically the
following attacks are expected to be detected:
1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
desirable memory layout is got (e.g. Stack Clash).
2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly
until a desirable memory layout is got (e.g. what CTFs do for simple
network service).
3.- Launching processes without exec() (e.g. Android Zygote) and exposing
state to attack a sibling.
4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until
the previously shared memory layout of all the other children is
exposed (e.g. kind of related to HeartBleed).
In each case, a privilege boundary has been crossed:
Case 1: setuid/setgid process
Case 2: network to local
Case 3: privilege changes
Case 4: network to local
So, what will really be detected are fork/exec brute force attacks that
cross any of the commented bounds.
The implementation details and comparison against other existing
implementations can be found in the "Documentation" patch.
This v4 version has changed a lot from the v2. Basically the application
crash period is now compute on an on-going basis using an exponential
moving average (EMA), a detection of a brute force attack through the
"execve" system call has been added and the crossing of the commented
privilege bounds are taken into account. Also, the fine tune has also been
removed and now, all this kind of attacks are detected without
administrator intervention.
In the v2 version Kees Cook suggested to study if the statistical data
shared by all the fork hierarchy processes can be tracked in some other
way. Specifically the question was if this info can be hold by the family
hierarchy of the mm struct. After studying this hierarchy I think it is not
suitable for the Brute LSM since they are totally copied on fork() and in
this case we want that they are shared. So I leave this road.
So, knowing all this information I will explain now the different patches:
The 1/8 patch defines a new LSM hook to get the fatal signal of a task.
This will be useful during the attack detection phase.
The 2/8 patch defines a new LSM and manages the statistical data shared by
all the fork hierarchy processes.
The 3/8 patch detects a fork/exec brute force attack.
The 4/8 patch narrows the detection taken into account the privilege
boundary crossing.
The 5/8 patch mitigates a brute force attack.
The 6/8 patch adds self-tests to validate the Brute LSM expectations.
The 7/8 patch adds the documentation to explain this implementation.
The 8/8 patch updates the maintainers file.
This patch serie is a task of the KSPP [1] and can also be accessed from my
github tree [2] in the "brute_v4" branch.
[1] https://github.com/KSPP/linux/issues/39
[2] https://github.com/johwood/linux/
The previous versions can be found in:
RFC
https://lore.kernel.org/kernel-hardening/20200910202107.3799376-1-keescook@…
Version 2
https://lore.kernel.org/kernel-hardening/20201025134540.3770-1-john.wood@gm…
Version 3
https://lore.kernel.org/lkml/20210221154919.68050-1-john.wood@gmx.com/
Changelog RFC -> v2
-------------------
- Rename this feature with a more suitable name (Jann Horn, Kees Cook).
- Convert the code to an LSM (Kees Cook).
- Add locking to avoid data races (Jann Horn).
- Add a new LSM hook to get the fatal signal of a task (Jann Horn, Kees
Cook).
- Add the last crashes timestamps list to avoid false positives in the
attack detection (Jann Horn).
- Use "period" instead of "rate" (Jann Horn).
- Other minor changes suggested (Jann Horn, Kees Cook).
Changelog v2 -> v3
------------------
- Compute the application crash period on an on-going basis (Kees Cook).
- Detect a brute force attack through the execve system call (Kees Cook).
- Detect an slow brute force attack (Randy Dunlap).
- Fine tuning the detection taken into account privilege boundary crossing
(Kees Cook).
- Taken into account only fatal signals delivered by the kernel (Kees
Cook).
- Remove the sysctl attributes to fine tuning the detection (Kees Cook).
- Remove the prctls to allow per process enabling/disabling (Kees Cook).
- Improve the documentation (Kees Cook).
- Fix some typos in the documentation (Randy Dunlap).
- Add self-test to validate the expectations (Kees Cook).
Changelog v3 -> v4
------------------
- Fix all the warnings shown by the tool "scripts/kernel-doc" (Randy
Dunlap).
Any constructive comments are welcome.
Thanks.
John Wood (8):
security: Add LSM hook at the point where a task gets a fatal signal
security/brute: Define a LSM and manage statistical data
securtiy/brute: Detect a brute force attack
security/brute: Fine tuning the attack detection
security/brute: Mitigate a brute force attack
selftests/brute: Add tests for the Brute LSM
Documentation: Add documentation for the Brute LSM
MAINTAINERS: Add a new entry for the Brute LSM
Documentation/admin-guide/LSM/Brute.rst | 224 +++++
Documentation/admin-guide/LSM/index.rst | 1 +
MAINTAINERS | 7 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 4 +
kernel/signal.c | 1 +
security/Kconfig | 11 +-
security/Makefile | 4 +
security/brute/Kconfig | 13 +
security/brute/Makefile | 2 +
security/brute/brute.c | 1102 ++++++++++++++++++++++
security/security.c | 5 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/brute/.gitignore | 2 +
tools/testing/selftests/brute/Makefile | 5 +
tools/testing/selftests/brute/config | 1 +
tools/testing/selftests/brute/exec.c | 44 +
tools/testing/selftests/brute/test.c | 507 ++++++++++
tools/testing/selftests/brute/test.sh | 226 +++++
20 files changed, 2160 insertions(+), 5 deletions(-)
create mode 100644 Documentation/admin-guide/LSM/Brute.rst
create mode 100644 security/brute/Kconfig
create mode 100644 security/brute/Makefile
create mode 100644 security/brute/brute.c
create mode 100644 tools/testing/selftests/brute/.gitignore
create mode 100644 tools/testing/selftests/brute/Makefile
create mode 100644 tools/testing/selftests/brute/config
create mode 100644 tools/testing/selftests/brute/exec.c
create mode 100644 tools/testing/selftests/brute/test.c
create mode 100755 tools/testing/selftests/brute/test.sh
--
2.25.1
The first argument to namedtuple() should match the name of the type,
which wasn't the case for KconfigEntryBase.
Fixing this is enough to make mypy show no python typing errors again.
Fixes 97752c39bd ("kunit: kunit_tool: Allow .kunitconfig to disable config items")
Signed-off-by: David Gow <davidgow(a)google.com>
---
tools/testing/kunit/kunit_config.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/kunit/kunit_config.py b/tools/testing/kunit/kunit_config.py
index 0b550cbd667d..1e2683dcc0e7 100644
--- a/tools/testing/kunit/kunit_config.py
+++ b/tools/testing/kunit/kunit_config.py
@@ -13,7 +13,7 @@ from typing import List, Set
CONFIG_IS_NOT_SET_PATTERN = r'^# CONFIG_(\w+) is not set$'
CONFIG_PATTERN = r'^CONFIG_(\w+)=(\S+|".*")$'
-KconfigEntryBase = collections.namedtuple('KconfigEntry', ['name', 'value'])
+KconfigEntryBase = collections.namedtuple('KconfigEntryBase', ['name', 'value'])
class KconfigEntry(KconfigEntryBase):
--
2.30.0.617.g56c4b15f3c-goog
kunit_tool maintains a list of config options which are broken under
UML, which we exclude from an otherwise 'make ARCH=um allyesconfig'
build used to run all tests with the --alltests option.
Something in UML allyesconfig is causing segfaults when page poisining
is enabled (and is poisoning with a non-zero value). Previously, this
didn't occur, as allyesconfig enabled the CONFIG_PAGE_POISONING_ZERO
option, which worked around the problem by zeroing memory. This option
has since been removed, and memory is now poisoned with 0xAA, which
triggers segfaults in many different codepaths, preventing UML from
booting.
Note that we have to disable both CONFIG_PAGE_POISONING and
CONFIG_DEBUG_PAGEALLOC, as the latter will 'select' the former on
architectures (such as UML) which don't implement __kernel_map_pages().
Ideally, we'd fix this properly by tracking down the real root cause,
but since this is breaking KUnit's --alltests feature, it's worth
disabling there in the meantime so the kernel can boot to the point
where tests can actually run.
Fixes: f289041ed4 ("mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO")
Signed-off-by: David Gow <davidgow(a)google.com>
---
As described above, 'make ARCH=um allyesconfig' is broken. KUnit has
been maintaining a list of configs to force-disable for this in
tools/testing/kunit/configs/broken_on_uml.config. The kernels we've
built with this have broken since CONFIG_PAGE_POISONING_ZERO was
removed, panic-ing on startup with:
<0>[ 0.100000][ T11] Kernel panic - not syncing: Segfault with no mm
<4>[ 0.100000][ T11] CPU: 0 PID: 11 Comm: kdevtmpfs Not tainted 5.11.0-rc7-00003-g63381dc6f5f1-dirty #4
<4>[ 0.100000][ T11] Stack:
<4>[ 0.100000][ T11] 677d3d40 677d3f10 0000000e 600c0bc0
<4>[ 0.100000][ T11] 677d3d90 603c99be 677d3d90 62529b93
<4>[ 0.100000][ T11] 603c9ac0 677d3f10 62529b00 603c98a0
<4>[ 0.100000][ T11] Call Trace:
<4>[ 0.100000][ T11] [<600c0bc0>] ? set_signals+0x0/0x60
<4>[ 0.100000][ T11] [<603c99be>] lookup_mnt+0x11e/0x220
<4>[ 0.100000][ T11] [<62529b93>] ? down_write+0x93/0x180
<4>[ 0.100000][ T11] [<603c9ac0>] ? lock_mount+0x0/0x160
<4>[ 0.100000][ T11] [<62529b00>] ? down_write+0x0/0x180
<4>[ 0.100000][ T11] [<603c98a0>] ? lookup_mnt+0x0/0x220
<4>[ 0.100000][ T11] [<603c8160>] ? namespace_unlock+0x0/0x1a0
<4>[ 0.100000][ T11] [<603c9b25>] lock_mount+0x65/0x160
<4>[ 0.100000][ T11] [<6012f360>] ? up_write+0x0/0x40
<4>[ 0.100000][ T11] [<603cbbd2>] do_new_mount_fc+0xd2/0x220
<4>[ 0.100000][ T11] [<603eb560>] ? vfs_parse_fs_string+0x0/0xa0
<4>[ 0.100000][ T11] [<603cbf04>] do_new_mount+0x1e4/0x260
<4>[ 0.100000][ T11] [<603ccae9>] path_mount+0x1c9/0x6e0
<4>[ 0.100000][ T11] [<603a9f4f>] ? getname_kernel+0xaf/0x1a0
<4>[ 0.100000][ T11] [<603ab280>] ? kern_path+0x0/0x60
<4>[ 0.100000][ T11] [<600238ee>] 0x600238ee
<4>[ 0.100000][ T11] [<62523baa>] devtmpfsd+0x52/0xb8
<4>[ 0.100000][ T11] [<62523b58>] ? devtmpfsd+0x0/0xb8
<4>[ 0.100000][ T11] [<600fffd8>] kthread+0x1d8/0x200
<4>[ 0.100000][ T11] [<600a4ea6>] new_thread_handler+0x86/0xc0
Disabling PAGE_POISONING fixes this. The issue can't be repoduced with
just PAGE_POISONING, there's clearly something (or several things) also
enabled by allyesconfig which contribute. Ideally, we'd track these down
and fix this at its root cause, but in the meantime it'd be nice to
disable PAGE_POISONING so we can at least get the kernel to boot. One
way would be to add a 'depends on !UML' or similar, but since
PAGE_POISONING does seem to work in the non-allyesconfig case, adding it
to our list of broken configs seemed the better choice.
Thoughts?
(Note that to reproduce this, you'll want to run
./tools/testing/kunit/kunit.py run --alltests --raw_output
It also depends on a couple of other fixes which are not upstream yet:
https://www.spinics.net/lists/linux-rtc/msg08294.htmlhttps://lore.kernel.org/linux-i3c/20210127040636.1535722-1-davidgow@google.…
Cheers,
-- David
tools/testing/kunit/configs/broken_on_uml.config | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/kunit/configs/broken_on_uml.config b/tools/testing/kunit/configs/broken_on_uml.config
index a7f0603d33f6..690870043ac0 100644
--- a/tools/testing/kunit/configs/broken_on_uml.config
+++ b/tools/testing/kunit/configs/broken_on_uml.config
@@ -40,3 +40,5 @@
# CONFIG_RESET_BRCMSTB_RESCAL is not set
# CONFIG_RESET_INTEL_GW is not set
# CONFIG_ADI_AXI_ADC is not set
+# CONFIG_DEBUG_PAGEALLOC is not set
+# CONFIG_PAGE_POISONING is not set
--
2.30.0.478.g8a0d178c01-goog
Hi,
This v2 series can mainly include two parts.
Based on kvm queue branch: https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=queue
Links of v1: https://lore.kernel.org/lkml/20210208090841.333724-1-wangyanan55@huawei.com/
In the first part, all the known hugetlb backing src types specified
with different hugepage sizes are listed, so that we can specify use
of hugetlb source of the exact granularity that we want, instead of
the system default ones. And as all the known hugetlb page sizes are
listed, it's appropriate for all architectures. Besides, a helper that
can get granularity of different backing src types(anonumous/thp/hugetlb)
is added, so that we can use the accurate backing src granularity for
kinds of alignment or guest memory accessing of vcpus.
In the second part, a new test is added:
This test is added to serve as a performance tester and a bug reproducer
for kvm page table code (GPA->HPA mappings), it gives guidance for the
people trying to make some improvement for kvm. And the following explains
what we can exactly do through this test.
The function guest_code() can cover the conditions where a single vcpu or
multiple vcpus access guest pages within the same memory region, in three
VM stages(before dirty logging, during dirty logging, after dirty logging).
Besides, the backing src memory type(ANONYMOUS/THP/HUGETLB) of the tested
memory region can be specified by users, which means normal page mappings
or block mappings can be chosen by users to be created in the test.
If ANONYMOUS memory is specified, kvm will create normal page mappings
for the tested memory region before dirty logging, and update attributes
of the page mappings from RO to RW during dirty logging. If THP/HUGETLB
memory is specified, kvm will create block mappings for the tested memory
region before dirty logging, and split the blcok mappings into normal page
mappings during dirty logging, and coalesce the page mappings back into
block mappings after dirty logging is stopped.
So in summary, as a performance tester, this test can present the
performance of kvm creating/updating normal page mappings, or the
performance of kvm creating/splitting/recovering block mappings,
through execution time.
When we need to coalesce the page mappings back to block mappings after
dirty logging is stopped, we have to firstly invalidate *all* the TLB
entries for the page mappings right before installation of the block entry,
because a TLB conflict abort error could occur if we can't invalidate the
TLB entries fully. We have hit this TLB conflict twice on aarch64 software
implementation and fixed it. As this test can imulate process from dirty
logging enabled to dirty logging stopped of a VM with block mappings,
so it can also reproduce this TLB conflict abort due to inadequate TLB
invalidation when coalescing tables.
Links about the TLB conflict abort:
https://lore.kernel.org/lkml/20201201201034.116760-3-wangyanan55@huawei.com/
Yanan Wang (7):
tools include: sync head files of mmap flag encodings about hugetlb
KVM: selftests: Use flag CLOCK_MONOTONIC_RAW for timing
KVM: selftests: Make a generic helper to get vm guest mode strings
KVM: selftests: Add a helper to get system configured THP page size
KVM: selftests: List all hugetlb src types specified with page sizes
KVM: selftests: Adapt vm_userspace_mem_region_add to new helpers
KVM: selftests: Add a test for kvm page table code
tools/include/asm-generic/hugetlb_encode.h | 3 +
tools/testing/selftests/kvm/Makefile | 3 +
.../selftests/kvm/demand_paging_test.c | 8 +-
.../selftests/kvm/dirty_log_perf_test.c | 14 +-
.../testing/selftests/kvm/include/kvm_util.h | 4 +-
.../testing/selftests/kvm/include/test_util.h | 21 +-
.../selftests/kvm/kvm_page_table_test.c | 476 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 58 +--
tools/testing/selftests/kvm/lib/test_util.c | 92 +++-
tools/testing/selftests/kvm/steal_time.c | 4 +-
10 files changed, 623 insertions(+), 60 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_page_table_test.c
--
2.19.1
Base
====
This series is based on top of my series which adds minor fault handling for
hugetlbfs [1]. (And, therefore, it is based on linux-next/akpm and Peter Xu's
series for disabling huge pmd sharing as well.)
[1] https://lore.kernel.org/patchwork/cover/1384095/
Overview
========
See my original series linked above for a detailed overview of minor fault
handling in general. The feature in this series works exactly like the
hugetblfs version (from userspace's perspective).
I'm sending this as a separate series because:
- The original minor fault handling series has been through several rounds of
review and seems close to being merged, so it seems reasonable to start
looking at this next step.
- shmem is different enough that this series may require some additional work
before it's ready, and I don't want to delay the original series
unnecessarily by bundling them together.
Use Case
========
In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.
Additionally, Android folks (Lokesh Gidra <lokeshgidra(a)google.com>) hope to
optimize the Android JVM garbage collector using this feature (a paper
describing a somewhat similar approach: https://arxiv.org/pdf/1902.04738.pdf).
Axel Rasmussen (5):
userfaultfd: support minor fault handling for shmem
userfaultfd/selftests: use memfd_create for shmem test type
userfaultfd/selftests: create alias mappings in the shmem test
userfaultfd/selftests: reinitialize test context in each test
userfaultfd/selftests: exercise minor fault handling shmem support
fs/userfaultfd.c | 6 +-
include/linux/shmem_fs.h | 26 +-
include/uapi/linux/userfaultfd.h | 4 +-
mm/memory.c | 8 +-
mm/shmem.c | 88 +++----
mm/userfaultfd.c | 27 +-
tools/testing/selftests/vm/userfaultfd.c | 322 +++++++++++++++--------
7 files changed, 293 insertions(+), 188 deletions(-)
--
2.30.0.617.g56c4b15f3c-goog
Hi,
This patch series mainly fixes race condition issues, explains specific
lock rules and improves code related to concurrent calls of
hook_sb_delete() and release_inode(). I exploited these races to
validate the fixes. Userspace tests are also improved along with some
commit messages and comments. Serge Hallyn's review is taken into
account and his Acked-by are added to the corresponding patches.
The SLOC count is 1328 for security/landlock/ and 2539 for
tools/testing/selftest/landlock/ . Test coverage for security/landlock/
is 93.6% of lines. The code not covered only deals with internal kernel
errors (e.g. memory allocation) and race conditions. This series is
being fuzzed by syzkaller (which may cover internal kernel errors), and
patches are on their way: https://github.com/google/syzkaller/pull/2380
The compiled documentation is available here:
https://landlock.io/linux-doc/landlock-v29/userspace-api/landlock.html
This series can be applied on top of v5.11-7592-g1a3a9ffb27bb (Linus's
master branch from Sunday). This can be tested with
CONFIG_SECURITY_LANDLOCK, CONFIG_SAMPLE_LANDLOCK and by prepending
"landlock," to CONFIG_LSM. This patch series can be found in a Git
repository here:
https://github.com/landlock-lsm/linux/commits/landlock-v29
This patch series seems ready for upstream and I would really appreciate
final reviews.
# Landlock LSM
The goal of Landlock is to enable to restrict ambient rights (e.g.
global filesystem access) for a set of processes. Because Landlock is a
stackable LSM [1], it makes possible to create safe security sandboxes
as new security layers in addition to the existing system-wide
access-controls. This kind of sandbox is expected to help mitigate the
security impact of bugs or unexpected/malicious behaviors in user-space
applications. Landlock empowers any process, including unprivileged
ones, to securely restrict themselves.
Landlock is inspired by seccomp-bpf but instead of filtering syscalls
and their raw arguments, a Landlock rule can restrict the use of kernel
objects like file hierarchies, according to the kernel semantic.
Landlock also takes inspiration from other OS sandbox mechanisms: XNU
Sandbox, FreeBSD Capsicum or OpenBSD Pledge/Unveil.
In this current form, Landlock misses some access-control features.
This enables to minimize this patch series and ease review. This series
still addresses multiple use cases, especially with the combined use of
seccomp-bpf: applications with built-in sandboxing, init systems,
security sandbox tools and security-oriented APIs [2].
Previous version:
https://lore.kernel.org/lkml/20210202162710.657398-1-mic@digikod.net/
[1] https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler…
[2] https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.n…
Casey Schaufler (1):
LSM: Infrastructure management of the superblock
Mickaël Salaün (11):
landlock: Add object management
landlock: Add ruleset and domain management
landlock: Set up the security framework and manage credentials
landlock: Add ptrace restrictions
fs,security: Add sb_delete hook
landlock: Support filesystem access-control
landlock: Add syscall implementations
arch: Wire up Landlock syscalls
selftests/landlock: Add user space tests
samples/landlock: Add a sandbox manager example
landlock: Add user and kernel documentation
Documentation/security/index.rst | 1 +
Documentation/security/landlock.rst | 79 +
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/landlock.rst | 307 ++
MAINTAINERS | 15 +
arch/Kconfig | 7 +
arch/alpha/kernel/syscalls/syscall.tbl | 3 +
arch/arm/tools/syscall.tbl | 3 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 6 +
arch/ia64/kernel/syscalls/syscall.tbl | 3 +
arch/m68k/kernel/syscalls/syscall.tbl | 3 +
arch/microblaze/kernel/syscalls/syscall.tbl | 3 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 3 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 3 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 3 +
arch/parisc/kernel/syscalls/syscall.tbl | 3 +
arch/powerpc/kernel/syscalls/syscall.tbl | 3 +
arch/s390/kernel/syscalls/syscall.tbl | 3 +
arch/sh/kernel/syscalls/syscall.tbl | 3 +
arch/sparc/kernel/syscalls/syscall.tbl | 3 +
arch/um/Kconfig | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 3 +
arch/x86/entry/syscalls/syscall_64.tbl | 3 +
arch/xtensa/kernel/syscalls/syscall.tbl | 3 +
fs/super.c | 1 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 4 +
include/linux/syscalls.h | 7 +
include/uapi/asm-generic/unistd.h | 8 +-
include/uapi/linux/landlock.h | 128 +
kernel/sys_ni.c | 5 +
samples/Kconfig | 7 +
samples/Makefile | 1 +
samples/landlock/.gitignore | 1 +
samples/landlock/Makefile | 13 +
samples/landlock/sandboxer.c | 238 ++
security/Kconfig | 11 +-
security/Makefile | 2 +
security/landlock/Kconfig | 21 +
security/landlock/Makefile | 4 +
security/landlock/common.h | 20 +
security/landlock/cred.c | 46 +
security/landlock/cred.h | 58 +
security/landlock/fs.c | 686 +++++
security/landlock/fs.h | 56 +
security/landlock/limits.h | 21 +
security/landlock/object.c | 67 +
security/landlock/object.h | 91 +
security/landlock/ptrace.c | 120 +
security/landlock/ptrace.h | 14 +
security/landlock/ruleset.c | 473 +++
security/landlock/ruleset.h | 165 +
security/landlock/setup.c | 40 +
security/landlock/setup.h | 18 +
security/landlock/syscalls.c | 445 +++
security/security.c | 51 +-
security/selinux/hooks.c | 58 +-
security/selinux/include/objsec.h | 6 +
security/selinux/ss/services.c | 3 +-
security/smack/smack.h | 6 +
security/smack/smack_lsm.c | 35 +-
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/landlock/.gitignore | 2 +
tools/testing/selftests/landlock/Makefile | 24 +
tools/testing/selftests/landlock/base_test.c | 219 ++
tools/testing/selftests/landlock/common.h | 183 ++
tools/testing/selftests/landlock/config | 7 +
tools/testing/selftests/landlock/fs_test.c | 2724 +++++++++++++++++
.../testing/selftests/landlock/ptrace_test.c | 337 ++
tools/testing/selftests/landlock/true.c | 5 +
72 files changed, 6827 insertions(+), 77 deletions(-)
create mode 100644 Documentation/security/landlock.rst
create mode 100644 Documentation/userspace-api/landlock.rst
create mode 100644 include/uapi/linux/landlock.h
create mode 100644 samples/landlock/.gitignore
create mode 100644 samples/landlock/Makefile
create mode 100644 samples/landlock/sandboxer.c
create mode 100644 security/landlock/Kconfig
create mode 100644 security/landlock/Makefile
create mode 100644 security/landlock/common.h
create mode 100644 security/landlock/cred.c
create mode 100644 security/landlock/cred.h
create mode 100644 security/landlock/fs.c
create mode 100644 security/landlock/fs.h
create mode 100644 security/landlock/limits.h
create mode 100644 security/landlock/object.c
create mode 100644 security/landlock/object.h
create mode 100644 security/landlock/ptrace.c
create mode 100644 security/landlock/ptrace.h
create mode 100644 security/landlock/ruleset.c
create mode 100644 security/landlock/ruleset.h
create mode 100644 security/landlock/setup.c
create mode 100644 security/landlock/setup.h
create mode 100644 security/landlock/syscalls.c
create mode 100644 tools/testing/selftests/landlock/.gitignore
create mode 100644 tools/testing/selftests/landlock/Makefile
create mode 100644 tools/testing/selftests/landlock/base_test.c
create mode 100644 tools/testing/selftests/landlock/common.h
create mode 100644 tools/testing/selftests/landlock/config
create mode 100644 tools/testing/selftests/landlock/fs_test.c
create mode 100644 tools/testing/selftests/landlock/ptrace_test.c
create mode 100644 tools/testing/selftests/landlock/true.c
base-commit: 31caf8b2a847214be856f843e251fc2ed2cd1075
--
2.30.0
The top-level kselftest directory is not called kselftest, but
selftests.
Signed-off-by: Antonio Terceiro <antonio.terceiro(a)linaro.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Jonathan Corbet <corbet(a)lwn.net>
---
Documentation/dev-tools/kselftest.rst | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst
index a901def730d9..dcefee707ccd 100644
--- a/Documentation/dev-tools/kselftest.rst
+++ b/Documentation/dev-tools/kselftest.rst
@@ -239,8 +239,8 @@ using a shell script test runner. ``kselftest/module.sh`` is designed
to facilitate this process. There is also a header file provided to
assist writing kernel modules that are for use with kselftest:
-- ``tools/testing/kselftest/kselftest_module.h``
-- ``tools/testing/kselftest/kselftest/module.sh``
+- ``tools/testing/selftests/kselftest_module.h``
+- ``tools/testing/selftests/kselftest/module.sh``
How to use
----------
--
2.30.1
This is an optimization of a set of sgx-related codes, each of which
is independent of the patch. Because the second and third patches have
conflicting dependencies, these patches are put together.
---
v5 changes:
* Remove the two patches with no actual value
* Typo fix in commit message
v4 changes:
* Improvements suggested by review
v3 changes:
* split free_cnt count and spin lock optimization into two patches
v2 changes:
* review suggested changes
Tianjia Zhang (3):
selftests/x86: Use getauxval() to simplify the code in sgx
x86/sgx: Allows ioctl PROVISION to execute before CREATE
x86/sgx: Remove redundant if conditions in sgx_encl_create
arch/x86/kernel/cpu/sgx/driver.c | 1 +
arch/x86/kernel/cpu/sgx/ioctl.c | 8 ++++----
tools/testing/selftests/sgx/main.c | 24 ++++--------------------
3 files changed, 9 insertions(+), 24 deletions(-)
--
2.19.1.3.ge56e4f7
Attacks against vulnerable userspace applications with the purpose to break
ASLR or bypass canaries traditionally use some level of brute force with
the help of the fork system call. This is possible since when creating a
new process using fork its memory contents are the same as those of the
parent process (the process that called the fork system call). So, the
attacker can test the memory infinite times to find the correct memory
values or the correct memory addresses without worrying about crashing the
application.
Based on the above scenario it would be nice to have this detected and
mitigated, and this is the goal of this patch serie. Specifically the
following attacks are expected to be detected:
1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
desirable memory layout is got (e.g. Stack Clash).
2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly
until a desirable memory layout is got (e.g. what CTFs do for simple
network service).
3.- Launching processes without exec() (e.g. Android Zygote) and exposing
state to attack a sibling.
4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until
the previously shared memory layout of all the other children is
exposed (e.g. kind of related to HeartBleed).
In each case, a privilege boundary has been crossed:
Case 1: setuid/setgid process
Case 2: network to local
Case 3: privilege changes
Case 4: network to local
So, what will really be detected are fork/exec brute force attacks that
cross any of the commented bounds.
The implementation details and comparison against other existing
implementations can be found in the "Documentation" patch.
This v3 version has changed a lot from the v2. Basically the application
crash period is now compute on an on-going basis using an exponential
moving average (EMA), a detection of a brute force attack through the
"execve" system call has been added and the crossing of the commented
privilege bounds are taken into account. Also, the fine tune has also been
removed and now, all this kind of attacks are detected without
administrator intervention.
In the v2 version Kees Cook suggested to study if the statistical data
shared by all the fork hierarchy processes can be tracked in some other
way. Specifically the question was if this info can be hold by the family
hierarchy of the mm struct. After studying this hierarchy I think it is not
suitable for the Brute LSM since they are totally copied on fork() and in
this case we want that they are shared. So I leave this road.
So, knowing all this information I will explain now the different patches:
The 1/8 patch defines a new LSM hook to get the fatal signal of a task.
This will be useful during the attack detection phase.
The 2/8 patch defines a new LSM and manages the statistical data shared by
all the fork hierarchy processes.
The 3/8 patch detects a fork/exec brute force attack.
The 4/8 patch narrows the detection taken into account the privilege
boundary crossing.
The 5/8 patch mitigates a brute force attack.
The 6/8 patch adds self-tests to validate the Brute LSM expectations.
The 7/8 patch adds the documentation to explain this implementation.
The 8/8 patch updates the maintainers file.
This patch serie is a task of the KSPP [1] and can also be accessed from my
github tree [2] in the "brute_v3" branch.
[1] https://github.com/KSPP/linux/issues/39
[2] https://github.com/johwood/linux/
The previous versions can be found in:
https://lore.kernel.org/kernel-hardening/20200910202107.3799376-1-keescook@…https://lore.kernel.org/kernel-hardening/20201025134540.3770-1-john.wood@gm…
Changelog RFC -> v2
-------------------
- Rename this feature with a more suitable name (Jann Horn, Kees Cook).
- Convert the code to an LSM (Kees Cook).
- Add locking to avoid data races (Jann Horn).
- Add a new LSM hook to get the fatal signal of a task (Jann Horn, Kees
Cook).
- Add the last crashes timestamps list to avoid false positives in the
attack detection (Jann Horn).
- Use "period" instead of "rate" (Jann Horn).
- Other minor changes suggested (Jann Horn, Kees Cook).
Changelog v2 -> v3
------------------
- Compute the application crash period on an on-going basis (Kees Cook).
- Detect a brute force attack through the execve system call (Kees Cook).
- Detect an slow brute force attack (Randy Dunlap).
- Fine tuning the detection taken into account privilege boundary crossing
(Kees Cook).
- Taken into account only fatal signals delivered by the kernel (Kees
Cook).
- Remove the sysctl attributes to fine tuning the detection (Kees Cook).
- Remove the prctls to allow per process enabling/disabling (Kees Cook).
- Improve the documentation (Kees Cook).
- Fix some typos in the documentation (Randy Dunlap).
- Add self-test to validate the expectations (Kees Cook).
John Wood (8):
security: Add LSM hook at the point where a task gets a fatal signal
security/brute: Define a LSM and manage statistical data
securtiy/brute: Detect a brute force attack
security/brute: Fine tuning the attack detection
security/brute: Mitigate a brute force attack
selftests/brute: Add tests for the Brute LSM
Documentation: Add documentation for the Brute LSM
MAINTAINERS: Add a new entry for the Brute LSM
Documentation/admin-guide/LSM/Brute.rst | 224 +++++
Documentation/admin-guide/LSM/index.rst | 1 +
MAINTAINERS | 7 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 4 +
kernel/signal.c | 1 +
security/Kconfig | 11 +-
security/Makefile | 4 +
security/brute/Kconfig | 13 +
security/brute/Makefile | 2 +
security/brute/brute.c | 1102 ++++++++++++++++++++++
security/security.c | 5 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/brute/.gitignore | 2 +
tools/testing/selftests/brute/Makefile | 5 +
tools/testing/selftests/brute/config | 1 +
tools/testing/selftests/brute/exec.c | 44 +
tools/testing/selftests/brute/test.c | 507 ++++++++++
tools/testing/selftests/brute/test.sh | 226 +++++
20 files changed, 2160 insertions(+), 5 deletions(-)
create mode 100644 Documentation/admin-guide/LSM/Brute.rst
create mode 100644 security/brute/Kconfig
create mode 100644 security/brute/Makefile
create mode 100644 security/brute/brute.c
create mode 100644 tools/testing/selftests/brute/.gitignore
create mode 100644 tools/testing/selftests/brute/Makefile
create mode 100644 tools/testing/selftests/brute/config
create mode 100644 tools/testing/selftests/brute/exec.c
create mode 100644 tools/testing/selftests/brute/test.c
create mode 100755 tools/testing/selftests/brute/test.sh
--
2.25.1
From: Mike Rapoport <rppt(a)linux.ibm.com>
Hi,
@Andrew, this is based on v5.11-rc5-mmotm-2021-01-27-23-30, with secretmem
and related patches dropped from there, I can rebase whatever way you
prefer.
This is an implementation of "secret" mappings backed by a file descriptor.
The file descriptor backing secret memory mappings is created using a
dedicated memfd_secret system call The desired protection mode for the
memory is configured using flags parameter of the system call. The mmap()
of the file descriptor created with memfd_secret() will create a "secret"
memory mapping. The pages in that mapping will be marked as not present in
the direct map and will be present only in the page table of the owning mm.
Although normally Linux userspace mappings are protected from other users,
such secret mappings are useful for environments where a hostile tenant is
trying to trick the kernel into giving them access to other tenants
mappings.
Additionally, in the future the secret mappings may be used as a mean to
protect guest memory in a virtual machine host.
For demonstration of secret memory usage we've created a userspace library
https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloade…
that does two things: the first is act as a preloader for openssl to
redirect all the OPENSSL_malloc calls to secret memory meaning any secret
keys get automatically protected this way and the other thing it does is
expose the API to the user who needs it. We anticipate that a lot of the
use cases would be like the openssl one: many toolkits that deal with
secret keys already have special handling for the memory to try to give
them greater protection, so this would simply be pluggable into the
toolkits without any need for user application modification.
Hiding secret memory mappings behind an anonymous file allows usage of
the page cache for tracking pages allocated for the "secret" mappings as
well as using address_space_operations for e.g. page migration callbacks.
The anonymous file may be also used implicitly, like hugetlb files, to
implement mmap(MAP_SECRET) and use the secret memory areas with "native" mm
ABIs in the future.
Removing of the pages from the direct map may cause its fragmentation on
architectures that use large pages to map the physical memory which affects
the system performance. However, the original Kconfig text for
CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can
improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736
("x86: add gbpages switches")) and the recent report [1] showed that "...
although 1G mappings are a good default choice, there is no compelling
evidence that it must be the only choice". Hence, it is sufficient to have
secretmem disabled by default with the ability of a system administrator to
enable it at boot time.
In addition, there is also a long term goal to improve management of the
direct map.
[1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux…
v17:
* Remove pool of large pages backing secretmem allocations, per Michal Hocko
* Add secretmem pages to unevictable LRU, per Michal Hocko
* Use GFP_HIGHUSER as secretmem mapping mask, per Michal Hocko
* Make secretmem an opt-in feature that is disabled by default
v16:
* Fix memory leak intorduced in v15
* Clean the data left from previous page user before handing the page to
the userspace
v15: https://lore.kernel.org/lkml/20210120180612.1058-1-rppt@kernel.org
* Add riscv/Kconfig update to disable set_memory operations for nommu
builds (patch 3)
* Update the code around add_to_page_cache() per Matthew's comments
(patches 6,7)
* Add fixups for build/checkpatch errors discovered by CI systems
v14: https://lore.kernel.org/lkml/20201203062949.5484-1-rppt@kernel.org
* Finally s/mod_node_page_state/mod_lruvec_page_state/
v13: https://lore.kernel.org/lkml/20201201074559.27742-1-rppt@kernel.org
* Added Reviewed-by, thanks Catalin and David
* s/mod_node_page_state/mod_lruvec_page_state/ as Shakeel suggested
Older history:
v12: https://lore.kernel.org/lkml/20201125092208.12544-1-rppt@kernel.org
v11: https://lore.kernel.org/lkml/20201124092556.12009-1-rppt@kernel.org
v10: https://lore.kernel.org/lkml/20201123095432.5860-1-rppt@kernel.org
v9: https://lore.kernel.org/lkml/20201117162932.13649-1-rppt@kernel.org
v8: https://lore.kernel.org/lkml/20201110151444.20662-1-rppt@kernel.org
v7: https://lore.kernel.org/lkml/20201026083752.13267-1-rppt@kernel.org
v6: https://lore.kernel.org/lkml/20200924132904.1391-1-rppt@kernel.org
v5: https://lore.kernel.org/lkml/20200916073539.3552-1-rppt@kernel.org
v4: https://lore.kernel.org/lkml/20200818141554.13945-1-rppt@kernel.org
v3: https://lore.kernel.org/lkml/20200804095035.18778-1-rppt@kernel.org
v2: https://lore.kernel.org/lkml/20200727162935.31714-1-rppt@kernel.org
v1: https://lore.kernel.org/lkml/20200720092435.17469-1-rppt@kernel.org
rfc-v2: https://lore.kernel.org/lkml/20200706172051.19465-1-rppt@kernel.org/
rfc-v1: https://lore.kernel.org/lkml/20200130162340.GA14232@rapoport-lnx/
rfc-v0: https://lore.kernel.org/lkml/1572171452-7958-1-git-send-email-rppt@kernel.o…
Arnd Bergmann (1):
arm64: kfence: fix header inclusion
Mike Rapoport (9):
mm: add definition of PMD_PAGE_ORDER
mmap: make mlock_future_check() global
riscv/Kconfig: make direct map manipulation options depend on MMU
set_memory: allow set_direct_map_*_noflush() for multiple pages
set_memory: allow querying whether set_direct_map_*() is actually enabled
mm: introduce memfd_secret system call to create "secret" memory areas
PM: hibernate: disable when there are active secretmem users
arch, mm: wire up memfd_secret system call where relevant
secretmem: test: add basic selftest for memfd_secret(2)
arch/arm64/include/asm/Kbuild | 1 -
arch/arm64/include/asm/cacheflush.h | 6 -
arch/arm64/include/asm/kfence.h | 2 +-
arch/arm64/include/asm/set_memory.h | 17 ++
arch/arm64/include/uapi/asm/unistd.h | 1 +
arch/arm64/kernel/machine_kexec.c | 1 +
arch/arm64/mm/mmu.c | 6 +-
arch/arm64/mm/pageattr.c | 23 +-
arch/riscv/Kconfig | 4 +-
arch/riscv/include/asm/set_memory.h | 4 +-
arch/riscv/include/asm/unistd.h | 1 +
arch/riscv/mm/pageattr.c | 8 +-
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/x86/include/asm/set_memory.h | 4 +-
arch/x86/mm/pat/set_memory.c | 8 +-
fs/dax.c | 11 +-
include/linux/pgtable.h | 3 +
include/linux/secretmem.h | 30 +++
include/linux/set_memory.h | 16 +-
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 6 +-
include/uapi/linux/magic.h | 1 +
kernel/power/hibernate.c | 5 +-
kernel/power/snapshot.c | 4 +-
kernel/sys_ni.c | 2 +
mm/Kconfig | 3 +
mm/Makefile | 1 +
mm/gup.c | 10 +
mm/internal.h | 3 +
mm/mlock.c | 3 +-
mm/mmap.c | 5 +-
mm/secretmem.c | 261 +++++++++++++++++++
mm/vmalloc.c | 5 +-
scripts/checksyscalls.sh | 4 +
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 3 +-
tools/testing/selftests/vm/memfd_secret.c | 296 ++++++++++++++++++++++
tools/testing/selftests/vm/run_vmtests | 17 ++
39 files changed, 726 insertions(+), 53 deletions(-)
create mode 100644 arch/arm64/include/asm/set_memory.h
create mode 100644 include/linux/secretmem.h
create mode 100644 mm/secretmem.c
create mode 100644 tools/testing/selftests/vm/memfd_secret.c
--
2.28.0
Hi Linus,
Please pull the following KUnit update for Linux 5.12-rc1.
This KUnit update for Linux 5.12-rc1 consists of consists of:
-- support for filtering test suites using glob from Daniel Latypov.
"kunit_filter.glob" command line option is passed to the UML
kernel, which currently only supports filtering by suite name.
This support allows running different subsets of tests, e.g.
$ ./tools/testing/kunit/kunit.py build
$ ./tools/testing/kunit/kunit.py exec 'list*'
$ ./tools/testing/kunit/kunit.py exec 'kunit*'
-- several fixes and cleanups also from Daniel Latypov.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 92bf22614b21a2706f4993b278017e437f7785b3:
Linux 5.11-rc7 (2021-02-07 13:57:38 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-kunit-5.12-rc1
for you to fetch changes up to 7af29141a31a2a2350589471c8979ff5f22fb9b7:
kunit: tool: fix unintentional statefulness in run_kernel()
(2021-02-08 16:10:22 -0700)
----------------------------------------------------------------
linux-kselftest-kunit-5.12-rc1
This KUnit update for Linux 5.12-rc1 consists of consists of:
-- support for filtering test suites using glob from Daniel Latypov.
"kunit_filter.glob" command line option is passed to the UML
kernel, which currently only supports filtering by suite name.
This support allows running different subsets of tests, e.g.
$ ./tools/testing/kunit/kunit.py build
$ ./tools/testing/kunit/kunit.py exec 'list*'
$ ./tools/testing/kunit/kunit.py exec 'kunit*'
-- several fixes and cleanups also from Daniel Latypov.
----------------------------------------------------------------
Daniel Latypov (12):
kunit: tool: fix unit test cleanup handling
kunit: tool: stop using bare asserts in unit test
kunit: tool: use `with open()` in unit test
minor: kunit: tool: fix unit test so it can run from non-root dir
kunit: tool: simplify kconfig is_subset_of() logic
KUnit: Docs: make start.rst example Kconfig follow style.rst
Documentation: kunit: add tips.rst for small examples
kunit: make kunit_tool accept optional path to .kunitconfig fragment
kunit: don't show `1 == 1` in failed assertion messages
kunit: add kunit.filter_glob cmdline option to filter suites
kunit: tool: add support for filtering suites by glob
kunit: tool: fix unintentional statefulness in run_kernel()
Documentation/dev-tools/kunit/index.rst | 2 +
Documentation/dev-tools/kunit/start.rst | 7 +-
Documentation/dev-tools/kunit/tips.rst | 115 ++++++++++++++++++
lib/kunit/Kconfig | 1 +
lib/kunit/assert.c | 39 +++++-
lib/kunit/executor.c | 93 +++++++++++++--
tools/testing/kunit/kunit.py | 30 +++--
tools/testing/kunit/kunit_config.py | 13 +-
tools/testing/kunit/kunit_kernel.py | 18 ++-
tools/testing/kunit/kunit_tool_test.py | 204
+++++++++++++++++---------------
10 files changed, 390 insertions(+), 132 deletions(-)
create mode 100644 Documentation/dev-tools/kunit/tips.rst
----------------------------------------------------------------
Hi Linus,
Please pull the following Kselftest update for Linux 5.12-rc1.
This Kselftest update for Linux 5.12-rc1 consists of:
- dmabuf-heaps test fixes and cleanups from John Stultz.
- seccomp test fix to accept any valid fd in user_notification_addfd.
- Minor fixes to breakpoints and vDSO tests.
- Minor code cleanups to ipc and x86 tests.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 92bf22614b21a2706f4993b278017e437f7785b3:
Linux 5.11-rc7 (2021-02-07 13:57:38 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-next-5.12-rc1
for you to fetch changes up to e0c0840a46db9d50ba7391082d665d74f320c39f:
selftests/seccomp: Accept any valid fd in user_notification_addfd
(2021-02-09 17:39:01 -0700)
----------------------------------------------------------------
linux-kselftest-next-5.12-rc1
This Kselftest update for Linux 5.12-rc1 consists of:
- dmabuf-heaps test fixes and cleanups from John Stultz.
- seccomp test fix to accept any valid fd in user_notification_addfd.
- Minor fixes to breakpoints and vDSO tests.
- Minor code cleanups to ipc and x86 tests.
----------------------------------------------------------------
John Stultz (5):
kselftests: dmabuf-heaps: Fix Makefile's inclusion of the
kernel's usr/include dir
kselftests: dmabuf-heaps: Add clearer checks on DMABUF_BEGIN/END_SYNC
kselftests: dmabuf-heaps: Softly fail if don't find a vgem device
kselftests: dmabuf-heaps: Cleanup test output
kselftests: dmabuf-heaps: Add extra checking that allocated
buffers are zeroed
Seth Forshee (1):
selftests/seccomp: Accept any valid fd in user_notification_addfd
Tiezhu Yang (1):
selftests: breakpoints: Use correct error messages in
breakpoint_test_arm64.c
Tobias Klauser (2):
selftests/vDSO: fix ABI selftest on riscv
selftests/timens: add futex binary to .gitignore
Yang Li (2):
selftests/ipc: remove unneeded semicolon
selftests/x86/ldt_gdt: remove unneeded semicolon
.../selftests/breakpoints/breakpoint_test_arm64.c | 4 +-
tools/testing/selftests/dmabuf-heaps/Makefile | 2 +-
tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c | 149
++++++++++++++++-----
tools/testing/selftests/ipc/msgque.c | 6 +-
tools/testing/selftests/seccomp/seccomp_bpf.c | 8 +-
tools/testing/selftests/timens/.gitignore | 1 +
tools/testing/selftests/vDSO/vdso_config.h | 4 +-
tools/testing/selftests/x86/ldt_gdt.c | 2 +-
8 files changed, 132 insertions(+), 44 deletions(-)
----------------------------------------------------------------
Hi,
This patch series fixes a corner-case with non-overlapping access rights
coming from different layers. This is now handled in a generic way and
verified with new tests. A stricter check is enforced for
landlock_add_rule(2) to forbid useless rules. Finally, the previous
landlock_enforce_ruleset_self(2) is renamed to
landlock_restrict_self(2), which is more consistent.
The SLOC count is 1314 for security/landlock/ and 2484 for
tools/testing/selftest/landlock/ . Test coverage for security/landlock/
is 94.7% of lines. The code not covered only deals with internal kernel
errors (e.g. memory allocation) and race conditions. This series is
being fuzzed by syzkaller, and patches are on their way:
https://github.com/google/syzkaller/pull/2380
The compiled documentation is available here:
https://landlock.io/linux-doc/landlock-v28/userspace-api/landlock.html
This series can be applied on top of v5.11-rc6 . This can be tested
with CONFIG_SECURITY_LANDLOCK, CONFIG_SAMPLE_LANDLOCK and by prepending
"landlock," to CONFIG_LSM. This patch series can be found in a Git
repository here:
https://github.com/landlock-lsm/linux/commits/landlock-v28
This patch series seems ready for upstream and I would really appreciate
final reviews.
# Landlock LSM
The goal of Landlock is to enable to restrict ambient rights (e.g.
global filesystem access) for a set of processes. Because Landlock is a
stackable LSM [1], it makes possible to create safe security sandboxes
as new security layers in addition to the existing system-wide
access-controls. This kind of sandbox is expected to help mitigate the
security impact of bugs or unexpected/malicious behaviors in user-space
applications. Landlock empowers any process, including unprivileged
ones, to securely restrict themselves.
Landlock is inspired by seccomp-bpf but instead of filtering syscalls
and their raw arguments, a Landlock rule can restrict the use of kernel
objects like file hierarchies, according to the kernel semantic.
Landlock also takes inspiration from other OS sandbox mechanisms: XNU
Sandbox, FreeBSD Capsicum or OpenBSD Pledge/Unveil.
In this current form, Landlock misses some access-control features.
This enables to minimize this patch series and ease review. This series
still addresses multiple use cases, especially with the combined use of
seccomp-bpf: applications with built-in sandboxing, init systems,
security sandbox tools and security-oriented APIs [2].
Previous version:
https://lore.kernel.org/lkml/20210121205119.793296-1-mic@digikod.net/
[1] https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler…
[2] https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.n…
Casey Schaufler (1):
LSM: Infrastructure management of the superblock
Mickaël Salaün (11):
landlock: Add object management
landlock: Add ruleset and domain management
landlock: Set up the security framework and manage credentials
landlock: Add ptrace restrictions
fs,security: Add sb_delete hook
landlock: Support filesystem access-control
landlock: Add syscall implementations
arch: Wire up Landlock syscalls
selftests/landlock: Add user space tests
samples/landlock: Add a sandbox manager example
landlock: Add user and kernel documentation
Documentation/security/index.rst | 1 +
Documentation/security/landlock.rst | 79 +
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/landlock.rst | 307 ++
MAINTAINERS | 15 +
arch/Kconfig | 7 +
arch/alpha/kernel/syscalls/syscall.tbl | 3 +
arch/arm/tools/syscall.tbl | 3 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 6 +
arch/ia64/kernel/syscalls/syscall.tbl | 3 +
arch/m68k/kernel/syscalls/syscall.tbl | 3 +
arch/microblaze/kernel/syscalls/syscall.tbl | 3 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 3 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 3 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 3 +
arch/parisc/kernel/syscalls/syscall.tbl | 3 +
arch/powerpc/kernel/syscalls/syscall.tbl | 3 +
arch/s390/kernel/syscalls/syscall.tbl | 3 +
arch/sh/kernel/syscalls/syscall.tbl | 3 +
arch/sparc/kernel/syscalls/syscall.tbl | 3 +
arch/um/Kconfig | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 3 +
arch/x86/entry/syscalls/syscall_64.tbl | 3 +
arch/xtensa/kernel/syscalls/syscall.tbl | 3 +
fs/super.c | 1 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 3 +
include/linux/security.h | 4 +
include/linux/syscalls.h | 7 +
include/uapi/asm-generic/unistd.h | 8 +-
include/uapi/linux/landlock.h | 128 +
kernel/sys_ni.c | 5 +
samples/Kconfig | 7 +
samples/Makefile | 1 +
samples/landlock/.gitignore | 1 +
samples/landlock/Makefile | 13 +
samples/landlock/sandboxer.c | 238 ++
security/Kconfig | 11 +-
security/Makefile | 2 +
security/landlock/Kconfig | 21 +
security/landlock/Makefile | 4 +
security/landlock/common.h | 20 +
security/landlock/cred.c | 46 +
security/landlock/cred.h | 58 +
security/landlock/fs.c | 627 ++++
security/landlock/fs.h | 56 +
security/landlock/limits.h | 21 +
security/landlock/object.c | 67 +
security/landlock/object.h | 91 +
security/landlock/ptrace.c | 120 +
security/landlock/ptrace.h | 14 +
security/landlock/ruleset.c | 473 +++
security/landlock/ruleset.h | 165 +
security/landlock/setup.c | 40 +
security/landlock/setup.h | 18 +
security/landlock/syscalls.c | 444 +++
security/security.c | 51 +-
security/selinux/hooks.c | 58 +-
security/selinux/include/objsec.h | 6 +
security/selinux/ss/services.c | 3 +-
security/smack/smack.h | 6 +
security/smack/smack_lsm.c | 35 +-
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/landlock/.gitignore | 2 +
tools/testing/selftests/landlock/Makefile | 24 +
tools/testing/selftests/landlock/base_test.c | 219 ++
tools/testing/selftests/landlock/common.h | 169 ++
tools/testing/selftests/landlock/config | 6 +
tools/testing/selftests/landlock/fs_test.c | 2664 +++++++++++++++++
.../testing/selftests/landlock/ptrace_test.c | 314 ++
tools/testing/selftests/landlock/true.c | 5 +
72 files changed, 6668 insertions(+), 77 deletions(-)
create mode 100644 Documentation/security/landlock.rst
create mode 100644 Documentation/userspace-api/landlock.rst
create mode 100644 include/uapi/linux/landlock.h
create mode 100644 samples/landlock/.gitignore
create mode 100644 samples/landlock/Makefile
create mode 100644 samples/landlock/sandboxer.c
create mode 100644 security/landlock/Kconfig
create mode 100644 security/landlock/Makefile
create mode 100644 security/landlock/common.h
create mode 100644 security/landlock/cred.c
create mode 100644 security/landlock/cred.h
create mode 100644 security/landlock/fs.c
create mode 100644 security/landlock/fs.h
create mode 100644 security/landlock/limits.h
create mode 100644 security/landlock/object.c
create mode 100644 security/landlock/object.h
create mode 100644 security/landlock/ptrace.c
create mode 100644 security/landlock/ptrace.h
create mode 100644 security/landlock/ruleset.c
create mode 100644 security/landlock/ruleset.h
create mode 100644 security/landlock/setup.c
create mode 100644 security/landlock/setup.h
create mode 100644 security/landlock/syscalls.c
create mode 100644 tools/testing/selftests/landlock/.gitignore
create mode 100644 tools/testing/selftests/landlock/Makefile
create mode 100644 tools/testing/selftests/landlock/base_test.c
create mode 100644 tools/testing/selftests/landlock/common.h
create mode 100644 tools/testing/selftests/landlock/config
create mode 100644 tools/testing/selftests/landlock/fs_test.c
create mode 100644 tools/testing/selftests/landlock/ptrace_test.c
create mode 100644 tools/testing/selftests/landlock/true.c
base-commit: 1048ba83fb1c00cd24172e23e8263972f6b5d9ac
--
2.30.0
Hi,
This patch series introduces the futex2 syscalls.
* What happened to the current futex()?
For some years now, developers have been trying to add new features to
futex, but maintainers have been reluctant to accept then, given the
multiplexed interface full of legacy features and tricky to do big
changes. Some problems that people tried to address with patchsets are:
NUMA-awareness[0], smaller sized futexes[1], wait on multiple futexes[2].
NUMA, for instance, just doesn't fit the current API in a reasonable
way. Considering that, it's not possible to merge new features into the
current futex.
** The NUMA problem
At the current implementation, all futex kernel side infrastructure is
stored on a single node. Given that, all futex() calls issued by
processors that aren't located on that node will have a memory access
penalty when doing it.
** The 32bit sized futex problem
Embedded systems or anything with memory constrains would benefit of
using smaller sizes for the futex userspace integer. Also, a mutex
implementation can be done using just three values, so 8 bits is enough
for various scenarios.
** The wait on multiple problem
The use case lies in the Wine implementation of the Windows NT interface
WaitMultipleObjects. This Windows API function allows a thread to sleep
waiting on the first of a set of event sources (mutexes, timers, signal,
console input, etc) to signal. Considering this is a primitive
synchronization operation for Windows applications, being able to quickly
signal events on the producer side, and quickly go to sleep on the
consumer side is essential for good performance of those running
over Wine.
[0] https://lore.kernel.org/lkml/20160505204230.932454245@linutronix.de/
[1] https://lore.kernel.org/lkml/20191221155659.3159-2-malteskarupke@web.de/
[2] https://lore.kernel.org/lkml/20200213214525.183689-1-andrealmeid@collabora.…
* The solution
As proposed by Peter Zijlstra and Florian Weimer[3], a new interface
is required to solve this, which must be designed with those features in
mind. futex2() is that interface. As opposed to the current multiplexed
interface, the new one should have one syscall per operation. This will
allow the maintainability of the API if it gets extended, and will help
users with type checking of arguments.
In particular, the new interface is extended to support the ability to
wait on any of a list of futexes at a time, which could be seen as a
vectored extension of the FUTEX_WAIT semantics.
[3] https://lore.kernel.org/lkml/20200303120050.GC2596@hirez.programming.kicks-…
* The interface
The new interface can be seen in details in the following patches, but
this is a high level summary of what the interface can do:
- Supports wake/wait semantics, as in futex()
- Supports requeue operations, similarly as FUTEX_CMP_REQUEUE, but with
individual flags for each address
- Supports waiting for a vector of futexes, using a new syscall named
futex_waitv()
- Supports variable sized futexes (8bits, 16bits and 32bits)
- Supports NUMA-awareness operations, where the user can specify on
which memory node would like to operate
* Implementation
The internal implementation follows a similar design to the original futex.
Given that we want to replicate the same external behavior of current
futex, this should be somewhat expected. For some functions, like the
init and the code to get a shared key, I literally copied code and
comments from kernel/futex.c. I decided to do so instead of exposing the
original function as a public function since in that way we can freely
modify our implementation if required, without any impact on old futex.
Also, the comments precisely describes the details and corner cases of
the implementation.
Each patch contains a brief description of implementation, but patch 6
"docs: locking: futex2: Add documentation" adds a more complete document
about it.
* The patchset
This patchset can be also found at my git tree:
https://gitlab.collabora.com/tonyk/linux/-/tree/futex2
- Patch 1: Implements wait/wake, and the basics foundations of futex2
- Patches 2-4: Implement the remaining features (shared, waitv, requeue).
- Patch 5: Adds the x86_x32 ABI handling. I kept it in a separated
patch since I'm not sure if x86_x32 is still a thing, or if it should
return -ENOSYS.
- Patch 6: Add a documentation file which details the interface and
the internal implementation.
- Patches 7-13: Selftests for all operations along with perf
support for futex2.
- Patch 14: While working on porting glibc for futex2, I found out
that there's a futex_wake() call at the user thread exit path, if
that thread was created with clone(..., CLONE_CHILD_SETTID, ...). In
order to make pthreads work with futex2, it was required to add
this patch. Note that this is more a proof-of-concept of what we
will need to do in future, rather than part of the interface and
shouldn't be merged as it is.
* Testing:
This patchset provides selftests for each operation and their flags.
Along with that, the following work was done:
** Stability
To stress the interface in "real world scenarios":
- glibc[4]: nptl's low level locking was modified to use futex2 API
(except for robust and PI things). All relevant nptl/ tests passed.
- Wine[5]: Proton/Wine was modified in order to use futex2() for the
emulation of Windows NT sync mechanisms based on futex, called "fsync".
Triple-A games with huge CPU's loads and tons of parallel jobs worked
as expected when compared with the previous FUTEX_WAIT_MULTIPLE
implementation at futex(). Some games issue 42k futex2() calls
per second.
- Full GNU/Linux distro: I installed the modified glibc in my host
machine, so all pthread's programs would use futex2(). After tweaking
systemd[6] to allow futex2() calls at seccomp, everything worked as
expected (web browsers do some syscall sandboxing and need some
configuration as well).
- perf: The perf benchmarks tests can also be used to stress the
interface, and they can be found in this patchset.
** Performance
- For comparing futex() and futex2() performance, I used the artificial
benchmarks implemented at perf (wake, wake-parallel, hash and
requeue). The setup was 200 runs for each test and using 8, 80, 800,
8000 for the number of threads, Note that for this test, I'm not using
patch 14 ("kernel: Enable waitpid() for futex2") , for reasons explained
at "The patchset" section.
- For the first three ones, I measured an average of 4% gain in
performance. This is not a big step, but it shows that the new
interface is at least comparable in performance with the current one.
- For requeue, I measured an average of 21% decrease in performance
compared to the original futex implementation. This is expected given
the new design with individual flags. The performance trade-offs are
explained at patch 4 ("futex2: Implement requeue operation").
[4] https://gitlab.collabora.com/tonyk/glibc/-/tree/futex2
[5] https://gitlab.collabora.com/tonyk/wine/-/tree/proton_5.13
[6] https://gitlab.collabora.com/tonyk/systemd
* FAQ
** "Where's the code for NUMA and FUTEX_8/16?"
The current code is already complex enough to take some time for
review, so I believe it's better to split that work out to a future
iteration of this patchset. Besides that, this RFC is the core part of the
infrastructure, and the following features will not pose big design
changes to it, the work will be more about wiring up the flags and
modifying some functions.
** "And what's about FUTEX_64?"
By supporting 64 bit futexes, the kernel structure for futex would
need to have a 64 bit field for the value, and that could defeat one of
the purposes of having different sized futexes in the first place:
supporting smaller ones to decrease memory usage. This might be
something that could be disabled for 32bit archs (and even for
CONFIG_BASE_SMALL).
Which use case would benefit for FUTEX_64? Does it worth the trade-offs?
** "Where's the PI/robust stuff?"
As said by Peter Zijlstra at [3], all those new features are related to
the "simple" futex interface, that doesn't use PI or robust. Do we want
to have this complexity at futex2() and if so, should it be part of
this patchset or can it be future work?
Thanks,
André
André Almeida (13):
futex2: Implement wait and wake functions
futex2: Add support for shared futexes
futex2: Implement vectorized wait
futex2: Implement requeue operation
futex2: Add compatibility entry point for x86_x32 ABI
docs: locking: futex2: Add documentation
selftests: futex2: Add wake/wait test
selftests: futex2: Add timeout test
selftests: futex2: Add wouldblock test
selftests: futex2: Add waitv test
selftests: futex2: Add requeue test
perf bench: Add futex2 benchmark tests
kernel: Enable waitpid() for futex2
Documentation/locking/futex2.rst | 198 +++
Documentation/locking/index.rst | 1 +
MAINTAINERS | 2 +-
arch/arm/tools/syscall.tbl | 4 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 4 +
arch/x86/entry/syscalls/syscall_32.tbl | 4 +
arch/x86/entry/syscalls/syscall_64.tbl | 4 +
fs/inode.c | 1 +
include/linux/compat.h | 23 +
include/linux/fs.h | 1 +
include/linux/syscalls.h | 18 +
include/uapi/asm-generic/unistd.h | 14 +-
include/uapi/linux/futex.h | 56 +
init/Kconfig | 7 +
kernel/Makefile | 1 +
kernel/fork.c | 2 +
kernel/futex2.c | 1255 +++++++++++++++++
kernel/sys_ni.c | 6 +
tools/arch/x86/include/asm/unistd_64.h | 12 +
tools/include/uapi/asm-generic/unistd.h | 11 +-
.../arch/x86/entry/syscalls/syscall_64.tbl | 3 +
tools/perf/bench/bench.h | 4 +
tools/perf/bench/futex-hash.c | 24 +-
tools/perf/bench/futex-requeue.c | 57 +-
tools/perf/bench/futex-wake-parallel.c | 41 +-
tools/perf/bench/futex-wake.c | 37 +-
tools/perf/bench/futex.h | 47 +
tools/perf/builtin-bench.c | 18 +-
.../selftests/futex/functional/.gitignore | 3 +
.../selftests/futex/functional/Makefile | 8 +-
.../futex/functional/futex2_requeue.c | 164 +++
.../selftests/futex/functional/futex2_wait.c | 209 +++
.../selftests/futex/functional/futex2_waitv.c | 157 +++
.../futex/functional/futex_wait_timeout.c | 58 +-
.../futex/functional/futex_wait_wouldblock.c | 33 +-
.../testing/selftests/futex/functional/run.sh | 6 +
.../selftests/futex/include/futex2test.h | 121 ++
38 files changed, 2563 insertions(+), 53 deletions(-)
create mode 100644 Documentation/locking/futex2.rst
create mode 100644 kernel/futex2.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_requeue.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_wait.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_waitv.c
create mode 100644 tools/testing/selftests/futex/include/futex2test.h
--
2.30.1
v1 by Uriel is here: [1].
Since it's been a while, I've dropped the Reviewed-By's.
It depended on commit 83c4e7a0363b ("KUnit: KASAN Integration") which
hadn't been merged yet, so that caused some kerfuffle with applying them
previously and the series was reverted.
This revives the series but makes the kunit_fail_current_test() function
take a format string and logs the file and line number of the failing
code, addressing Alan Maguire's comments on the previous version.
As a result, the patch that makes UBSAN errors was tweaked slightly to
include an error message.
v2 -> v3:
Fix kunit_fail_current_test() so it works w/ CONFIG_KUNIT=m
s/_/__ on the helper func to match others in test.c
[1] https://lore.kernel.org/linux-kselftest/20200806174326.3577537-1-urielguaja…
Uriel Guajardo (2):
kunit: support failure from dynamic analysis tools
kunit: ubsan integration
include/kunit/test-bug.h | 30 ++++++++++++++++++++++++++++++
lib/kunit/test.c | 37 +++++++++++++++++++++++++++++++++----
lib/ubsan.c | 3 +++
3 files changed, 66 insertions(+), 4 deletions(-)
create mode 100644 include/kunit/test-bug.h
base-commit: 1e0d27fce010b0a4a9e595506b6ede75934c31be
--
2.30.0.478.g8a0d178c01-goog
pipe() and socketpair() have different behavior wrt edge-triggered
read epoll, in that no event is generated when data is written into
a non-empty pipe, but an event is generated if socketpair() is used
instead.
This simple modification of the epoll2 testlet from
tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
(it just adds a second write) shows the different behavior.
The testlet passes with pipe() but fails with socketpair() with 5.10.
They both fail with 4.19.
Is it fair to assume that 5.10 pipe's behavior is the correct one?
Thanks,
Francesco Ruggeri
/*
* t0
* | (ew)
* e0
* | (et)
* s0
*/
TEST(epoll2)
{
int efd;
int sfd[2];
struct epoll_event e;
ASSERT_EQ(socketpair(AF_UNIX, SOCK_STREAM, 0, sfd), 0);
//ASSERT_EQ(pipe(sfd), 0);
efd = epoll_create(1);
ASSERT_GE(efd, 0);
e.events = EPOLLIN | EPOLLET;
ASSERT_EQ(epoll_ctl(efd, EPOLL_CTL_ADD, sfd[0], &e), 0);
ASSERT_EQ(write(sfd[1], "w", 1), 1);
EXPECT_EQ(epoll_wait(efd, &e, 1, 0), 1);
ASSERT_EQ(write(sfd[1], "w", 1), 1);
EXPECT_EQ(epoll_wait(efd, &e, 1, 0), 0);
close(efd);
close(sfd[0]);
close(sfd[1]);
}
This is an optimization of a set of sgx-related codes, each of which
is independent of the patch. Because the second and third patches have
conflicting dependencies, these patches are put together.
---
v3 changes:
* split free_cnt count and spin lock optimization into two patches
v2 changes:
* review suggested changes
Tianjia Zhang (5):
selftests/x86: Simplify the code to get vdso base address in sgx
x86/sgx: Optimize the locking range in sgx_sanitize_section()
x86/sgx: Optimize the free_cnt count in sgx_epc_section
x86/sgx: Allows ioctl PROVISION to execute before CREATE
x86/sgx: Remove redundant if conditions in sgx_encl_create
arch/x86/kernel/cpu/sgx/driver.c | 1 +
arch/x86/kernel/cpu/sgx/ioctl.c | 9 +++++----
arch/x86/kernel/cpu/sgx/main.c | 13 +++++--------
tools/testing/selftests/sgx/main.c | 24 ++++--------------------
4 files changed, 15 insertions(+), 32 deletions(-)
--
2.19.1.3.ge56e4f7
Changelog
---------
v11
- Another build fix reported by robot on i386: moved is_pinnable_page()
below set_page_section() in linux/mm.h
v10
- Fixed !CONFIG_MMU compiler issues by adding is_zero_pfn() stub.
v9
- Renamed gpf_to_alloc_flags() to gfp_to_alloc_flags_cma(); thanks Lecopzer
Chen for noticing.
- Fixed warning reported scripts/checkpatch.pl:
"Logical continuations should be on the previous line"
v8
- Added reviewed by's from John Hubbard
- Fixed subjects for selftests patches
- Moved zero page check inside is_pinnable_page() as requested by Jason
Gunthorpe.
v7
- Added reviewed-by's
- Fixed a compile bug on non-mmu builds reported by robot
v6
Small update, but I wanted to send it out quicker, as it removes a
controversial patch and replaces it with something sane.
- Removed forcing FOLL_WRITE for longterm gup, instead added a patch to
skip zero pages during migration.
- Added reviewed-by's and minor log changes.
v5
- Added the following patches to the beginning of series, which are fixes
to the other existing problems with CMA migration code:
mm/gup: check every subpage of a compound page during isolation
mm/gup: return an error on migration failure
mm/gup: check for isolation errors also at the beginning of series
mm/gup: do not allow zero page for pinned pages
- remove .gfp_mask/.reclaim_idx changes from mm/vmscan.c
- update movable zone header comment in patch 8 instead of patch 3, fix
the comment
- Added acked, sign-offs
- Updated commit logs based on feedback
- Addressed issues reported by Michal and Jason.
- Remove:
#define PINNABLE_MIGRATE_MAX 10
#define PINNABLE_ISOLATE_MAX 100
Instead: fail on the first migration failure, and retry isolation
forever as their failures are transient.
- In self-set addressed some of the comments from John Hubbard, updated
commit logs, and added comments. Renamed gup->flags with gup->test_flags.
v4
- Address page migration comments. New patch:
mm/gup: limit number of gup migration failures, honor failures
Implements the limiting number of retries for migration failures, and
also check for isolation failures.
Added a test case into gup_test to verify that pages never long-term
pinned in a movable zone, and also added tests to fault both in kernel
and in userland.
v3
- Merged with linux-next, which contains clean-up patch from Jason,
therefore this series is reduced by two patches which did the same
thing.
v2
- Addressed all review comments
- Added Reviewed-by's.
- Renamed PF_MEMALLOC_NOMOVABLE to PF_MEMALLOC_PIN
- Added is_pinnable_page() to check if page can be longterm pinned
- Fixed gup fast path by checking is_in_pinnable_zone()
- rename cma_page_list to movable_page_list
- add a admin-guide note about handling pinned pages in ZONE_MOVABLE,
updated caveat about pinned pages from linux/mmzone.h
- Move current_gfp_context() to fast-path
---------
When page is pinned it cannot be moved and its physical address stays
the same until pages is unpinned.
This is useful functionality to allows userland to implementation DMA
access. For example, it is used by vfio in vfio_pin_pages().
However, this functionality breaks memory hotplug/hotremove assumptions
that pages in ZONE_MOVABLE can always be migrated.
This patch series fixes this issue by forcing new allocations during
page pinning to omit ZONE_MOVABLE, and also to migrate any existing
pages from ZONE_MOVABLE during pinning.
It uses the same scheme logic that is currently used by CMA, and extends
the functionality for all allocations.
For more information read the discussion [1] about this problem.
[1] https://lore.kernel.org/lkml/CA+CK2bBffHBxjmb9jmSKacm0fJMinyt3Nhk8Nx6iudcQS…
Previous versions:
v1
https://lore.kernel.org/lkml/20201202052330.474592-1-pasha.tatashin@soleen.…
v2
https://lore.kernel.org/lkml/20201210004335.64634-1-pasha.tatashin@soleen.c…
v3
https://lore.kernel.org/lkml/20201211202140.396852-1-pasha.tatashin@soleen.…
v4
https://lore.kernel.org/lkml/20201217185243.3288048-1-pasha.tatashin@soleen…
v5
https://lore.kernel.org/lkml/20210119043920.155044-1-pasha.tatashin@soleen.…
v6
https://lore.kernel.org/lkml/20210120014333.222547-1-pasha.tatashin@soleen.…
v7
https://lore.kernel.org/lkml/20210122033748.924330-1-pasha.tatashin@soleen.…
v8
https://lore.kernel.org/lkml/20210125194751.1275316-1-pasha.tatashin@soleen…
v9
https://lore.kernel.org/lkml/20210201153827.444374-1-pasha.tatashin@soleen.…
v10
https://lore.kernel.org/lkml/20210211162427.618913-1-pasha.tatashin@soleen.…
Pavel Tatashin (14):
mm/gup: don't pin migrated cma pages in movable zone
mm/gup: check every subpage of a compound page during isolation
mm/gup: return an error on migration failure
mm/gup: check for isolation errors
mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_PIN
mm: apply per-task gfp constraints in fast path
mm: honor PF_MEMALLOC_PIN for all movable pages
mm/gup: do not migrate zero page
mm/gup: migrate pinned pages out of movable zone
memory-hotplug.rst: add a note about ZONE_MOVABLE and page pinning
mm/gup: change index type to long as it counts pages
mm/gup: longterm pin migration cleanup
selftests/vm: gup_test: fix test flag
selftests/vm: gup_test: test faulting in kernel, and verify pinnable
pages
.../admin-guide/mm/memory-hotplug.rst | 9 +
include/linux/migrate.h | 1 +
include/linux/mm.h | 19 ++
include/linux/mmzone.h | 13 +-
include/linux/pgtable.h | 12 ++
include/linux/sched.h | 2 +-
include/linux/sched/mm.h | 27 +--
include/trace/events/migrate.h | 3 +-
mm/gup.c | 174 ++++++++----------
mm/gup_test.c | 29 +--
mm/gup_test.h | 3 +-
mm/hugetlb.c | 4 +-
mm/page_alloc.c | 33 ++--
tools/testing/selftests/vm/gup_test.c | 36 +++-
14 files changed, 208 insertions(+), 157 deletions(-)
--
2.25.1
Musl libc does not define the glibc specific macro __GLIBC_PREREQ(), but
it has the clock_adjtime() function. Assume that a libc implementation
which does not define __GLIBC_PREREQ at all still implements
clock_adjtime().
This fixes a build problem with musl libc because the __GLIBC_PREREQ()
macro is missing.
Fixes: 42e1358e103d ("ptp: In the testptp utility, use clock_adjtime from glibc when available")
Signed-off-by: Hauke Mehrtens <hauke(a)hauke-m.de>
---
tools/testing/selftests/ptp/testptp.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/ptp/testptp.c b/tools/testing/selftests/ptp/testptp.c
index f7911aaeb007..ecffe2c78543 100644
--- a/tools/testing/selftests/ptp/testptp.c
+++ b/tools/testing/selftests/ptp/testptp.c
@@ -38,6 +38,7 @@
#define NSEC_PER_SEC 1000000000LL
/* clock_adjtime is not available in GLIBC < 2.14 */
+#ifdef __GLIBC_PREREQ
#if !__GLIBC_PREREQ(2, 14)
#include <sys/syscall.h>
static int clock_adjtime(clockid_t id, struct timex *tx)
@@ -45,6 +46,7 @@ static int clock_adjtime(clockid_t id, struct timex *tx)
return syscall(__NR_clock_adjtime, id, tx);
}
#endif
+#endif /* __GLIBC_PREREQ */
static void show_flag_test(int rq_index, unsigned int flags, int err)
{
--
2.20.1
This is an optimization of a set of sgx-related codes, each of which
is independent of the patch. Because the second and third patches have
conflicting dependencies, these patches are put together.
---
v4 changes:
* Improvements suggested by review
v3 changes:
* split free_cnt count and spin lock optimization into two patches
v2 changes:
* review suggested changes
Tianjia Zhang (5):
selftests/x86: Use getauxval() to simplify the code in sgx
x86/sgx: Reduce the locking range in sgx_sanitize_section()
x86/sgx: Optimize the free_cnt count in sgx_epc_section
x86/sgx: Allows ioctl PROVISION to execute before CREATE
x86/sgx: Remove redundant if conditions in sgx_encl_create
arch/x86/kernel/cpu/sgx/driver.c | 1 +
arch/x86/kernel/cpu/sgx/ioctl.c | 8 ++++----
arch/x86/kernel/cpu/sgx/main.c | 13 +++++--------
tools/testing/selftests/sgx/main.c | 24 ++++--------------------
4 files changed, 14 insertions(+), 32 deletions(-)
--
2.19.1.3.ge56e4f7
Changelog
---------
v10
- Fixed !CONFIG_MMU compiler issues by adding is_zero_pfn() stub.
v9
- Renamed gpf_to_alloc_flags() to gfp_to_alloc_flags_cma(); thanks Lecopzer
Chen for noticing.
- Fixed warning reported scripts/checkpatch.pl:
"Logical continuations should be on the previous line"
v8
- Added reviewed by's from John Hubbard
- Fixed subjects for selftests patches
- Moved zero page check inside is_pinnable_page() as requested by Jason
Gunthorpe.
v7
- Added reviewed-by's
- Fixed a compile bug on non-mmu builds reported by robot
v6
Small update, but I wanted to send it out quicker, as it removes a
controversial patch and replaces it with something sane.
- Removed forcing FOLL_WRITE for longterm gup, instead added a patch to
skip zero pages during migration.
- Added reviewed-by's and minor log changes.
v5
- Added the following patches to the beginning of series, which are fixes
to the other existing problems with CMA migration code:
mm/gup: check every subpage of a compound page during isolation
mm/gup: return an error on migration failure
mm/gup: check for isolation errors also at the beginning of series
mm/gup: do not allow zero page for pinned pages
- remove .gfp_mask/.reclaim_idx changes from mm/vmscan.c
- update movable zone header comment in patch 8 instead of patch 3, fix
the comment
- Added acked, sign-offs
- Updated commit logs based on feedback
- Addressed issues reported by Michal and Jason.
- Remove:
#define PINNABLE_MIGRATE_MAX 10
#define PINNABLE_ISOLATE_MAX 100
Instead: fail on the first migration failure, and retry isolation
forever as their failures are transient.
- In self-set addressed some of the comments from John Hubbard, updated
commit logs, and added comments. Renamed gup->flags with gup->test_flags.
v4
- Address page migration comments. New patch:
mm/gup: limit number of gup migration failures, honor failures
Implements the limiting number of retries for migration failures, and
also check for isolation failures.
Added a test case into gup_test to verify that pages never long-term
pinned in a movable zone, and also added tests to fault both in kernel
and in userland.
v3
- Merged with linux-next, which contains clean-up patch from Jason,
therefore this series is reduced by two patches which did the same
thing.
v2
- Addressed all review comments
- Added Reviewed-by's.
- Renamed PF_MEMALLOC_NOMOVABLE to PF_MEMALLOC_PIN
- Added is_pinnable_page() to check if page can be longterm pinned
- Fixed gup fast path by checking is_in_pinnable_zone()
- rename cma_page_list to movable_page_list
- add a admin-guide note about handling pinned pages in ZONE_MOVABLE,
updated caveat about pinned pages from linux/mmzone.h
- Move current_gfp_context() to fast-path
---------
When page is pinned it cannot be moved and its physical address stays
the same until pages is unpinned.
This is useful functionality to allows userland to implementation DMA
access. For example, it is used by vfio in vfio_pin_pages().
However, this functionality breaks memory hotplug/hotremove assumptions
that pages in ZONE_MOVABLE can always be migrated.
This patch series fixes this issue by forcing new allocations during
page pinning to omit ZONE_MOVABLE, and also to migrate any existing
pages from ZONE_MOVABLE during pinning.
It uses the same scheme logic that is currently used by CMA, and extends
the functionality for all allocations.
For more information read the discussion [1] about this problem.
[1] https://lore.kernel.org/lkml/CA+CK2bBffHBxjmb9jmSKacm0fJMinyt3Nhk8Nx6iudcQS…
Previous versions:
v1
https://lore.kernel.org/lkml/20201202052330.474592-1-pasha.tatashin@soleen.…
v2
https://lore.kernel.org/lkml/20201210004335.64634-1-pasha.tatashin@soleen.c…
v3
https://lore.kernel.org/lkml/20201211202140.396852-1-pasha.tatashin@soleen.…
v4
https://lore.kernel.org/lkml/20201217185243.3288048-1-pasha.tatashin@soleen…
v5
https://lore.kernel.org/lkml/20210119043920.155044-1-pasha.tatashin@soleen.…
v6
https://lore.kernel.org/lkml/20210120014333.222547-1-pasha.tatashin@soleen.…
v7
https://lore.kernel.org/lkml/20210122033748.924330-1-pasha.tatashin@soleen.…
v8
https://lore.kernel.org/lkml/20210125194751.1275316-1-pasha.tatashin@soleen…
v9
https://lore.kernel.org/lkml/20210201153827.444374-1-pasha.tatashin@soleen.…
Pavel Tatashin (14):
mm/gup: don't pin migrated cma pages in movable zone
mm/gup: check every subpage of a compound page during isolation
mm/gup: return an error on migration failure
mm/gup: check for isolation errors
mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_PIN
mm: apply per-task gfp constraints in fast path
mm: honor PF_MEMALLOC_PIN for all movable pages
mm/gup: do not migrate zero page
mm/gup: migrate pinned pages out of movable zone
memory-hotplug.rst: add a note about ZONE_MOVABLE and page pinning
mm/gup: change index type to long as it counts pages
mm/gup: longterm pin migration cleanup
selftests/vm: gup_test: fix test flag
selftests/vm: gup_test: test faulting in kernel, and verify pinnable
pages
.../admin-guide/mm/memory-hotplug.rst | 9 +
include/linux/migrate.h | 1 +
include/linux/mm.h | 19 ++
include/linux/mmzone.h | 13 +-
include/linux/pgtable.h | 12 ++
include/linux/sched.h | 2 +-
include/linux/sched/mm.h | 27 +--
include/trace/events/migrate.h | 3 +-
mm/gup.c | 174 ++++++++----------
mm/gup_test.c | 29 +--
mm/gup_test.h | 3 +-
mm/hugetlb.c | 4 +-
mm/page_alloc.c | 33 ++--
tools/testing/selftests/vm/gup_test.c | 36 +++-
14 files changed, 208 insertions(+), 157 deletions(-)
--
2.25.1
If a signed number field starts with a '-' the field width must be > 1,
or unlimited, to allow at least one digit after the '-'.
This patch adds a check for this. If a signed field starts with '-'
and field_width == 1 the scanf will quit.
It is ok for a signed number field to have a field width of 1 if it
starts with a digit. In that case the single digit can be converted.
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
Reviewed-by: Petr Mladek <pmladek(a)suse.com>
Acked-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
---
lib/vsprintf.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 3b53c73580c5..28bb26cd1f67 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -3434,8 +3434,12 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
str = skip_spaces(str);
digit = *str;
- if (is_sign && digit == '-')
+ if (is_sign && digit == '-') {
+ if (field_width == 1)
+ break;
+
digit = *(str + 1);
+ }
if (!digit
|| (base == 16 && !isxdigit(digit))
--
2.20.1
If a signed number field starts with a '-' the field width must be > 1,
or unlimited, to allow at least one digit after the '-'.
This patch adds a check for this. If a signed field starts with '-'
and field_width == 1 the scanf will quit.
It is ok for a signed number field to have a field width of 1 if it
starts with a digit. In that case the single digit can be converted.
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
Reviewed-by: Petr Mladek <pmladek(a)suse.com>
Acked-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
---
lib/vsprintf.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 3b53c73580c5..28bb26cd1f67 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -3434,8 +3434,12 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
str = skip_spaces(str);
digit = *str;
- if (is_sign && digit == '-')
+ if (is_sign && digit == '-') {
+ if (field_width == 1)
+ break;
+
digit = *(str + 1);
+ }
if (!digit
|| (base == 16 && !isxdigit(digit))
--
2.20.1
From: Willem de Bruijn <willemb(a)google.com>
FIXTURE_VARIANT data is passed to FIXTURE_SETUP and TEST_F as variant.
In some cases, the variant will change the setup, such that expections
also change on teardown. Also pass variant to FIXTURE_TEARDOWN.
The new FIXTURE_TEARDOWN logic is identical to that in FIXTURE_SETUP,
right above.
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
---
For one use of this see tentative
tools/testing/selftests/filesystems/selectpoll.c kselftest at
https://github.com/wdebruij/linux-next-mirror/commit/12b4d183ac9140c1360637…
---
tools/testing/selftests/kselftest_harness.h | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index f19804df244c..6a27e79278e8 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -283,7 +283,9 @@
#define FIXTURE_TEARDOWN(fixture_name) \
void fixture_name##_teardown( \
struct __test_metadata __attribute__((unused)) *_metadata, \
- FIXTURE_DATA(fixture_name) __attribute__((unused)) *self)
+ FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \
+ const FIXTURE_VARIANT(fixture_name) \
+ __attribute__((unused)) *variant)
/**
* FIXTURE_VARIANT(fixture_name) - Optionally called once per fixture
@@ -298,9 +300,9 @@
* ...
* };
*
- * Defines type of constant parameters provided to FIXTURE_SETUP() and TEST_F()
- * as *variant*. Variants allow the same tests to be run with different
- * arguments.
+ * Defines type of constant parameters provided to FIXTURE_SETUP(), TEST_F() and
+ * FIXTURE_TEARDOWN as *variant*. Variants allow the same tests to be run with
+ * different arguments.
*/
#define FIXTURE_VARIANT(fixture_name) struct _fixture_variant_##fixture_name
@@ -382,7 +384,7 @@
if (!_metadata->passed) \
return; \
fixture_name##_##test_name(_metadata, &self, variant->data); \
- fixture_name##_teardown(_metadata, &self); \
+ fixture_name##_teardown(_metadata, &self, variant->data); \
} \
static struct __test_metadata \
_##fixture_name##_##test_name##_object = { \
--
2.29.2.576.ga3fc446d84-goog
We're working on a user space control plane for the BPF sk_lookup
hook [1]. The hook attaches to a network namespace and allows
control over which socket receives a new connection / packet.
Roughly, applications can give a socket to our user space component
to participate in custom bind semantics. This creates an edge case
where an application can provide us with a socket that lives in
a different network namespace than our BPF sk_lookup program.
We'd like to return an error in this case.
Additionally, we have some user space state that is tied to the
network namespace. We currently use the inode of the nsfs entry
in a directory name, but this is suffers from inode reuse.
I'm proposing to fix both of these issues by adding a new
SO_NETNS_COOKIE socket option as well as a NS_GET_COOKIE ioctl.
Using these we get a stable, unique identifier for a network
namespace and check whether a socket belongs to the "correct"
namespace.
NS_GET_COOKIE could be renamed to NS_GET_NET_COOKIE. I kept the
name generic because it seems like other namespace types could
benefit from a cookie as well.
I'm trying to land this via the bpf tree since this is where the
netns cookie originated, please let me know if this isn't
appropriate.
1: https://www.kernel.org/doc/html/latest/bpf/prog_sk_lookup.html
Cc: bpf(a)vger.kernel.org
Cc: linux-alpha(a)vger.kernel.org
Cc: linux-api(a)vger.kernel.org
Cc: linux-arch(a)vger.kernel.org
Cc: linux-fsdevel(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-mips(a)vger.kernel.org
Cc: linux-parisc(a)vger.kernel.org
Cc: netdev(a)vger.kernel.org
Cc: sparclinux(a)vger.kernel.org
Lorenz Bauer (4):
net: add SO_NETNS_COOKIE socket option
nsfs: add an ioctl to discover the network namespace cookie
tools/testing: add test for NS_GET_COOKIE
tools/testing: add a selftest for SO_NETNS_COOKIE
arch/alpha/include/uapi/asm/socket.h | 2 +
arch/mips/include/uapi/asm/socket.h | 2 +
arch/parisc/include/uapi/asm/socket.h | 2 +
arch/sparc/include/uapi/asm/socket.h | 2 +
fs/nsfs.c | 9 +++
include/linux/sock_diag.h | 20 ++++++
include/net/net_namespace.h | 11 ++++
include/uapi/asm-generic/socket.h | 2 +
include/uapi/linux/nsfs.h | 2 +
net/core/filter.c | 9 ++-
net/core/sock.c | 7 +++
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 2 +-
tools/testing/selftests/net/so_netns_cookie.c | 61 +++++++++++++++++++
tools/testing/selftests/nsfs/.gitignore | 1 +
tools/testing/selftests/nsfs/Makefile | 2 +-
tools/testing/selftests/nsfs/netns.c | 57 +++++++++++++++++
17 files changed, 185 insertions(+), 7 deletions(-)
create mode 100644 tools/testing/selftests/net/so_netns_cookie.c
create mode 100644 tools/testing/selftests/nsfs/netns.c
--
2.27.0
Hi,
This test is added to serve as a performance tester and a bug reproducer
for kvm page table code (GPA->HPA mappings), it gives guidance for the
people trying to make some improvement for kvm.
The following explains what we can exactly do through this test.
And a RFC is sent for comments, thanks.
The function guest_code() is designed to cover conditions where a single vcpu
or multiple vcpus access guest pages within the same memory range, in three
VM stages(before dirty-logging, during dirty-logging, after dirty-logging).
Besides, the backing source memory type(ANONYMOUS/THP/HUGETLB) of the tested
memory region can be specified by users, which means normal page mappings or
block mappings can be chosen by users to be created in the test.
If use of ANONYMOUS memory is specified, kvm will create page mappings for the
tested memory region before dirty-logging, and update attributes of the page
mappings from RO to RW during dirty-logging. If use of THP/HUGETLB memory is
specified, kvm will create block mappings for the tested memory region before
dirty-logging, and split the blcok mappings into page mappings during
dirty-logging, and coalesce the page mappings back into block mappings after
dirty-logging is stopped.
So in summary, as a performance tester, this test can present the performance
of kvm creating/updating normal page mappings, or the performance of kvm
creating/splitting/recovering block mappings, through execution time.
When we need to coalesce the page mappings back to block mappings after dirty
logging is stopped, we have to firstly invalidate *all* the TLB entries for the
page mappings right before installation of the block entry, because a TLB conflict
abort error could occur if we can't invalidate the TLB entries fully. We have
hit this TLB conflict twice on aarch64 software implementation and fixed it.
As this test can imulate process from dirty-logging enabled to dirty-logging
stopped of a VM with block mappings, so it can also reproduce this TLB conflict
abort due to inadequate TLB invalidation when coalescing tables.
Links about the TLB conflict abort:
https://lore.kernel.org/lkml/20201201201034.116760-3-wangyanan55@huawei.com/
---
Here are some test examples of this test:
platform: HiSilicon Kunpeng920 (aarch64, FWB not supported)
host kernel: Linux mainline
(1) Based on v5.11-rc6
cmdline: ./kvm_page_table_test -m 4 -t 0 -g 4K -s 1G -v 1
(1 vcpu, 1G memory, page mappings(granule 4K))
KVM_CREATE_MAPPINGS: 0.8196s 0.8260s 0.8258s 0.8169s 0.8190s
KVM_UPDATE_MAPPINGS: 1.1930s 1.1949s 1.1940s 1.1934s 1.1946s
cmdline: ./kvm_page_table_test -m 4 -t 0 -g 4K -s 1G -v 20
(20 vcpus, 1G memory, page mappings(granule 4K))
KVM_CREATE_MAPPINGS: 23.4028s 23.8015s 23.6702s 23.9437s 22.1646s
KVM_UPDATE_MAPPINGS: 16.9550s 16.4734s 16.8300s 16.9621s 16.9402s
cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 1
(1 vcpu, 20G memory, block mappings(granule 1G))
KVM_CREATE_MAPPINGS: 3.7040s 3.7053s 3.7047s 3.7061s 3.7068s
KVM_ADJUST_MAPPINGS: 2.8264s 2.8266s 2.8272s 2.8259s 2.8283s
cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 20
(20 vcpus, 20G memory, block mappings(granule 1G))
KVM_CREATE_MAPPINGS: 52.8338s 52.8327s 52.8336s 52.8255s 52.8303s
KVM_ADJUST_MAPPINGS: 52.0466s 52.0473s 52.0550s 52.0518s 52.0467s
(2) I have post a patch series to improve efficiency of stage2 page table code,
so test the performance changes.
cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 20
(20 vcpus, 20G memory, block mappings(granule 1G))
Before patch: KVM_CREATE_MAPPINGS: 52.8338s 52.8327s 52.8336s 52.8255s 52.8303s
After patch: KVM_CREATE_MAPPINGS: 3.7022s 3.7031s 3.7028s 3.7012s 3.7024s
Before patch: KVM_ADJUST_MAPPINGS: 52.0466s 52.0473s 52.0550s 52.0518s 52.0467s
After patch: KVM_ADJUST_MAPPINGS: 0.3008s 0.3004s 0.2974s 0.2917s 0.2900s
cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 40
(40 vcpus, 20G memory, block mappings(granule 1G))
Before patch: KVM_CREATE_MAPPINGS: 104.560s 104.556s 104.554s 104.556s 104.550s
After patch: KVM_CREATE_MAPPINGS: 3.7011s 3.7103s 3.7005s 3.7024s 3.7106s
Before patch: KVM_ADJUST_MAPPINGS: 103.931s 103.936s 103.927s 103.942s 103.927s
After patch: KVM_ADJUST_MAPPINGS: 0.3541s 0.3694s 0.3656s 0.3693s 0.3687s
---
Yanan Wang (2):
KVM: selftests: Add a macro to get string of vm_mem_backing_src_type
KVM: selftests: Add a test for kvm page table code
tools/testing/selftests/kvm/Makefile | 3 +
.../testing/selftests/kvm/include/kvm_util.h | 3 +
.../selftests/kvm/kvm_page_table_test.c | 518 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 8 +
4 files changed, 532 insertions(+)
create mode 100644 tools/testing/selftests/kvm/kvm_page_table_test.c
--
2.23.0
Only older versions of the RISC-V GCC toolchain define __riscv__. Check
for __riscv as well, which is used by newer GCC toolchains. Also set
VDSO_32BIT based on __riscv_xlen.
Before (on riscv64):
$ ./vdso_test_abi
[vDSO kselftest] VDSO_VERSION: LINUX_4
Could not find __vdso_gettimeofday
Could not find __vdso_clock_gettime
Could not find __vdso_clock_getres
clock_id: CLOCK_REALTIME [PASS]
Could not find __vdso_clock_gettime
Could not find __vdso_clock_getres
clock_id: CLOCK_BOOTTIME [PASS]
Could not find __vdso_clock_gettime
Could not find __vdso_clock_getres
clock_id: CLOCK_TAI [PASS]
Could not find __vdso_clock_gettime
Could not find __vdso_clock_getres
clock_id: CLOCK_REALTIME_COARSE [PASS]
Could not find __vdso_clock_gettime
Could not find __vdso_clock_getres
clock_id: CLOCK_MONOTONIC [PASS]
Could not find __vdso_clock_gettime
Could not find __vdso_clock_getres
clock_id: CLOCK_MONOTONIC_RAW [PASS]
Could not find __vdso_clock_gettime
Could not find __vdso_clock_getres
clock_id: CLOCK_MONOTONIC_COARSE [PASS]
Could not find __vdso_time
After (on riscv32):
$ ./vdso_test_abi
[vDSO kselftest] VDSO_VERSION: LINUX_4.15
The time is 1612449376.015086
The time is 1612449376.18340784
The resolution is 0 1
clock_id: CLOCK_REALTIME [PASS]
The time is 774.842586182
The resolution is 0 1
clock_id: CLOCK_BOOTTIME [PASS]
The time is 1612449376.22536565
The resolution is 0 1
clock_id: CLOCK_TAI [PASS]
The time is 1612449376.20885172
The resolution is 0 4000000
clock_id: CLOCK_REALTIME_COARSE [PASS]
The time is 774.845491269
The resolution is 0 1
clock_id: CLOCK_MONOTONIC [PASS]
The time is 774.849534200
The resolution is 0 1
clock_id: CLOCK_MONOTONIC_RAW [PASS]
The time is 774.842139684
The resolution is 0 4000000
clock_id: CLOCK_MONOTONIC_COARSE [PASS]
Could not find __vdso_time
Signed-off-by: Tobias Klauser <tklauser(a)distanz.ch>
---
tools/testing/selftests/vDSO/vdso_config.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vDSO/vdso_config.h b/tools/testing/selftests/vDSO/vdso_config.h
index 6a6fe8d4ff55..6188b16827d1 100644
--- a/tools/testing/selftests/vDSO/vdso_config.h
+++ b/tools/testing/selftests/vDSO/vdso_config.h
@@ -47,10 +47,12 @@
#elif defined(__x86_64__)
#define VDSO_VERSION 0
#define VDSO_NAMES 1
-#elif defined(__riscv__)
+#elif defined(__riscv__) || defined(__riscv)
#define VDSO_VERSION 5
#define VDSO_NAMES 1
+#if __riscv_xlen == 32
#define VDSO_32BIT 1
+#endif
#else /* nds32 */
#define VDSO_VERSION 4
#define VDSO_NAMES 1
--
2.30.0
This test expects fds to have specific values, which works fine
when the test is run standalone. However, the kselftest runner
consumes a couple of extra fds for redirection when running
tests, so the test fails when run via kselftest.
Change the test to pass on any valid fd number.
Signed-off-by: Seth Forshee <seth.forshee(a)canonical.com>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 26c72f2b61b1..9338df6f4ca8 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -4019,18 +4019,14 @@ TEST(user_notification_addfd)
/* Verify we can set an arbitrary remote fd */
fd = ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd);
- /*
- * The child has fds 0(stdin), 1(stdout), 2(stderr), 3(memfd),
- * 4(listener), so the newly allocated fd should be 5.
- */
- EXPECT_EQ(fd, 5);
+ EXPECT_GE(fd, 0);
EXPECT_EQ(filecmp(getpid(), pid, memfd, fd), 0);
/* Verify we can set an arbitrary remote fd with large size */
memset(&big, 0x0, sizeof(big));
big.addfd = addfd;
fd = ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD_BIG, &big);
- EXPECT_EQ(fd, 6);
+ EXPECT_GE(fd, 0);
/* Verify we can set a specific remote fd */
addfd.newfd = 42;
--
2.29.2
v1 by Uriel is here: [1].
Since it's been a while, I've dropped the Reviewed-By's.
It depended on commit 83c4e7a0363b ("KUnit: KASAN Integration") which
hadn't been merged yet, so that caused some kerfuffle with applying them
previously and the series was reverted.
This revives the series but makes the kunit_fail_current_test() function
take a format string and logs the file and line number of the failing
code, addressing Alan Maguire's comments on the previous version.
As a result, the patch that makes UBSAN errors was tweaked slightly to
include an error message.
[1] https://lore.kernel.org/linux-kselftest/20200806174326.3577537-1-urielguaja…
Uriel Guajardo (2):
kunit: support failure from dynamic analysis tools
kunit: ubsan integration
include/kunit/test-bug.h | 30 ++++++++++++++++++++++++++++++
lib/kunit/test.c | 36 ++++++++++++++++++++++++++++++++----
lib/ubsan.c | 3 +++
3 files changed, 65 insertions(+), 4 deletions(-)
create mode 100644 include/kunit/test-bug.h
base-commit: 1e0d27fce010b0a4a9e595506b6ede75934c31be
--
2.30.0.478.g8a0d178c01-goog
Sincerely hope I did this right; first time contributor, learning the
patch workflow. I took the opportunity to fix two .gitignore files that
were leaving stale worktree/index outputs after running `make kselftest`
against recent mainline (76c057c84d286140c6c416c3b4ba832cd1d8984e).
Thanks for your time!
XFAIL was removed in commit 9847d24af95c ("selftests/harness: Refactor
XFAIL into SKIP") and its use in close_range_test was already replaced
by commit 1d44d0dd61b6 ("selftests: core: use SKIP instead of XFAIL in
close_range_test.c"). However, commit 23afeaeff3d9 ("selftests: core:
add tests for CLOSE_RANGE_CLOEXEC") introduced usage of XFAIL in
TEST(close_range_cloexec). Use SKIP there as well.
Cc: Giuseppe Scrivano <gscrivan(a)redhat.com>
Fixes: 23afeaeff3d9 ("selftests: core: add tests for CLOSE_RANGE_CLOEXEC")
Signed-off-by: Tobias Klauser <tklauser(a)distanz.ch>
---
tools/testing/selftests/core/close_range_test.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/core/close_range_test.c b/tools/testing/selftests/core/close_range_test.c
index 87e16d65d9d7..670fb30d62f6 100644
--- a/tools/testing/selftests/core/close_range_test.c
+++ b/tools/testing/selftests/core/close_range_test.c
@@ -241,7 +241,7 @@ TEST(close_range_cloexec)
fd = open("/dev/null", O_RDONLY);
ASSERT_GE(fd, 0) {
if (errno == ENOENT)
- XFAIL(return, "Skipping test since /dev/null does not exist");
+ SKIP(return, "Skipping test since /dev/null does not exist");
}
open_fds[i] = fd;
@@ -250,9 +250,9 @@ TEST(close_range_cloexec)
ret = sys_close_range(1000, 1000, CLOSE_RANGE_CLOEXEC);
if (ret < 0) {
if (errno == ENOSYS)
- XFAIL(return, "close_range() syscall not supported");
+ SKIP(return, "close_range() syscall not supported");
if (errno == EINVAL)
- XFAIL(return, "close_range() doesn't support CLOSE_RANGE_CLOEXEC");
+ SKIP(return, "close_range() doesn't support CLOSE_RANGE_CLOEXEC");
}
/* Ensure the FD_CLOEXEC bit is set also with a resource limit in place. */
--
2.29.0
Copied in from somewhere else, the makefile was including
the kerne's usr/include dir, which caused the asm/ioctl.h file
to be used.
Unfortunately, that file has different values for _IOC_SIZEBITS
and _IOC_WRITE than include/uapi/asm-generic/ioctl.h which then
causes the _IOCW macros to give the wrong ioctl numbers,
specifically for DMA_BUF_IOCTL_SYNC.
This patch simply removes the extra include from the Makefile
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Brian Starkey <brian.starkey(a)arm.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Laura Abbott <labbott(a)kernel.org>
Cc: Hridya Valsaraju <hridya(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Sandeep Patil <sspatil(a)google.com>
Cc: Daniel Mentz <danielmentz(a)google.com>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linux-kselftest(a)vger.kernel.org
Fixes: a8779927fd86c ("kselftests: Add dma-heap test")
Signed-off-by: John Stultz <john.stultz(a)linaro.org>
---
tools/testing/selftests/dmabuf-heaps/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/dmabuf-heaps/Makefile b/tools/testing/selftests/dmabuf-heaps/Makefile
index 607c2acd2082..604b43ece15f 100644
--- a/tools/testing/selftests/dmabuf-heaps/Makefile
+++ b/tools/testing/selftests/dmabuf-heaps/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-CFLAGS += -static -O3 -Wl,-no-as-needed -Wall -I../../../../usr/include
+CFLAGS += -static -O3 -Wl,-no-as-needed -Wall
TEST_GEN_PROGS = dmabuf-heap
--
2.25.1
When using `kunit.py run` to run tests, users must populate a
`kunitconfig` file to select the options the tests are hidden behind and
all their dependencies.
The patch [1] to allow specifying a path to kunitconfig promises to make
this nicer as we can have checked in files corresponding to different
sets of tests.
But it's still annoying
1) when trying to run a subet of tests
2) when you want to run tests that don't have such a pre-existing
kunitconfig and selecting all the necessary options is tricky.
This patch series aims to alleviate both:
1) `kunit.py run 'my-suite-*'`
I.e. use my current kunitconfig, but just run suites that match this glob
2) `kunit.py run --alltests 'my-suite-*'`
I.e. use allyesconfig so I don't have to worry about writing a
kunitconfig at all.
See the first commit message for more details and discussion about
future work.
This patch series also includes a bugfix for a latent bug that can't be
triggered right now but has worse consequences as a result of the
changes needed to plumb in this suite name glob.
[1] https://lore.kernel.org/linux-kselftest/20210201205514.3943096-1-dlatypov@g…
---
v1 -> v2:
Fix free of `suites` subarray in suite_set.
Found by Dan Carpenter and kernel test robot.
v2 -> v3:
Add MODULE_PARM_DESC() for kunit.filter_glob.
v3 -> v4:
Rebase on top of kunit_tool_test.py and typing fixes for merging.
Daniel Latypov (3):
kunit: add kunit.filter_glob cmdline option to filter suites
kunit: tool: add support for filtering suites by glob
kunit: tool: fix unintentional statefulness in run_kernel()
lib/kunit/Kconfig | 1 +
lib/kunit/executor.c | 93 +++++++++++++++++++++++---
tools/testing/kunit/kunit.py | 21 ++++--
tools/testing/kunit/kunit_kernel.py | 6 +-
tools/testing/kunit/kunit_tool_test.py | 15 +++--
5 files changed, 115 insertions(+), 21 deletions(-)
base-commit: aa919f3b019d0e10e0c035598546b30cca7bcb19
--
2.30.0.478.g8a0d178c01-goog