Remove some of the outmoded "Why KUnit" rationale -- which focuses
significantly on UML -- and update the Getting Started guide to mention
running tests without the kunit_tool wrapper.
Signed-off-by: David Gow <davidgow(a)google.com>
---
This is an attempt at resolving some of the issues with the KUnit
documentation pointed out here:
https://lore.kernel.org/linux-kselftest/CABVgOSkiLi0UNijH1xTSvmsJEE5+ocCZ7n…
There's definitely room for further work on the KUnit documentation
(e.g., adding more information around the environment tests run in), but
this hopefully is better than nothing as a starting point.
Documentation/dev-tools/kunit/index.rst | 60 ++++---------------
Documentation/dev-tools/kunit/start.rst | 80 +++++++++++++++++++++----
2 files changed, 78 insertions(+), 62 deletions(-)
diff --git a/Documentation/dev-tools/kunit/index.rst b/Documentation/dev-tools/kunit/index.rst
index d16a4d2c3a41..6064cd14dfad 100644
--- a/Documentation/dev-tools/kunit/index.rst
+++ b/Documentation/dev-tools/kunit/index.rst
@@ -17,63 +17,23 @@ What is KUnit?
==============
KUnit is a lightweight unit testing and mocking framework for the Linux kernel.
-These tests are able to be run locally on a developer's workstation without a VM
-or special hardware.
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining unit test
cases, grouping related test cases into test suites, providing common
infrastructure for running tests, and much more.
-Get started now: :doc:`start`
-
-Why KUnit?
-==========
-
-A unit test is supposed to test a single unit of code in isolation, hence the
-name. A unit test should be the finest granularity of testing and as such should
-allow all possible code paths to be tested in the code under test; this is only
-possible if the code under test is very small and does not have any external
-dependencies outside of the test's control like hardware.
-
-Outside of KUnit, there are no testing frameworks currently
-available for the kernel that do not require installing the kernel on a test
-machine or in a VM and all require tests to be written in userspace running on
-the kernel; this is true for Autotest, and kselftest, disqualifying
-any of them from being considered unit testing frameworks.
+KUnit consists of a kernel component, which provides a set of macros for easily
+writing unit tests. Tests written against KUnit will run on kernel boot if
+built-in, or when loaded if built as a module. These tests write out results to
+the kernel log in `TAP <https://testanything.org/>`_ format.
-KUnit addresses the problem of being able to run tests without needing a virtual
-machine or actual hardware with User Mode Linux. User Mode Linux is a Linux
-architecture, like ARM or x86; however, unlike other architectures it compiles
-to a standalone program that can be run like any other program directly inside
-of a host operating system; to be clear, it does not require any virtualization
-support; it is just a regular program.
+To make running these tests (and reading the results) easier, KUnit offsers
+:doc:`kunit_tool <kunit-tool>`, which builds a `User Mode Linux
+<http://user-mode-linux.sourceforge.net>`_ kernel, runs it, and parses the test
+results. This provides a quick way of running KUnit tests during development.
-Alternatively, kunit and kunit tests can be built as modules and tests will
-run when the test module is loaded.
-
-KUnit is fast. Excluding build time, from invocation to completion KUnit can run
-several dozen tests in only 10 to 20 seconds; this might not sound like a big
-deal to some people, but having such fast and easy to run tests fundamentally
-changes the way you go about testing and even writing code in the first place.
-Linus himself said in his `git talk at Google
-<https://gist.github.com/lorn/1272686/revisions#diff-53c65572127855f1b003db4…>`_:
-
- "... a lot of people seem to think that performance is about doing the
- same thing, just doing it faster, and that is not true. That is not what
- performance is all about. If you can do something really fast, really
- well, people will start using it differently."
-
-In this context Linus was talking about branching and merging,
-but this point also applies to testing. If your tests are slow, unreliable, are
-difficult to write, and require a special setup or special hardware to run,
-then you wait a lot longer to write tests, and you wait a lot longer to run
-tests; this means that tests are likely to break, unlikely to test a lot of
-things, and are unlikely to be rerun once they pass. If your tests are really
-fast, you run them all the time, every time you make a change, and every time
-someone sends you some code. Why trust that someone ran all their tests
-correctly on every change when you can just run them yourself in less time than
-it takes to read their test log?
+Get started now: :doc:`start`
How do I use it?
================
@@ -81,3 +41,5 @@ How do I use it?
* :doc:`start` - for new users of KUnit
* :doc:`usage` - for a more detailed explanation of KUnit features
* :doc:`api/index` - for the list of KUnit APIs used for testing
+* :doc:`kunit-tool` - for more information on the kunit_tool helper script
+* :doc:`faq` - for answers to some common questions about KUnit
diff --git a/Documentation/dev-tools/kunit/start.rst b/Documentation/dev-tools/kunit/start.rst
index 4e1d24db6b13..e1c5ce80ce12 100644
--- a/Documentation/dev-tools/kunit/start.rst
+++ b/Documentation/dev-tools/kunit/start.rst
@@ -9,11 +9,10 @@ Installing dependencies
KUnit has the same dependencies as the Linux kernel. As long as you can build
the kernel, you can run KUnit.
-KUnit Wrapper
-=============
-Included with KUnit is a simple Python wrapper that helps format the output to
-easily use and read KUnit output. It handles building and running the kernel, as
-well as formatting the output.
+Running tests with the KUnit Wrapper
+====================================
+Included with KUnit is a simple Python wrapper which runs tests under User Mode
+Linux, and formats the test results.
The wrapper can be run with:
@@ -21,22 +20,42 @@ The wrapper can be run with:
./tools/testing/kunit/kunit.py run --defconfig
-For more information on this wrapper (also called kunit_tool) checkout the
+For more information on this wrapper (also called kunit_tool) check out the
:doc:`kunit-tool` page.
Creating a .kunitconfig
-=======================
-The Python script is a thin wrapper around Kbuild. As such, it needs to be
-configured with a ``.kunitconfig`` file. This file essentially contains the
-regular Kernel config, with the specific test targets as well.
-
+-----------------------
+If you want to run a specific set of tests (rather than those listed in the
+KUnit defconfig), you can provide Kconfig options in the ``.kunitconfig`` file.
+This file essentially contains the regular Kernel config, with the specific
+test targets as well. The ``.kunitconfig`` should also contain any other config
+options required by the tests.
+
+A good starting point for a ``.kunitconfig`` is the KUnit defconfig:
.. code-block:: bash
cd $PATH_TO_LINUX_REPO
cp arch/um/configs/kunit_defconfig .kunitconfig
-Verifying KUnit Works
----------------------
+You can then add any other Kconfig options you wish, e.g.:
+.. code-block:: none
+
+ CONFIG_LIST_KUNIT_TEST=y
+
+:doc:`kunit_tool <kunit-tool>` will ensure that all config options set in
+``.kunitconfig`` are set in the kernel ``.config`` before running the tests.
+It'll warn you if you haven't included the dependencies of the options you're
+using.
+
+.. note::
+ Note that removing something from the ``.kunitconfig`` will not trigger a
+ rebuild of the ``.config`` file: the configuration is only updated if the
+ ``.kunitconfig`` is not a subset of ``.config``. This means that you can use
+ other tools (such as make menuconfig) to adjust other config options.
+
+
+Running the tests
+-----------------
To make sure that everything is set up correctly, simply invoke the Python
wrapper from your kernel repo:
@@ -62,6 +81,41 @@ followed by a list of tests that are run. All of them should be passing.
Because it is building a lot of sources for the first time, the
``Building KUnit kernel`` step may take a while.
+Running tests without the KUnit Wrapper
+=======================================
+
+If you'd rather not use the KUnit Wrapper (if, for example, you need to
+integrate with other systems, or use an architecture other than UML), KUnit can
+be included in any kernel, and the results read out and parsed manually.
+
+.. note::
+ KUnit is not designed for use in a production system, and it's possible that
+ tests may reduce the stability or security of the system.
+
+
+
+Configuring the kernel
+----------------------
+
+In order to enable KUnit itself, you simply need to enable the ``CONFIG_KUNIT``
+Kconfig option (it's under Kernel Hacking/Kernel Testing and Coverage in
+menuconfig). From there, you can enable any KUnit tests you want: they usually
+have config options ending in ``_KUNIT_TEST``.
+
+KUnit and KUnit tests can be compiled as modules: in this case the tests in a
+module will be run when the module is loaded.
+
+Running the tests
+-----------------
+
+Build and run your kernel as usual. Test output will be written to the kernel
+log in `TAP <https://testanything.org/>`_ format.
+
+.. note::
+ It's possible that there will be other lines and/or data interspersed in the
+ TAP output.
+
+
Writing your first test
=======================
--
2.25.0.341.g760bfbb309-goog
From: Randy Dunlap <rdunlap(a)infradead.org>
Fix Documentation warning due to missing a blank line after a directive:
linux/Documentation/dev-tools/kunit/usage.rst:553: WARNING: Error in "code-block" directive:
maximum 1 argument(s) allowed, 3 supplied.
.. code-block:: bash
modprobe example-test
Fixes: 6ae2bfd3df06 ("kunit: update documentation to describe module-based build")
Signed-off-by: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Alan Maguire <alan.maguire(a)oracle.com>
Cc: Knut Omang <knut.omang(a)oracle.com>
Cc: Brendan Higgins <brendanhiggins(a)google.com>
Cc: linux-kselftest(a)vger.kernel.org
Cc: kunit-dev(a)googlegroups.com
Cc: Shuah Khan <skhan(a)linuxfoundation.org>
---
Documentation/dev-tools/kunit/usage.rst | 1 +
1 file changed, 1 insertion(+)
--- lnx-56-rc1.orig/Documentation/dev-tools/kunit/usage.rst
+++ lnx-56-rc1/Documentation/dev-tools/kunit/usage.rst
@@ -551,6 +551,7 @@ options to your ``.config``:
Once the kernel is built and installed, a simple
.. code-block:: bash
+
modprobe example-test
...will run the tests.
[Resend the v9 patch set to Shuah Khan and linux-kselftest mailing list.
No code and commit message change.]
With more and more resctrl features are being added by Intel, AMD
and ARM, a test tool is becoming more and more useful to validate
that both hardware and software functionalities work as expected.
We introduce resctrl selftest to cover resctrl features on X86, AMD
and ARM architectures. It first implements MBM (Memory Bandwidth
Monitoring), MBA (Memory Bandwidth Allocation), L3 CAT (Cache Allocation
Technology), and CQM (Cache QoS Monitoring) tests. We will enhance
the selftest tool to include more functionality tests in the future.
The tool has been tested on Intel RDT, AMD QoS and ARM MPAM and is
in tools/testing/selftests/resctrl in order to have generic test code
base for all architectures.
The selftest tool we are introducing here provides a convenient
tool which does automatic resctrl testing, is easily available in kernel
tree, and covers Intel RDT, AMD QoS and ARM MPAM.
There is an existing resctrl test suite 'intel_cmt_cat'. But its major
purpose is to test Intel RDT hardware via writing and reading MSR
registers. It does access resctrl file system; but the functionalities
are very limited. And it doesn't support automatic test and a lot of
manual verifications are involved.
Changelog:
v9:
- Per Boris suggestion, add Co-developed-by in each patch to make it
clear who contributed to the patch set.
v8:
Update code per comments from Andre Przywara from ARM:
- Change Makefile and remove inline assembly code to build and test the
tool on ARM
- Change the output to TAP format because the format is both readable by
human and other test tools.
- Detect resctrl feature from /proc/cpuinfo instead of dmesg to support
generic detection on all architectures.
- Fix a few coding issues.
v7:
- Fix a few warnings when compiling patches separately, pointed by Babu
v6:
- Fix a benchmark reading optimized out issue in newer GCC.
- Fix a few coding style issues.
- Re-arrange code among patches to make cleaner code. No change in patches
structure.
v5:
- Based the v4 patches submitted by Fenghua Yu and added changes to support
AMD.
- Changed the function name get_sock_num to get_resource_id. Intel uses
socket number for schemata and AMD uses l3 index id. To generalize,
changed the function name to get_resource_id.
- Added the code to detect vendor.
- Disabled the few tests for AMD where the test results are not clear.
Also AMD does not have IMC.
- Fixed few compile issues.
- Some cleanup to make each patch independent.
- Tested the patches on AMD system. Fenghua, Need your help to test on
Intel box. Please feel free to change and resubmit if something
broken.
v4:
- address comments from Balu and Randy
- Add CAT and CQM tests
v3:
- Change code based on comments from Babu Moger
- Remove some unnessary code and use pipe to communicate b/w processes
v2:
- Change code based on comments from Babu Moger
- Clean up other places.
Babu Moger (3):
selftests/resctrl: Add vendor detection mechanism
selftests/resctrl: Use cache index3 id for AMD schemata masks
selftests/resctrl: Disable MBA and MBM tests for AMD
Fenghua Yu (6):
selftests/resctrl: Add README for resctrl tests
selftests/resctrl: Add MBM test
selftests/resctrl: Add MBA test
selftests/resctrl: Add Cache QoS Monitoring (CQM) selftest
selftests/resctrl: Add Cache Allocation Technology (CAT) selftest
selftests/resctrl: Add the test in MAINTAINERS
Sai Praneeth Prakhya (4):
selftests/resctrl: Add basic resctrl file system operations and data
selftests/resctrl: Read memory bandwidth from perf IMC counter and
from resctrl file system
selftests/resctrl: Add callback to start a benchmark
selftests/resctrl: Add built in benchmark
MAINTAINERS | 1 +
tools/testing/selftests/resctrl/Makefile | 17 +
tools/testing/selftests/resctrl/README | 53 ++
tools/testing/selftests/resctrl/cache.c | 272 +++++++
tools/testing/selftests/resctrl/cat_test.c | 250 ++++++
tools/testing/selftests/resctrl/cqm_test.c | 176 +++++
tools/testing/selftests/resctrl/fill_buf.c | 213 +++++
tools/testing/selftests/resctrl/mba_test.c | 171 ++++
tools/testing/selftests/resctrl/mbm_test.c | 145 ++++
tools/testing/selftests/resctrl/resctrl.h | 107 +++
.../testing/selftests/resctrl/resctrl_tests.c | 202 +++++
tools/testing/selftests/resctrl/resctrl_val.c | 744 ++++++++++++++++++
tools/testing/selftests/resctrl/resctrlfs.c | 722 +++++++++++++++++
13 files changed, 3073 insertions(+)
create mode 100644 tools/testing/selftests/resctrl/Makefile
create mode 100644 tools/testing/selftests/resctrl/README
create mode 100644 tools/testing/selftests/resctrl/cache.c
create mode 100644 tools/testing/selftests/resctrl/cat_test.c
create mode 100644 tools/testing/selftests/resctrl/cqm_test.c
create mode 100644 tools/testing/selftests/resctrl/fill_buf.c
create mode 100644 tools/testing/selftests/resctrl/mba_test.c
create mode 100644 tools/testing/selftests/resctrl/mbm_test.c
create mode 100644 tools/testing/selftests/resctrl/resctrl.h
create mode 100644 tools/testing/selftests/resctrl/resctrl_tests.c
create mode 100644 tools/testing/selftests/resctrl/resctrl_val.c
create mode 100644 tools/testing/selftests/resctrl/resctrlfs.c
--
2.19.1
Hi,
Here's another update after another round of reviews from Kirill and Jan.
There is a git repo and branch, for convenience in reviewing:
git@github.com:johnhubbard/linux.git track_user_pages_v5
============================================================
Changes since v4:
* Added documentation about the huge page behavior of the new
/proc/vmstat items.
* Added a missing mode_node_page_state() call to put_compound_head().
* Fixed a tracepoint call in page_ref_sub_return().
* Added a trailing underscore to a URL in pin_user_pages.rst, to fix
a broken generated link.
* Added ACKs and reviewed-by's from Jan Kara and Kirill Shutemov.
* Rebased onto today's linux.git, and
* I am experimenting here with "git format-patch --base=<commit>".
This generated the "base-commit:" tag you'll see at the end of this
cover letter. I was inspired to do so after trying out a new
get-lore-mbox.py tool (it's very nice), mentioned in a recent LWN
article (https://lwn.net/Articles/811528/ ). That tool relies on the
base-commit tag for some things.
============================================================
Changes since v3:
* Rebased onto latest linux.git
* Added ACKs and reviewed-by's from Kirill Shutemov and Jan Kara.
* /proc/vmstat:
* Renamed items, after realizing that I hate the previous names:
nr_foll_pin_requested --> nr_foll_pin_acquired
nr_foll_pin_returned --> nr_foll_pin_released
* Removed the CONFIG_DEBUG_VM guard, and collapsed away a wrapper
routine: now just calls mod_node_page_state() directly.
* Tweaked the WARN_ON_ONCE() statements in mm/hugetlb.c to be more
informative, and added comments above them as well.
* Fixed gup_benchmark: signed int --> unsigned long.
* One or two minor formatting changes.
============================================================
Changes since v2:
* Rebased onto linux.git, because the akpm tree for 5.6 has been merged.
* Split the tracking patch into even more patches, as requested.
* Merged Matthew Wilcox's dump_page() changes into mine, as part of the
first patch.
* Renamed: page_dma_pinned() --> page_maybe_dma_pinned(), in response to
Kirill Shutemov's review.
* Moved a WARN to the top of a routine, and fixed a typo in the commit
description of patch #7, also as suggested by Kirill.
============================================================
Changes since v1:
* Split the tracking patch into 6 smaller patches
* Rebased onto today's linux-next/akpm (there weren't any conflicts).
* Fixed an "unsigned int" vs. "int" problem in gup_benchmark, reported
by Nathan Chancellor. (I don't see it in my local builds, probably
because they use gcc, but an LLVM test found the mismatch.)
* Fixed a huge page pincount problem (add/subtract vs.
increment/decrement), spotted by Jan Kara.
============================================================
There is a reasonable case to be made for merging two of the patches
(patches 7 and 8), given that patch 7 provides tracking that has upper
limits on the number of pins that can be done with huge pages. Let me
know if anyone wants those merged, but unless there is some weird chance
of someone grabbing patch 7 and not patch 8, I don't really see the
need. Meanwhile, it's easier to review in this form.
Also, patch 3 has been revived. Earlier reviewers asked for it to be
merged into the tracking patch (one cannot please everyone, heh), but
now it's back out on it's own.
This activates tracking of FOLL_PIN pages. This is in support of fixing
the get_user_pages()+DMA problem described in [1]-[4].
FOLL_PIN support is now in the main linux tree. However, the
patch to use FOLL_PIN to track pages was *not* submitted, because Leon
saw an RDMA test suite failure that involved (I think) page refcount
overflows when huge pages were used.
This patch definitively solves that kind of overflow problem, by adding
an exact pincount, for compound pages (of order > 1), in the 3rd struct
page of a compound page. If available, that form of pincounting is used,
instead of the GUP_PIN_COUNTING_BIAS approach. Thanks again to Jan Kara
for that idea.
Other interesting changes:
* dump_page(): added one, or two new things to report for compound
pages: head refcount (for all compound pages), and map_pincount (for
compound pages of order > 1).
* Documentation/core-api/pin_user_pages.rst: removed the "TODO" for the
huge page refcount upper limit problems, and added notes about how it
works now. Also added a note about the dump_page() enhancements.
* Added some comments in gup.c and mm.h, to explain that there are two
ways to count pinned pages: exact (for compound pages of order > 1)
and fuzzy (GUP_PIN_COUNTING_BIAS: for all other pages).
============================================================
General notes about the tracking patch:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], [4] and in a remarkable number of email threads since about
2017. :)
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Next steps:
* Convert more subsystems from get_user_pages() to pin_user_pages().
* Work with Ira and others to connect this all up with file system
leases.
[1] Some slow progress on get_user_pages() (Apr 2, 2019):
https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018):
https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018):
https://lwn.net/Articles/753027/
[4] LWN kernel index: get_user_pages()
https://lwn.net/Kernel/Index/#Memory_management-get_user_pages
John Hubbard (12):
mm: dump_page(): better diagnostics for compound pages
mm/gup: split get_user_pages_remote() into two routines
mm/gup: pass a flags arg to __gup_device_* functions
mm: introduce page_ref_sub_return()
mm/gup: pass gup flags to two more routines
mm/gup: require FOLL_GET for get_user_pages_fast()
mm/gup: track FOLL_PIN pages
mm/gup: page->hpage_pinned_refcount: exact pin counts for huge pages
mm: dump_page(): better diagnostics for huge pinned pages
mm/gup: /proc/vmstat: pin_user_pages (FOLL_PIN) reporting
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
Documentation/core-api/pin_user_pages.rst | 86 ++--
include/linux/mm.h | 108 ++++-
include/linux/mm_types.h | 7 +-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 9 +
mm/debug.c | 61 ++-
mm/gup.c | 452 ++++++++++++++++-----
mm/gup_benchmark.c | 71 +++-
mm/huge_memory.c | 29 +-
mm/hugetlb.c | 60 ++-
mm/page_alloc.c | 2 +
mm/rmap.c | 6 +
mm/vmstat.c | 2 +
tools/testing/selftests/vm/gup_benchmark.c | 15 +-
tools/testing/selftests/vm/run_vmtests | 22 +
15 files changed, 752 insertions(+), 180 deletions(-)
base-commit: 90568ecf561540fa330511e21fcd823b0c3829c6
--
2.25.0
Hello,
This patchset implements a new futex operation, called FUTEX_WAIT_MULTIPLE,
which allows a thread to wait on several futexes at the same time, and be
awoken by any of them.
The use case lies in the Wine implementation of the Windows NT interface
WaitMultipleObjects. This Windows API function allows a thread to sleep
waiting on the first of a set of event sources (mutexes, timers, signal,
console input, etc) to signal. Considering this is a primitive
synchronization operation for Windows applications, being able to quickly
signal events on the producer side, and quickly go to sleep on the
consumer side is essential for good performance of those running over Wine.
Since this API exposes a mechanism to wait on multiple objects, and
we might have multiple waiters for each of these events, a M->N
relationship, the current Linux interfaces fell short on performance
evaluation of large M,N scenarios. We experimented, for instance, with
eventfd, which has performance problems discussed below, but we also
experimented with userspace solutions, like making each consumer wait on
a condition variable guarding the entire list of objects, and then
waking up multiple variables on the producer side, but this is
prohibitively expensive since we either need to signal many condition
variables or share that condition variable among multiple waiters, and
then verify for the event being signaled in userspace, which means
dealing with often false positive wakes ups.
The natural interface to implement the behavior we want, also
considering that one of the waitable objects is a mutex itself, would be
the futex interface. Therefore, this patchset proposes a mechanism for
a thread to wait on multiple futexes at once, and wake up on the first
futex that was awaken.
In particular, using futexes in our Wine use case reduced the CPU
utilization by 4% for the game Beat Saber and by 1.5% for the game
Shadow of Tomb Raider, both running over Proton (a Wine based solution
for Windows emulation), when compared to the eventfd interface. This
implementation also doesn't rely of file descriptors, so it doesn't risk
overflowing the resource.
In time, we are also proposing modifications to glibc and libpthread to
make this feature available for Linux native multithreaded applications
using libpthread, which can benefit from the behavior of waiting on any
of a group of futexes.
Technically, the existing FUTEX_WAIT implementation can be easily
reworked by using futex_wait_multiple() with a count of one, and I
have a patch showing how it works. I'm not proposing it, since
futex is such a tricky code, that I'd be more comfortable to have
FUTEX_WAIT_MULTIPLE running upstream for a couple development cycles,
before considering modifying FUTEX_WAIT.
The patch series includes an extensive set of kselftests validating
the behavior of the interface. We also implemented support[1] on
Syzkaller and survived the fuzzy testing.
Finally, if you'd rather pull directly a branch with this set you can
find it here:
https://gitlab.collabora.com/tonyk/linux/commits/futex-dev
=== Performance of eventfd ===
Polling on several eventfd contexts with semaphore semantics would
provide us with the exact semantics we are looking for. However, as
shown below, in a scenario with sufficient producers and consumers, the
eventfd interface itself becomes a bottleneck, in particular because
each thread will compete to acquire a sequence of waitqueue locks for
each eventfd context in the poll list. In addition, in the uncontended
case, where the producer is ready for consumption, eventfd still
requires going into the kernel on the consumer side.
When a write or a read operation in an eventfd file succeeds, it will try
to wake up all threads that are waiting to perform some operation to
the file. The lock (ctx->wqh.lock) that hold the access to the file value
(ctx->count) is the same lock used to control access the waitqueue. When
all those those thread woke, they will compete to get this lock. Along
with that, the poll() also manipulates the waitqueue and need to hold
this same lock. This lock is specially hard to acquire when poll() calls
poll_freewait(), where it tries to free all waitqueues associated with
this poll. While doing that, it will compete with a lot of read and
write operations that have been waken.
In our use case, with a huge number of parallel reads, writes and polls,
this lock is a bottleneck and hurts the performance of applications. Our
implementation of futex, however, decrease the calls of spin lock by more
than 80% in some user applications.
Finally, eventfd operates on file descriptors, which is a limited
resource that has shown its limitation in our use cases. Despite the
Windows interface not waiting on more than 64 objects at once, we still
have multiple waiters at the same time, and we were easily able to
exhaust the FD limits on applications like games.
The RFC for this patch can be found here:
https://lkml.org/lkml/2019/7/30/1399
Thanks,
André
[1] https://github.com/andrealmeid/syzkaller/tree/futex-wait-multiple
Gabriel Krisman Bertazi (4):
futex: Implement mechanism to wait on any of several futexes
selftests: futex: Add FUTEX_WAIT_MULTIPLE timeout test
selftests: futex: Add FUTEX_WAIT_MULTIPLE wouldblock test
selftests: futex: Add FUTEX_WAIT_MULTIPLE wake up test
include/uapi/linux/futex.h | 20 +
kernel/futex.c | 356 +++++++++++++++++-
.../selftests/futex/functional/.gitignore | 1 +
.../selftests/futex/functional/Makefile | 3 +-
.../futex/functional/futex_wait_multiple.c | 173 +++++++++
.../futex/functional/futex_wait_timeout.c | 38 +-
.../futex/functional/futex_wait_wouldblock.c | 28 +-
.../testing/selftests/futex/functional/run.sh | 3 +
.../selftests/futex/include/futextest.h | 22 +
9 files changed, 635 insertions(+), 7 deletions(-)
create mode 100644 tools/testing/selftests/futex/functional/futex_wait_multiple.c
--
2.25.0
Commit 5f70bde26a48 ("selftests: fix build behaviour on targets' failures")
added a logic to track failure of builds of individual targets. However, it
does exactly the opposite of what a distro kernel needs: we create a RPM
package with a selected set of selftests and we need the build to fail if
build of any of the targets fail.
Both use cases are valid. A distribution kernel is in control of what is
included in the kernel and what is being built; any error needs to be
flagged and acted upon. A CI system that tries to build as many tests as
possible on the best effort basis is not really interested in a failure here
and there.
Support both use cases by introducing a FORCE_TARGETS variable. It is
switched off by default to make life for CI systems easier, distributions
can easily switch it on while building their packages.
Reported-by: Yauheni Kaliuta <yauheni.kaliuta(a)redhat.com>
Signed-off-by: Jiri Benc <jbenc(a)redhat.com>
---
tools/testing/selftests/Makefile | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 5182d6078cbc..97fca70d2cd6 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -74,6 +74,12 @@ ifneq ($(SKIP_TARGETS),)
override TARGETS := $(TMP)
endif
+# User can set FORCE_TARGETS to 1 to require all targets to be successfully
+# built; make will fail if any of the targets cannot be built. If
+# FORCE_TARGETS is not set (the default), make will succeed if at least one
+# of the targets gets built.
+FORCE_TARGETS ?=
+
# Clear LDFLAGS and MAKEFLAGS if called from main
# Makefile to avoid test build failures when test
# Makefile doesn't have explicit build rules.
@@ -148,7 +154,8 @@ all: khdr
for TARGET in $(TARGETS); do \
BUILD_TARGET=$$BUILD/$$TARGET; \
mkdir $$BUILD_TARGET -p; \
- $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET; \
+ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET \
+ $(if $(FORCE_TARGETS),|| exit); \
ret=$$((ret * $$?)); \
done; exit $$ret;
@@ -202,7 +209,8 @@ ifdef INSTALL_PATH
@ret=1; \
for TARGET in $(TARGETS); do \
BUILD_TARGET=$$BUILD/$$TARGET; \
- $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET INSTALL_PATH=$(INSTALL_PATH)/$$TARGET install; \
+ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET INSTALL_PATH=$(INSTALL_PATH)/$$TARGET install \
+ $(if $(FORCE_TARGETS),|| exit); \
ret=$$((ret * $$?)); \
done; exit $$ret;
--
2.18.1
With some shells, the command construed for install of bpf selftests becomes
too large due to long list of files:
make[1]: execvp: /bin/sh: Argument list too long
make[1]: *** [../lib.mk:73: install] Error 127
Currently, each of the file lists is replicated three times in the command:
in the shell 'if' condition, in the 'echo' and in the 'rsync'. Reduce that
by one instance by using make conditionals and separate the echo and rsync
into two shell commands. (One would be inclined to just remove the '@' at
the beginning of the rsync command and let 'make' echo it by itself;
unfortunately, it appears that the '@' in the front of mkdir silences output
also for the following commands.)
Also, separate handling of each of the lists to its own shell command.
The semantics of the makefile is unchanged before and after the patch. The
ability of individual test directories to override INSTALL_RULE is retained.
Reported-by: Yauheni Kaliuta <yauheni.kaliuta(a)redhat.com>
Tested-by: Yauheni Kaliuta <yauheni.kaliuta(a)redhat.com>
Signed-off-by: Jiri Benc <jbenc(a)redhat.com>
---
tools/testing/selftests/lib.mk | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 1c8a1963d03f..3ed0134a764d 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -83,17 +83,20 @@ else
$(call RUN_TESTS, $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS) $(TEST_PROGS))
endif
+define INSTALL_SINGLE_RULE
+ $(if $(INSTALL_LIST),@mkdir -p $(INSTALL_PATH))
+ $(if $(INSTALL_LIST),@echo rsync -a $(INSTALL_LIST) $(INSTALL_PATH)/)
+ $(if $(INSTALL_LIST),@rsync -a $(INSTALL_LIST) $(INSTALL_PATH)/)
+endef
+
define INSTALL_RULE
- @if [ "X$(TEST_PROGS)$(TEST_PROGS_EXTENDED)$(TEST_FILES)" != "X" ]; then \
- mkdir -p ${INSTALL_PATH}; \
- echo "rsync -a $(TEST_PROGS) $(TEST_PROGS_EXTENDED) $(TEST_FILES) $(INSTALL_PATH)/"; \
- rsync -a $(TEST_PROGS) $(TEST_PROGS_EXTENDED) $(TEST_FILES) $(INSTALL_PATH)/; \
- fi
- @if [ "X$(TEST_GEN_PROGS)$(TEST_CUSTOM_PROGS)$(TEST_GEN_PROGS_EXTENDED)$(TEST_GEN_FILES)" != "X" ]; then \
- mkdir -p ${INSTALL_PATH}; \
- echo "rsync -a $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) $(INSTALL_PATH)/"; \
- rsync -a $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) $(INSTALL_PATH)/; \
- fi
+ $(eval INSTALL_LIST = $(TEST_PROGS)) $(INSTALL_SINGLE_RULE)
+ $(eval INSTALL_LIST = $(TEST_PROGS_EXTENDED)) $(INSTALL_SINGLE_RULE)
+ $(eval INSTALL_LIST = $(TEST_FILES)) $(INSTALL_SINGLE_RULE)
+ $(eval INSTALL_LIST = $(TEST_GEN_PROGS)) $(INSTALL_SINGLE_RULE)
+ $(eval INSTALL_LIST = $(TEST_CUSTOM_PROGS)) $(INSTALL_SINGLE_RULE)
+ $(eval INSTALL_LIST = $(TEST_GEN_PROGS_EXTENDED)) $(INSTALL_SINGLE_RULE)
+ $(eval INSTALL_LIST = $(TEST_GEN_FILES)) $(INSTALL_SINGLE_RULE)
endef
install: all
--
2.18.1
Hi,
Here's an update after the latest round of reviews from Kirill and Jan.
There is a git repo and branch, for convenience in reviewing:
git@github.com:johnhubbard/linux.git track_user_pages_v4
I'm particularly pleased that the /proc/vmstat items are now available
in all configurations, seeing as how I was looking (in vain) for that
information during a recent investigation of a remotely reported
problem.
============================================================
Changes since v3:
* Rebased onto latest linux.git
* Added ACKs and reviewed-by's from Kirill Shutemov and Jan Kara.
* /proc/vmstat:
* Renamed items, after realizing that I hate the previous names:
nr_foll_pin_requested --> nr_foll_pin_acquired
nr_foll_pin_returned --> nr_foll_pin_released
* Removed the CONFIG_DEBUG_VM guard, and collapsed away a wrapper
routine: now just calls mod_node_page_state() directly.
* Tweaked the WARN_ON_ONCE() statements in mm/hugetlb.c to be more
informative, and added comments above them as well.
* Fixed gup_benchmark: signed int --> unsigned long.
* One or two minor formatting changes.
============================================================
Changes since v2:
* Rebased onto linux.git, because the akpm tree for 5.6 has been merged.
* Split the tracking patch into even more patches, as requested.
* Merged Matthew Wilcox's dump_page() changes into mine, as part of the
first patch.
* Renamed: page_dma_pinned() --> page_maybe_dma_pinned(), in response to
Kirill Shutemov's review.
* Moved a WARN to the top of a routine, and fixed a typo in the commit
description of patch #7, also as suggested by Kirill.
============================================================
Changes since v1:
* Split the tracking patch into 6 smaller patches
* Rebased onto today's linux-next/akpm (there weren't any conflicts).
* Fixed an "unsigned int" vs. "int" problem in gup_benchmark, reported
by Nathan Chancellor. (I don't see it in my local builds, probably
because they use gcc, but an LLVM test found the mismatch.)
* Fixed a huge page pincount problem (add/subtract vs.
increment/decrement), spotted by Jan Kara.
============================================================
There is a reasonable case to be made for merging two of the patches
(patches 7 and 8), given that patch 7 provides tracking that has upper
limits on the number of pins that can be done with huge pages. Let me
know if anyone wants those merged, but unless there is some weird chance
of someone grabbing patch 7 and not patch 8, I don't really see the
need. Meanwhile, it's easier to review in this form.
Also, patch 3 has been revived. Earlier reviewers asked for it to be
merged into the tracking patch (one cannot please everyone, heh), but
now it's back out on it's own.
This activates tracking of FOLL_PIN pages. This is in support of fixing
the get_user_pages()+DMA problem described in [1]-[4].
FOLL_PIN support is now in the main linux tree. However, the
patch to use FOLL_PIN to track pages was *not* submitted, because Leon
saw an RDMA test suite failure that involved (I think) page refcount
overflows when huge pages were used.
This patch definitively solves that kind of overflow problem, by adding
an exact pincount, for compound pages (of order > 1), in the 3rd struct
page of a compound page. If available, that form of pincounting is used,
instead of the GUP_PIN_COUNTING_BIAS approach. Thanks again to Jan Kara
for that idea.
Other interesting changes:
* dump_page(): added one, or two new things to report for compound
pages: head refcount (for all compound pages), and map_pincount (for
compound pages of order > 1).
* Documentation/core-api/pin_user_pages.rst: removed the "TODO" for the
huge page refcount upper limit problems, and added notes about how it
works now. Also added a note about the dump_page() enhancements.
* Added some comments in gup.c and mm.h, to explain that there are two
ways to count pinned pages: exact (for compound pages of order > 1)
and fuzzy (GUP_PIN_COUNTING_BIAS: for all other pages).
============================================================
General notes about the tracking patch:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], [4] and in a remarkable number of email threads since about
2017. :)
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Next steps:
* Convert more subsystems from get_user_pages() to pin_user_pages().
* Work with Ira and others to connect this all up with file system
leases.
[1] Some slow progress on get_user_pages() (Apr 2, 2019):
https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018):
https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018):
https://lwn.net/Articles/753027/
[4] LWN kernel index: get_user_pages()
https://lwn.net/Kernel/Index/#Memory_management-get_user_pages
John Hubbard (12):
mm: dump_page(): better diagnostics for compound pages
mm/gup: split get_user_pages_remote() into two routines
mm/gup: pass a flags arg to __gup_device_* functions
mm: introduce page_ref_sub_return()
mm/gup: pass gup flags to two more routines
mm/gup: require FOLL_GET for get_user_pages_fast()
mm/gup: track FOLL_PIN pages
mm/gup: page->hpage_pinned_refcount: exact pin counts for huge pages
mm: dump_page(): better diagnostics for huge pinned pages
mm/gup: /proc/vmstat: pin_user_pages (FOLL_PIN) reporting
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
Documentation/core-api/pin_user_pages.rst | 59 ++-
include/linux/mm.h | 108 ++++-
include/linux/mm_types.h | 7 +-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 9 +
mm/debug.c | 61 ++-
mm/gup.c | 448 ++++++++++++++++-----
mm/gup_benchmark.c | 71 +++-
mm/huge_memory.c | 29 +-
mm/hugetlb.c | 60 ++-
mm/page_alloc.c | 2 +
mm/rmap.c | 6 +
mm/vmstat.c | 2 +
tools/testing/selftests/vm/gup_benchmark.c | 15 +-
tools/testing/selftests/vm/run_vmtests | 22 +
15 files changed, 721 insertions(+), 180 deletions(-)
--
2.25.0
Matthew,
I've merged in your dump_page() ideas, and also factored things out
into a new __dump_tail_page() routine, in order to save a few
indentation levels, mainly.
Kirill, thanks for your review comments. I've applied them, and I think
splitting this up as you recommended really makes it a lot better, and
easier to spot problems.
============================================================
Changes since v2:
* Rebased onto linux.git, because the akpm tree for 5.6 has been merged.
* Split the tracking patch into even more patches, as requested.
* Merged Matthew Wilcox's dump_page() changes into mine, as part of the
first patch.
* Renamed: page_dma_pinned() --> page_maybe_dma_pinned(), in response to
Kirill Shutemov's review.
* Moved a WARN to the top of a routine, and fixed a typo in the commit
description of patch #7, also as suggested by Kirill.
============================================================
Changes since v1:
* Split the tracking patch into 6 smaller patches
* Rebased onto today's linux-next/akpm (there weren't any conflicts).
* Fixed an "unsigned int" vs. "int" problem in gup_benchmark, reported
by Nathan Chancellor. (I don't see it in my local builds, probably
because they use gcc, but an LLVM test found the mismatch.)
* Fixed a huge page pincount problem (add/subtract vs.
increment/decrement), spotted by Jan Kara.
============================================================
There is a reasonable case to be made for merging two of the patches
(patches 4 and 5), given that patch 4 provides tracking that has upper
limits on the number of pins that can be done with huge pages. Let me
know if anyone wants those merged, but unless there is some weird chance
of someone grabbing patch 4 and not patch 5, I don't really see the
need. Meanwhile, it's easier to review in this form.
Also, patch 3 has been revived. Earlier reviewers asked for it to be
merged into the tracking patch (one cannot please everyone, heh), but
now it's back out on it's own.
This activates tracking of FOLL_PIN pages. This is in support of fixing
the get_user_pages()+DMA problem described in [1]-[4].
It is based on today's (Jan 28) linux-next (branch: akpm),
commit 280e9cb00b41 ("drivers/media/platform/sti/delta/delta-ipc.c: fix
read buffer overflow")
There is a git repo and branch, for convenience in reviewing:
git@github.com:johnhubbard/linux.git
track_user_pages_v2_linux-next_akpm_28Jan2020
FOLL_PIN support is (so far) in mmotm and linux-next. However, the
patch to use FOLL_PIN to track pages was *not* submitted, because Leon
saw an RDMA test suite failure that involved (I think) page refcount
overflows when huge pages were used.
This patch definitively solves that kind of overflow problem, by adding
an exact pincount, for compound pages (of order > 1), in the 3rd struct
page of a compound page. If available, that form of pincounting is used,
instead of the GUP_PIN_COUNTING_BIAS approach. Thanks again to Jan Kara
for that idea.
Here's the last reviewed version of the tracking patch (v11):
https://lore.kernel.org/r/20191216222537.491123-1-jhubbard@nvidia.com
Jan Kara had provided a reviewed-by tag for that, but I've had to remove
it (again) here, due to having changed the patch "a little bit", in
order to add the feature described above.
Other interesting changes:
* dump_page(): added one, or two new things to report for compound
pages: head refcount (for all compound pages), and map_pincount (for
compound pages of order > 1).
* Documentation/core-api/pin_user_pages.rst: removed the "TODO" for the
huge page refcount upper limit problems, and added notes about how it
works now. Also added a note about the dump_page() enhancements.
* Added some comments in gup.c and mm.h, to explain that there are two
ways to count pinned pages: exact (for compound pages of order > 1)
and fuzzy (GUP_PIN_COUNTING_BIAS: for all other pages).
============================================================
General notes about the tracking patch:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], [4] and in a remarkable number of email threads since about
2017. :)
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Next steps:
* Convert more subsystems from get_user_pages() to pin_user_pages().
* Work with Ira and others to connect this all up with file system
leases.
[1] Some slow progress on get_user_pages() (Apr 2, 2019):
https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018):
https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018):
https://lwn.net/Articles/753027/
[4] LWN kernel index: get_user_pages()
https://lwn.net/Kernel/Index/#Memory_management-get_user_pages
John Hubbard (12):
mm: dump_page(): better diagnostics for compound pages
mm/gup: split get_user_pages_remote() into two routines
mm/gup: pass a flags arg to __gup_device_* functions
mm: introduce page_ref_sub_return()
mm/gup: pass gup flags to two more routines
mm/gup: require FOLL_GET for get_user_pages_fast()
mm/gup: track FOLL_PIN pages
mm/gup: page->hpage_pinned_refcount: exact pin counts for huge pages
mm: dump_page(): better diagnostics for huge pinned pages
mm/gup: /proc/vmstat: pin_user_pages (FOLL_PIN) reporting
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
Documentation/core-api/pin_user_pages.rst | 53 +--
include/linux/mm.h | 108 ++++-
include/linux/mm_types.h | 7 +-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 10 +
mm/debug.c | 60 ++-
mm/gup.c | 459 ++++++++++++++++-----
mm/gup_benchmark.c | 71 +++-
mm/huge_memory.c | 29 +-
mm/hugetlb.c | 44 +-
mm/page_alloc.c | 2 +
mm/rmap.c | 6 +
mm/vmstat.c | 2 +
tools/testing/selftests/vm/gup_benchmark.c | 15 +-
tools/testing/selftests/vm/run_vmtests | 22 +
15 files changed, 715 insertions(+), 175 deletions(-)
--
2.25.0
From: SeongJae Park <sjpark(a)amazon.de>
When closing a connection, the two acks that required to change closing
socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
reverse order. This is possible in RSS disabled environments such as a
connection inside a host.
For example, expected state transitions and required packets for the
disconnection will be similar to below flow.
00 (Process A) (Process B)
01 ESTABLISHED ESTABLISHED
02 close()
03 FIN_WAIT_1
04 ---FIN-->
05 CLOSE_WAIT
06 <--ACK---
07 FIN_WAIT_2
08 <--FIN/ACK---
09 TIME_WAIT
10 ---ACK-->
11 LAST_ACK
12 CLOSED CLOSED
The acks in lines 6 and 8 are the acks. If the line 8 packet is
processed before the line 6 packet, it will be just ignored as it is not
a expected packet, and the later process of the line 6 packet will
change the status of Process A to FIN_WAIT_2, but as it has already
handled line 8 packet, it will not go to TIME_WAIT and thus will not
send the line 10 packet to Process B. Thus, Process B will left in
CLOSE_WAIT status, as below.
00 (Process A) (Process B)
01 ESTABLISHED ESTABLISHED
02 close()
03 FIN_WAIT_1
04 ---FIN-->
05 CLOSE_WAIT
06 (<--ACK---)
07 (<--FIN/ACK---)
08 (fired in right order)
09 <--FIN/ACK---
10 <--ACK---
11 (processed in reverse order)
12 FIN_WAIT_2
Later, if the Process B sends SYN to Process A for reconnection using
the same port, Process A will responds with an ACK for the last flow,
which has no increased sequence number. Thus, Process A will send RST,
wait for TIMEOUT_INIT (one second in default), and then try
reconnection. If reconnections are frequent, the one second latency
spikes can be a big problem. Below is a tcpdump results of the problem:
14.436259 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644
14.436266 IP 127.0.0.1.4242 > 127.0.0.1.45150: Flags [.], ack 5, win 512
14.436271 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [R], seq 2541101298
/* ONE SECOND DELAY */
15.464613 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644
Patchset Organization
---------------------
The first patch fix a trivial nit. The second one fix the problem by
adjusting the resend delay of the SYN in the case. Finally, the third
patch adds a user space test to reproduce this problem.
The patches are based on the v5.5. You can also clone the complete git
tree:
$ git clone git://github.com/sjp38/linux -b patches/finack_lat/v1
The web is also available:
https://github.com/sjp38/linux/tree/patches/finack_lat/v1
SeongJae Park (3):
net/ipv4/inet_timewait_sock: Fix inconsistent comments
tcp: Reduce SYN resend delay if a suspicous ACK is received
selftests: net: Add FIN_ACK processing order related latency spike
test
net/ipv4/inet_timewait_sock.c | 1 +
net/ipv4/tcp_input.c | 6 +-
tools/testing/selftests/net/.gitignore | 2 +
tools/testing/selftests/net/Makefile | 2 +
tools/testing/selftests/net/fin_ack_lat.sh | 42 ++++++++++
.../selftests/net/fin_ack_lat_accept.c | 49 +++++++++++
.../selftests/net/fin_ack_lat_connect.c | 81 +++++++++++++++++++
7 files changed, 182 insertions(+), 1 deletion(-)
create mode 100755 tools/testing/selftests/net/fin_ack_lat.sh
create mode 100644 tools/testing/selftests/net/fin_ack_lat_accept.c
create mode 100644 tools/testing/selftests/net/fin_ack_lat_connect.c
--
2.17.1
From: SeongJae Park <sjpark(a)amazon.de>
When closing a connection, the two acks that required to change closing
socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
reverse order. This is possible in RSS disabled environments such as a
connection inside a host.
For example, expected state transitions and required packets for the
disconnection will be similar to below flow.
00 (Process A) (Process B)
01 ESTABLISHED ESTABLISHED
02 close()
03 FIN_WAIT_1
04 ---FIN-->
05 CLOSE_WAIT
06 <--ACK---
07 FIN_WAIT_2
08 <--FIN/ACK---
09 TIME_WAIT
10 ---ACK-->
11 LAST_ACK
12 CLOSED CLOSED
In some cases such as LINGER option applied socket, the FIN and FIN/ACK will be
substituted to RST and RST/ACK, but there is no difference in the main logic.
The acks in lines 6 and 8 are the acks. If the line 8 packet is
processed before the line 6 packet, it will be just ignored as it is not
a expected packet, and the later process of the line 6 packet will
change the status of Process A to FIN_WAIT_2, but as it has already
handled line 8 packet, it will not go to TIME_WAIT and thus will not
send the line 10 packet to Process B. Thus, Process B will left in
CLOSE_WAIT status, as below.
00 (Process A) (Process B)
01 ESTABLISHED ESTABLISHED
02 close()
03 FIN_WAIT_1
04 ---FIN-->
05 CLOSE_WAIT
06 (<--ACK---)
07 (<--FIN/ACK---)
08 (fired in right order)
09 <--FIN/ACK---
10 <--ACK---
11 (processed in reverse order)
12 FIN_WAIT_2
Later, if the Process B sends SYN to Process A for reconnection using
the same port, Process A will responds with an ACK for the last flow,
which has no increased sequence number. Thus, Process A will send RST,
wait for TIMEOUT_INIT (one second in default), and then try
reconnection. If reconnections are frequent, the one second latency
spikes can be a big problem. Below is a tcpdump results of the problem:
14.436259 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644
14.436266 IP 127.0.0.1.4242 > 127.0.0.1.45150: Flags [.], ack 5, win 512
14.436271 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [R], seq 2541101298
/* ONE SECOND DELAY */
15.464613 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644
Patchset Organization
---------------------
The first patch fixes the problem by adjusting the first resend delay of
the SYN in the case. The second one adds a user space test to reproduce
this problem.
The patches are based on the v5.5. You can also clone the complete git
tree:
$ git clone git://github.com/sjp38/linux -b patches/finack_lat/v3
The web is also available:
https://github.com/sjp38/linux/tree/patches/finack_lat/v3
Patchset History
----------------
>From v2
(https://lore.kernel.org/linux-kselftest/20200201071859.4231-1-sj38.park@gma…)
- Use TCP_TIMEOUT_MIN as reduced delay (Neal Cardwall)
- Add Reviewed-by and Signed-off-by from Eric Dumazet
>From v1
(https://lore.kernel.org/linux-kselftest/20200131122421.23286-1-sjpark@amazo…)
- Drop the trivial comment fix patch (Eric Dumazet)
- Limit the delay adjustment to only the first SYN resend (Eric Dumazet)
- selftest: Avoid use of hard-coded port number (Eric Dumazet)
- Explain RST/ACK and FIN/ACK has no big difference (Neal Cardwell)
SeongJae Park (2):
tcp: Reduce SYN resend delay if a suspicous ACK is received
selftests: net: Add FIN_ACK processing order related latency spike
test
net/ipv4/tcp_input.c | 8 +-
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 2 +
tools/testing/selftests/net/fin_ack_lat.c | 151 +++++++++++++++++++++
tools/testing/selftests/net/fin_ack_lat.sh | 35 +++++
5 files changed, 196 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/net/fin_ack_lat.c
create mode 100755 tools/testing/selftests/net/fin_ack_lat.sh
--
2.17.1
From: SeongJae Park <sjpark(a)amazon.de>
When closing a connection, the two acks that required to change closing
socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
reverse order. This is possible in RSS disabled environments such as a
connection inside a host.
For example, expected state transitions and required packets for the
disconnection will be similar to below flow.
00 (Process A) (Process B)
01 ESTABLISHED ESTABLISHED
02 close()
03 FIN_WAIT_1
04 ---FIN-->
05 CLOSE_WAIT
06 <--ACK---
07 FIN_WAIT_2
08 <--FIN/ACK---
09 TIME_WAIT
10 ---ACK-->
11 LAST_ACK
12 CLOSED CLOSED
In some cases such as LINGER option applied socket, the FIN and FIN/ACK will be
substituted to RST and RST/ACK, but there is no difference in the main logic.
The acks in lines 6 and 8 are the acks. If the line 8 packet is
processed before the line 6 packet, it will be just ignored as it is not
a expected packet, and the later process of the line 6 packet will
change the status of Process A to FIN_WAIT_2, but as it has already
handled line 8 packet, it will not go to TIME_WAIT and thus will not
send the line 10 packet to Process B. Thus, Process B will left in
CLOSE_WAIT status, as below.
00 (Process A) (Process B)
01 ESTABLISHED ESTABLISHED
02 close()
03 FIN_WAIT_1
04 ---FIN-->
05 CLOSE_WAIT
06 (<--ACK---)
07 (<--FIN/ACK---)
08 (fired in right order)
09 <--FIN/ACK---
10 <--ACK---
11 (processed in reverse order)
12 FIN_WAIT_2
Later, if the Process B sends SYN to Process A for reconnection using
the same port, Process A will responds with an ACK for the last flow,
which has no increased sequence number. Thus, Process A will send RST,
wait for TIMEOUT_INIT (one second in default), and then try
reconnection. If reconnections are frequent, the one second latency
spikes can be a big problem. Below is a tcpdump results of the problem:
14.436259 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644
14.436266 IP 127.0.0.1.4242 > 127.0.0.1.45150: Flags [.], ack 5, win 512
14.436271 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [R], seq 2541101298
/* ONE SECOND DELAY */
15.464613 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644
Patchset Organization
---------------------
The first patch fixes the problem by adjusting the first resend delay of
the SYN in the case. The second one adds a user space test to reproduce
this problem.
The patches are based on the v5.5. You can also clone the complete git
tree:
$ git clone git://github.com/sjp38/linux -b patches/finack_lat/v2
The web is also available:
https://github.com/sjp38/linux/tree/patches/finack_lat/v2
Patchset History
----------------
>From v1
(https://lore.kernel.org/linux-kselftest/20200131122421.23286-1-sjpark@amazo…)
- Drop the trivial comment fix patch (Eric Dumazet)
- Limit the delay adjustment to only the first SYN resend (Eric Dumazet)
- selftest: Avoid use of hard-coded port number (Eric Dumazet)
- Explain RST/ACK and FIN/ACK has no big difference (Neal Cardwell)
SeongJae Park (2):
tcp: Reduce SYN resend delay if a suspicous ACK is received
selftests: net: Add FIN_ACK processing order related latency spike
test
net/ipv4/tcp_input.c | 8 +-
tools/testing/selftests/net/.gitignore | 2 +
tools/testing/selftests/net/Makefile | 2 +
tools/testing/selftests/net/fin_ack_lat.c | 151 +++++++++++++++++++++
tools/testing/selftests/net/fin_ack_lat.sh | 35 +++++
5 files changed, 197 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/net/fin_ack_lat.c
create mode 100755 tools/testing/selftests/net/fin_ack_lat.sh
--
2.17.1
Fix a missing newline in a code block that was causing a warning:
Documentation/dev-tools/kunit/usage.rst:553: WARNING: Error in "code-block" directive:
maximum 1 argument(s) allowed, 3 supplied.
.. code-block:: bash
modprobe example-test
Signed-off-by: Brendan Higgins <brendanhiggins(a)google.com>
---
Documentation/dev-tools/kunit/usage.rst | 1 +
1 file changed, 1 insertion(+)
diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst
index 7cd56a1993b14..607758a66a99c 100644
--- a/Documentation/dev-tools/kunit/usage.rst
+++ b/Documentation/dev-tools/kunit/usage.rst
@@ -551,6 +551,7 @@ options to your ``.config``:
Once the kernel is built and installed, a simple
.. code-block:: bash
+
modprobe example-test
...will run the tests.
--
2.25.0.341.g760bfbb309-goog
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
While running the ftracetests, the pid filter test failed because the
instance "foo" existed, and it was using it to rerun the test under a
instance named foo. The collision caused the test to fail as the mkdir
failed as the name already existed.
As of commit b5b77be812de7 ("selftests: ftrace: Allow some tests to be run
in a tracing instance") all a selftest needs to do to be tested in an
instance is to set the "instance" flag. There's no reason a selftest needs
to create an instance to run its test in an instance directly.
Remove the open coded testing in an instance for the pid filter test and
have it set the "instance" flag instead.
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
.../selftests/ftrace/test.d/ftrace/func-filter-pid.tc | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc
index 64cfcc75e3c1..f2ee1e889e13 100644
--- a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc
+++ b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc
@@ -1,6 +1,7 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
# description: ftrace - function pid filters
+# flags: instance
# Make sure that function pid matching filter works.
# Also test it on an instance directory
@@ -96,13 +97,6 @@ do_test() {
}
do_test
-
-mkdir instances/foo
-cd instances/foo
-do_test
-cd ../../
-rmdir instances/foo
-
do_reset
exit 0
--
2.20.1
OK, as requested, I've split the tracking patch into 6 smaller patches,
and it should be *much* easier to understand and review now.
============================================================
Changes since v1:
* Split the tracking patch into 6 smaller patches
* Rebased onto today's linux-next/akpm (there weren't any conflicts).
* Fixed an "unsigned int" vs. "int" problem in gup_benchmark, reported
by Nathan Chancellor. (I don't see it in my local builds, probably
because they use gcc, but an LLVM test found the mismatch.)
* Fixed a huge page pincount problem (add/subtract vs.
increment/decrement), spotted by Jan Kara.
============================================================
There is a reasonable case to be made for merging two of the patches
(patches 4 and 5), given that patch 4 provides tracking that has upper
limits on the number of pins that can be done with huge pages. Let me
know if anyone wants those merged, but unless there is some weird chance
of someone grabbing patch 4 and not patch 5, I don't really see the
need. Meanwhile, it's easier to review in this form.
Also, patch 3 has been revived. Earlier reviewers asked for it to be
merged into the tracking patch (one cannot please everyone, heh), but
now it's back out on it's own.
This activates tracking of FOLL_PIN pages. This is in support of fixing
the get_user_pages()+DMA problem described in [1]-[4].
It is based on today's (Jan 28) linux-next (branch: akpm),
commit 280e9cb00b41 ("drivers/media/platform/sti/delta/delta-ipc.c: fix
read buffer overflow")
There is a git repo and branch, for convenience in reviewing:
git@github.com:johnhubbard/linux.git
track_user_pages_v2_linux-next_akpm_28Jan2020
FOLL_PIN support is (so far) in mmotm and linux-next. However, the
patch to use FOLL_PIN to track pages was *not* submitted, because Leon
saw an RDMA test suite failure that involved (I think) page refcount
overflows when huge pages were used.
This patch definitively solves that kind of overflow problem, by adding
an exact pincount, for compound pages (of order > 1), in the 3rd struct
page of a compound page. If available, that form of pincounting is used,
instead of the GUP_PIN_COUNTING_BIAS approach. Thanks again to Jan Kara
for that idea.
Here's the last reviewed version of the tracking patch (v11):
https://lore.kernel.org/r/20191216222537.491123-1-jhubbard@nvidia.com
Jan Kara had provided a reviewed-by tag for that, but I've had to remove
it (again) here, due to having changed the patch "a little bit", in
order to add the feature described above.
Other interesting changes:
* dump_page(): added one, or two new things to report for compound
pages: head refcount (for all compound pages), and map_pincount (for
compound pages of order > 1).
* Documentation/core-api/pin_user_pages.rst: removed the "TODO" for the
huge page refcount upper limit problems, and added notes about how it
works now. Also added a note about the dump_page() enhancements.
* Added some comments in gup.c and mm.h, to explain that there are two
ways to count pinned pages: exact (for compound pages of order > 1)
and fuzzy (GUP_PIN_COUNTING_BIAS: for all other pages).
============================================================
General notes about the tracking patch:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], [4] and in a remarkable number of email threads since about
2017. :)
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Next steps:
* Convert more subsystems from get_user_pages() to pin_user_pages().
* Work with Ira and others to connect this all up with file system
leases.
[1] Some slow progress on get_user_pages() (Apr 2, 2019):
https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018):
https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018):
https://lwn.net/Articles/753027/
[4] LWN kernel index: get_user_pages()
https://lwn.net/Kernel/Index/#Memory_management-get_user_pages
John Hubbard (8):
mm: dump_page: print head page's refcount, for compound pages
mm/gup: split get_user_pages_remote() into two routines
mm/gup: pass a flags arg to __gup_device_* functions
mm/gup: track FOLL_PIN pages
mm/gup: page->hpage_pinned_refcount: exact pin counts for huge pages
mm/gup: /proc/vmstat: pin_user_pages (FOLL_PIN) reporting
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
Documentation/core-api/pin_user_pages.rst | 47 +--
include/linux/mm.h | 109 ++++-
include/linux/mm_types.h | 7 +-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 10 +
mm/debug.c | 22 +-
mm/gup.c | 460 ++++++++++++++++-----
mm/gup_benchmark.c | 71 +++-
mm/huge_memory.c | 29 +-
mm/hugetlb.c | 44 +-
mm/page_alloc.c | 2 +
mm/rmap.c | 6 +
mm/vmstat.c | 2 +
tools/testing/selftests/vm/gup_benchmark.c | 15 +-
tools/testing/selftests/vm/run_vmtests | 22 +
15 files changed, 681 insertions(+), 167 deletions(-)
--
2.25.0
When kunit tests are run on native (i.e. non-UML) environments, the results
of test execution are often intermixed with dmesg output. This patch
series attempts to solve this by providing a debugfs representation
of the results of the last test run, available as
/sys/kernel/debug/kunit/<testsuite>/results
In addition, we provide a way to re-run the tests and show results via
/sys/kernel/debug/kunit/<testsuite>/run
Changes since v1:
- trimmed unneeded include files in lib/kunit/debugfs.c (Greg)
- renamed global debugfs functions to be prefixed with kunit_ (Greg)
- removed error checking for debugfs operations (Greg)
Alan Maguire (3):
kunit: add debugfs /sys/kernel/debug/kunit/<suite>/results display
kunit: add "run" debugfs file to run suites, display results
kunit: update documentation to describe debugfs representation
Documentation/dev-tools/kunit/usage.rst | 19 +++++
include/kunit/test.h | 21 +++--
lib/kunit/Makefile | 3 +-
lib/kunit/debugfs.c | 137 ++++++++++++++++++++++++++++++++
lib/kunit/debugfs.h | 16 ++++
lib/kunit/test.c | 85 +++++++++++++++-----
6 files changed, 254 insertions(+), 27 deletions(-)
create mode 100644 lib/kunit/debugfs.c
create mode 100644 lib/kunit/debugfs.h
--
1.8.3.1
## TL;DR
This patchset adds a centralized executor to dispatch tests rather than
relying on late_initcall to schedule each test suite separately along
with a couple of new features that depend on it.
## What am I trying to do?
Conceptually, I am trying to provide a mechanism by which test suites
can be grouped together so that they can be reasoned about collectively.
The last two of three patches in this series add features which depend
on this:
PATCH 5/7 Prints out a test plan right before KUnit tests are run[1];
this is valuable because it makes it possible for a test
harness to detect whether the number of tests run matches the
number of tests expected to be run, ensuring that no tests
silently failed.
PATCH 6/7 Add a new kernel command-line option which allows the user to
specify that the kernel poweroff, halt, or reboot after
completing all KUnit tests; this is very handy for running
KUnit tests on UML or a VM so that the UML/VM process exits
cleanly immediately after running all tests without needing a
special initramfs.
In addition, by dispatching tests from a single location, we can
guarantee that all KUnit tests run after late_init is complete, which
was a concern during the initial KUnit patchset review (this has not
been a problem in practice, but resolving with certainty is nevertheless
desirable).
Other use cases for this exist, but the above features should provide an
idea of the value that this could provide.
Alan Maguire (1):
kunit: test: create a single centralized executor for all tests
Brendan Higgins (5):
vmlinux.lds.h: add linker section for KUnit test suites
arch: um: add linker section for KUnit test suites
init: main: add KUnit to kernel init
kunit: test: add test plan to KUnit TAP format
Documentation: Add kunit_shutdown to kernel-parameters.txt
David Gow (1):
kunit: Add 'kunit_shutdown' option
.../admin-guide/kernel-parameters.txt | 7 ++
arch/um/include/asm/common.lds.S | 4 +
include/asm-generic/vmlinux.lds.h | 8 ++
include/kunit/test.h | 82 ++++++++++++-------
init/main.c | 4 +
lib/kunit/Makefile | 3 +-
lib/kunit/executor.c | 71 ++++++++++++++++
lib/kunit/test.c | 11 ---
tools/testing/kunit/kunit_kernel.py | 2 +-
tools/testing/kunit/kunit_parser.py | 76 ++++++++++++++---
.../test_is_test_passed-all_passed.log | 1 +
.../test_data/test_is_test_passed-crash.log | 1 +
.../test_data/test_is_test_passed-failure.log | 1 +
13 files changed, 217 insertions(+), 54 deletions(-)
create mode 100644 lib/kunit/executor.c
--
2.25.0.341.g760bfbb309-goog
Memory protection keys enables an application to protect its address
space from inadvertent access by its own code.
This feature is now enabled on powerpc and has been available since
4.16-rc1. The patches move the selftests to arch neutral directory
and enhance their test coverage.
Tested on powerpc64 and x86_64 (Skylake-SP).
Link to development branch:
https://github.com/sandip4n/linux/tree/pkey-selftests
Changelog
---------
Link to previous version (v16):
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=153824
v17:
(1) Fixed issues with i386 builds when running on x86_64
based on feedback from Dave.
(2) Replaced patch 6 from previous version with patch 7.
This addresses u64 format specifier related concerns
that Michael had raised in v15.
v16:
(1) Rebased on top of latest master.
(2) Switched to u64 instead of using an arch-dependent
pkey_reg_t type for references to the pkey register
based on suggestions from Dave, Michal and Michael.
(3) Removed build time determination of page size based
on suggestion from Michael.
(4) Fixed comment before the definition of __page_o_noops()
from patch 13 ("selftests/vm/pkeys: Introduce powerpc
support").
v15:
(1) Rebased on top of latest master.
(2) Addressed review comments from Dave Hansen.
(3) Moved code for getting or setting pkey bits to new
helpers. These changes replace patch 7 of v14.
(4) Added a fix which ensures that the correct count of
reserved keys is used across different platforms.
(5) Added a fix which ensures that the correct page size
is used as powerpc supports both 4K and 64K pages.
v14:
(1) Incorporated another round of comments from Dave Hansen.
v13:
(1) Incorporated comments for Dave Hansen.
(2) Added one more test for correct pkey-0 behavior.
v12:
(1) Fixed the offset of pkey field in the siginfo structure for
x86_64 and powerpc. And tries to use the actual field
if the headers have it defined.
v11:
(1) Fixed a deadlock in the ptrace testcase.
v10 and prior:
(1) Moved the testcase to arch neutral directory.
(2) Split the changes into incremental patches.
Desnes A. Nunes do Rosario (1):
selftests/vm/pkeys: Fix number of reserved powerpc pkeys
Ram Pai (16):
selftests/x86/pkeys: Move selftests to arch-neutral directory
selftests/vm/pkeys: Rename all references to pkru to a generic name
selftests/vm/pkeys: Move generic definitions to header file
selftests/vm/pkeys: Fix pkey_disable_clear()
selftests/vm/pkeys: Fix assertion in pkey_disable_set/clear()
selftests/vm/pkeys: Fix alloc_random_pkey() to make it really random
selftests/vm/pkeys: Introduce generic pkey abstractions
selftests/vm/pkeys: Introduce powerpc support
selftests/vm/pkeys: Fix assertion in test_pkey_alloc_exhaust()
selftests/vm/pkeys: Improve checks to determine pkey support
selftests/vm/pkeys: Associate key on a mapped page and detect access
violation
selftests/vm/pkeys: Associate key on a mapped page and detect write
violation
selftests/vm/pkeys: Detect write violation on a mapped
access-denied-key page
selftests/vm/pkeys: Introduce a sub-page allocator
selftests/vm/pkeys: Test correct behaviour of pkey-0
selftests/vm/pkeys: Override access right definitions on powerpc
Sandipan Das (5):
selftests: vm: pkeys: Fix multilib builds for x86
selftests: vm: pkeys: Use sane types for pkey register
selftests: vm: pkeys: Add helpers for pkey bits
selftests: vm: pkeys: Use the correct huge page size
selftests: vm: pkeys: Use the correct page size on powerpc
Thiago Jung Bauermann (2):
selftests/vm/pkeys: Move some definitions to arch-specific header
selftests/vm/pkeys: Make gcc check arguments of sigsafe_printf()
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 50 ++
tools/testing/selftests/vm/pkey-helpers.h | 225 ++++++
tools/testing/selftests/vm/pkey-powerpc.h | 136 ++++
tools/testing/selftests/vm/pkey-x86.h | 181 +++++
.../selftests/{x86 => vm}/protection_keys.c | 696 ++++++++++--------
tools/testing/selftests/x86/.gitignore | 1 -
tools/testing/selftests/x86/Makefile | 2 +-
tools/testing/selftests/x86/pkey-helpers.h | 219 ------
9 files changed, 979 insertions(+), 532 deletions(-)
create mode 100644 tools/testing/selftests/vm/pkey-helpers.h
create mode 100644 tools/testing/selftests/vm/pkey-powerpc.h
create mode 100644 tools/testing/selftests/vm/pkey-x86.h
rename tools/testing/selftests/{x86 => vm}/protection_keys.c (74%)
delete mode 100644 tools/testing/selftests/x86/pkey-helpers.h
--
2.17.1
Hi Linus,
Please pull the following Kselftest update for Linux 5.6-rc1
This Kselftest update for Linux 5.6-rc1 consists of several fixes to
framework and individual tests. In addition, it enables LKDTM tests
adding lkdtm target to kselftest Makefile.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit c79f46a282390e0f5b306007bf7b11a46d529538:
Linux 5.5-rc5 (2020-01-05 14:23:27 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-5.6-rc1
for you to fetch changes up to af4ddd607dff7aabd466a4a878e01b9f592a75ab:
selftests/ftrace: fix glob selftest (2020-01-28 13:36:48 -0700)
----------------------------------------------------------------
linux-kselftest-5.6-rc1
This Kselftest update for Linux 5.6-rc1 consists of several fixes to
framework and individual tests. In addition, it enables LKDTM tests
adding lkdtm target to kselftest Makefile.
----------------------------------------------------------------
Cristian Marussi (1):
selftests: fix build behaviour on targets' failures
Dan Carpenter (1):
selftests: Uninitialized variable in test_cgcore_proc_migration()
Kees Cook (1):
selftests/lkdtm: Add tests for LKDTM targets
Matthieu Baerts (1):
selftests: settings: tests can be in subsubdirs
Miroslav Benes (2):
selftests/livepatch: Replace set_dynamic_debug() with
setup_config() in README
selftests/livepatch: Remove unused local variable in
set_ftrace_enabled()
Siddhesh Poyarekar (1):
kselftest: Minimise dependency of get_size on C library interfaces
Sven Schnelle (1):
selftests/ftrace: fix glob selftest
MAINTAINERS | 1 +
tools/testing/selftests/Makefile | 19 +++--
tools/testing/selftests/cgroup/test_core.c | 2 +-
.../ftrace/test.d/ftrace/func-filter-glob.tc | 2 +-
tools/testing/selftests/kselftest/runner.sh | 2 +-
tools/testing/selftests/livepatch/README | 2 +-
tools/testing/selftests/livepatch/functions.sh | 1 -
tools/testing/selftests/lkdtm/Makefile | 12 +++
tools/testing/selftests/lkdtm/config | 1 +
tools/testing/selftests/lkdtm/run.sh | 92
++++++++++++++++++++++
tools/testing/selftests/lkdtm/tests.txt | 71 +++++++++++++++++
tools/testing/selftests/size/get_size.c | 24 ++++--
12 files changed, 211 insertions(+), 18 deletions(-)
create mode 100644 tools/testing/selftests/lkdtm/Makefile
create mode 100644 tools/testing/selftests/lkdtm/config
create mode 100755 tools/testing/selftests/lkdtm/run.sh
create mode 100644 tools/testing/selftests/lkdtm/tests.txt
----------------------------------------------------------------
Leon Romanovsky:
If you get a chance, I'd love to have this short series (or even just
the first patch; the others are just selftests) run through your test
suite that was previously choking on my earlier v11 patchset. The huge
page pincount limitations are removed, so I'm expecting a perfect test
run this time!
Everyone:
This activates tracking of FOLL_PIN pages. This is in support of fixing
the get_user_pages()+DMA problem described in [1]-[4].
It is based on today's (Jan 24) mmotm. There is a git repo and branch,
for convenience in reviewing:
git@github.com:johnhubbard/linux.git track_user_pages_v1_mmotm_24Jan2020
FOLL_PIN support is (so far) in mmotm and linux-next. However, the
patch to use FOLL_PIN to track pages was *not* submitted, because Leon
saw an RDMA test suite failure that involved (I think) page refcount
overflows when huge pages were used.
This patch definitively solves that kind of overflow problem, by adding
an exact pincount, for compound pages (of order > 1), in the 3rd struct
page of a compound page. If available, that form of pincounting is used,
instead of the GUP_PIN_COUNTING_BIAS approach. Thanks again to Jan Kara
for that idea.
Here's the last reviewed version of the tracking patch (v11):
https://lore.kernel.org/r/20191216222537.491123-1-jhubbard@nvidia.com
Jan Kara had provided a reviewed-by tag for that, but I've had to remove
it (again) here, due to having changed the patch "a little bit", in
order to add the feature described above.
Other interesting changes:
* dump_page(): added one, or two new things to report for compound
pages: head refcount (for all compound pages), and map_pincount (for
compound pages of order > 1).
* Documentation/core-api/pin_user_pages.rst: removed the "TODO" for the
huge page refcount upper limit problems, and added notes about how it
works now. Also added a note about the dump_page() enhancements.
* Added some comments in gup.c and mm.h, to explain that there are two
ways to count pinned pages: exact (for compound pages of order > 1)
and fuzzy (GUP_PIN_COUNTING_BIAS: for all other pages).
============================================================
General notes about the tracking patch:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], [4] and in a remarkable number of email threads since about
2017. :)
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Next steps:
* Convert more subsystems from get_user_pages() to pin_user_pages().
* Work with Ira and others to connect this all up with file system
leases.
[1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/
[4] LWN kernel index: get_user_pages() https://lwn.net/Kernel/Index/#Memory_management-get_user_pages
John Hubbard (3):
mm/gup: track FOLL_PIN pages
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
Documentation/core-api/pin_user_pages.rst | 48 ++-
include/linux/mm.h | 109 ++++-
include/linux/mm_types.h | 7 +-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 10 +
mm/debug.c | 22 +-
mm/gup.c | 467 ++++++++++++++++-----
mm/gup_benchmark.c | 70 ++-
mm/huge_memory.c | 29 +-
mm/hugetlb.c | 44 +-
mm/page_alloc.c | 2 +
mm/rmap.c | 6 +
mm/vmstat.c | 2 +
tools/testing/selftests/vm/gup_benchmark.c | 15 +-
tools/testing/selftests/vm/run_vmtests | 22 +
15 files changed, 678 insertions(+), 177 deletions(-)
--
2.25.0
On Mon, Jan 27, 2020 at 6:23 AM Sven Schnelle <svens(a)linux.ibm.com> wrote:
>
> Hi Steve,
>
> On Wed, Jan 08, 2020 at 09:11:55AM -0500, Steven Rostedt wrote:
> >
> > Shuah,
> >
> > Want to take this through your tree?
> >
> > https://lore.kernel.org/r/20200108074043.21580-1-svens@linux.ibm.com
> >
> > Reviewed-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
>
> As Shuah didn't reply, can you push that through your tree?
>
Hi Sven,
Did you run getmaintainers of this patch? You didn't send this to my
email address listed in the get maintainers file and also didn't cc
linux-kselftest.
I just happen to notice this now. Please resend with steve's
Reviewed-by tag to the recipients suggested by get_maintainers.pl
I will take this through ksleftest tree.
thanks,
-- Shuah
test.d/ftrace/func-filter-glob.tc is failing on s390 because it has
ARCH_INLINE_SPIN_LOCK and friends set to 'y'. So the usual
__raw_spin_lock symbol isn't in the ftrace function list. Change
'*aw*lock' to '*spin*lock' which would hopefully match some of the
locking functions on all platforms.
Reviewed-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Sven Schnelle <svens(a)linux.ibm.com>
---
Changes in v4:
- rebase to latest master
Changes in v3:
change '*spin*lock' to '*pin*lock' to not match the beginning
Changes in v2:
use '*spin*lock' instead of '*ktime*ns'
.../testing/selftests/ftrace/test.d/ftrace/func-filter-glob.tc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-glob.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-glob.tc
index 27a54a17da65..f4e92afab14b 100644
--- a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-glob.tc
+++ b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-glob.tc
@@ -30,7 +30,7 @@ ftrace_filter_check '*schedule*' '^.*schedule.*$'
ftrace_filter_check 'schedule*' '^schedule.*$'
# filter by *mid*end
-ftrace_filter_check '*aw*lock' '.*aw.*lock$'
+ftrace_filter_check '*pin*lock' '.*pin.*lock$'
# filter by start*mid*
ftrace_filter_check 'mutex*try*' '^mutex.*try.*'
--
2.17.1
Commit 852c8cbf34d3 (selftests/kselftest/runner.sh: Add 45 second
timeout per test) adds support for a new per-test-directory "settings"
file. But this only works for tests not in a sub-subdirectories, e.g.
- tools/testing/selftests/rtc (rtc) is OK,
- tools/testing/selftests/net/mptcp (net/mptcp) is not.
We have to increase the timeout for net/mptcp tests which are not
upstreamed yet but this fix is valid for other tests if they need to add
a "settings" file, see the full list with:
tools/testing/selftests/*/*/**/Makefile
Note that this patch changes the text header message printed at the end
of the execution but this text is modified only for the tests that are
in sub-subdirectories, e.g.
ok 1 selftests: net/mptcp: mptcp_connect.sh
Before we had:
ok 1 selftests: mptcp: mptcp_connect.sh
But showing the full target name is probably better, just in case a
subsubdir has the same name as another one in another subdirectory.
Fixes: 852c8cbf34d3 (selftests/kselftest/runner.sh: Add 45 second timeout per test)
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
tools/testing/selftests/kselftest/runner.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index 84de7bc74f2c..0d7a89901ef7 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -90,7 +90,7 @@ run_one()
run_many()
{
echo "TAP version 13"
- DIR=$(basename "$PWD")
+ DIR="${PWD#${BASE_DIR}/}"
test_num=0
total=$(echo "$@" | wc -w)
echo "1..$total"
--
2.20.1
When handling page faults for many vCPUs during demand paging, KVM's MMU
lock becomes highly contended. This series creates a test with a naive
userfaultfd based demand paging implementation to demonstrate that
contention. This test serves both as a functional test of userfaultfd
and a microbenchmark of demand paging performance with a variable number
of vCPUs and memory per vCPU.
The test creates N userfaultfd threads, N vCPUs, and a region of memory
with M pages per vCPU. The N userfaultfd polling threads are each set up
to serve faults on a region of memory corresponding to one of the vCPUs.
Each of the vCPUs is then started, and touches each page of its disjoint
memory region, sequentially. In response to faults, the userfaultfd
threads copy a static buffer into the guest's memory. This creates a
worst case for MMU lock contention as we have removed most of the
contention between the userfaultfd threads and there is no time required
to fetch the contents of guest memory.
This test was run successfully on Intel Haswell, Broadwell, and
Cascadelake hosts with a variety of vCPU counts and memory sizes.
This test was adapted from the dirty_log_test.
The series can also be viewed in Gerrit here:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/1464
(Thanks to Dmitry Vyukov <dvyukov(a)google.com> for setting up the Gerrit
instance)
v4 (Responding to feedback from Andrew Jones, Peter Xu, and Peter Shier):
- Tested this revision by running
demand_paging_test
at each commit in the series on an Intel Haswell machine. Ran
demand_paging_test -u -v 8 -b 8M -d 10
on the same machine at the last commit in the series.
- Readded partial aarch64 support, though aarch64 and s390 remain
untested
- Implemented pipefd polling to reduce UFFD thread exit latency
- Added variable unit input for memory size so users can pass command
line arguments of the form -b 24M instead of the raw number or bytes
- Moved a missing break from a patch later in the series to an earlier
one
- Moved to syncing per-vCPU global variables to guest and looking up
per-vcpu arguments based on a single CPU ID passed to each guest
vCPU. This allows for future patches to pass more than the supported
number of arguments for each arch to the vCPUs.
- Implemented vcpu_args_set for s390 and aarch64 [UNTESTED]
- Changed vm_create to always allocate memslot 0 at 4G instead of only
when the number of pages required is large.
- Changed vcpu_wss to vcpu_memory_size for clarity.
Ben Gardon (10):
KVM: selftests: Create a demand paging test
KVM: selftests: Add demand paging content to the demand paging test
KVM: selftests: Add configurable demand paging delay
KVM: selftests: Add memory size parameter to the demand paging test
KVM: selftests: Pass args to vCPU in global vCPU args struct
KVM: selftests: Add support for vcpu_args_set to aarch64 and s390x
KVM: selftests: Support multiple vCPUs in demand paging test
KVM: selftests: Time guest demand paging
KVM: selftests: Stop memslot creation in KVM internal memslot region
KVM: selftests: Move memslot 0 above KVM internal memslots
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 5 +-
.../selftests/kvm/demand_paging_test.c | 680 ++++++++++++++++++
.../testing/selftests/kvm/include/test_util.h | 2 +
.../selftests/kvm/lib/aarch64/processor.c | 33 +
tools/testing/selftests/kvm/lib/kvm_util.c | 27 +-
.../selftests/kvm/lib/s390x/processor.c | 35 +
tools/testing/selftests/kvm/lib/test_util.c | 61 ++
8 files changed, 839 insertions(+), 5 deletions(-)
create mode 100644 tools/testing/selftests/kvm/demand_paging_test.c
create mode 100644 tools/testing/selftests/kvm/lib/test_util.c
--
2.25.0.341.g760bfbb309-goog
The current stable LLVM BPF backend fails to compile the BPF selftests
due to a compiler bug. The bug has been fixed in trunk, but that fix
hasn't landed in the binary packages I'm using yet (Fedora arm64).
Without this workaround the tests don't compile for me.
This patch triggers a preprocessor warning on LLVM versions that
definitely have the bug. The test may be conservative (ie, I'm not sure
if 9.1 will have the fix), but it should at least make the current set
of stable releases work together.
See https://reviews.llvm.org/D69438 for more information on the fix. I
obtained the workaround from
https://lore.kernel.org/linux-kselftest/aed8eda7-df20-069b-ea14-f0662898456…
Fixes: 20a9ad2e7136 ("selftests/bpf: add CO-RE relocs array tests")
Signed-off-by: Palmer Dabbelt <palmerdabbelt(a)google.com>
---
.../testing/selftests/bpf/progs/test_core_reloc_arrays.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c b/tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c
index 89951b684282..e5eafdab80a4 100644
--- a/tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c
+++ b/tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c
@@ -43,15 +43,23 @@ int test_core_arrays(void *ctx)
/* in->a[2] */
if (CORE_READ(&out->a2, &in->a[2]))
return 1;
+#if defined(__clang__) && (__clang_major__ < 10) && (__clang_minor__ < 1)
+# warning "clang 9.0 SEGVs on multidimensional arrays, see https://reviews.llvm.org/D69438"
+#else
/* in->b[1][2][3] */
if (CORE_READ(&out->b123, &in->b[1][2][3]))
return 1;
+#endif
/* in->c[1].c */
if (CORE_READ(&out->c1c, &in->c[1].c))
return 1;
+#if defined(__clang__) && (__clang_major__ < 10) && (__clang_minor__ < 1)
+# warning "clang 9.0 SEGVs on multidimensional arrays, see https://reviews.llvm.org/D69438"
+#else
/* in->d[0][0].d */
if (CORE_READ(&out->d00d, &in->d[0][0].d))
return 1;
+#endif
return 0;
}
--
2.25.0.341.g760bfbb309-goog
A few of the lists used in the linked-list KUnit tests (the
for_each_entry{,_reverse} tests) are declared 'static', and so are
not-reinitialised if the test runs multiple times. This was not a
problem when KUnit tests were run once on startup, but when tests are
able to be run manually (e.g. from debugfs[1]), this is no longer the
case.
Making these lists no longer 'static' causes the lists to be
reinitialised, and the test passes each time it is run. While there may
be some value in testing that initialising static lists works, the
for_each_entry_* tests are unlikely to be the right place for it.
Signed-off-by: David Gow <davidgow(a)google.com>
---
lib/list-test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/list-test.c b/lib/list-test.c
index 76babb1df889..ee09505df16f 100644
--- a/lib/list-test.c
+++ b/lib/list-test.c
@@ -659,7 +659,7 @@ static void list_test_list_for_each_prev_safe(struct kunit *test)
static void list_test_list_for_each_entry(struct kunit *test)
{
struct list_test_struct entries[5], *cur;
- static LIST_HEAD(list);
+ LIST_HEAD(list);
int i = 0;
for (i = 0; i < 5; ++i) {
@@ -680,7 +680,7 @@ static void list_test_list_for_each_entry(struct kunit *test)
static void list_test_list_for_each_entry_reverse(struct kunit *test)
{
struct list_test_struct entries[5], *cur;
- static LIST_HEAD(list);
+ LIST_HEAD(list);
int i = 0;
for (i = 0; i < 5; ++i) {
--
2.25.0.341.g760bfbb309-goog
The reuseport tests currently suffer from a race condition: RST
packets count towards DROP_ERR_SKB_DATA, since they don't contain
a valid struct cmd. Tests will spuriously fail depending on whether
check_results is called before or after the RST is processed.
Exit the BPF program early if FIN is set.
Signed-off-by: Lorenz Bauer <lmb(a)cloudflare.com>
---
.../selftests/bpf/progs/test_select_reuseport_kern.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c b/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
index d69a1f2bbbfd..26e77dcc7e91 100644
--- a/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
@@ -113,6 +113,12 @@ int _select_by_skb_data(struct sk_reuseport_md *reuse_md)
data_check.skb_ports[0] = th->source;
data_check.skb_ports[1] = th->dest;
+ if (th->fin)
+ /* The connection is being torn down at the end of a
+ * test. It can't contain a cmd, so return early.
+ */
+ return SK_PASS;
+
if ((th->doff << 2) + sizeof(*cmd) > data_check.len)
GOTO_DONE(DROP_ERR_SKB_DATA);
if (bpf_skb_load_bytes(reuse_md, th->doff << 2, &cmd_copy,
--
2.20.1
Currently, there is a lot of false positives if a single reuseport test
fails. This is because expected_results and the result map are not cleared.
Zero both after individual test runs, which fixes the mentioned false
positives.
Signed-off-by: Lorenz Bauer <lmb(a)cloudflare.com>
Fixes: 91134d849a0e ("bpf: Test BPF_PROG_TYPE_SK_REUSEPORT")
---
.../selftests/bpf/prog_tests/select_reuseport.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/select_reuseport.c b/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
index e7e56929751c..098bcae5f827 100644
--- a/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
+++ b/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
@@ -33,7 +33,7 @@
#define REUSEPORT_ARRAY_SIZE 32
static int result_map, tmp_index_ovr_map, linum_map, data_check_map;
-static enum result expected_results[NR_RESULTS];
+static __u32 expected_results[NR_RESULTS];
static int sk_fds[REUSEPORT_ARRAY_SIZE];
static int reuseport_array = -1, outer_map = -1;
static int select_by_skb_data_prog;
@@ -697,7 +697,19 @@ static void setup_per_test(int type, sa_family_t family, bool inany,
static void cleanup_per_test(bool no_inner_map)
{
- int i, err;
+ int i, err, zero = 0;
+
+ memset(expected_results, 0, sizeof(expected_results));
+
+ for (i = 0; i < NR_RESULTS; i++) {
+ err = bpf_map_update_elem(result_map, &i, &zero, BPF_ANY);
+ RET_IF(err, "reset elem in result_map",
+ "i:%u err:%d errno:%d\n", i, err, errno);
+ }
+
+ err = bpf_map_update_elem(linum_map, &zero, &zero, BPF_ANY);
+ RET_IF(err, "reset line number in linum_map", "err:%d errno:%d\n",
+ err, errno);
for (i = 0; i < REUSEPORT_ARRAY_SIZE; i++)
close(sk_fds[i]);
--
2.20.1
The reuseport tests currently suffer from a race condition: FIN
packets count towards DROP_ERR_SKB_DATA, since they don't contain
a valid struct cmd. Tests will spuriously fail depending on whether
check_results is called before or after the FIN is processed.
Exit the BPF program early if FIN is set.
Signed-off-by: Lorenz Bauer <lmb(a)cloudflare.com>
Fixes: 91134d849a0e ("bpf: Test BPF_PROG_TYPE_SK_REUSEPORT")
---
.../selftests/bpf/progs/test_select_reuseport_kern.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c b/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
index d69a1f2bbbfd..26e77dcc7e91 100644
--- a/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
@@ -113,6 +113,12 @@ int _select_by_skb_data(struct sk_reuseport_md *reuse_md)
data_check.skb_ports[0] = th->source;
data_check.skb_ports[1] = th->dest;
+ if (th->fin)
+ /* The connection is being torn down at the end of a
+ * test. It can't contain a cmd, so return early.
+ */
+ return SK_PASS;
+
if ((th->doff << 2) + sizeof(*cmd) > data_check.len)
GOTO_DONE(DROP_ERR_SKB_DATA);
if (bpf_skb_load_bytes(reuse_md, th->doff << 2, &cmd_copy,
--
2.20.1
Currently, there is a lot of false positives if a single reuseport test
fails. This is because expected_results and the result map are not cleared.
Zero both after individual test runs, which fixes the mentioned false
positives.
Signed-off-by: Lorenz Bauer <lmb(a)cloudflare.com>
---
.../selftests/bpf/prog_tests/select_reuseport.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/select_reuseport.c b/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
index 09a536af139a..0bab8b1ca1c3 100644
--- a/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
+++ b/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
@@ -699,7 +699,19 @@ static void setup_per_test(int type, sa_family_t family, bool inany,
static void cleanup_per_test(bool no_inner_map)
{
- int i, err;
+ int i, err, zero = 0;
+
+ memset(expected_results, 0, sizeof(expected_results));
+
+ for (i = 0; i < NR_RESULTS; i++) {
+ err = bpf_map_update_elem(result_map, &i, &zero, BPF_ANY);
+ RET_IF(err, "reset elem in result_map",
+ "i:%u err:%d errno:%d\n", i, err, errno);
+ }
+
+ err = bpf_map_update_elem(linum_map, &zero, &zero, BPF_ANY);
+ RET_IF(err, "reset line number in linum_map", "err:%d errno:%d\n",
+ err, errno);
for (i = 0; i < REUSEPORT_ARRAY_SIZE; i++)
close(sk_fds[i]);
--
2.20.1