When running the ftrace selftests, 2 failures and 6 unresolved
cases were observed. The failures can be avoided by setting
a sysctl prior to test execution (fixed in patch 1) and the
unresolved cases result from absence of testing modules which
are built based on CONFIG options being set and program
availability (fixed in patch 2).
These seem more like "unsupported" than "unresolved" errors,
since for the ftrace tests "unresolved" cases cause the test
(and thus kselftest) to report failure. With these changes
in place, the unresolved cases become unsupported and the
test failures disappear, resulting in the ftracetest program
exiting with "ok" status.
Alan Maguire (2):
ftrace/selftests: workaround cgroup RT scheduling issues
ftrace/selftest: absence of modules/programs should trigger
unsupported errors
tools/testing/selftests/ftrace/ftracetest | 23 ++++++++++++++++++++++
.../ftrace/test.d/direct/ftrace-direct.tc | 2 +-
.../ftrace/test.d/direct/kprobe-direct.tc | 2 +-
.../selftests/ftrace/test.d/event/trace_printk.tc | 2 +-
.../ftrace/test.d/ftrace/func_mod_trace.tc | 2 +-
.../ftrace/test.d/kprobe/kprobe_module.tc | 2 +-
.../selftests/ftrace/test.d/selftest/bashisms.tc | 2 +-
7 files changed, 29 insertions(+), 6 deletions(-)
--
1.8.3.1
When kunit tests are run on native (i.e. non-UML) environments, the results
of test execution are often intermixed with dmesg output. This patch
series attempts to solve this by providing a debugfs representation
of the results of the last test run, available as
/sys/kernel/debug/kunit/<testsuite>/results
Changes since v2:
- updated kunit_status2str() to kunit_status_to_string() and made it
static inline in include/kunit/test.h (Brendan)
- added log string to struct kunit_suite and kunit_case, with log
pointer in struct kunit pointing at the case log. This allows us
to collect kunit_[err|info|warning]() messages at the same time
as we printk() them. This solves for the most part the sharing
of log messages between test execution and debugfs since we
just print the suite log (which contains the test suite preamble)
and the individual test logs. The only exception is the suite-level
status, which we cannot store in the suite log as it would mean
we'd print the suite and its status prior to the suite's results.
(Brendan, patch 1)
- dropped debugfs-based kunit run patch for now so as not to cause
problems with tests currently under development (Brendan)
- fixed doc issues with code block (Brendan, patch 3)
Changes since v1:
- trimmed unneeded include files in lib/kunit/debugfs.c (Greg)
- renamed global debugfs functions to be prefixed with kunit_ (Greg)
- removed error checking for debugfs operations (Greg)
Alan Maguire (2):
kunit: add debugfs /sys/kernel/debug/kunit/<suite>/results display
kunit: update documentation to describe debugfs representation
Documentation/dev-tools/kunit/usage.rst | 12 +++
include/kunit/test.h | 52 ++++++++++--
lib/kunit/Makefile | 3 +-
lib/kunit/debugfs.c | 105 ++++++++++++++++++++++++
lib/kunit/debugfs.h | 16 ++++
lib/kunit/kunit-test.c | 4 +-
lib/kunit/test.c | 136 ++++++++++++++++++++++++--------
7 files changed, 286 insertions(+), 42 deletions(-)
create mode 100644 lib/kunit/debugfs.c
create mode 100644 lib/kunit/debugfs.h
--
1.8.3.1
Implemented small fix so that the script changes work directories to the
linux directory where kunit.py is run. This enables the user to run
kunit from any working directory. Originally considered using
os.path.join but this is more error prone as we would have to find all
file path usages and modify them accordingly. Using os.chdir ensures
that the entire script is run within /linux.
Signed-off-by: Heidi Fahim <heidifahim(a)google.com>
---
tools/testing/kunit/kunit.py | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index e59eb9e7f923..3cc7be7b28a0 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -35,6 +35,13 @@ def create_default_kunitconfig():
shutil.copyfile('arch/um/configs/kunit_defconfig',
kunit_kernel.kunitconfig_path)
+def get_kernel_root_path():
+ parts = sys.argv[0] if not __file__ else __file__
+ parts = os.path.realpath(parts).split('tools/testing/kunit')
+ if len(parts) != 2:
+ sys.exit(1)
+ return parts[0]
+
def run_tests(linux: kunit_kernel.LinuxSourceTree,
request: KunitRequest) -> KunitResult:
config_start = time.time()
@@ -114,6 +121,9 @@ def main(argv, linux=None):
cli_args = parser.parse_args(argv)
if cli_args.subcommand == 'run':
+ if get_kernel_root_path():
+ os.chdir(get_kernel_root_path())
+
if cli_args.build_dir:
if not os.path.exists(cli_args.build_dir):
os.mkdir(cli_args.build_dir)
--
2.25.0.341.g760bfbb309-goog
There's four patches here, but only one of them actually does anything. The
first patch fixes a BPF selftests build failure on my machine and has already
been sent to the list separately. The next three are just staged such that
there are some patches that avoid changing any functionality pulled out from
the whole point of those refactorings, with two cleanups and then the idea.
Maybe this is an odd thing to say in a cover letter, but I'm not actually sure
this patch set is a good idea. The issue of extra moves after calls came up as
I was reviewing some unrelated performance optimizations to the RISC-V BPF JIT.
I figured I'd take a whack at performing the optimization in the context of the
arm64 port just to get a breath of fresh air, and I'm not convinced I like the
results.
That said, I think I would accept something like this for the RISC-V port
because we're already doing a multi-pass optimization for shrinking function
addresses so it's not as much extra complexity over there. If we do that we
should probably start puling some of this code into the shared BPF compiler,
but we're also opening the doors to more complicated BPF JIT optimizations.
Given that the BPF JIT appears to have been designed explicitly to be
simple/fast as opposed to perform complex optimization, I'm not sure this is a
sane way to move forward.
I figured I'd send the patch set out as more of a question than anything else.
Specifically:
* How should I go about measuring the performance of these sort of
optimizations? I'd like to balance the time it takes to run the JIT with the
time spent executing the program, but I don't have any feel for what real BPF
programs look like or have any benchmark suite to run. Is there something
out there this should be benchmarked against? (I'd also like to know that to
run those benchmarks on the RISC-V port.)
* Is this the sort of thing that makes sense in a BPF JIT? I guess I've just
realized I turned "review this patch" into a way bigger rabbit hole than I
really want to go down...
I worked on top of 5.4 for these, but trivially different versions of the
patches applied on Linus' master a few days ago when I tried. LMK if those
aren't sane places to start from over here, I'm new to both arm64 and BPF so I
might be a bit lost.
[PATCH 1/4] selftests/bpf: Elide a check for LLVM versions that can't
[PATCH 2/4] arm64: bpf: Convert bpf2a64 to a function
[PATCH 3/4] arm64: bpf: Split the read and write halves of dst
[PATCH 4/4] arm64: bpf: Elide some moves to a0 after calls
When kunit tests are run on native (i.e. non-UML) environments, the results
of test execution are often intermixed with dmesg output. This patch
series attempts to solve this by providing a debugfs representation
of the results of the last test run, available as
/sys/kernel/debug/kunit/<testsuite>/results
Changes since v3:
- added CONFIG_KUNIT_DEBUGFS to support conditional compilation of debugfs
representation, including string logging (Frank, patch 1)
- removed unneeded NULL check for test_case in kunit_suite_for_each_test_case()
(Frank, patch 1)
- added kunit log test to verify logging multiple strings works
(Frank, patch 2)
- rephrased description of results file (Frank, patch 3)
Changes since v2:
- updated kunit_status2str() to kunit_status_to_string() and made it
static inline in include/kunit/test.h (Brendan)
- added log string to struct kunit_suite and kunit_case, with log
pointer in struct kunit pointing at the case log. This allows us
to collect kunit_[err|info|warning]() messages at the same time
as we printk() them. This solves for the most part the sharing
of log messages between test execution and debugfs since we
just print the suite log (which contains the test suite preamble)
and the individual test logs. The only exception is the suite-level
status, which we cannot store in the suite log as it would mean
we'd print the suite and its status prior to the suite's results.
(Brendan, patch 1)
- dropped debugfs-based kunit run patch for now so as not to cause
problems with tests currently under development (Brendan)
- fixed doc issues with code block (Brendan, patch 3)
Changes since v1:
- trimmed unneeded include files in lib/kunit/debugfs.c (Greg)
- renamed global debugfs functions to be prefixed with kunit_ (Greg)
- removed error checking for debugfs operations (Greg)
Alan Maguire (3):
kunit: add debugfs /sys/kernel/debug/kunit/<suite>/results display
kunit: add log test
kunit: update documentation to describe debugfs representation
Documentation/dev-tools/kunit/usage.rst | 13 ++++
include/kunit/test.h | 54 +++++++++++---
lib/kunit/Kconfig | 8 +++
lib/kunit/Makefile | 4 ++
lib/kunit/debugfs.c | 116 ++++++++++++++++++++++++++++++
lib/kunit/debugfs.h | 30 ++++++++
lib/kunit/kunit-test.c | 31 +++++++-
lib/kunit/test.c | 122 ++++++++++++++++++++++++--------
8 files changed, 336 insertions(+), 42 deletions(-)
create mode 100644 lib/kunit/debugfs.c
create mode 100644 lib/kunit/debugfs.h
--
1.8.3.1
Remove some of the outmoded "Why KUnit" rationale, and move some
UML-specific information to the kunit_tool page. Also update the Getting
Started guide to mention running tests without the kunit_tool wrapper.
Signed-off-by: David Gow <davidgow(a)google.com>
---
Thanks: I've added a note to the "Why KUnit" section which I think
better reflects both the UML is optional, and that it can be used both
with and without kunit_tool.
Changelog:
v3:
- Added a note that KUnit can be used with UML, both with and without
kunit_tool to replace the section moved to kunit_tool.
v2:
https://lore.kernel.org/linux-kselftest/f99a3d4d-ad65-5fd1-3407-db33f378b1f…
- Reinstated the "Why Kunit?" section, minus the comparison with other
testing frameworks (covered in the FAQ), and the description of UML.
- Moved the description of UML into to kunit_tool page.
- Tidied up the wording around how KUnit is built and run to make it
work
without the UML description.
v1:
https://lore.kernel.org/linux-kselftest/9c703dea-a9e1-94e2-c12d-3cb0a09e75a…
- Initial patch
Documentation/dev-tools/kunit/index.rst | 41 ++++++----
Documentation/dev-tools/kunit/kunit-tool.rst | 7 ++
Documentation/dev-tools/kunit/start.rst | 80 ++++++++++++++++----
3 files changed, 100 insertions(+), 28 deletions(-)
diff --git a/Documentation/dev-tools/kunit/index.rst b/Documentation/dev-tools/kunit/index.rst
index d16a4d2c3a41..9e77d0c90ff4 100644
--- a/Documentation/dev-tools/kunit/index.rst
+++ b/Documentation/dev-tools/kunit/index.rst
@@ -17,14 +17,23 @@ What is KUnit?
==============
KUnit is a lightweight unit testing and mocking framework for the Linux kernel.
-These tests are able to be run locally on a developer's workstation without a VM
-or special hardware.
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining unit test
cases, grouping related test cases into test suites, providing common
infrastructure for running tests, and much more.
+KUnit consists of a kernel component, which provides a set of macros for easily
+writing unit tests. Tests written against KUnit will run on kernel boot if
+built-in, or when loaded if built as a module. These tests write out results to
+the kernel log in `TAP <https://testanything.org/>`_ format.
+
+To make running these tests (and reading the results) easier, KUnit offsers
+:doc:`kunit_tool <kunit-tool>`, which builds a `User Mode Linux
+<http://user-mode-linux.sourceforge.net>`_ kernel, runs it, and parses the test
+results. This provides a quick way of running KUnit tests during development,
+without requiring a virtual machine or separate hardware.
+
Get started now: :doc:`start`
Why KUnit?
@@ -36,22 +45,21 @@ allow all possible code paths to be tested in the code under test; this is only
possible if the code under test is very small and does not have any external
dependencies outside of the test's control like hardware.
-Outside of KUnit, there are no testing frameworks currently
-available for the kernel that do not require installing the kernel on a test
-machine or in a VM and all require tests to be written in userspace running on
-the kernel; this is true for Autotest, and kselftest, disqualifying
-any of them from being considered unit testing frameworks.
-
-KUnit addresses the problem of being able to run tests without needing a virtual
-machine or actual hardware with User Mode Linux. User Mode Linux is a Linux
-architecture, like ARM or x86; however, unlike other architectures it compiles
-to a standalone program that can be run like any other program directly inside
-of a host operating system; to be clear, it does not require any virtualization
-support; it is just a regular program.
+KUnit provides a common framework for unit tests within the kernel.
-Alternatively, kunit and kunit tests can be built as modules and tests will
+KUnit tests can be run on most kernel configurations, and most tests are
+architecture independent. All built-in KUnit tests run on kernel startup.
+Alternatively, KUnit and KUnit tests can be built as modules and tests will
run when the test module is loaded.
+.. note::
+
+ KUnit can also run tests without needing a virtual machine or actual
+ hardware under User Mode Linux. User Mode Linux is a Linux architecture,
+ like ARM or x86, which compiles the kernel as a Linux executable. KUnit
+ can be used with UML either by building with ``ARCH=um`` (like any other
+ architecture), or by using :doc:`kunit_tool <kunit-tool>`.
+
KUnit is fast. Excluding build time, from invocation to completion KUnit can run
several dozen tests in only 10 to 20 seconds; this might not sound like a big
deal to some people, but having such fast and easy to run tests fundamentally
@@ -75,9 +83,12 @@ someone sends you some code. Why trust that someone ran all their tests
correctly on every change when you can just run them yourself in less time than
it takes to read their test log?
+
How do I use it?
================
* :doc:`start` - for new users of KUnit
* :doc:`usage` - for a more detailed explanation of KUnit features
* :doc:`api/index` - for the list of KUnit APIs used for testing
+* :doc:`kunit-tool` - for more information on the kunit_tool helper script
+* :doc:`faq` - for answers to some common questions about KUnit
diff --git a/Documentation/dev-tools/kunit/kunit-tool.rst b/Documentation/dev-tools/kunit/kunit-tool.rst
index 50d46394e97e..949af2da81e5 100644
--- a/Documentation/dev-tools/kunit/kunit-tool.rst
+++ b/Documentation/dev-tools/kunit/kunit-tool.rst
@@ -12,6 +12,13 @@ the Linux kernel as UML (`User Mode Linux
<http://user-mode-linux.sourceforge.net/>`_), running KUnit tests, parsing
the test results and displaying them in a user friendly manner.
+kunit_tool addresses the problem of being able to run tests without needing a
+virtual machine or actual hardware with User Mode Linux. User Mode Linux is a
+Linux architecture, like ARM or x86; however, unlike other architectures it
+compiles the kernel as a standalone Linux executable that can be run like any
+other program directly inside of a host operating system. To be clear, it does
+not require any virtualization support: it is just a regular program.
+
What is a kunitconfig?
======================
diff --git a/Documentation/dev-tools/kunit/start.rst b/Documentation/dev-tools/kunit/start.rst
index 4e1d24db6b13..e1c5ce80ce12 100644
--- a/Documentation/dev-tools/kunit/start.rst
+++ b/Documentation/dev-tools/kunit/start.rst
@@ -9,11 +9,10 @@ Installing dependencies
KUnit has the same dependencies as the Linux kernel. As long as you can build
the kernel, you can run KUnit.
-KUnit Wrapper
-=============
-Included with KUnit is a simple Python wrapper that helps format the output to
-easily use and read KUnit output. It handles building and running the kernel, as
-well as formatting the output.
+Running tests with the KUnit Wrapper
+====================================
+Included with KUnit is a simple Python wrapper which runs tests under User Mode
+Linux, and formats the test results.
The wrapper can be run with:
@@ -21,22 +20,42 @@ The wrapper can be run with:
./tools/testing/kunit/kunit.py run --defconfig
-For more information on this wrapper (also called kunit_tool) checkout the
+For more information on this wrapper (also called kunit_tool) check out the
:doc:`kunit-tool` page.
Creating a .kunitconfig
-=======================
-The Python script is a thin wrapper around Kbuild. As such, it needs to be
-configured with a ``.kunitconfig`` file. This file essentially contains the
-regular Kernel config, with the specific test targets as well.
-
+-----------------------
+If you want to run a specific set of tests (rather than those listed in the
+KUnit defconfig), you can provide Kconfig options in the ``.kunitconfig`` file.
+This file essentially contains the regular Kernel config, with the specific
+test targets as well. The ``.kunitconfig`` should also contain any other config
+options required by the tests.
+
+A good starting point for a ``.kunitconfig`` is the KUnit defconfig:
.. code-block:: bash
cd $PATH_TO_LINUX_REPO
cp arch/um/configs/kunit_defconfig .kunitconfig
-Verifying KUnit Works
----------------------
+You can then add any other Kconfig options you wish, e.g.:
+.. code-block:: none
+
+ CONFIG_LIST_KUNIT_TEST=y
+
+:doc:`kunit_tool <kunit-tool>` will ensure that all config options set in
+``.kunitconfig`` are set in the kernel ``.config`` before running the tests.
+It'll warn you if you haven't included the dependencies of the options you're
+using.
+
+.. note::
+ Note that removing something from the ``.kunitconfig`` will not trigger a
+ rebuild of the ``.config`` file: the configuration is only updated if the
+ ``.kunitconfig`` is not a subset of ``.config``. This means that you can use
+ other tools (such as make menuconfig) to adjust other config options.
+
+
+Running the tests
+-----------------
To make sure that everything is set up correctly, simply invoke the Python
wrapper from your kernel repo:
@@ -62,6 +81,41 @@ followed by a list of tests that are run. All of them should be passing.
Because it is building a lot of sources for the first time, the
``Building KUnit kernel`` step may take a while.
+Running tests without the KUnit Wrapper
+=======================================
+
+If you'd rather not use the KUnit Wrapper (if, for example, you need to
+integrate with other systems, or use an architecture other than UML), KUnit can
+be included in any kernel, and the results read out and parsed manually.
+
+.. note::
+ KUnit is not designed for use in a production system, and it's possible that
+ tests may reduce the stability or security of the system.
+
+
+
+Configuring the kernel
+----------------------
+
+In order to enable KUnit itself, you simply need to enable the ``CONFIG_KUNIT``
+Kconfig option (it's under Kernel Hacking/Kernel Testing and Coverage in
+menuconfig). From there, you can enable any KUnit tests you want: they usually
+have config options ending in ``_KUNIT_TEST``.
+
+KUnit and KUnit tests can be compiled as modules: in this case the tests in a
+module will be run when the module is loaded.
+
+Running the tests
+-----------------
+
+Build and run your kernel as usual. Test output will be written to the kernel
+log in `TAP <https://testanything.org/>`_ format.
+
+.. note::
+ It's possible that there will be other lines and/or data interspersed in the
+ TAP output.
+
+
Writing your first test
=======================
--
2.25.0.265.gbab2e86ba0-goog
There is an important use-case which is not possible with the
"rseq" (Restartable Sequences) system call, which was left as
future work.
That use-case is to modify user-space per-cpu data structures
belonging to specific CPUs which may be brought offline and
online again by CPU hotplug. This can be used by memory
allocators to migrate free memory pools when CPUs are brought
offline, or by ring buffer consumers to target specific per-CPU
buffers, even when CPUs are brought offline.
A few rather complex prior attempts were made to solve this.
Those were based on in-kernel interpreters (cpu_opv, do_on_cpu).
That complexity was generally frowned upon, even by their author.
This patch fulfills this use-case in a refreshingly simple way:
it introduces a "pin_on_cpu" system call, which allows user-space
threads to pin themselves on a specific CPU (which needs to be
present in the thread's allowed cpu mask), and then clear this
pinned state.
"But this can already be done with sched_setaffinity", some
would rightfully reply. However, there is a significant twist
in the way pin_on_cpu deals with CPU hotplug compared to the
allowed cpu mask.
When all CPUs part of the thread's allowed cpu mask are offlined,
this mask is effectively reset to include all CPUs. This behavior
is completely incompatible with modifying per-cpu data structures:
the updates then become racy between concurrent CPUs trying to
update the given per-cpu data.
Conversely, all threads pinned on a given CPU with pin_on_cpu are
guaranteed to be scheduled on the same runqueue when that CPU is
offline. If that CPU is brought back online, the CPU hotplug
scheduler hooks are responsible for migrating back the tasks to
their pinned CPU.
For instance, this allows implementing this userspace library API
for incrementing a per-cpu counter for a specific cpu number
received as parameter:
static inline __attribute__((always_inline))
int percpu_addv(intptr_t *v, intptr_t count, int cpu)
{
int ret;
ret = rseq_addv(v, count, cpu);
check:
if (rseq_unlikely(ret)) {
pin_on_cpu_set(cpu);
ret = rseq_addv(v, count, percpu_current_cpu());
pin_on_cpu_clear();
goto check;
}
return 0;
}
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Joel Fernandes <joelaf(a)google.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Watson <davejwatson(a)fb.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Andi Kleen <andi(a)firstfloor.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: "H . Peter Anvin" <hpa(a)zytor.com>
Cc: Chris Lameter <cl(a)linux.com>
Cc: Russell King <linux(a)arm.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages(a)gmail.com>
Cc: "Paul E . McKenney" <paulmck(a)linux.vnet.ibm.com>
Cc: Paul Turner <pjt(a)google.com>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: Josh Triplett <josh(a)joshtriplett.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Ben Maurer <bmaurer(a)fb.com>
Cc: linux-api(a)vger.kernel.org
Cc: Andy Lutomirski <luto(a)amacapital.net>
---
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
fs/exec.c | 1 +
include/linux/sched.h | 1 +
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 5 +-
include/uapi/linux/sched.h | 6 +
init/init_task.c | 1 +
kernel/sched/core.c | 321 +++++++++++++++++++++++--
kernel/sched/deadline.c | 54 +++--
kernel/sched/fair.c | 19 ++
kernel/sched/rt.c | 15 +-
kernel/sched/sched.h | 28 +++
kernel/sys_ni.c | 1 +
14 files changed, 413 insertions(+), 42 deletions(-)
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 15908eb9b17e..0b1081a9b872 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -440,3 +440,4 @@
433 i386 fspick sys_fspick __ia32_sys_fspick
434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open
435 i386 clone3 sys_clone3 __ia32_sys_clone3
+436 i386 pin_on_cpu sys_pin_on_cpu __ia32_sys_pin_on_cpu
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index c29976eca4a8..90f9b3cab88d 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -357,6 +357,7 @@
433 common fspick __x64_sys_fspick
434 common pidfd_open __x64_sys_pidfd_open
435 common clone3 __x64_sys_clone3/ptregs
+436 common pin_on_cpu __x64_sys_pin_on_cpu
#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/exec.c b/fs/exec.c
index c27231234764..6d882dbdd1e3 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1827,6 +1827,7 @@ static int __do_execve_file(int fd, struct filename *filename,
current->fs->in_exec = 0;
current->in_execve = 0;
rseq_execve(current);
+ current->pinned_cpu = -1;
acct_update_integrals(current);
task_numa_free(current, false);
free_bprm(bprm);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7f0bb6fff27c..ac0cac7b8d1d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -651,6 +651,7 @@ struct task_struct {
/* Current CPU: */
unsigned int cpu;
#endif
+ int pinned_cpu;
unsigned int wakee_flips;
unsigned long wakee_flip_decay_ts;
struct task_struct *last_wakee;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index be0d0cf788ba..46fee5af99e3 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1000,6 +1000,7 @@ asmlinkage long sys_fspick(int dfd, const char __user *path, unsigned int flags)
asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
siginfo_t __user *info,
unsigned int flags);
+asmlinkage long sys_pin_on_cpu(int cmd, int flags, int cpu);
/*
* Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 1fc8faa6e973..43b0c956cc3c 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -851,8 +851,11 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open)
__SYSCALL(__NR_clone3, sys_clone3)
#endif
+#define __NR_pin_on_cpu 436
+__SYSCALL(__NR_pin_on_cpu, sys_pin_on_cpu)
+
#undef __NR_syscalls
-#define __NR_syscalls 436
+#define __NR_syscalls 437
/*
* 32 bit systems traditionally used different
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 25b4fa00bad1..590cdc613698 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -114,4 +114,10 @@ struct clone_args {
SCHED_FLAG_KEEP_ALL | \
SCHED_FLAG_UTIL_CLAMP)
+enum pin_on_cpu_cmd {
+ PIN_ON_CPU_CMD_QUERY = 0,
+ PIN_ON_CPU_CMD_SET = (1 << 0),
+ PIN_ON_CPU_CMD_CLEAR = (1 << 1),
+};
+
#endif /* _UAPI_LINUX_SCHED_H */
diff --git a/init/init_task.c b/init/init_task.c
index 9e5cbe5eab7b..9aabce589cc7 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -88,6 +88,7 @@ struct task_struct init_task
.tasks = LIST_HEAD_INIT(init_task.tasks),
#ifdef CONFIG_SMP
.pushable_tasks = PLIST_NODE_INIT(init_task.pushable_tasks, MAX_PRIO),
+ .pinned_cpu = -1,
#endif
#ifdef CONFIG_CGROUP_SCHED
.sched_task_group = &root_task_group,
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8dacda4b0362..6ca904d6e0ef 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -52,6 +52,8 @@ const_debug unsigned int sysctl_sched_features =
#undef SCHED_FEAT
#endif
+#define PIN_ON_CPU_CMD_BITMASK (PIN_ON_CPU_CMD_SET | PIN_ON_CPU_CMD_CLEAR)
+
/*
* Number of tasks to iterate in a single balance run.
* Limited because this is done with IRQs disabled.
@@ -1457,8 +1459,13 @@ static inline bool is_per_cpu_kthread(struct task_struct *p)
*/
static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
{
- if (!cpumask_test_cpu(cpu, p->cpus_ptr))
- return false;
+ if (is_pinned_task(p)) {
+ if (!allowed_pinned_cpu(p, cpu))
+ return false;
+ } else {
+ if (!cpumask_test_cpu(cpu, p->cpus_ptr))
+ return false;
+ }
if (is_per_cpu_kthread(p))
return cpu_online(cpu);
@@ -1662,6 +1669,12 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
goto out;
}
+ /* Prevent removing the currently pinned CPU from the allowed cpu mask. */
+ if (is_pinned_task(p) && !cpumask_test_cpu(p->pinned_cpu, new_mask)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
do_set_cpus_allowed(p, new_mask);
if (p->flags & PF_KTHREAD) {
@@ -1674,6 +1687,10 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
p->nr_cpus_allowed != 1);
}
+ /* Task pinned to a CPU overrides allowed cpu mask. */
+ if (is_pinned_task(p))
+ goto out;
+
/* Can the task run on the task's current CPU? If so, we're done */
if (cpumask_test_cpu(task_cpu(p), new_mask))
goto out;
@@ -1813,11 +1830,20 @@ static int migrate_swap_stop(void *data)
if (task_cpu(arg->src_task) != arg->src_cpu)
goto unlock;
- if (!cpumask_test_cpu(arg->dst_cpu, arg->src_task->cpus_ptr))
- goto unlock;
-
- if (!cpumask_test_cpu(arg->src_cpu, arg->dst_task->cpus_ptr))
- goto unlock;
+ if (is_pinned_task(arg->src_task)) {
+ if (!allowed_pinned_cpu(arg->src_task, arg->dst_cpu))
+ goto unlock;
+ } else {
+ if (!cpumask_test_cpu(arg->dst_cpu, arg->src_task->cpus_ptr))
+ goto unlock;
+ }
+ if (is_pinned_task(arg->dst_task)) {
+ if (!allowed_pinned_cpu(arg->dst_task, arg->src_cpu))
+ goto unlock;
+ } else {
+ if (!cpumask_test_cpu(arg->src_cpu, arg->dst_task->cpus_ptr))
+ goto unlock;
+ }
__migrate_swap_task(arg->src_task, arg->dst_cpu);
__migrate_swap_task(arg->dst_task, arg->src_cpu);
@@ -1858,11 +1884,21 @@ int migrate_swap(struct task_struct *cur, struct task_struct *p,
if (!cpu_active(arg.src_cpu) || !cpu_active(arg.dst_cpu))
goto out;
- if (!cpumask_test_cpu(arg.dst_cpu, arg.src_task->cpus_ptr))
- goto out;
+ if (is_pinned_task(arg.src_task)) {
+ if (!allowed_pinned_cpu(arg.src_task, arg.dst_cpu))
+ goto out;
+ } else {
+ if (!cpumask_test_cpu(arg.dst_cpu, arg.src_task->cpus_ptr))
+ goto out;
+ }
- if (!cpumask_test_cpu(arg.src_cpu, arg.dst_task->cpus_ptr))
- goto out;
+ if (is_pinned_task(arg.dst_task)) {
+ if (!allowed_pinned_cpu(arg.dst_task, arg.src_cpu))
+ goto out;
+ } else {
+ if (!cpumask_test_cpu(arg.src_cpu, arg.dst_task->cpus_ptr))
+ goto out;
+ }
trace_sched_swap_numa(cur, arg.src_cpu, p, arg.dst_cpu);
ret = stop_two_cpus(arg.dst_cpu, arg.src_cpu, migrate_swap_stop, &arg);
@@ -2034,6 +2070,18 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
enum { cpuset, possible, fail } state = cpuset;
int dest_cpu;
+ /*
+ * If the task is pinned to a CPU which is online, pick that pinned CPU
+ * number.
+ * If the task is pinned to a CPU which is offline, pick a CPU which is
+ * guaranteed to be the same for all tasks pinned to that offlined CPU.
+ */
+ if (is_pinned_task(p)) {
+ if (cpu_online(p->pinned_cpu))
+ return p->pinned_cpu;
+ else
+ return pinned_cpu_offline_offload(p);
+ }
/*
* If the node that the CPU is on has been offlined, cpu_to_node()
* will return -1. There is no CPU on the node, and we should
@@ -2104,10 +2152,15 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
{
lockdep_assert_held(&p->pi_lock);
- if (p->nr_cpus_allowed > 1)
- cpu = p->sched_class->select_task_rq(p, cpu, sd_flags, wake_flags);
- else
- cpu = cpumask_any(p->cpus_ptr);
+ if (is_pinned_task(p))
+ cpu = p->pinned_cpu;
+ else {
+ if (p->nr_cpus_allowed > 1)
+ cpu = p->sched_class->select_task_rq(p, cpu, sd_flags,
+ wake_flags);
+ else
+ cpu = cpumask_any(p->cpus_ptr);
+ }
/*
* In order not to call set_task_cpu() on a blocking task we need
@@ -6130,8 +6183,13 @@ int migrate_task_to(struct task_struct *p, int target_cpu)
if (curr_cpu == target_cpu)
return 0;
- if (!cpumask_test_cpu(target_cpu, p->cpus_ptr))
- return -EINVAL;
+ if (is_pinned_task(p)) {
+ if (!allowed_pinned_cpu(p, target_cpu))
+ return -EINVAL;
+ } else {
+ if (!cpumask_test_cpu(target_cpu, p->cpus_ptr))
+ return -EINVAL;
+ }
/* TODO: This is not properly updating schedstats */
@@ -6300,6 +6358,7 @@ static void migrate_tasks(struct rq *dead_rq, struct rq_flags *rf)
rq->stop = stop;
}
+
#endif /* CONFIG_HOTPLUG_CPU */
void set_rq_online(struct rq *rq)
@@ -6380,11 +6439,100 @@ static int cpuset_cpu_inactive(unsigned int cpu)
return 0;
}
+static bool skip_pinned_task(int pinned_cpu, int cpu,
+ bool first_online)
+{
+ if (pinned_cpu < 0)
+ return true;
+ if (first_online) {
+ if (cpu_online(pinned_cpu) && pinned_cpu != cpu)
+ return true;
+ } else {
+ if (pinned_cpu != cpu)
+ return true;
+ }
+ return false;
+}
+
+static void sched_cpu_migrate_pinned_tasks(unsigned int cpu)
+{
+ struct rq *rq = cpu_rq(cpu);
+ struct task_struct *p, *t;
+ bool first_online = false;
+
+ if (cpu == cpumask_first(cpu_online_mask))
+ first_online = true;
+
+ /*
+ * This state transition (online && !active) when going online
+ * only allow bound kthreads to be scheduled.
+ * At this point, the CPU is completely online and running,
+ * but no userspace tasks are scheduled yet.
+ */
+ read_lock(&tasklist_lock);
+ for_each_process_thread(p, t) {
+ struct rq *target_rq;
+ struct rq_flags rf;
+ int pinned_cpu;
+
+ /*
+ * Migrate t to cpu if pinned to this cpu.
+ *
+ * Migrate t to cpu if its pinned cpu is offline
+ * and cpu becomes the new first online cpu.
+ *
+ * Transition of t->pinned_cpu to cpu can only
+ * happen if the thread is scheduled on cpu, which
+ * is impossible at this point because the cpu is
+ * not active.
+ *
+ * Transition of t->pinned_cpu from cpu to -1 or some
+ * other cpu number may happen concurrently. Therefore,
+ * skip early (without rq lock), and check again with
+ * the rq lock held to eliminate concurrent transitions
+ * from cpu to -1 or some other cpu number.
+ */
+ pinned_cpu = READ_ONCE(t->pinned_cpu);
+ if (skip_pinned_task(pinned_cpu, cpu, first_online))
+ continue;
+ if (pinned_cpu == cpu)
+ printk("pin_on_cpu migrate to owner: online cpu %d\n",
+ cpu);
+ if (first_online && !cpu_online(pinned_cpu))
+ printk("pin_on_cpu migrate to new offload cpu %d\n",
+ cpu);
+ target_rq = task_rq_lock(t, &rf);
+ pinned_cpu = t->pinned_cpu;
+ if (skip_pinned_task(pinned_cpu, cpu, first_online))
+ goto unlock;
+ WARN_ON_ONCE(target_rq == rq);
+ update_rq_clock(target_rq);
+ if (task_running(target_rq, t) || t->state == TASK_WAKING) {
+ struct migration_arg arg = { t, cpu };
+ /* Need help from migration thread: drop lock and wait. */
+ task_rq_unlock(target_rq, t, &rf);
+ stop_one_cpu(cpu_of(target_rq), migration_cpu_stop, &arg);
+ continue;
+ } else if (task_on_rq_queued(t)) {
+ /*
+ * OK, since we're going to drop the lock immediately
+ * afterwards anyway.
+ */
+ rq = move_queued_task(target_rq, &rf, t, cpu);
+ }
+ unlock:
+ task_rq_unlock(target_rq, t, &rf);
+ }
+ read_unlock(&tasklist_lock);
+}
+
int sched_cpu_activate(unsigned int cpu)
{
struct rq *rq = cpu_rq(cpu);
struct rq_flags rf;
+ sched_cpu_migrate_pinned_tasks(cpu);
+
#ifdef CONFIG_SCHED_SMT
/*
* When going up, increment the number of cores with SMT present.
@@ -7899,6 +8047,145 @@ struct cgroup_subsys cpu_cgrp_subsys = {
#endif /* CONFIG_CGROUP_SCHED */
+static void do_set_pinned_cpu(struct task_struct *p, int cpu)
+{
+ struct rq *rq = task_rq(p);
+ bool queued, running;
+
+ lockdep_assert_held(&p->pi_lock);
+
+ queued = task_on_rq_queued(p);
+ running = task_current(rq, p);
+
+ if (queued) {
+ /*
+ * Because __kthread_bind() calls this on blocked tasks without
+ * holding rq->lock.
+ */
+ lockdep_assert_held(&rq->lock);
+ dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK);
+ }
+ if (running)
+ put_prev_task(rq, p);
+
+ WRITE_ONCE(p->pinned_cpu, cpu);
+
+ if (queued)
+ enqueue_task(rq, p, ENQUEUE_RESTORE | ENQUEUE_NOCLOCK);
+ if (running)
+ set_next_task(rq, p);
+}
+
+static int __do_pin_on_cpu(int cpu)
+{
+ struct task_struct *p = current;
+ struct rq_flags rf;
+ struct rq *rq;
+ int ret = 0, dest_cpu;
+ struct migration_arg arg = { p };
+
+ cpus_read_lock();
+ rq = task_rq_lock(p, &rf);
+ update_rq_clock(rq);
+ if (cpu >= 0 && !cpumask_test_cpu(cpu, current->cpus_ptr)) {
+ ret = -EINVAL;
+ goto out;
+ }
+#ifdef CONFIG_SMP
+ do_set_pinned_cpu(p, cpu);
+ if (cpu >= 0) {
+ if (cpu_online(cpu))
+ dest_cpu = cpu;
+ else
+ dest_cpu = pinned_cpu_offline_offload(p);
+ if (task_cpu(p) == dest_cpu) {
+ /*
+ * If the task already runs on the pinned cpu, we're
+ * done.
+ */
+ goto out;
+ }
+ } else {
+ /*
+ * When clearing the pinned cpu, we may need to migrate the
+ * current task if it is currently sitting on a runqueue which
+ * does not belong to the allowed mask.
+ */
+ dest_cpu = cpumask_any(p->cpus_ptr);
+ }
+ arg.dest_cpu = dest_cpu;
+ /* Need help from migration thread: drop lock and wait. */
+ task_rq_unlock(rq, p, &rf);
+ stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
+
+ /* Preempt disable prevents hotplug on current cpu. */
+ preempt_disable();
+ WARN_ON_ONCE(cpu >= 0 && cpu_online(cpu) &&
+ smp_processor_id() != cpu);
+ preempt_enable();
+ cpus_read_unlock();
+ return 0;
+#endif
+out:
+ task_rq_unlock(rq, p, &rf);
+ cpus_read_unlock();
+ return ret;
+}
+
+static int pin_on_cpu_set(int cpu)
+{
+ if (cpu < 0 || !cpu_possible(cpu)) {
+ return -EINVAL;
+ }
+ return __do_pin_on_cpu(cpu);
+}
+
+static int pin_on_cpu_clear(void)
+{
+ return __do_pin_on_cpu(-1);
+}
+
+/*
+ * sys_pin_on_cpu - pin current task to a specific cpu.
+ * @cmd: command to issue (enum pin_on_cpu_cmd)
+ * @flags: system call flags
+ * @cpu: cpu where the task should run.
+ *
+ * Returns -EINVAL if cmd is unknown.
+ * Returns -EINVAL if flags are unknown.
+ * Returns -EINVAL if the CPU is not part of the possible CPUs.
+ * Returns -EINVAL if the CPU is not part of the allowed cpu mask
+ * for the current task.
+ *
+ * PIN_ON_CPU_CMD_QUERY: Return the mask of supported commands.
+ * PIN_ON_CPU_CMD_SET: Pin the current task to a specific CPU.
+ * PIN_ON_CPU_CMD_CLEAR: Clear cpu pinning for current task.
+ *
+ * If the pinned CPU is online, the current task will run on that CPU.
+ *
+ * If the pinned CPU is offline, the scheduler guarantees that
+ * all tasks pinned to that CPU number are moved to the same
+ * runqueue.
+ *
+ * Removing the pinned CPU from the task's allowed cpu mask is
+ * forbidden.
+ */
+SYSCALL_DEFINE3(pin_on_cpu, int, cmd, int, flags, int, cpu)
+{
+ if (unlikely(flags))
+ return -EINVAL;
+ switch (cmd) {
+ case PIN_ON_CPU_CMD_QUERY:
+ return PIN_ON_CPU_CMD_BITMASK;
+ case PIN_ON_CPU_CMD_SET:
+ return pin_on_cpu_set(cpu);
+ case PIN_ON_CPU_CMD_CLEAR:
+ return pin_on_cpu_clear();
+ default:
+ return -EINVAL;
+ }
+}
+
void dump_cpu_task(int cpu)
{
pr_info("Task dump for CPU %d:\n", cpu);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index a8a08030a8f7..8a1581e8509e 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -535,24 +535,31 @@ static struct rq *dl_task_offline_migration(struct rq *rq, struct task_struct *p
if (!later_rq) {
int cpu;
- /*
- * If we cannot preempt any rq, fall back to pick any
- * online CPU:
- */
- cpu = cpumask_any_and(cpu_active_mask, p->cpus_ptr);
- if (cpu >= nr_cpu_ids) {
- /*
- * Failed to find any suitable CPU.
- * The task will never come back!
- */
- BUG_ON(dl_bandwidth_enabled());
-
+ if (is_pinned_task(p)) {
+ if (cpu_online(p->pinned_cpu))
+ cpu = p->pinned_cpu;
+ else
+ cpu = pinned_cpu_offline_offload(p);
+ } else {
/*
- * If admission control is disabled we
- * try a little harder to let the task
- * run.
+ * If we cannot preempt any rq, fall back to pick any
+ * online CPU:
*/
- cpu = cpumask_any(cpu_active_mask);
+ cpu = cpumask_any_and(cpu_active_mask, p->cpus_ptr);
+ if (cpu >= nr_cpu_ids) {
+ /*
+ * Failed to find any suitable CPU.
+ * The task will never come back!
+ */
+ BUG_ON(dl_bandwidth_enabled());
+
+ /*
+ * If admission control is disabled we
+ * try a little harder to let the task
+ * run.
+ */
+ cpu = cpumask_any(cpu_active_mask);
+ }
}
later_rq = cpu_rq(cpu);
double_lock_balance(rq, later_rq);
@@ -1836,9 +1843,15 @@ static void task_fork_dl(struct task_struct *p)
static int pick_dl_task(struct rq *rq, struct task_struct *p, int cpu)
{
- if (!task_running(rq, p) &&
- cpumask_test_cpu(cpu, p->cpus_ptr))
- return 1;
+ if (!task_running(rq, p)) {
+ if (is_pinned_task(p)) {
+ if (allowed_pinned_cpu(p, cpu))
+ return 1;
+ } else {
+ if (cpumask_test_cpu(cpu, p->cpus_ptr))
+ return 1;
+ }
+ }
return 0;
}
@@ -1987,7 +2000,8 @@ static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
/* Retry if something changed. */
if (double_lock_balance(rq, later_rq)) {
if (unlikely(task_rq(task) != rq ||
- !cpumask_test_cpu(later_rq->cpu, task->cpus_ptr) ||
+ (is_pinned_task(task) && !allowed_pinned_cpu(task, later_rq->cpu)) ||
+ (!is_pinned_task(task) && !cpumask_test_cpu(later_rq->cpu, task->cpus_ptr)) ||
task_running(rq, task) ||
!dl_task(task) ||
!task_on_rq_queued(task))) {
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 69a81a5709ff..e96ae1ce9829 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7223,6 +7223,25 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
lockdep_assert_held(&env->src_rq->lock);
+ if (is_pinned_task(p)) {
+ if (task_running(env->src_rq, p)) {
+ schedstat_inc(p->se.statistics.nr_failed_migrations_running);
+ return 0;
+ }
+ if (cpu_online(p->pinned_cpu)) {
+ if (env->dst_cpu == p->pinned_cpu)
+ return 1;
+ else
+ return 0;
+ } else {
+ if (env->dst_cpu ==
+ pinned_cpu_offline_offload(p))
+ return 1;
+ else
+ return 0;
+ }
+ }
+
/*
* We do not migrate tasks that are:
* 1) throttled_lb_pair, or
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 9b8adc01be3d..2774311e5750 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1600,9 +1600,15 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
{
- if (!task_running(rq, p) &&
- cpumask_test_cpu(cpu, p->cpus_ptr))
- return 1;
+ if (!task_running(rq, p)) {
+ if (is_pinned_task(p)) {
+ if (allowed_pinned_cpu(p, cpu))
+ return 1;
+ } else {
+ if (cpumask_test_cpu(cpu, p->cpus_ptr))
+ return 1;
+ }
+ }
return 0;
}
@@ -1738,7 +1744,8 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
* Also make sure that it wasn't scheduled on its rq.
*/
if (unlikely(task_rq(task) != rq ||
- !cpumask_test_cpu(lowest_rq->cpu, task->cpus_ptr) ||
+ (is_pinned_task(task) && !allowed_pinned_cpu(task, lowest_rq->cpu)) ||
+ (!is_pinned_task(task) && !cpumask_test_cpu(lowest_rq->cpu, task->cpus_ptr)) ||
task_running(rq, task) ||
!rt_task(task) ||
!task_on_rq_queued(task))) {
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 49ed949f850c..922bc618cc87 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -187,6 +187,34 @@ static inline int task_has_dl_policy(struct task_struct *p)
return dl_policy(p->policy);
}
+/*
+ * All tasks which require to be pinned on offlined CPUs are sent
+ * to runqueue of the first online CPU.
+ */
+static inline bool is_pinned_task(struct task_struct *p)
+{
+ return p->pinned_cpu >= 0;
+}
+
+static inline int pinned_cpu_offline_offload(struct task_struct *p)
+{
+ return cpumask_first(cpu_online_mask);
+}
+
+static inline bool allowed_pinned_cpu(struct task_struct *p, int cpu)
+{
+ if (!cpu_possible(cpu))
+ return false;
+ if (cpu_online(p->pinned_cpu)) {
+ if (p->pinned_cpu == cpu)
+ return true;
+ } else {
+ if (cpu == pinned_cpu_offline_offload(p))
+ return true;
+ }
+ return false;
+}
+
#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
/*
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 34b76895b81e..7e5192cd8d9d 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -449,3 +449,4 @@ COND_SYSCALL(setuid16);
/* restartable sequence */
COND_SYSCALL(rseq);
+COND_SYSCALL(pin_on_cpu);
--
2.17.1
From: Siddhesh Poyarekar <siddhesh(a)gotplt.org>
[ Upstream commit 6b64a650f0b2ae3940698f401732988699eecf7a ]
It was observed[1] on arm64 that __builtin_strlen led to an infinite
loop in the get_size selftest. This is because __builtin_strlen (and
other builtins) may sometimes result in a call to the C library
function. The C library implementation of strlen uses an IFUNC
resolver to load the most efficient strlen implementation for the
underlying machine and hence has a PLT indirection even for static
binaries. Because this binary avoids the C library startup routines,
the PLT initialization never happens and hence the program gets stuck
in an infinite loop.
On x86_64 the __builtin_strlen just happens to expand inline and avoid
the call but that is not always guaranteed.
Further, while testing on x86_64 (Fedora 31), it was observed that the
test also failed with a segfault inside write() because the generated
code for the write function in glibc seems to access TLS before the
syscall (probably due to the cancellation point check) and fails
because TLS is not initialised.
To mitigate these problems, this patch reduces the interface with the
C library to just the syscall function. The syscall function still
sets errno on failure, which is undesirable but for now it only
affects cases where syscalls fail.
[1] https://bugs.linaro.org/show_bug.cgi?id=5479
Signed-off-by: Siddhesh Poyarekar <siddhesh(a)gotplt.org>
Reported-by: Masami Hiramatsu <masami.hiramatsu(a)linaro.org>
Tested-by: Masami Hiramatsu <masami.hiramatsu(a)linaro.org>
Reviewed-by: Tim Bird <tim.bird(a)sony.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/size/get_size.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/size/get_size.c b/tools/testing/selftests/size/get_size.c
index d4b59ab979a09..f55943b6d1e2a 100644
--- a/tools/testing/selftests/size/get_size.c
+++ b/tools/testing/selftests/size/get_size.c
@@ -12,23 +12,35 @@
* own execution. It also attempts to have as few dependencies
* on kernel features as possible.
*
- * It should be statically linked, with startup libs avoided.
- * It uses no library calls, and only the following 3 syscalls:
+ * It should be statically linked, with startup libs avoided. It uses
+ * no library calls except the syscall() function for the following 3
+ * syscalls:
* sysinfo(), write(), and _exit()
*
* For output, it avoids printf (which in some C libraries
* has large external dependencies) by implementing it's own
* number output and print routines, and using __builtin_strlen()
+ *
+ * The test may crash if any of the above syscalls fails because in some
+ * libc implementations (e.g. the GNU C Library) errno is saved in
+ * thread-local storage, which does not get initialized due to avoiding
+ * startup libs.
*/
#include <sys/sysinfo.h>
#include <unistd.h>
+#include <sys/syscall.h>
#define STDOUT_FILENO 1
static int print(const char *s)
{
- return write(STDOUT_FILENO, s, __builtin_strlen(s));
+ size_t len = 0;
+
+ while (s[len] != '\0')
+ len++;
+
+ return syscall(SYS_write, STDOUT_FILENO, s, len);
}
static inline char *num_to_str(unsigned long num, char *buf, int len)
@@ -80,12 +92,12 @@ void _start(void)
print("TAP version 13\n");
print("# Testing system size.\n");
- ccode = sysinfo(&info);
+ ccode = syscall(SYS_sysinfo, &info);
if (ccode < 0) {
print("not ok 1");
print(test_name);
print(" ---\n reason: \"could not get sysinfo\"\n ...\n");
- _exit(ccode);
+ syscall(SYS_exit, ccode);
}
print("ok 1");
print(test_name);
@@ -101,5 +113,5 @@ void _start(void)
print(" ...\n");
print("1..1\n");
- _exit(0);
+ syscall(SYS_exit, 0);
}
--
2.20.1
From: Lorenz Bauer <lmb(a)cloudflare.com>
[ Upstream commit 51bad0f05616c43d6d34b0a19bcc9bdab8e8fb39 ]
Currently, there is a lot of false positives if a single reuseport test
fails. This is because expected_results and the result map are not cleared.
Zero both after individual test runs, which fixes the mentioned false
positives.
Fixes: 91134d849a0e ("bpf: Test BPF_PROG_TYPE_SK_REUSEPORT")
Signed-off-by: Lorenz Bauer <lmb(a)cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub(a)cloudflare.com>
Acked-by: Martin KaFai Lau <kafai(a)fb.com>
Acked-by: John Fastabend <john.fastabend(a)gmail.com>
Link: https://lore.kernel.org/bpf/20200124112754.19664-5-lmb@cloudflare.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/bpf/test_select_reuseport.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_select_reuseport.c b/tools/testing/selftests/bpf/test_select_reuseport.c
index 75646d9b34aaa..cdbbdab2725fc 100644
--- a/tools/testing/selftests/bpf/test_select_reuseport.c
+++ b/tools/testing/selftests/bpf/test_select_reuseport.c
@@ -30,7 +30,7 @@
#define REUSEPORT_ARRAY_SIZE 32
static int result_map, tmp_index_ovr_map, linum_map, data_check_map;
-static enum result expected_results[NR_RESULTS];
+static __u32 expected_results[NR_RESULTS];
static int sk_fds[REUSEPORT_ARRAY_SIZE];
static int reuseport_array, outer_map;
static int select_by_skb_data_prog;
@@ -610,7 +610,19 @@ static void setup_per_test(int type, unsigned short family, bool inany)
static void cleanup_per_test(void)
{
- int i, err;
+ int i, err, zero = 0;
+
+ memset(expected_results, 0, sizeof(expected_results));
+
+ for (i = 0; i < NR_RESULTS; i++) {
+ err = bpf_map_update_elem(result_map, &i, &zero, BPF_ANY);
+ RET_IF(err, "reset elem in result_map",
+ "i:%u err:%d errno:%d\n", i, err, errno);
+ }
+
+ err = bpf_map_update_elem(linum_map, &zero, &zero, BPF_ANY);
+ RET_IF(err, "reset line number in linum_map", "err:%d errno:%d\n",
+ err, errno);
for (i = 0; i < REUSEPORT_ARRAY_SIZE; i++)
close(sk_fds[i]);
--
2.20.1