Here is a first batch of fixes for v6.5 and older.
The fixes are not linked to each others.
Patch 1 ensures subflows are unhashed before cleaning the backlog to
avoid races. This fixes another recent fix from v6.4.
Patch 2 does not rely on implicit state check in mptcp_listen() to avoid
races when receiving an MP_FASTCLOSE. A regression from v5.17.
The rest fixes issues in the selftests.
Patch 3 makes sure errors when setting up the environment are no longer
ignored. For v5.17+.
Patch 4 uses 'iptables-legacy' if available to be able to run on older
kernels. A fix for v5.13 and newer.
Patch 5 catches errors when issues are detected with packet marks. Also
for v5.13+.
Patch 6 uses the correct variable instead of an undefined one. Even if
there was no visible impact, it can help to find regressions later. An
issue visible in v5.19+.
Patch 7 makes sure errors with some sub-tests are reported to have the
selftest marked as failed as expected. Also for v5.19+.
Patch 8 adds a kernel config that is required to execute MPTCP
selftests. It is valid for v5.9+.
Patch 9 fixes issues when validating the userspace path-manager with
32-bit arch, an issue affecting v5.19+.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Matthieu Baerts (7):
selftests: mptcp: connect: fail if nft supposed to work
selftests: mptcp: sockopt: use 'iptables-legacy' if available
selftests: mptcp: sockopt: return error if wrong mark
selftests: mptcp: userspace_pm: use correct server port
selftests: mptcp: userspace_pm: report errors with 'remove' tests
selftests: mptcp: depend on SYN_COOKIES
selftests: mptcp: pm_nl_ctl: fix 32-bit support
Paolo Abeni (2):
mptcp: ensure subflow is unhashed before cleaning the backlog
mptcp: do not rely on implicit state check in mptcp_listen()
net/mptcp/protocol.c | 7 +++++-
tools/testing/selftests/net/mptcp/config | 1 +
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 3 +++
tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 29 ++++++++++++----------
tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 10 ++++----
tools/testing/selftests/net/mptcp/userspace_pm.sh | 4 ++-
6 files changed, 34 insertions(+), 20 deletions(-)
---
base-commit: 14bb236b29922c4f57d8c05bfdbcb82677f917c9
change-id: 20230704-upstream-net-20230704-misc-fixes-6-5-rc1-c52608649559
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
From: Jeff Xu <jeffxu(a)google.com>
When sysctl vm.memfd_noexec is 2 (MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED),
memfd_create(.., MFD_EXEC) should fail.
This complies with how MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED is
defined - "memfd_create() without MFD_NOEXEC_SEAL will be rejected"
Thanks to Dominique Martinet <asmadeus(a)codewreck.org> who reported the bug.
see [1] for context.
[1] https://lore.kernel.org/linux-mm/CABi2SkXUX_QqTQ10Yx9bBUGpN1wByOi_=gZU6WEy5…
History:
V2: fix build error when CONFIG_SYSCTL is not defined.
V1: initial version
https://lore.kernel.org/linux-mm/20230630031721.623955-3-jeffxu@google.com/…
Jeff Xu (2):
mm/memfd: sysctl: fix MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED
selftests/memfd: sysctl: fix MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED
mm/memfd.c | 57 +++++++++++++---------
tools/testing/selftests/memfd/memfd_test.c | 5 ++
2 files changed, 38 insertions(+), 24 deletions(-)
--
2.41.0.255.g8b1d071c50-goog
Hello.
I am Frank Jody Dawson, I have investors and they are seeking to invest in any lucrative venture worldwide, like aviation, real estate, agriculture, industrial, medical equipment and renewable energy. My investors are mainly from the Arabian countries who are widely in real estate and oil and gas, but now they want to expand their businesses across the globe in any lucrative business.
Your profile caught my attention so I decided to message and see if we
can work together?
Thank you,
Frank.
BPF applications, e.g., a TCP congestion control, might benefit from
precise packet timestamps. These timestamps are already available in
__sk_buff and bpf_sock_ops, but could not be requested: A BPF program
was not allowed to set SO_TIMESTAMPING* on a socket. This change enables
BPF programs to actively request the generation of timestamps from a
stream socket.
To reuse the setget_sockopt BPF prog test for
bpf_{get,set}sockopt(SO_TIMESTAMPING_NEW), also implement the missing
getsockopt(SO_TIMESTAMPING_NEW) in the network stack.
I reckon the way I added getsockopt(SO_TIMESTAMPING_NEW) causes an API
change: For existing users that set SO_TIMESTAMPING_NEW but queried
SO_TIMESTAMPING_OLD afterwards, it would now look as if no timestamping
flags have been set. Is this an acceptable change? If not, I’m happy to
change getsockopt() to only be strict about the newly-implemented
getsockopt(SO_TIMESTAMPING_NEW), or not distinguish between
SO_TIMESTAMPING_NEW and SO_TIMESTAMPING_OLD at all.
Jörn-Thorben Hinz (2):
net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)
bpf: Allow setting SO_TIMESTAMPING* with bpf_setsockopt()
include/uapi/linux/bpf.h | 3 ++-
net/core/filter.c | 2 ++
net/core/sock.c | 9 +++++++--
tools/include/uapi/linux/bpf.h | 3 ++-
tools/testing/selftests/bpf/progs/bpf_tracing_net.h | 2 ++
tools/testing/selftests/bpf/progs/setget_sockopt.c | 4 ++++
6 files changed, 19 insertions(+), 4 deletions(-)
--
2.39.2
Hi Jon, Shuah & others,
I'd like to discuss with you with regards to test documentation.
I had some preliminary discussions with people interested on improving
tests during EOSS last week in Prague, as we're working to improve media
test coverage as well. During such discussions, I talked with developers
from several companies that have been collaboration and/or using Kernel
CI. I also talked with Nikolai from Red Hat, who gave a presentation about
Kernel CI, which points that one of the areas to be improved there is
documentation.
So, it seems it is worth having some discussions about how to improve
Kernel test documentation.
While kernel_doc does a pretty decent job documenting functions and data
structures, for tests, the most important things to be documented are:
a. what the tests do;
b. what functionalities they are testing.
This is a lot more important than documenting functions - and the used
data structures on tests are typically the ones that are part of the
driver's kAPI or uAPI, so they should be documented somewhere else.
Usually, (b) is not so simple, as, at least for complex hardware,
the tested features are grouped on an hierarchical way, like:
1. hardware
1.1 DMA engine
1.2 output ports
...
2. firmware
2.1 firmware load
2.2 firmware DMA actions
...
3. kernel features
3.1 memory allocation
3.2 mmap
3.3 bind/unbind
...
CI engines running the test sets usually want to produce a report that will
be providing pass rates for the tested features and functionalites that
are available at the driver's and their respective hardware and firmware.
I've doing some work at the tool we use to test DRM code [1] in order to
have a decent documentation of the tests we have hosted there, focusing
mostly on tests for i915 and Xe Intel drivers, also covering documentation
for DRM core tests - while providing support for other vendors to also
improve their test documentation for IGT - IGT GPU tools and tests.
The documentation tool I developed is generic enough to be used for other
test sets and I believe it could be useful as well to document Kselftest
and KUnit.
The core of the tool (at test_list.py) is a Python class, with some callers
(igt_doc.py, xls_to_doc.py, doc_to_xls.py), being extensible enough to
also have other callers to integrate with external tools. We are
developing internally one to integrate with our internal Grafana reports
to report the pass rate per documented feature, in an hierarchical way.
Something similar to:
1. hardware pass rate: 98% (98 tests passed of 100)
1.1 DMA engine pass rate: 80% (8 tests passed of 10)
1.2 output ports pass rate: 100% (10 tests passed of 10)
...
It is based on the concept that test documentation should be placed as
close as possible to the actual code implementing the test sets. It was
also be developed in a way that the documentation grouping is flexible.
The code was written from the scratch in Python and was implemented
inside a class that can also be re-used to do do other nice things,
like importing/exporting test documentation to spreadsheets and
integration with other tools (like Grafana).
The actual documentation tags look like this:
/**
* TEST: Check if new IGT test documentation logic functionality is working
* Category: Software build block
* Sub-category: documentation
* Functionality: test documentation
* Issue: none
* Description: Complete description of this test
*
* SUBTEST: foo
* Description: do foo things
* description continuing on another line
*
* SUBTEST: bar
* Description: do bar things
* description continuing on another line
* Functionality: bar test doc
*/
And it has support for wildcards.
There, "TEST" is associated to the contents of the file, while "SUBTEST"
refers to each specific subtest inside it. The valid fields are imported
from JSON config files, and can be placed into an hierarchical way, in
order to produce an hierarchical documentation. Fields defined at the
"TEST" level are imported on "SUBTEST", but can be overriden.
The JSON config file looks like this:
https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/158feaa20fa2b9424ee…
The output is in ReST, which can be generated in hierarchical or per-file
way. The hierarchical output looks like this:
$ ./scripts/igt_doc.py --config tests/xe/xe_test_config.json --file fubar_tests.c
===============================
Implemented Tests for Xe Driver
===============================
Category: Software build block
==============================
Sub-category: documentation
---------------------------
Functionality: bar test doc
^^^^^^^^^^^^^^^^^^^^^^^^^^^
``igt@fubar_tests@bar``
:Description: do bar things description continuing on another line
:Issue: none
Functionality: test documentation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``igt@fubar_tests@foo``
:Description: do foo things description continuing on another line
:Issue: none
(if --file is not used, it will use all C files specified at the
configuration)
The tool already skips tags like the ones used by kernel-doc[1], so one
could have both function documentation and per-test documentation on
the same file, if needed.
While such tool was conceived to be part of IGT, it doesn't have anything
specific for it [2], and I do believe it would be a great contribution to
the Kernel to have such tool upstreamed, and integrated as a Sphinx
extension.
If we decide to go ahead adding it, I can work on a patchset to apply
it to the Kernel, modifying the scripts to better fit at the Kernel
needs and start with some documentation examples for i915,
DRM core and upcoming Xe KUnit tests.
Comments?
Regards,
Mauro
[1] It should be trivial to patch kernel-doc for it to skip TEST and
SUBTEST tags if we decide to integrate it to the kernel.
[2] except that tests there are named after IGT, as
<igt <test>@<subtest>@<dynamic_subtest>, but a change to a
Kernel-specific namespace would be trivial
Hi Noah,
On Thu, May 25, 2023 at 8:04 PM tip-bot2 for Noah Goldstein
<tip-bot2(a)linutronix.de> wrote:
> The following commit has been merged into the x86/misc branch of tip:
>
> Commit-ID: 688eb8191b475db5acfd48634600b04fd3dda9ad
> Gitweb: https://git.kernel.org/tip/688eb8191b475db5acfd48634600b04fd3dda9ad
> Author: Noah Goldstein <goldstein.w.n(a)gmail.com>
> AuthorDate: Wed, 10 May 2023 20:10:02 -05:00
> Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
> CommitterDate: Thu, 25 May 2023 10:55:18 -07:00
>
> x86/csum: Improve performance of `csum_partial`
>
> 1) Add special case for len == 40 as that is the hottest value. The
> nets a ~8-9% latency improvement and a ~30% throughput improvement
> in the len == 40 case.
>
> 2) Use multiple accumulators in the 64-byte loop. This dramatically
> improves ILP and results in up to a 40% latency/throughput
> improvement (better for more iterations).
>
> Results from benchmarking on Icelake. Times measured with rdtsc()
> len lat_new lat_old r tput_new tput_old r
> 8 3.58 3.47 1.032 3.58 3.51 1.021
> 16 4.14 4.02 1.028 3.96 3.78 1.046
> 24 4.99 5.03 0.992 4.23 4.03 1.050
> 32 5.09 5.08 1.001 4.68 4.47 1.048
> 40 5.57 6.08 0.916 3.05 4.43 0.690
> 48 6.65 6.63 1.003 4.97 4.69 1.059
> 56 7.74 7.72 1.003 5.22 4.95 1.055
> 64 6.65 7.22 0.921 6.38 6.42 0.994
> 96 9.43 9.96 0.946 7.46 7.54 0.990
> 128 9.39 12.15 0.773 8.90 8.79 1.012
> 200 12.65 18.08 0.699 11.63 11.60 1.002
> 272 15.82 23.37 0.677 14.43 14.35 1.005
> 440 24.12 36.43 0.662 21.57 22.69 0.951
> 952 46.20 74.01 0.624 42.98 53.12 0.809
> 1024 47.12 78.24 0.602 46.36 58.83 0.788
> 1552 72.01 117.30 0.614 71.92 96.78 0.743
> 2048 93.07 153.25 0.607 93.28 137.20 0.680
> 2600 114.73 194.30 0.590 114.28 179.32 0.637
> 3608 156.34 268.41 0.582 154.97 254.02 0.610
> 4096 175.01 304.03 0.576 175.89 292.08 0.602
>
> There is no such thing as a free lunch, however, and the special case
> for len == 40 does add overhead to the len != 40 cases. This seems to
> amount to be ~5% throughput and slightly less in terms of latency.
>
> Testing:
> Part of this change is a new kunit test. The tests check all
> alignment X length pairs in [0, 64) X [0, 512).
> There are three cases.
> 1) Precomputed random inputs/seed. The expected results where
> generated use the generic implementation (which is assumed to be
> non-buggy).
> 2) An input of all 1s. The goal of this test is to catch any case
> a carry is missing.
> 3) An input that never carries. The goal of this test si to catch
> any case of incorrectly carrying.
>
> More exhaustive tests that test all alignment X length pairs in
> [0, 8192) X [0, 8192] on random data are also available here:
> https://github.com/goldsteinn/csum-reproduction
>
> The reposity also has the code for reproducing the above benchmark
> numbers.
>
> Signed-off-by: Noah Goldstein <goldstein.w.n(a)gmail.com>
> Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Thanks for your patch, which is now commit 688eb8191b475db5 ("x86/csum:
Improve performance of `csum_partial`") in linus/master stable/master
> Link: https://lore.kernel.org/all/20230511011002.935690-1-goldstein.w.n%40gmail.c…
This does not seem to be a message sent to a public mailing list
archived at lore (yet).
On m68k (ARAnyM):
KTAP version 1
# Subtest: checksum
1..3
# test_csum_fixed_random_inputs: ASSERTION FAILED at
lib/checksum_kunit.c:243
Expected result == expec, but
result == 54991 (0xd6cf)
expec == 33316 (0x8224)
not ok 1 test_csum_fixed_random_inputs
# test_csum_all_carry_inputs: ASSERTION FAILED at lib/checksum_kunit.c:267
Expected result == expec, but
result == 255 (0xff)
expec == 65280 (0xff00)
Endianness issue in the test?
not ok 2 test_csum_all_carry_inputs
# test_csum_no_carry_inputs: ASSERTION FAILED at lib/checksum_kunit.c:306
Expected result == expec, but
result == 64515 (0xfc03)
expec == 0 (0x0)
not ok 3 test_csum_no_carry_inputs
# checksum: pass:0 fail:3 skip:0 total:3
# Totals: pass:0 fail:3 skip:0 total:3
not ok 1 checksum
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert(a)linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
KVM_GET_REG_LIST will dump all register IDs that are available to
KVM_GET/SET_ONE_REG and It's very useful to identify some platform
regression issue during VM migration.
Patch 1-7 re-structured the get-reg-list test in aarch64 to make some
of the code as common test framework that can be shared by riscv.
Patch 8 move reject_set check logic to a function so as to check for
different errno for different registers.
Patch 9 move finalize_vcpu back to run_test so that riscv can implement
its specific operation.
Patch 10 change to do the get/set operation only on present-blessed list.
Patch 11 add the skip_set facilities so that riscv can skip set operation
on some registers.
Patch 12 enabled the KVM_GET_REG_LIST API in riscv.
patch 13 added the corresponding kselftest for checking possible
register regressions.
The get-reg-list kvm selftest was ported from aarch64 and tested with
Linux v6.4 on a Qemu riscv64 virt machine.
---
Changed since v4:
* Rebase to v6.4
* Address Andrew's suggestions and comments:
Added skip_set concept
Updated errno check logic
Modified finalize_vcpu as weak function
Andrew Jones (7):
KVM: arm64: selftests: Replace str_with_index with strdup_printf
KVM: arm64: selftests: Drop SVE cap check in print_reg
KVM: arm64: selftests: Remove print_reg's dependency on vcpu_config
KVM: arm64: selftests: Rename vcpu_config and add to kvm_util.h
KVM: arm64: selftests: Delete core_reg_fixup
KVM: arm64: selftests: Split get-reg-list test code
KVM: arm64: selftests: Finish generalizing get-reg-list
Haibo Xu (6):
KVM: arm64: selftests: Move reject_set check logic to a function
KVM: arm64: selftests: Move finalize_vcpu back to run_test
KVM: selftests: Only do get/set tests on present blessed list
KVM: selftests: Add skip_set facility to get_reg_list test
KVM: riscv: Add KVM_GET_REG_LIST API support
KVM: riscv: selftests: Add get-reg-list test
Documentation/virt/kvm/api.rst | 2 +-
arch/riscv/kvm/vcpu.c | 375 +++++++++
tools/testing/selftests/kvm/Makefile | 11 +-
.../selftests/kvm/aarch64/get-reg-list.c | 544 ++----------
tools/testing/selftests/kvm/get-reg-list.c | 395 +++++++++
.../selftests/kvm/include/kvm_util_base.h | 21 +
.../selftests/kvm/include/riscv/processor.h | 3 +
.../testing/selftests/kvm/include/test_util.h | 2 +
tools/testing/selftests/kvm/lib/test_util.c | 15 +
.../selftests/kvm/riscv/get-reg-list.c | 780 ++++++++++++++++++
10 files changed, 1655 insertions(+), 493 deletions(-)
create mode 100644 tools/testing/selftests/kvm/get-reg-list.c
create mode 100644 tools/testing/selftests/kvm/riscv/get-reg-list.c
--
2.34.1
Writing `subprocess.Popen[str]` requires python 3.9+.
kunit.py has an assertion that the python version is 3.7+, so we should
try to stay backwards compatible.
This conflicts a bit with commit 1da2e6220e11 ("kunit: tool: fix
pre-existing `mypy --strict` errors and update run_checks.py"), since
mypy complains like so
> kunit_kernel.py:95: error: Missing type parameters for generic type "Popen" [type-arg]
Note: `mypy --strict --python-version 3.7` does not work.
We could annotate each file with comments like
`# mypy: disable-error-code="type-arg"
but then we might still get nudged to break back-compat in other files.
This patch adds a `mypy.ini` file since it seems like the only way to
disable specific error codes for all our files.
Note: run_checks.py doesn't need to specify `--config_file mypy.ini`,
but I think being explicit is better, particularly since most kernel
devs won't be familiar with how mypy works.
Fixes: 695e26030858 ("kunit: tool: add subscripts for type annotations where appropriate")
Reported-by: SeongJae Park <sj(a)kernel.org>
Link: https://lore.kernel.org/linux-kselftest/20230501171520.138753-1-sj@kernel.o…
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
tools/testing/kunit/kunit_kernel.py | 6 +++---
tools/testing/kunit/mypy.ini | 6 ++++++
tools/testing/kunit/run_checks.py | 2 +-
3 files changed, 10 insertions(+), 4 deletions(-)
create mode 100644 tools/testing/kunit/mypy.ini
diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
index f01f94106129..7f648802caf6 100644
--- a/tools/testing/kunit/kunit_kernel.py
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -92,7 +92,7 @@ class LinuxSourceTreeOperations:
if stderr: # likely only due to build warnings
print(stderr.decode())
- def start(self, params: List[str], build_dir: str) -> subprocess.Popen[str]:
+ def start(self, params: List[str], build_dir: str) -> subprocess.Popen:
raise RuntimeError('not implemented!')
@@ -113,7 +113,7 @@ class LinuxSourceTreeOperationsQemu(LinuxSourceTreeOperations):
kconfig.merge_in_entries(base_kunitconfig)
return kconfig
- def start(self, params: List[str], build_dir: str) -> subprocess.Popen[str]:
+ def start(self, params: List[str], build_dir: str) -> subprocess.Popen:
kernel_path = os.path.join(build_dir, self._kernel_path)
qemu_command = ['qemu-system-' + self._qemu_arch,
'-nodefaults',
@@ -142,7 +142,7 @@ class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations):
kconfig.merge_in_entries(base_kunitconfig)
return kconfig
- def start(self, params: List[str], build_dir: str) -> subprocess.Popen[str]:
+ def start(self, params: List[str], build_dir: str) -> subprocess.Popen:
"""Runs the Linux UML binary. Must be named 'linux'."""
linux_bin = os.path.join(build_dir, 'linux')
params.extend(['mem=1G', 'console=tty', 'kunit_shutdown=halt'])
diff --git a/tools/testing/kunit/mypy.ini b/tools/testing/kunit/mypy.ini
new file mode 100644
index 000000000000..ddd288309efa
--- /dev/null
+++ b/tools/testing/kunit/mypy.ini
@@ -0,0 +1,6 @@
+[mypy]
+strict = True
+
+# E.g. we can't write subprocess.Popen[str] until Python 3.9+.
+# But kunit.py tries to support Python 3.7+, so let's disable it.
+disable_error_code = type-arg
diff --git a/tools/testing/kunit/run_checks.py b/tools/testing/kunit/run_checks.py
index 8208c3b3135e..c6d494ea3373 100755
--- a/tools/testing/kunit/run_checks.py
+++ b/tools/testing/kunit/run_checks.py
@@ -23,7 +23,7 @@ commands: Dict[str, Sequence[str]] = {
'kunit_tool_test.py': ['./kunit_tool_test.py'],
'kunit smoke test': ['./kunit.py', 'run', '--kunitconfig=lib/kunit', '--build_dir=kunit_run_checks'],
'pytype': ['/bin/sh', '-c', 'pytype *.py'],
- 'mypy': ['mypy', '--strict', '--exclude', '_test.py$', '--exclude', 'qemu_configs/', '.'],
+ 'mypy': ['mypy', '--config-file', 'mypy.ini', '--exclude', '_test.py$', '--exclude', 'qemu_configs/', '.'],
}
# The user might not have mypy or pytype installed, skip them if so.
base-commit: a42077b787680cbc365a96446b30f32399fa3f6f
--
2.40.1.495.gc816e09b53d-goog
Events Tracing infrastructure contains lot of files, directories
(internally in terms of inodes, dentries). And ends up by consuming
memory in MBs. We can have multiple events of Events Tracing, which
further requires more memory.
Instead of creating inodes/dentries, eventfs could keep meta-data and
skip the creation of inodes/dentries. As and when require, eventfs will
create the inodes/dentries only for required files/directories.
Also eventfs would delete the inodes/dentries once no more requires
but preserve the meta data.
Tracing events took ~9MB, with this approach it took ~4.5MB
for ~10K files/dir.
v2:
Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM'
Patch 02: moved from v1 1/9
Patch 03: moved from v1 2/9
As suggested by Zheng Yejian, introduced eventfs_prepare_ef()
helper function to add files or directories to eventfs
fix WARNING reported by kernel test robot in v1 8/9
Patch 04: moved from v1 3/9
used eventfs_prepare_ef() to add files
fix WARNING reported by kernel test robot in v1 8/9
Patch 05: moved from v1 4/9
fix compiling warning reported by kernel test robot in v1 4/9
Patch 06: moved from v1 5/9
Patch 07: moved from v1 6/9
Patch 08: moved from v1 7/9
Patch 09: moved from v1 8/9
rebased because of v3 01/10
Patch 10: moved from v1 9/9
v1:
Patch 1: add header file
Patch 2: resolved kernel test robot issues
protecting eventfs lists using nested eventfs_rwsem
Patch 3: protecting eventfs lists using nested eventfs_rwsem
Patch 4: improve events cleanup code to fix crashes
Patch 5: resolved kernel test robot issues
removed d_instantiate_anon() calls
Patch 6: resolved kernel test robot issues
fix kprobe test in eventfs_root_lookup()
protecting eventfs lists using nested eventfs_rwsem
Patch 7: remove header file
Patch 8: pass eventfs_rwsem as argument to eventfs functions
called eventfs_remove_events_dir() instead of tracefs_remove()
from event_trace_del_tracer()
Patch 9: new patch to fix kprobe test case
fs/tracefs/Makefile | 1 +
fs/tracefs/event_inode.c | 757 ++++++++++++++++++
fs/tracefs/inode.c | 124 ++-
fs/tracefs/internal.h | 25 +
include/linux/trace_events.h | 1 +
include/linux/tracefs.h | 49 ++
kernel/trace/trace.h | 3 +-
kernel/trace/trace_events.c | 78 +-
.../ftrace/test.d/kprobe/kprobe_args_char.tc | 4 +-
.../test.d/kprobe/kprobe_args_string.tc | 4 +-
10 files changed, 994 insertions(+), 52 deletions(-)
create mode 100644 fs/tracefs/event_inode.c
create mode 100644 fs/tracefs/internal.h
--
2.40.0
Hi, Willy
Here is the v2 of our old patchset about test report [1].
The trailing '\r' fixup has been merged, so, here only resend the left
parts with an additional patch to restore the failed tests print.
This patchset is rebased on the dev.2023.06.14a branch of linux-rcu [2].
Tests have passed for 'x86 run':
138 test(s) passed, 0 skipped, 0 failed.
See all results in /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/run.out
Also did 'run-user' for x86, mips and arm64.
Changes from v1 -> v2:
1. selftests/nolibc: add a standalone test report macro
As Willy pointed out, the old method with additional test-report
target not work in 'make -j'.
A new macro is added to share the same report logic among the
run-user, run and rerun targets, the path to test log file is
2. selftests/nolibc: always print the path to test log file
Always print the path to test log file, but move it to a new line to
avoid annoying people when the test pass without any failures.
3. selftests/nolibc: restore the failed tests print
Restore printing of the failed tests to avoid manually opening
the test log file when there are really failues.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1685936428.git.falcon@tinylab.org/
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/
Zhangjin Wu (3):
selftests/nolibc: add a standalone test report macro
selftests/nolibc: always print the path to test log file
selftests/nolibc: restore the failed tests print
tools/testing/selftests/nolibc/Makefile | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
--
2.25.1
Hi, Willy
This is the revision of the v4 part2 of support for rv32 [1], this
further split the generic KARCH code out of the old rv32 compile patch
and also add kernel specific KARCH and nolibc specific NARCH for
tools/include/nolibc/Makefile too.
This is rebased on the dev.2023.06.14a branch of linux-rcu repo [2] with
basic run-user and run tests.
Changes from v4 -> v5:
* selftests/nolibc: allow customize kernel specific ARCH variable
The KARCH customize support part splitted out of the old rv32 compile
patch and removed the one passed to tools/include/nolibc/Makefile.
* tools/nolibc: add kernel and nolibc specific ARCH variables
Pass original ARCH to tools/include/nolibc/Makefile, add KARCH and
NARCH for kernel and nolibc respectively.
* selftests/nolibc: riscv: customize makefile for rv32
Now, it is rv32 specific, no generic code.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/linux-riscv/cover.1686128703.git.falcon@tinylab.org/
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/
Zhangjin Wu (5):
tools/nolibc: fix up #error compile failures with -ENOSYS
tools/nolibc: fix up undeclared syscall macros with #ifdef and -ENOSYS
selftests/nolibc: allow customize kernel specific ARCH variable
tools/nolibc: add kernel and nolibc specific ARCH variables
selftests/nolibc: riscv: customize makefile for rv32
tools/include/nolibc/Makefile | 18 +++++++++---
tools/include/nolibc/sys.h | 38 ++++++++++++++++---------
tools/testing/selftests/nolibc/Makefile | 18 ++++++++++--
3 files changed, 55 insertions(+), 19 deletions(-)
--
2.25.1
Hi,
This patchset further improves porting of nolibc to new architectures,
it is based on our previous v5 sysret helper series [1].
It mainly shrinks the assembly _start by moving most of its operations
to a C version of _start_c() function. and also, it removes the old
sys_stat() support by using the sys_statx() instead and therefore,
removes all of the arch specific sys_stat_struct.
Tested 'run' on all of the supported architectures:
arch/board | result
------------|------------
arm/vexpress-a9 | 141 test(s) passed, 1 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/arm-vexpress-a9-nolibc-test.log
arm/virt | 141 test(s) passed, 1 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/arm-virt-nolibc-test.log
aarch64/virt | 141 test(s) passed, 1 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/aarch64-virt-nolibc-test.log
ppc/g3beige | not supported
ppc/ppce500 | not supported
i386/pc | 141 test(s) passed, 1 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/i386-pc-nolibc-test.log
x86_64/pc | 141 test(s) passed, 1 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/x86_64-pc-nolibc-test.log
mipsel/malta | 141 test(s) passed, 1 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/mipsel-malta-nolibc-test.log
loongarch64/virt | 141 test(s) passed, 1 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/loongarch64-virt-nolibc-test.log
riscv64/virt | 141 test(s) passed, 1 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/riscv64-virt-nolibc-test.log
riscv32/virt | 119 test(s) passed, 1 skipped, 22 failed. See all results in /labs/linux-lab/logging/nolibc/riscv32-virt-nolibc-test.log
s390x/s390-ccw-virtio | 141 test(s) passed, 1 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/s390x-s390-ccw-virtio-nolibc-test.log
Notes:
- ppc support are ready locally, will be sent out later.
- full riscv32/virt support are ready locally, will be sent out later.
Changes:
* tools/nolibc: remove old arch specific stat support
Just like the __NR_statx we used in nolibc-test.c, Let's only
reserve sys_statx() and use it to implement the stat() function.
Remove the old sys_stat() and sys_stat_struct completely.
* tools/nolibc: add new crt.h with _start_c
A new C version of _start_c() is added to only require a 'sp' argument
and find the others (argc, argv, envp/environ, auxv) for us in C.
* tools/nolibc: include crt.h before arch.h
Include crt.h before arch.h to let _start() be able to call the new
added _start_c() in arch-<ARCH>.h.
* tools/nolibc: arm: shrink _start with _start_c
tools/nolibc: aarch64: shrink _start with _start_c
tools/nolibc: i386: shrink _start with _start_c
tools/nolibc: x86_64: shrink _start with _start_c
tools/nolibc: mips: shrink _start with _start_c
tools/nolibc: loongarch: shrink _start with _start_c
tools/nolibc: riscv: shrink _start with _start_c
tools/nolibc: s390: shrink _start with _start_c
Move most of the operations from the assembly _start() to the C
_start_c(), only require to do minimal operations in assembly _start
now.
With this patchset, porting nolibc to a new architecture become easier,
the powerpc porting will be added later.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1687957589.git.falcon@tinylab.org/
Zhangjin Wu (11):
tools/nolibc: remove old arch specific stat support
tools/nolibc: add new crt.h with _start_c
tools/nolibc: include crt.h before arch.h
tools/nolibc: arm: shrink _start with _start_c
tools/nolibc: aarch64: shrink _start with _start_c
tools/nolibc: i386: shrink _start with _start_c
tools/nolibc: x86_64: shrink _start with _start_c
tools/nolibc: mips: shrink _start with _start_c
tools/nolibc: loongarch: shrink _start with _start_c
tools/nolibc: riscv: shrink _start with _start_c
tools/nolibc: s390: shrink _start with _start_c
tools/include/nolibc/Makefile | 1 +
tools/include/nolibc/arch-aarch64.h | 53 ++----------------
tools/include/nolibc/arch-arm.h | 79 ++-------------------------
tools/include/nolibc/arch-i386.h | 58 +++-----------------
tools/include/nolibc/arch-loongarch.h | 42 ++------------
tools/include/nolibc/arch-mips.h | 73 +++----------------------
tools/include/nolibc/arch-riscv.h | 65 ++--------------------
tools/include/nolibc/arch-s390.h | 60 ++------------------
tools/include/nolibc/arch-x86_64.h | 54 ++----------------
tools/include/nolibc/crt.h | 57 +++++++++++++++++++
tools/include/nolibc/nolibc.h | 1 +
tools/include/nolibc/signal.h | 1 +
tools/include/nolibc/stdio.h | 1 +
tools/include/nolibc/stdlib.h | 1 +
tools/include/nolibc/sys.h | 64 ++++------------------
tools/include/nolibc/time.h | 1 +
tools/include/nolibc/types.h | 4 +-
tools/include/nolibc/unistd.h | 1 +
18 files changed, 122 insertions(+), 494 deletions(-)
create mode 100644 tools/include/nolibc/crt.h
--
2.25.1
The kernel cmdline option panic_on_warn expects an integer, it is not a
plain option as documented. A number of uses in the tree figured this
already, and use panic_on_warn=1 for their purpose.
Adjust a comment which otherwise may mislead people in the future.
Fixes: 9e3961a097 ("kernel: add panic_on_warn")
Signed-off-by: Olaf Hering <olaf(a)aepfle.de>
---
Documentation/admin-guide/kernel-parameters.txt | 2 +-
tools/testing/selftests/rcutorture/bin/kvm.sh | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9e5bab29685f..15196f84df49 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4049,7 +4049,7 @@
extra details on the taint flags that users can pick
to compose the bitmask to assign to panic_on_taint.
- panic_on_warn panic() instead of WARN(). Useful to cause kdump
+ panic_on_warn=1 panic() instead of WARN(). Useful to cause kdump
on a WARN().
parkbd.port= [HW] Parallel port number the keyboard adapter is
diff --git a/tools/testing/selftests/rcutorture/bin/kvm.sh b/tools/testing/selftests/rcutorture/bin/kvm.sh
index 62f3b0f56e4d..d3cdc2d33d4b 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm.sh
@@ -655,4 +655,4 @@ fi
# Control buffer size: --bootargs trace_buf_size=3k
# Get trace-buffer dumps on all oopses: --bootargs ftrace_dump_on_oops
# Ditto, but dump only the oopsing CPU: --bootargs ftrace_dump_on_oops=orig_cpu
-# Heavy-handed way to also dump on warnings: --bootargs panic_on_warn
+# Heavy-handed way to also dump on warnings: --bootargs panic_on_warn=1
Hi, Thomas, David, Willy
Thanks very much for your kindly review.
This is the revision of v3 "tools/nolibc: add a new syscall helper" [1],
this mainly applies the suggestion from David in this reply [2] and
rebased everything on the dev.2023.06.14a branch of linux-rcu [3].
The old __sysret() doesn't support the syscalls with pointer return
value, this revision now supports such syscalls. The left mmap() syscall
is converted to use this new __sysret() with additional test cases.
Changes from v3 -> v4:
* tools/nolibc: sys.h: add a syscall return helper
tools/nolibc: unistd.h: apply __sysret() helper
tools/nolibc: sys.h: apply __sysret() helper
The original v3 series, no code change, except the Reviewed-by lines
from Thomas.
* tools/nolibc: unistd.h: reorder the syscall macros
reorder the syscall macros in using order and align most of them.
* tools/nolibc: add missing my_syscall6() for mips
required by mmap() syscall, this is the last missing my_syscall6().
* tools/nolibc: __sysret: support syscalls who return a pointer
Apply suggestion from David.
Let __sysret() also supports syscalls with pointer return value, so, the
return value is converted to unsigned long and the comparing of < 0 is
converted to the comparing of [(unsigned long)-MAX_ERRNO, (unsigned long)-1].
This also allows return a huge value (not pointer) with highest bit as 1.
It is able to merge this one to the first one if necessary.
* tools/nolibc: clean up mmap() support
Apply new __sysret(), clean up #ifdef and some macros.
* selftests/nolibc: add EXPECT_PTREQ, EXPECT_PTRNE and EXPECT_PTRER
selftests/nolibc: add sbrk_0 to test current brk getting
selftests/nolibc: add mmap and munmap test cases
Add some mmap & munmap test cases and the corresponding helpers, to
verify one of the new helpers, a sbrk_0 test case is also added.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/linux-riscv/87e7a391-b97b-4001-b12a-76d20790563e@t-…
[2]: https://lore.kernel.org/linux-riscv/94dd5170929f454fbc0a10a2eb3b108d@AcuMS.…
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/
Zhangjin Wu (10):
tools/nolibc: sys.h: add a syscall return helper
tools/nolibc: unistd.h: apply __sysret() helper
tools/nolibc: sys.h: apply __sysret() helper
tools/nolibc: unistd.h: reorder the syscall macros
tools/nolibc: add missing my_syscall6() for mips
tools/nolibc: __sysret: support syscalls who return a pointer
tools/nolibc: clean up mmap() support
selftests/nolibc: add EXPECT_PTREQ, EXPECT_PTRNE and EXPECT_PTRER
selftests/nolibc: add sbrk_0 to test current brk getting
selftests/nolibc: add mmap and munmap test cases
tools/include/nolibc/arch-mips.h | 26 ++
tools/include/nolibc/nolibc.h | 9 +-
tools/include/nolibc/sys.h | 391 +++----------------
tools/include/nolibc/types.h | 11 +
tools/include/nolibc/unistd.h | 13 +-
tools/testing/selftests/nolibc/nolibc-test.c | 90 +++++
6 files changed, 191 insertions(+), 349 deletions(-)
--
2.25.1
When running Kselftests with the current selftests/net/config
the following problem can be seen with the net:xfrm_policy.sh
selftest:
# selftests: net: xfrm_policy.sh
[ 41.076721] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[ 41.094787] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[ 41.107635] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
# modprobe: FATAL: Module ip_tables not found in directory /lib/modules/6.1.36
# iptables v1.8.7 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
# Perhaps iptables or your kernel needs to be upgraded.
# modprobe: FATAL: Module ip_tables not found in directory /lib/modules/6.1.36
# iptables v1.8.7 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
# Perhaps iptables or your kernel needs to be upgraded.
# SKIP: Could not insert iptables rule
ok 1 selftests: net: xfrm_policy.sh # SKIP
This is because IPsec "policy" match support is not available
to the kernel.
This patch adds CONFIG_NETFILTER_XT_MATCH_POLICY as a module
to the selftests/net/config file, so that `make
kselftest-merge` can take this into consideration.
Signed-off-by: Daniel Díaz <daniel.diaz(a)linaro.org>
---
tools/testing/selftests/net/config | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config
index d1d421ec10a3..cd3cc52c59b4 100644
--- a/tools/testing/selftests/net/config
+++ b/tools/testing/selftests/net/config
@@ -50,3 +50,4 @@ CONFIG_CRYPTO_SM4_GENERIC=y
CONFIG_AMT=m
CONFIG_VXLAN=m
CONFIG_IP_SCTP=m
+CONFIG_NETFILTER_XT_MATCH_POLICY=m
--
2.34.1
From: Björn Töpel <bjorn(a)rivosinc.com>
When you're cross-building kselftest, in this case RISC-V:
| make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- O=/tmp/kselftest \
| HOSTCC=gcc FORMAT= SKIP_TARGETS="arm64 ia64 powerpc sparc64 x86 \
| sgx" -C tools/testing/selftests gen_tar
the components (paths) that fail to build are skipped. In this case,
openat2 failed due to missing library support, and proc due to an
x86-64 only test.
This tiny series addresses the problems above.
Björn
Björn Töpel (2):
selftests/openat2: Run-time check for -fsanitize=undefined
selftests/proc: Do not build x86-64 tests on non-x86-64 builds
tools/testing/selftests/openat2/Makefile | 9 ++++++++-
tools/testing/selftests/proc/Makefile | 4 ++++
2 files changed, 12 insertions(+), 1 deletion(-)
base-commit: 3a8a670eeeaa40d87bd38a587438952741980c18
--
2.39.2
Hi, all
Thanks very much for your review suggestions of the v1 series [1], we
just sent out the generic part1 [2], and here is the part2 of the whole
v2 revision.
Changes from v1 -> v2:
* Don't emulate the return values in the new syscalls path, fix up or
support the new syscalls in the side of the related test cases (1-3)
selftests/nolibc: remove gettimeofday_bad1/2 completely
selftests/nolibc: support two errnos with EXPECT_SYSER2()
selftests/nolibc: waitpid_min: add waitid syscall support
(Review suggestions from Willy and Thomas)
* Fix up new failure of the state_timestamps test case (4, new)
tools/nolibc: add missing nanoseconds support for __NR_statx
(Fixes for the commit a89c937d781a ("tools/nolibc: support nanoseconds in stat()")
* Add new waitstatus macros as a standalone patch for the waitid support (5)
tools/nolibc: add more wait status related types
(Split and Cleanup for the waitid syscall based sys_wait4)
* Pure 64bit lseek and time64 select/poll/gettimeofday support (6-11)
tools/nolibc: add pure 64bit off_t, time_t and blkcnt_t
tools/nolibc: sys_lseek: add pure 64bit lseek
tools/nolibc: add pure 64bit time structs
tools/nolibc: sys_select: add pure 64bit select
tools/nolibc: sys_poll: add pure 64bit poll
tools/nolibc: sys_gettimeofday: add pure 64bit gettimeofday
(Review suggestions from Arnd, Thomas and Willy, time32 variants have
been removed completely and some fixups)
* waitid syscall support cleanup (12)
tools/nolibc: sys_wait4: add waitid syscall support
(Sync with the waitstatus macros update and Removal of emulated code)
* rv32 nolibc-test support, commit message update (13)
selftests/nolibc: riscv: customize makefile for rv32
(Review suggestions from Thomas, explain more about the change logic in commit message)
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/linux-riscv/20230529113143.GB2762@1wt.eu/T/#t
[2]: https://lore.kernel.org/linux-riscv/cover.1685362482.git.falcon@tinylab.org/
Zhangjin Wu (13):
selftests/nolibc: remove gettimeofday_bad1/2 completely
selftests/nolibc: support two errnos with EXPECT_SYSER2()
selftests/nolibc: waitpid_min: add waitid syscall support
tools/nolibc: add missing nanoseconds support for __NR_statx
tools/nolibc: add more wait status related types
tools/nolibc: add pure 64bit off_t, time_t and blkcnt_t
tools/nolibc: sys_lseek: add pure 64bit lseek
tools/nolibc: add pure 64bit time structs
tools/nolibc: sys_select: add pure 64bit select
tools/nolibc: sys_poll: add pure 64bit poll
tools/nolibc: sys_gettimeofday: add pure 64bit gettimeofday
tools/nolibc: sys_wait4: add waitid syscall support
selftests/nolibc: riscv: customize makefile for rv32
tools/include/nolibc/arch-aarch64.h | 3 -
tools/include/nolibc/arch-loongarch.h | 3 -
tools/include/nolibc/arch-riscv.h | 3 -
tools/include/nolibc/std.h | 28 ++--
tools/include/nolibc/sys.h | 134 +++++++++++++++----
tools/include/nolibc/types.h | 58 +++++++-
tools/testing/selftests/nolibc/Makefile | 11 +-
tools/testing/selftests/nolibc/nolibc-test.c | 20 +--
8 files changed, 202 insertions(+), 58 deletions(-)
--
2.25.1
This extension allows to use F_UNLCK on query, which currently returns
EINVAL. Instead it can be used to query the locks on a particular fd -
something that is not currently possible. The basic idea is that on
F_OFD_GETLK, F_UNLCK would "conflict" with (or query) any types of the
lock on the same fd, and ignore any locks on other fds.
Use-cases:
1. CRIU-alike scenario when you want to read the locking info from an
fd for the later reconstruction. This can now be done by setting
l_start and l_len to 0 to cover entire file range, and do F_OFD_GETLK.
In the loop you need to advance l_start past the returned lock ranges,
to eventually collect all locked ranges.
2. Implementing the lock checking/enforcing policy.
Say you want to implement an "auditor" module in your program,
that checks that the I/O is done only after the proper locking is
applied on a file region. In this case you need to know if the
particular region is locked on that fd, and if so - with what type
of the lock. If you would do that currently (without this extension)
then you can only check for the write locks, and for that you need to
probe the lock on your fd and then open the same file via another fd and
probe there. That way you can identify the write lock on a particular
fd, but such trick is non-atomic and complex. As for finding out the
read lock on a particular fd - impossible.
This extension allows to do such queries without any extra efforts.
3. Implementing the mandatory locking policy.
Suppose you want to make a policy where the write lock inhibits any
unlocked readers and writers. Currently you need to check if the
write lock is present on some other fd, and if it is not there - allow
the I/O operation. But because the write lock can appear at any moment,
you need to do that under some global lock, which can be released only
when the I/O operation is finished.
With the proposed extension you can instead just check the write lock
on your own fd first, and if it is there - allow the I/O operation on
that fd without using any global lock. Only if there is no write lock
on this fd, then you need to take global lock and check for a write
lock on other fds.
The second patch adds a test-case for OFD locks.
It tests both the generic things and the proposed extension.
The third patch is a proposed man page update for fcntl(2)
(not for the linux source tree)
Changes in v3:
- Move selftest to selftests/filelock
Changes in v2:
- Dropped the l_pid extension patch and updated test-case accordingly.
Stas Sergeev (2):
fs/locks: F_UNLCK extension for F_OFD_GETLK
selftests: add OFD lock tests
fs/locks.c | 23 +++-
tools/testing/selftests/filelock/Makefile | 5 +
tools/testing/selftests/filelock/ofdlocks.c | 132 ++++++++++++++++++++
3 files changed, 157 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/filelock/Makefile
create mode 100644 tools/testing/selftests/filelock/ofdlocks.c
CC: Jeff Layton <jlayton(a)kernel.org>
CC: Chuck Lever <chuck.lever(a)oracle.com>
CC: Alexander Viro <viro(a)zeniv.linux.org.uk>
CC: Christian Brauner <brauner(a)kernel.org>
CC: linux-fsdevel(a)vger.kernel.org
CC: linux-kernel(a)vger.kernel.org
CC: Shuah Khan <shuah(a)kernel.org>
CC: linux-kselftest(a)vger.kernel.org
CC: linux-api(a)vger.kernel.org
--
2.39.2
Willy, Thomas
This is v3 to allow run with minimal kernel config, see v2 [1].
Applied further suggestions from Thomas, It is based on our previous v5
sysret helper series [2] and Thomas' chmod_net removal patchset [3].
Now, a test report on arm/vexpress-a9 without procfs, shmem, tmpfs, net
and memfd_create looks like:
LOG: testing report for arm/vexpress-a9:
14 chmod_self [SKIPPED]
16 chown_self [SKIPPED]
40 link_cross [SKIPPED]
0 -fstackprotector not supported [SKIPPED]
139 test(s) passed, 4 skipped, 0 failed.
See all results in /labs/linux-lab/logging/nolibc/arm-vexpress-a9-nolibc-test.log
LOG: testing summary:
arch/board | result
------------|------------
arm/vexpress-a9 | 139 test(s) passed, 4 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/arm-vexpress-a9-nolibc-test.log
Changes from v2 --> v3:
* Added Reviewed-by from Thomas for the whole series, Many Thanks
* selftests/nolibc: stat_fault: silence NULL argument warning with glibc
selftests/nolibc: gettid: restore for glibc and musl
selftests/nolibc: add _LARGEFILE64_SOURCE for musl
selftests/nolibc: fix up int_fast16/32_t test cases for musl
selftests/nolibc: fix up kernel parameters support
selftests/nolibc: stat_timestamps: remove procfs dependency
selftests/nolibc: link_cross: use /proc/self/cmdline
tools/nolibc: add rmdir() support
selftests/nolibc: add a new rmdir() test case
selftests/nolibc: fix up failures when CONFIG_PROC_FS=n
selftests/nolibc: vfprintf: remove MEMFD_CREATE dependency
No code changes except some commit message cleanups.
* selftests/nolibc: prepare /tmp for tmpfs or ramfs
As suggested by Thomas, simply calling mkdir() and mount() to
prepare /tmp can save a stat() call.
* selftests/nolibc: chroot_exe: remove procfs dependency
As suggested by Thomas, remove the 'weird' get_tmpfile() and use
the '/init' for !procfs as we did for stat_timestamps.
For the worst-case scene, when '/init' is not there, add ENOENT to
the error check list.
Now, it is a oneline code change.
* selftests/nolibc: add chmod_tmpdir test
Without get_tmpfile(), let's direct mkdir() a temp directory for
chmod_tmpdir test, it function as a substitute for the removed
chmod_net.
Now, it is a oneline code change.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1688078604.git.falcon@tinylab.org/
Zhangjin Wu (14):
selftests/nolibc: stat_fault: silence NULL argument warning with glibc
selftests/nolibc: gettid: restore for glibc and musl
selftests/nolibc: add _LARGEFILE64_SOURCE for musl
selftests/nolibc: fix up int_fast16/32_t test cases for musl
selftests/nolibc: fix up kernel parameters support
selftests/nolibc: stat_timestamps: remove procfs dependency
selftests/nolibc: chroot_exe: remove procfs dependency
selftests/nolibc: link_cross: use /proc/self/cmdline
tools/nolibc: add rmdir() support
selftests/nolibc: add a new rmdir() test case
selftests/nolibc: fix up failures when CONFIG_PROC_FS=n
selftests/nolibc: prepare /tmp for tmpfs or ramfs
selftests/nolibc: add chmod_tmpdir test
selftests/nolibc: vfprintf: remove MEMFD_CREATE dependency
tools/include/nolibc/sys.h | 22 ++++++
tools/testing/selftests/nolibc/nolibc-test.c | 83 +++++++++++++++-----
2 files changed, 87 insertions(+), 18 deletions(-)
--
2.25.1
This is the initial KUnit integration for running Rust documentation
tests within the kernel.
Thank you to the KUnit team for all the input and feedback on this
over the months, as well as the Intel LKP 0-Day team!
This may be merged through either the KUnit or the Rust trees. If
the KUnit team wants to merge it, then that would be great.
Please see the message in the main commit for the details.
Miguel Ojeda (6):
rust: init: make doctests compilable/testable
rust: str: make doctests compilable/testable
rust: sync: make doctests compilable/testable
rust: types: make doctests compilable/testable
rust: support running Rust documentation tests as KUnit ones
MAINTAINERS: add Rust KUnit files to the KUnit entry
MAINTAINERS | 2 +
lib/Kconfig.debug | 13 +++
rust/.gitignore | 2 +
rust/Makefile | 29 ++++++
rust/bindings/bindings_helper.h | 1 +
rust/helpers.c | 7 ++
rust/kernel/init.rs | 25 +++--
rust/kernel/kunit.rs | 156 ++++++++++++++++++++++++++++
rust/kernel/lib.rs | 2 +
rust/kernel/str.rs | 4 +-
rust/kernel/sync/arc.rs | 9 +-
rust/kernel/sync/lock/mutex.rs | 1 +
rust/kernel/sync/lock/spinlock.rs | 1 +
rust/kernel/types.rs | 6 +-
scripts/.gitignore | 2 +
scripts/Makefile | 4 +
scripts/rustdoc_test_builder.rs | 73 ++++++++++++++
scripts/rustdoc_test_gen.rs | 162 ++++++++++++++++++++++++++++++
18 files changed, 484 insertions(+), 15 deletions(-)
create mode 100644 rust/kernel/kunit.rs
create mode 100644 scripts/rustdoc_test_builder.rs
create mode 100644 scripts/rustdoc_test_gen.rs
base-commit: d2e3115d717197cb2bc020dd1f06b06538474ac3
--
2.41.0
TCP SYN/ACK packets of connections from processes/sockets outside a
cgroup on the same host are not received by the cgroup's installed
cgroup_skb filters.
There were two BPF cgroup_skb programs attached to a cgroup named
"my_cgroup".
SEC("cgroup_skb/ingress")
int ingress(struct __sk_buff *skb)
{
/* .... process skb ... */
return 1;
}
SEC("cgroup_skb/egress")
int egress(struct __sk_buff *skb)
{
/* .... process skb ... */
return 1;
}
We discovered that when running the command "nc -6 -l 8000" in
"my_group" and connecting to it from outside of "my_cgroup" with the
command "nc -6 localhost 8000", the egress filter did not detect the
SYN/ACK packet. However, we did observe the SYN/ACK packet at the
ingress when connecting from a socket in "my_cgroup" to a socket
outside of it.
We came across BPF_CGROUP_RUN_PROG_INET_EGRESS(). This macro is
responsible for calling BPF programs that are attached to the egress
hook of a cgroup and it skips programs if the sending socket is not the
owner of the skb. Specifically, in our situation, the SYN/ACK
skb is owned by a struct request_sock instance, but the sending
socket is the listener socket we use to receive incoming
connections. The request_sock is created to manage an incoming
connection.
It has been determined that checking the owner of a skb against
the sending socket is not required. Removing this check will allow the
filters to receive SYN/ACK packets.
To ensure that cgroup_skb filters can receive all signaling packets,
including SYN, SYN/ACK, ACK, FIN, and FIN/ACK. A new self-test has
been added as well.
Changes from v3:
- Check SKB ownership against full socket instead of just remove the
check.
- Address the issue raised by Yonghong.
- Put more details down in the commit message.
Changes from v2:
- Remove redundant blank lines.
Changes from v1:
- Check the number of observed packets instead of just sleeping.
- Use ASSERT_XXX() instead of CHECK()/
[v1] https://lore.kernel.org/all/20230612191641.441774-1-kuifeng@meta.com/
[v2] https://lore.kernel.org/all/20230617052756.640916-2-kuifeng@meta.com/
[v3] https://lore.kernel.org/all/20230620171409.166001-1-kuifeng@meta.com/
Kui-Feng Lee (2):
net: bpf: Check SKB ownership against full socket.
selftests/bpf: Verify that the cgroup_skb filters receive expected
packets.
include/linux/bpf-cgroup.h | 4 +-
tools/testing/selftests/bpf/cgroup_helpers.c | 12 +
tools/testing/selftests/bpf/cgroup_helpers.h | 1 +
tools/testing/selftests/bpf/cgroup_tcp_skb.h | 35 ++
.../selftests/bpf/prog_tests/cgroup_tcp_skb.c | 402 ++++++++++++++++++
.../selftests/bpf/progs/cgroup_tcp_skb.c | 382 +++++++++++++++++
6 files changed, 834 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/bpf/cgroup_tcp_skb.h
create mode 100644 tools/testing/selftests/bpf/prog_tests/cgroup_tcp_skb.c
create mode 100644 tools/testing/selftests/bpf/progs/cgroup_tcp_skb.c
--
2.34.1
Willy, Thomas
This is v2 to allow run with minimal kernel config, see v1 [1].
It mainly applied the suggestions from Thomas. It is based on our
previous v5 sysret helper series [2] and Thomas' chmod_net removal
patchset [3].
Now, a test report on arm/vexpress-a9 without procfs, shmem, tmpfs, net
and memfd_create looks like:
LOG: testing report for arm/vexpress-a9:
14 chmod_net [SKIPPED]
15 chmod_self [SKIPPED]
17 chown_self [SKIPPED]
41 link_cross [SKIPPED]
0 -fstackprotector not supported [SKIPPED]
139 test(s) passed, 5 skipped, 0 failed.
See all results in /labs/linux-lab/logging/nolibc/arm-vexpress-a9-nolibc-test.log
LOG: testing summary:
arch/board | result
------------|------------
arm/vexpress-a9 | 139 test(s) passed, 5 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/arm-vexpress-a9-nolibc-test.log
Changes from v1 --> v2:
* selftests/nolibc: stat_fault: silence NULL argument warning with glibc
selftests/nolibc: gettid: restore for glibc and musl
selftests/nolibc: add _LARGEFILE64_SOURCE for musl
selftests/nolibc: add a new rmdir() test case
selftests/nolibc: fix up failures when CONFIG_PROC_FS=n
The same as v1, only a few of commit message changes.
* selftests/nolibc: fix up int_fast16/32_t test cases for musl
Applied the method suggested by Thomas, two new macros are added to
get SINT_MAX_OF_TYPE(type) and SINT_MIN_OF_TYPE(type).
* selftests/nolibc: fix up kernel parameters support
After discuss with Thomas and with more tests, both of argv[1] and
NOLIBC_TEST environment variable should be verified to support
such kernel parameters:
NOLIBC_TEST=syscall
noapic NOLIBC_TEST=syscall
noapic
* selftests/nolibc: stat_timestamps: remove procfs dependency
Add '/init' and '/' for !procfs, don't skip it.
* selftests/nolibc: link_cross: use /proc/self/cmdline
Use /proc/self/cmdline instead of /proc/self/net, the ramfs based
/tmp/file doesn't work as expected (not really crossdev).
* tools/nolibc: add rmdir() support
Now, rebased on __sysret() from sysret helper patchset [2].
* selftests/nolibc: prepare /tmp for tmpfs or ramfs
Removed the hugetlbfs prepare part, not really required.
Don't remove /tmp and reserve it to use ramfs as tmpfs.
* selftests/nolibc: add common get_tmpfile()
selftests/nolibc: rename chroot_exe to chroot_tmpfile
Some cleanups.
* selftests/nolibc: add chmod_tmpfile test
To avoid conflict with Thomas' chmod_net removal patch [3], a new
chmod_tmpfile is added (in v1, there is a rename patch from
chmod_net to chmod_good)
Still to avoid conflict, these two are removed in this series:
- selftests/nolibc: rename proc variable to has_proc
- selftests/nolibc: rename euid0 variable to is_root
* selftests/nolibc: vfprintf: remove MEMFD_CREATE dependency
Many checks are removed, only reserve the direct tmpfs access
version.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1687344643.git.falcon@tinylab.org/
[2]: https://lore.kernel.org/lkml/cover.1687976753.git.falcon@tinylab.org/
[3]: https://lore.kernel.org/lkml/20230624-proc-net-setattr-v1-0-73176812adee@we…
Zhangjin Wu (15):
selftests/nolibc: stat_fault: silence NULL argument warning with glibc
selftests/nolibc: gettid: restore for glibc and musl
selftests/nolibc: add _LARGEFILE64_SOURCE for musl
selftests/nolibc: fix up int_fast16/32_t test cases for musl
selftests/nolibc: fix up kernel parameters support
selftests/nolibc: stat_timestamps: remove procfs dependency
selftests/nolibc: link_cross: use /proc/self/cmdline
tools/nolibc: add rmdir() support
selftests/nolibc: add a new rmdir() test case
selftests/nolibc: fix up failures when CONFIG_PROC_FS=n
selftests/nolibc: prepare /tmp for tmpfs or ramfs
selftests/nolibc: add common get_tmpfile()
selftests/nolibc: rename chroot_exe to chroot_tmpfile
selftests/nolibc: add chmod_tmpfile test
selftests/nolibc: vfprintf: remove MEMFD_CREATE dependency
tools/include/nolibc/sys.h | 22 ++++
tools/testing/selftests/nolibc/nolibc-test.c | 102 +++++++++++++++----
2 files changed, 106 insertions(+), 18 deletions(-)
--
2.25.1
Hi,
This patch series introduces two tests to further enhance and
verify the functionality of the KVM subsystem. These tests focus
on MSR_IA32_DS_AREA and MSR_IA32_PERF_CAPABILITIES.
The first patch adds tests to verify the correct behavior when
trying to set MSR_IA32_DS_AREA with a non-classical address. It
checks that KVM is correctly faulting these non-classical addresses,
ensuring the accuracy and stability of the KVM subsystem.
The second patch includes a comprehensive PEBS test that checks all
possible combinations of PEBS-related bits in MSR_IA32_PERF_CAPABILITIES.
This helps to ensure the accuracy of PEBS functionality.
Feedback and suggestions are welcomed and appreciated.
Sincerely,
Jinrong Liang
Jinrong Liang (2):
KVM: selftests: Test consistency of setting MSR_IA32_DS_AREA
KVM: selftests: Add PEBS test for MSR_IA32_PERF_CAPABILITIES
.../selftests/kvm/x86_64/vmx_pmu_caps_test.c | 171 ++++++++++++++++++
1 file changed, 171 insertions(+)
base-commit: 31b4fc3bc64aadd660c5bfa5178c86a7ba61e0f7
--
2.31.1
From: Jeff Xu <jeffxu(a)google.com>
When sysctl vm.memfd_noexec is 2 (MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED),
memfd_create(.., MFD_EXEC) should fail.
This complies with how MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED is
defined - "memfd_create() without MFD_NOEXEC_SEAL will be rejected"
Thanks to Dominique Martinet <asmadeus(a)codewreck.org> who reported the bug.
see [1] for context.
[1] https://lore.kernel.org/linux-mm/CABi2SkXUX_QqTQ10Yx9bBUGpN1wByOi_=gZU6WEy5…
Jeff Xu (2):
mm/memfd: sysctl: fix MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED
selftests/memfd: sysctl: fix MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED
mm/memfd.c | 48 +++++++++++-----------
tools/testing/selftests/memfd/memfd_test.c | 5 +++
2 files changed, 30 insertions(+), 23 deletions(-)
--
2.41.0.255.g8b1d071c50-goog
From: Jeff Xu <jeffxu(a)google.com>
Add documentation for sysctl vm.memfd_noexec
Link:https://lore.kernel.org/linux-mm/CABi2SkXUX_QqTQ10Yx9bBUGpN1wByOi_=gZU…
Reported-by: Dominique Martinet <asmadeus(a)codewreck.org>
Signed-off-by: Jeff Xu <jeffxu(a)google.com>
---
Documentation/admin-guide/sysctl/vm.rst | 30 +++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 45ba1f4dc004..621588041a9e 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -424,6 +424,36 @@ e.g., up to one or two maps per allocation.
The default value is 65530.
+memfd_noexec:
+=============
+This pid namespaced sysctl controls memfd_create().
+
+The new MFD_NOEXEC_SEAL and MFD_EXEC flags of memfd_create() allows
+application to set executable bit at creation time.
+
+When MFD_NOEXEC_SEAL is set, memfd is created without executable bit
+(mode:0666), and sealed with F_SEAL_EXEC, so it can't be chmod to
+be executable (mode: 0777) after creation.
+
+when MFD_EXEC flag is set, memfd is created with executable bit
+(mode:0777), this is the same as the old behavior of memfd_create.
+
+The new pid namespaced sysctl vm.memfd_noexec has 3 values:
+0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
+ MFD_EXEC was set.
+1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
+ MFD_NOEXEC_SEAL was set.
+2: memfd_create() without MFD_NOEXEC_SEAL will be rejected.
+
+The default value is 0.
+
+Once set, it can't be downgraded at runtime, i.e. 2=>1, 1=>0
+are denied.
+
+This is pid namespaced sysctl, child processes inherit the parent
+process's memfd_noexec at the time of fork. Changes to the parent
+process after fork are not automatically propagated to the child
+process.
memory_failure_early_kill:
==========================
--
2.41.0.255.g8b1d071c50-goog
Hi,
This patch series aims to improve the PMU event filter settings with a cleaner
and more organized structure and adds several test cases related to PMU event
filters.
The first patch of this series introduces a custom "__kvm_pmu_event_filter"
structure that simplifies the event filter setup and improves overall code
readability and maintainability.
The second patch adds test cases to check that unsupported input values in the
PMU event filters are rejected, covering unsupported "action" values,
unsupported "flags" values, and unsupported "nevents" values, as well as the
setting of non-existent fixed counters in the fixed bitmap.
The third patch includes tests for the PMU event filter's behavior when applied
to fixed performance counters, ensuring the correct operation in cases where no
fixed counters exist (e.g., Intel guest PMU version=1 or AMD guest).
Finally, the fourth patch adds a test to verify that setting both generic and
fixed performance event filters does not impact the consistency of the fixed
performance filter behavior.
These changes help to ensure that KVM's PMU event filter functions as expected
in all supported use cases. These patches have been tested and verified to
function properly.
Any feedback or suggestions are greatly appreciated.
Please note that following patches should be applied before this patch series:
https://lore.kernel.org/kvm/20230530134248.23998-2-cloudliang@tencent.comhttps://lore.kernel.org/kvm/20230530134248.23998-3-cloudliang@tencent.com
This will ensure that macro definitions such as X86_INTEL_MAX_FIXED_CTR_NUM,
INTEL_PMC_IDX_FIXED, etc. can be used.
Sincerely,
Jinrong Liang
Changes log:
v3:
- Rebased to 31b4fc3bc64a(tag: kvm-x86-next-2023.06.02).
- Dropped the patch "KVM: selftests: Replace int with uint32_t for nevents". (Sean)
- Dropped the patch "KVM: selftests: Test pmu event filter with incompatible
kvm_pmu_event_filter". (Sean)
- Introduce __kvm_pmu_event_filter to replace the original method of creating
PMU event filters. (Sean)
- Use the macro definition of kvm_cpu_property to find the number of supported
fixed counters instead of calculating it via the vcpu's cpuid. (Sean)
- Remove the wrappers that are single line passthroughs. (Sean)
- Optimize function names and variable names. (Sean)
- Optimize comments to make them more rigorous. (Sean)
v2:
- Wrap the code from the documentation in a block of code. (Bagas Sanjaya)
v1:
https://lore.kernel.org/kvm/20230414110056.19665-1-cloudliang@tencent.com
Jinrong Liang (4):
KVM: selftests: Introduce __kvm_pmu_event_filter to improved event
filter settings
KVM: selftests: Test unavailable event filters are rejected
KVM: selftests: Check if event filter meets expectations on fixed
counters
KVM: selftests: Test gp event filters don't affect fixed event filters
.../kvm/x86_64/pmu_event_filter_test.c | 341 +++++++++++++-----
1 file changed, 246 insertions(+), 95 deletions(-)
base-commit: 31b4fc3bc64aadd660c5bfa5178c86a7ba61e0f7
prerequisite-patch-id: 909d42f185f596d6e5c5b48b33231c89fa5236e4
prerequisite-patch-id: ba0dd0f97d8db0fb6cdf2c7f1e3a60c206fc9784
--
2.31.1
Hi, Willy
This patchset mainly allows speed up the nolibc test with a minimal
kernel config.
As the nolibc supported architectures become more and more, the 'run'
test with DEFCONFIG may cost several hours, which is not friendly to
develop testing and even for release testing, so, smaller kernel configs
may be required, and firstly, we should let nolibc-test work with less
kernel config options, this patchset aims to this goal.
This patchset mainly remove the dependency from procfs, tmpfs, net and
memfd_create, many failures have been fixed up.
When CONFIG_TMPFS and CONFIG_SHMEM are disabled, kernel will provide a
ramfs based tmpfs (mm/shmem.c), it will be used as a choice to fix up
some failures and also allow skip less tests.
Besides, it also adds musl support, improves glibc support and fixes up
a kernel cmdline passing use case.
This is based on the dev.2023.06.14a branch of linux-rcu [1], all of the
supported architectures are tested (with local minimal configs, [5]
pasted the one for i386) without failures:
arch/board | result
------------|------------
arm/vexpress-a9 | 138 test(s) passed, 1 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/arm-vexpress-a9-nolibc-test.log
aarch64/virt | 138 test(s) passed, 1 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/aarch64-virt-nolibc-test.log
ppc/g3beige | not supported
i386/pc | 136 test(s) passed, 3 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/i386-pc-nolibc-test.log
x86_64/pc | 138 test(s) passed, 1 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/x86_64-pc-nolibc-test.log
mipsel/malta | 138 test(s) passed, 1 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/mipsel-malta-nolibc-test.log
loongarch64/virt | 138 test(s) passed, 1 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/loongarch64-virt-nolibc-test.log
riscv64/virt | 136 test(s) passed, 3 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/riscv64-virt-nolibc-test.log
riscv32/virt | no test log found
s390x/s390-ccw-virtio | 138 test(s) passed, 1 skipped, 0 failed. See all results in /labs/linux-lab/logging/nolibc/s390x-s390-ccw-virtio-nolibc-test.log
Notes:
* The skipped ones are -fstackprotector, chmod_self and chown_self
The -fstackprotector skip is due to gcc version.
chmod_self and chmod_self skips are due to procfs not enabled
* ppc/g3beige support is added locally, but not added in this patchset
will send ppc support as a new patchset, it depends on v2 test
report patchset [3] and the v5 rv32 support, require changes on
Makefile
* riscv32/virt support is still in review, see v5 rv32 support [4]
This patchset doesn't depends on any of my other nolibc patch series,
but the new rmdir() routine added in this patchset may be requird to
apply the __sysret() from our v4 syscall helper series [2] after that
series being merged, currently, we use the old method to let it compile
without any dependency.
Here explains all of the patches:
* selftests/nolibc: stat_fault: silence NULL argument warning with glibc
selftests/nolibc: gettid: restore for glibc and musl
selftests/nolibc: add _LARGEFILE64_SOURCE for musl
The above 3 patches adds musl compile support and improve glibc support.
It is able to build and run nolibc-test with musl libc now, but there
are some failures/skips due to the musl its own issues/requirements:
$ sudo ./nolibc-test | grep -E 'FAIL|SKIP'
8 sbrk = 1 ENOMEM [FAIL]
9 brk = -1 ENOMEM [FAIL]
46 limit_int_fast16_min = -2147483648 [FAIL]
47 limit_int_fast16_max = 2147483647 [FAIL]
49 limit_int_fast32_min = -2147483648 [FAIL]
50 limit_int_fast32_max = 2147483647 [FAIL]
0 -fstackprotector not supported [SKIPPED]
musl disabled sbrk and brk for some conflicts with its malloc and the
fast version of int types are defined in 32bit, which differs from nolibc
and glibc. musl reserved the sbrk(0) to allow get current brk, we
added a test for this in the v4 __sysret() helper series [2].
* selftests/nolibc: fix up kernel parameters support
kernel cmdline allows pass two types of parameters, one is without
'=', another is with '=', the first one is passed as init arguments,
the sencond one is passed as init environment variables.
Our nolibc-test prefer arguments to environment variables, this not
work when users add such parameters in the kernel cmdline:
noapic NOLIBC_TEST=syscall
So, this patch will verify the setting from arguments at first, if it
is no valid, will try the environment variables instead.
* selftests/nolibc: stat_timestamps: remove procfs dependency
Use '/' instead of /proc/self, or we can add a 'has_proc' condition
for this test case, but it is not that necessary to skip the whole
stat_timestamps tests for such a subtest binding to /proc/self.
Welcome suggestion from Thomas.
* tools/nolibc: add rmdir() support
selftests/nolibc: add a new rmdir() test case
rmdir() routine and test case are added for the coming requirement.
Note, if the __sysret() patchset [2] is applied before us, this patch
should be rebased on it and apply the __sysret() helper.
* selftests/nolibc: fix up failures when there is no procfs
call rmdir() to remove /proc completely to rework the checking of
/proc, before, the existing of /proc not means the procfs is really
mounted.
* selftests/nolibc: rename proc variable to has_proc
selftests/nolibc: rename euid0 variable to is_root
align with the has_gettid, has_xxx variables.
* selftests/nolibc: prepare tmpfs and hugetlbfs
selftests/nolibc: rename chmod_net to chmod_good
selftests/nolibc: link_cross: support tmpfs
selftests/nolibc: rename chroot_exe to chroot_file
use file from /tmp instead of file from /proc when there is no procfs
this avoid skipping the chmod_net, link_cross, chroot_exe tests
* selftests/nolibc: vfprintf: silence memfd_create() warning
selftests/nolibc: vfprintf: skip if neither tmpfs nor hugetlbfs
selftests/nolibc: vfprintf: support tmpfs and hugetlbfs
memfd_create from kernel >= v6.2 forcely warn on missing
MFD_NOEXEC_SEAL flag, the first one silence it with such flag, for
older kernels, use 0 flag as before.
since memfd_create() depends on TMPFS or HUGETLBFS, the second one
skip the whole vfprintf instead of simply fail if memfd_create() not
work.
the 3rd one futher try the ramfs based tmpfs even when memfd_create()
not work.
At last, let's simply discuss about the configs, I have prepared minimal
configs for all of the nolibc supported architectures but not sure where
should we put them, what about tools/testing/selftests/nolibc/configs ?
Thanks!
Best regards,
Zhangjin
---
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/
[2]: https://lore.kernel.org/linux-riscv/cover.1687187451.git.falcon@tinylab.org/
[3]: https://lore.kernel.org/lkml/cover.1687156559.git.falcon@tinylab.org/
[4]: https://lore.kernel.org/linux-riscv/cover.1687176996.git.falcon@tinylab.org/
[5]: https://pastebin.com/5jq0Vxbz
Zhangjin Wu (17):
selftests/nolibc: stat_fault: silence NULL argument warning with glibc
selftests/nolibc: gettid: restore for glibc and musl
selftests/nolibc: add _LARGEFILE64_SOURCE for musl
selftests/nolibc: fix up kernel parameters support
selftests/nolibc: stat_timestamps: remove procfs dependency
tools/nolibc: add rmdir() support
selftests/nolibc: add a new rmdir() test case
selftests/nolibc: fix up failures when there is no procfs
selftests/nolibc: rename proc variable to has_proc
selftests/nolibc: rename euid0 variable to is_root
selftests/nolibc: prepare tmpfs and hugetlbfs
selftests/nolibc: rename chmod_net to chmod_good
selftests/nolibc: link_cross: support tmpfs
selftests/nolibc: rename chroot_exe to chroot_file
selftests/nolibc: vfprintf: silence memfd_create() warning
selftests/nolibc: vfprintf: skip if neither tmpfs nor hugetlbfs
selftests/nolibc: vfprintf: support tmpfs and hugetlbfs
tools/include/nolibc/sys.h | 28 ++++
tools/testing/selftests/nolibc/nolibc-test.c | 132 +++++++++++++++----
2 files changed, 138 insertions(+), 22 deletions(-)
--
2.25.1
From: Jeff Xu <jeffxu(a)google.com>
Since Linux introduced the memfd feature, memfd have always had their
execute bit set, and the memfd_create() syscall doesn't allow setting
it differently.
However, in a secure by default system, such as ChromeOS, (where all
executables should come from the rootfs, which is protected by Verified
boot), this executable nature of memfd opens a door for NoExec bypass
and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm
process created a memfd to share the content with an external process,
however the memfd is overwritten and used for executing arbitrary code
and root escalation. [2] lists more VRP in this kind.
On the other hand, executable memfd has its legit use, runc uses memfd’s
seal and executable feature to copy the contents of the binary then
execute them, for such system, we need a solution to differentiate runc's
use of executable memfds and an attacker's [3].
To address those above, this set of patches add following:
1> Let memfd_create() set X bit at creation time.
2> Let memfd to be sealed for modifying X bit.
3> A new pid namespace sysctl: vm.memfd_noexec to control the behavior of
X bit.For example, if a container has vm.memfd_noexec=2, then
memfd_create() without MFD_NOEXEC_SEAL will be rejected.
4> A new security hook in memfd_create(). This make it possible to a new
LSM, which rejects or allows executable memfd based on its security policy.
Change history:
v8:
- Update ref bug in cover letter.
- Add Reviewed-by field.
- Remove security hook (security_memfd_create) patch, which will have
its own patch set in future.
v7:
- patch 2/6: remove #ifdef and MAX_PATH (memfd_test.c).
- patch 3/6: check capability (CAP_SYS_ADMIN) from userns instead of
global ns (pid_sysctl.h). Add a tab (pid_namespace.h).
- patch 5/6: remove #ifdef (memfd_test.c)
- patch 6/6: remove unneeded security_move_mount(security.c).
v6:https://lore.kernel.org/lkml/20221206150233.1963717-1-jeffxu@google.com/
- Address comment and move "#ifdef CONFIG_" from .c file to pid_sysctl.h
v5:https://lore.kernel.org/lkml/20221206152358.1966099-1-jeffxu@google.com/
- Pass vm.memfd_noexec from current ns to child ns.
- Fix build issue detected by kernel test robot.
- Add missing security.c
v3:https://lore.kernel.org/lkml/20221202013404.163143-1-jeffxu@google.com/
- Address API design comments in v2.
- Let memfd_create() to set X bit at creation time.
- A new pid namespace sysctl: vm.memfd_noexec to control behavior of X bit.
- A new security hook in memfd_create().
v2:https://lore.kernel.org/lkml/20220805222126.142525-1-jeffxu@google.com/
- address comments in V1.
- add sysctl (vm.mfd_noexec) to set the default file permissions of
memfd_create to be non-executable.
v1:https://lwn.net/Articles/890096/
[1] https://crbug.com/1305267
[2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20me…
[3] https://lwn.net/Articles/781013/
Daniel Verkamp (2):
mm/memfd: add F_SEAL_EXEC
selftests/memfd: add tests for F_SEAL_EXEC
Jeff Xu (3):
mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC
mm/memfd: Add write seals when apply SEAL_EXEC to executable memfd
selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC
include/linux/pid_namespace.h | 19 ++
include/uapi/linux/fcntl.h | 1 +
include/uapi/linux/memfd.h | 4 +
kernel/pid_namespace.c | 5 +
kernel/pid_sysctl.h | 59 ++++
mm/memfd.c | 56 +++-
mm/shmem.c | 6 +
tools/testing/selftests/memfd/fuse_test.c | 1 +
tools/testing/selftests/memfd/memfd_test.c | 341 ++++++++++++++++++++-
9 files changed, 489 insertions(+), 3 deletions(-)
create mode 100644 kernel/pid_sysctl.h
base-commit: eb7081409f94a9a8608593d0fb63a1aa3d6f95d8
--
2.39.0.rc1.256.g54fd8350bd-goog
From: sunliming <sunliming(a)kylinos.cn>
[ Upstream commit ba470eebc2f6c2f704872955a715b9555328e7d0 ]
User processes register name_args for events. If the same name but different
args event are registered. The trace outputs of second event are printed
as the first event. This is incorrect.
Return EADDRINUSE back to the user process if the same name but different args
event has being registered.
Link: https://lore.kernel.org/linux-trace-kernel/20230529032100.286534-1-sunlimin…
Signed-off-by: sunliming <sunliming(a)kylinos.cn>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Acked-by: Beau Belgrave <beaub(a)linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/trace/trace_events_user.c | 36 +++++++++++++++----
.../selftests/user_events/ftrace_test.c | 6 ++++
2 files changed, 36 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
index 625cab4b9d945..774d146c2c2ca 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -1274,6 +1274,8 @@ static int user_event_parse(struct user_event_group *group, char *name,
int index;
u32 key;
struct user_event *user;
+ int argc = 0;
+ char **argv;
/* Prevent dyn_event from racing */
mutex_lock(&event_mutex);
@@ -1281,13 +1283,35 @@ static int user_event_parse(struct user_event_group *group, char *name,
mutex_unlock(&event_mutex);
if (user) {
- *newuser = user;
- /*
- * Name is allocated by caller, free it since it already exists.
- * Caller only worries about failure cases for freeing.
- */
- kfree(name);
+ if (args) {
+ argv = argv_split(GFP_KERNEL, args, &argc);
+ if (!argv) {
+ ret = -ENOMEM;
+ goto error;
+ }
+
+ ret = user_fields_match(user, argc, (const char **)argv);
+ argv_free(argv);
+
+ } else
+ ret = list_empty(&user->fields);
+
+ if (ret) {
+ *newuser = user;
+ /*
+ * Name is allocated by caller, free it since it already exists.
+ * Caller only worries about failure cases for freeing.
+ */
+ kfree(name);
+ } else {
+ ret = -EADDRINUSE;
+ goto error;
+ }
+
return 0;
+error:
+ refcount_dec(&user->refcnt);
+ return ret;
}
index = find_first_zero_bit(group->page_bitmap, MAX_EVENTS);
diff --git a/tools/testing/selftests/user_events/ftrace_test.c b/tools/testing/selftests/user_events/ftrace_test.c
index 1bc26e6476fc3..df0e776c2cc1b 100644
--- a/tools/testing/selftests/user_events/ftrace_test.c
+++ b/tools/testing/selftests/user_events/ftrace_test.c
@@ -209,6 +209,12 @@ TEST_F(user, register_events) {
ASSERT_EQ(0, reg.write_index);
ASSERT_NE(0, reg.status_bit);
+ /* Multiple registers to same name but different args should fail */
+ reg.enable_bit = 29;
+ reg.name_args = (__u64)"__test_event u32 field1;";
+ ASSERT_EQ(-1, ioctl(self->data_fd, DIAG_IOCSREG, ®));
+ ASSERT_EQ(EADDRINUSE, errno);
+
/* Ensure disabled */
self->enable_fd = open(enable_file, O_RDWR);
ASSERT_NE(-1, self->enable_fd);
--
2.39.2
From: sunliming <sunliming(a)kylinos.cn>
[ Upstream commit ba470eebc2f6c2f704872955a715b9555328e7d0 ]
User processes register name_args for events. If the same name but different
args event are registered. The trace outputs of second event are printed
as the first event. This is incorrect.
Return EADDRINUSE back to the user process if the same name but different args
event has being registered.
Link: https://lore.kernel.org/linux-trace-kernel/20230529032100.286534-1-sunlimin…
Signed-off-by: sunliming <sunliming(a)kylinos.cn>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Acked-by: Beau Belgrave <beaub(a)linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/trace/trace_events_user.c | 36 +++++++++++++++----
.../selftests/user_events/ftrace_test.c | 6 ++++
2 files changed, 36 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
index 625cab4b9d945..774d146c2c2ca 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -1274,6 +1274,8 @@ static int user_event_parse(struct user_event_group *group, char *name,
int index;
u32 key;
struct user_event *user;
+ int argc = 0;
+ char **argv;
/* Prevent dyn_event from racing */
mutex_lock(&event_mutex);
@@ -1281,13 +1283,35 @@ static int user_event_parse(struct user_event_group *group, char *name,
mutex_unlock(&event_mutex);
if (user) {
- *newuser = user;
- /*
- * Name is allocated by caller, free it since it already exists.
- * Caller only worries about failure cases for freeing.
- */
- kfree(name);
+ if (args) {
+ argv = argv_split(GFP_KERNEL, args, &argc);
+ if (!argv) {
+ ret = -ENOMEM;
+ goto error;
+ }
+
+ ret = user_fields_match(user, argc, (const char **)argv);
+ argv_free(argv);
+
+ } else
+ ret = list_empty(&user->fields);
+
+ if (ret) {
+ *newuser = user;
+ /*
+ * Name is allocated by caller, free it since it already exists.
+ * Caller only worries about failure cases for freeing.
+ */
+ kfree(name);
+ } else {
+ ret = -EADDRINUSE;
+ goto error;
+ }
+
return 0;
+error:
+ refcount_dec(&user->refcnt);
+ return ret;
}
index = find_first_zero_bit(group->page_bitmap, MAX_EVENTS);
diff --git a/tools/testing/selftests/user_events/ftrace_test.c b/tools/testing/selftests/user_events/ftrace_test.c
index 1bc26e6476fc3..df0e776c2cc1b 100644
--- a/tools/testing/selftests/user_events/ftrace_test.c
+++ b/tools/testing/selftests/user_events/ftrace_test.c
@@ -209,6 +209,12 @@ TEST_F(user, register_events) {
ASSERT_EQ(0, reg.write_index);
ASSERT_NE(0, reg.status_bit);
+ /* Multiple registers to same name but different args should fail */
+ reg.enable_bit = 29;
+ reg.name_args = (__u64)"__test_event u32 field1;";
+ ASSERT_EQ(-1, ioctl(self->data_fd, DIAG_IOCSREG, ®));
+ ASSERT_EQ(EADDRINUSE, errno);
+
/* Ensure disabled */
self->enable_fd = open(enable_file, O_RDWR);
ASSERT_NE(-1, self->enable_fd);
--
2.39.2
=== Context ===
In the context of a middlebox, fragmented packets are tricky to handle.
The full 5-tuple of a packet is often only available in the first
fragment which makes enforcing consistent policy difficult. There are
really only two stateless options, neither of which are very nice:
1. Enforce policy on first fragment and accept all subsequent fragments.
This works but may let in certain attacks or allow data exfiltration.
2. Enforce policy on first fragment and drop all subsequent fragments.
This does not really work b/c some protocols may rely on
fragmentation. For example, DNS may rely on oversized UDP packets for
large responses.
So stateful tracking is the only sane option. RFC 8900 [0] calls this
out as well in section 6.3:
Middleboxes [...] should process IP fragments in a manner that is
consistent with [RFC0791] and [RFC8200]. In many cases, middleboxes
must maintain state in order to achieve this goal.
=== BPF related bits ===
Policy has traditionally been enforced from XDP/TC hooks. Both hooks
run before kernel reassembly facilities. However, with the new
BPF_PROG_TYPE_NETFILTER, we can rather easily hook into existing
netfilter reassembly infra.
The basic idea is we bump a refcnt on the netfilter defrag module and
then run the bpf prog after the defrag module runs. This allows bpf
progs to transparently see full, reassembled packets. The nice thing
about this is that progs don't have to carry around logic to detect
fragments.
=== Patchset details ===
There was an earlier attempt at providing defrag via kfuncs [1]. The
feedback was that we could end up doing too much stuff in prog execution
context (like sending ICMP error replies). However, I think there are
still some outstanding discussion w.r.t. performance when it comes to
netfilter vs the previous approach. I'll schedule some time during
office hours for this.
Patches 1 & 2 are stolenfrom Florian. Hopefully he doesn't mind. There
were some outstanding comments on the v2 [2] but it doesn't look like a
v3 was ever submitted. I've addressed the comments and put them in this
patchset cuz I needed them.
Finally, the new selftest seems to be a little flaky. I'm not quite
sure why the server will fail to `recvfrom()` occassionaly. I'm fairly
sure it's a timing related issue with creating veths. I'll keep
debugging but I didn't want that to hold up discussion on this patchset.
[0]: https://datatracker.ietf.org/doc/html/rfc8900
[1]: https://lore.kernel.org/bpf/cover.1677526810.git.dxu@dxuuu.xyz/
[2]: https://lore.kernel.org/bpf/20230525110100.8212-1-fw@strlen.de/
Daniel Xu (7):
tools: libbpf: add netfilter link attach helper
selftests/bpf: Add bpf_program__attach_netfilter helper test
netfilter: defrag: Add glue hooks for enabling/disabling defrag
netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
bpf: selftests: Support not connecting client socket
bpf: selftests: Support custom type and proto for client sockets
bpf: selftests: Add defrag selftests
include/linux/netfilter.h | 12 +
include/uapi/linux/bpf.h | 5 +
net/ipv4/netfilter/nf_defrag_ipv4.c | 8 +
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c | 10 +
net/netfilter/core.c | 6 +
net/netfilter/nf_bpf_link.c | 108 ++++++-
tools/include/uapi/linux/bpf.h | 5 +
tools/lib/bpf/bpf.c | 8 +
tools/lib/bpf/bpf.h | 6 +
tools/lib/bpf/libbpf.c | 47 +++
tools/lib/bpf/libbpf.h | 15 +
tools/lib/bpf/libbpf.map | 1 +
tools/testing/selftests/bpf/Makefile | 4 +-
.../selftests/bpf/generate_udp_fragments.py | 90 ++++++
.../selftests/bpf/ip_check_defrag_frags.h | 57 ++++
tools/testing/selftests/bpf/network_helpers.c | 26 +-
tools/testing/selftests/bpf/network_helpers.h | 3 +
.../bpf/prog_tests/ip_check_defrag.c | 282 ++++++++++++++++++
.../bpf/prog_tests/netfilter_basic.c | 78 +++++
.../selftests/bpf/progs/ip_check_defrag.c | 104 +++++++
.../bpf/progs/test_netfilter_link_attach.c | 14 +
21 files changed, 868 insertions(+), 21 deletions(-)
create mode 100755 tools/testing/selftests/bpf/generate_udp_fragments.py
create mode 100644 tools/testing/selftests/bpf/ip_check_defrag_frags.h
create mode 100644 tools/testing/selftests/bpf/prog_tests/ip_check_defrag.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/netfilter_basic.c
create mode 100644 tools/testing/selftests/bpf/progs/ip_check_defrag.c
create mode 100644 tools/testing/selftests/bpf/progs/test_netfilter_link_attach.c
--
2.40.1
Dzień dobry,
zapoznałem się z Państwa ofertą i z przyjemnością przyznaję, że przyciąga uwagę i zachęca do dalszych rozmów.
Pomyślałem, że może mógłbym mieć swój wkład w Państwa rozwój i pomóc dotrzeć z tą ofertą do większego grona odbiorców. Pozycjonuję strony www, dzięki czemu generują świetny ruch w sieci.
Możemy porozmawiać w najbliższym czasie?
Pozdrawiam
Adam Charachuta
Nested translation is a hardware feature that is supported by many modern
IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
to get access to the physical address. stage-1 translation table is owned
by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
to stage-1 translation table should be followed by an IOTLB invalidation.
Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.
.-------------. .---------------------------.
| vIOMMU | | Guest I/O page table |
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush --+
'-------------' |
| | V
| | I/O page table pointer in GPA
'-------------'
Guest
------| Shadow |--------------------------|--------
v v v
Host
.-------------. .------------------------.
| pIOMMU | | FS for GIOVA->GPA |
| | '------------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.----------------------------------.
| | | SS for GPA->HPA, unmanaged domain|
| | '----------------------------------'
'-------------'
Where:
- FS = First stage page tables
- SS = Second stage page tables
<Intel VT-d Nested translation>
In IOMMUFD, all the translation tables are tracked by hw_pagetable (hwpt)
and each has an iommu_domain allocated from iommu driver. So in this series
hw_pagetable and iommu_domain means the same thing if no special note.
IOMMUFD has already supported allocating hw_pagetable that is linked with
an IOAS. However, nesting requires IOMMUFD to allow allocating hw_pagetable
with driver specific parameters and interface to sync stage-1 IOTLB as user
owns the stage-1 translation table.
This series is based on the iommu hw info reporting series [1]. It first
introduces new iommu op for allocating domains with user data and the op
for syncing stage-1 IOTLB, and then extend the IOMMUFD internal infrastructure
to accept user_data and parent hwpt, then relay the data to iommu core to
allocate iommu_domain. After it, extend the ioctl IOMMU_HWPT_ALLOC to accept
user data and stage-2 hwpt ID to allocate hwpt. Along with it, ioctl
IOMMU_HWPT_INVALIDATE is added to invalidate stage-1 IOTLB. This is needed
for user-managed hwpts. Selftest is added as well to cover the new ioctls.
Complete code can be found in [2], QEMU could can be found in [3].
At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks
them for the help. ^_^. Look forward to your feedbacks.
base-commit: cf905391237ded2331388e75adb5afbabeddc852
[1] https://lore.kernel.org/linux-iommu/20230511143024.19542-1-yi.l.liu@intel.c…
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv4.mig.reset.v4_var3%…
Change log:
v2:
- Add union iommu_domain_user_data to include all user data structures to avoid
passing void * in kernel APIs.
- Add iommu op to return user data length for user domain allocation
- Rename struct iommu_hwpt_alloc::data_type to be hwpt_type
- Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len
- Convert cache_invalidate_user op to be int instead of void
- Remove @data_type in struct iommu_hwpt_invalidate
- Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1
v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.…
Thanks,
Yi Liu
Lu Baolu (2):
iommu: Add new iommu op to create domains owned by userspace
iommu: Add nested domain support
Nicolin Chen (5):
iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
iommufd/selftest: Add domain_alloc_user() support in iommu mock
iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data
iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
Yi Liu (4):
iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
iommufd: Pass parent hwpt and user_data to
iommufd_hw_pagetable_alloc()
iommufd: IOMMU_HWPT_ALLOC allocation with user data
iommufd: Add IOMMU_HWPT_INVALIDATE
drivers/iommu/iommufd/device.c | 2 +-
drivers/iommu/iommufd/hw_pagetable.c | 191 +++++++++++++++++-
drivers/iommu/iommufd/iommufd_private.h | 16 +-
drivers/iommu/iommufd/iommufd_test.h | 30 +++
drivers/iommu/iommufd/main.c | 5 +-
drivers/iommu/iommufd/selftest.c | 119 ++++++++++-
include/linux/iommu.h | 36 ++++
include/uapi/linux/iommufd.h | 58 +++++-
tools/testing/selftests/iommu/iommufd.c | 126 +++++++++++-
tools/testing/selftests/iommu/iommufd_utils.h | 70 +++++++
10 files changed, 629 insertions(+), 24 deletions(-)
--
2.34.1
Make sv39 the default address space for mmap as some applications
currently depend on this assumption. The RISC-V specification enforces
that bits outside of the virtual address range are not used, so
restricting the size of the default address space as such should be
temporary. A hint address passed to mmap will cause the largest address
space that fits entirely into the hint to be used. If the hint is less
than or equal to 1<<38, a 39-bit address will be used. After an address
space is completely full, the next smallest address space will be used.
Documentation is also added to the RISC-V virtual memory section to explain
these changes.
Charlie Jenkins (2):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Update documentation and include test
Documentation/riscv/vm-layout.rst | 20 ++++++++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 21 ++++++--
arch/riscv/include/asm/processor.h | 41 +++++++++++++---
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/Makefile | 22 +++++++++
.../selftests/riscv/mm/testcases/mmap.c | 49 +++++++++++++++++++
7 files changed, 144 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap.c
base-commit: eef509789cecdce895020682192d32e8bac790e8
--
2.34.1
Hi folks,
This series implements the functionality of delivering IO page faults to
user space through the IOMMUFD framework. The use case is nested
translation, where modern IOMMU hardware supports two-stage translation
tables. The second-stage translation table is managed by the host VMM
while the first-stage translation table is owned by the user space.
Hence, any IO page fault that occurs on the first-stage page table
should be delivered to the user space and handled there. The user space
should respond the page fault handling result to the device top-down
through the IOMMUFD response uAPI.
User space indicates its capablity of handling IO page faults by setting
a user HWPT allocation flag IOMMU_HWPT_ALLOC_FLAGS_IOPF_CAPABLE. IOMMUFD
will then setup its infrastructure for page fault delivery. Together
with the iopf-capable flag, user space should also provide an eventfd
where it will listen on any down-top page fault messages.
On a successful return of the allocation of iopf-capable HWPT, a fault
fd will be returned. User space can open and read fault messages from it
once the eventfd is signaled.
Besides the overall design, I'd like to hear comments about below
designs:
- The IOMMUFD fault message format. It is very similar to that in
uapi/linux/iommu which has been discussed before and partially used by
the IOMMU SVA implementation. I'd like to get more comments on the
format when it comes to IOMMUFD.
- The timeout value for the pending page fault messages. Ideally we
should determine the timeout value from the device configuration, but
I failed to find any statement in the PCI specification (version 6.x).
A default 100 milliseconds is selected in the implementation, but it
leave the room for grow the code for per-device setting.
This series is only for review comment purpose. I used IOMMUFD selftest
to verify the hwpt allocation, attach/detach and replace. But I didn't
get a chance to run it with real hardware yet. I will do more test in
the subsequent versions when I am confident that I am heading on the
right way.
This series is based on the latest implementation of the nested
translation under discussion. The whole series and related patches are
available on gitbub:
https://github.com/LuBaolu/intel-iommu/commits/iommufd-io-pgfault-delivery-…
Best regards,
baolu
Lu Baolu (17):
iommu: Move iommu fault data to linux/iommu.h
iommu: Support asynchronous I/O page fault response
iommu: Add helper to set iopf handler for domain
iommu: Pass device parameter to iopf handler
iommu: Split IO page fault handling from SVA
iommu: Add iommu page fault cookie helpers
iommufd: Add iommu page fault data
iommufd: IO page fault delivery initialization and release
iommufd: Add iommufd hwpt iopf handler
iommufd: Add IOMMU_HWPT_ALLOC_FLAGS_USER_PASID_TABLE for hwpt_alloc
iommufd: Deliver fault messages to user space
iommufd: Add io page fault response support
iommufd: Add a timer for each iommufd fault data
iommufd: Drain all pending faults when destroying hwpt
iommufd: Allow new hwpt_alloc flags
iommufd/selftest: Add IOPF feature for mock devices
iommufd/selftest: Cover iopf-capable nested hwpt
include/linux/iommu.h | 175 +++++++++-
drivers/iommu/{iommu-sva.h => io-pgfault.h} | 25 +-
drivers/iommu/iommu-priv.h | 3 +
drivers/iommu/iommufd/iommufd_private.h | 32 ++
include/uapi/linux/iommu.h | 161 ---------
include/uapi/linux/iommufd.h | 73 +++-
tools/testing/selftests/iommu/iommufd_utils.h | 20 +-
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
drivers/iommu/intel/iommu.c | 2 +-
drivers/iommu/intel/svm.c | 2 +-
drivers/iommu/io-pgfault.c | 7 +-
drivers/iommu/iommu-sva.c | 4 +-
drivers/iommu/iommu.c | 50 ++-
drivers/iommu/iommufd/device.c | 64 +++-
drivers/iommu/iommufd/hw_pagetable.c | 318 +++++++++++++++++-
drivers/iommu/iommufd/main.c | 3 +
drivers/iommu/iommufd/selftest.c | 71 ++++
tools/testing/selftests/iommu/iommufd.c | 17 +-
MAINTAINERS | 1 -
drivers/iommu/Kconfig | 4 +
drivers/iommu/Makefile | 3 +-
drivers/iommu/intel/Kconfig | 1 +
23 files changed, 837 insertions(+), 203 deletions(-)
rename drivers/iommu/{iommu-sva.h => io-pgfault.h} (71%)
delete mode 100644 include/uapi/linux/iommu.h
--
2.34.1
When we collect a signal context with one of the SME modes enabled we will
have enabled that mode behind the compiler and libc's back so they may
issue some instructions not valid in streaming mode, causing spurious
failures.
For the code prior to issuing the BRK to trigger signal handling we need to
stay in streaming mode if we were already there since that's a part of the
signal context the caller is trying to collect. Unfortunately this code
includes a memset() which is likely to be heavily optimised and is likely
to use FP instructions incompatible with streaming mode. We can avoid this
happening by open coding the memset(), inserting a volatile assembly
statement to avoid the compiler recognising what's being done and doing
something in optimisation. This code is not performance critical so the
inefficiency should not be an issue.
After collecting the context we can simply exit streaming mode, avoiding
these issues. Use a full SMSTOP for safety to prevent any issues appearing
with ZA.
Reported-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
.../selftests/arm64/signal/test_signals_utils.h | 28 +++++++++++++++++++++-
1 file changed, 27 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.h b/tools/testing/selftests/arm64/signal/test_signals_utils.h
index 222093f51b67..db28409fd44b 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.h
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.h
@@ -60,13 +60,28 @@ static __always_inline bool get_current_context(struct tdescr *td,
size_t dest_sz)
{
static volatile bool seen_already;
+ int i;
+ char *uc = (char *)dest_uc;
assert(td && dest_uc);
/* it's a genuine invocation..reinit */
seen_already = 0;
td->live_uc_valid = 0;
td->live_sz = dest_sz;
- memset(dest_uc, 0x00, td->live_sz);
+
+ /*
+ * This is a memset() but we don't want the compiler to
+ * optimise it into either instructions or a library call
+ * which might be incompatible with streaming mode.
+ */
+ for (i = 0; i < td->live_sz; i++) {
+ asm volatile("nop"
+ : "+m" (*dest_uc)
+ :
+ : "memory");
+ uc[i] = 0;
+ }
+
td->live_uc = dest_uc;
/*
* Grab ucontext_t triggering a SIGTRAP.
@@ -103,6 +118,17 @@ static __always_inline bool get_current_context(struct tdescr *td,
:
: "memory");
+ /*
+ * If we were grabbing a streaming mode context then we may
+ * have entered streaming mode behind the system's back and
+ * libc or compiler generated code might decide to do
+ * something invalid in streaming mode, or potentially even
+ * the state of ZA. Issue a SMSTOP to exit both now we have
+ * grabbed the state.
+ */
+ if (td->feats_supported & FEAT_SME)
+ asm volatile("msr S0_3_C4_C6_3, xzr");
+
/*
* If we get here with seen_already==1 it implies the td->live_uc
* context has been used to get back here....this probably means
---
base-commit: 6995e2de6891c724bfeb2db33d7b87775f913ad1
change-id: 20230628-arm64-signal-memcpy-fix-7de3b3c8fa10
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Hi Mark,
While debugging the SME issue reported in CI, I noticed that the
streaming SVE tests are failing on the fastmodel because of an
unexpected SIGILL. For example:
will:arm64/signal$ ./ssve_za_regs
# Streaming SVE registers :: Check that we get the right Streaming SVE registers reported
Registered handlers for all signals.
Detected MINSTKSIGSZ:4720
Required Features: [ SME ] supported
Incompatible Features: [] absent
Testcase initialized.
Testing VL 64
-- RX UNEXPECTED SIGNAL: 4
==>> completed. FAIL(0)
The signal is injected because we get an SME trap due to an fpsimd, sve
or sve2 instruction being used in streaming mode (ESR is 0x76000001).
I did a bit of digging and it looks like this is my libc using a vector
DUP instruction in memset:
#0 __memset_generic () at ../sysdeps/aarch64/memset.S:37
#1 0x0000aaaaaaaa1170 in get_current_context (dest_sz=131072,
dest_uc=0xaaaaaeab6ba0 <context>, td=0xaaaaaaab50f0 <tde>)
at ./test_signals_utils.h:69
#2 do_one_sme_vl (si=<optimized out>, uc=<optimized out>, vl=64,
td=0xaaaaaaab50f0 <tde>) at testcases/ssve_za_regs.c:90
#3 sme_regs (td=0xaaaaaaab50f0 <tde>, si=<optimized out>, uc=<optimized out>)
at testcases/ssve_za_regs.c:145
#4 0x0000aaaaaaaa0ed0 in main (argc=<optimized out>, argv=<optimized out>)
at test_signals.c:21
Dump of assembler code for function __memset_generic:
=> 0x0000fffff7edfb00 <+0>: dup v0.16b, w1
The easy option would be to require FA64 for these tests, but I guess it
would be better to exit streaming mode.
Please can you have a look?
Thanks,
Will
Awk is already called for /sys/block/zram#/mm_stat parsing, so use it
to also perform the floating point capacity vs consumption ratio
calculations. The test output is unchanged.
This allows bc to be dropped as a dependency for the zram selftests.
Signed-off-by: David Disseldorp <ddiss(a)suse.de>
---
tools/testing/selftests/zram/zram01.sh | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/zram/zram01.sh b/tools/testing/selftests/zram/zram01.sh
index 8f4affe34f3e4..df1b1d4158989 100755
--- a/tools/testing/selftests/zram/zram01.sh
+++ b/tools/testing/selftests/zram/zram01.sh
@@ -33,7 +33,7 @@ zram_algs="lzo"
zram_fill_fs()
{
- for i in $(seq $dev_start $dev_end); do
+ for ((i = $dev_start; i <= $dev_end && !ERR_CODE; i++)); do
echo "fill zram$i..."
local b=0
while [ true ]; do
@@ -44,15 +44,13 @@ zram_fill_fs()
done
echo "zram$i can be filled with '$b' KB"
- local mem_used_total=`awk '{print $3}' "/sys/block/zram$i/mm_stat"`
- local v=$((100 * 1024 * $b / $mem_used_total))
- if [ "$v" -lt 100 ]; then
- echo "FAIL compression ratio: 0.$v:1"
- ERR_CODE=-1
- return
- fi
-
- echo "zram compression ratio: $(echo "scale=2; $v / 100 " | bc):1: OK"
+ awk -v b="$b" '{ v = (100 * 1024 * b / $3) } END {
+ if (v < 100) {
+ printf "FAIL compression ratio: 0.%u:1\n", v
+ exit 1
+ }
+ printf "zram compression ratio: %.2f:1: OK\n", v / 100
+ }' "/sys/block/zram$i/mm_stat" || ERR_CODE=-1
done
}
--
2.35.3
KVM_GET_REG_LIST will dump all register IDs that are available to
KVM_GET/SET_ONE_REG and It's very useful to identify some platform
regression issue during VM migration.
Patch 1-7 re-structured the get-reg-list test in aarch64 to make some
of the code as common test framework that can be shared by riscv.
Patch 8 move reject_set check logic to a function so as to check for
different errno for different registers.
Patch 9 change to do the get/set operation only on present-blessed list.
Patch 10 enabled the KVM_GET_REG_LIST API in riscv.
patch 11-12 added the corresponding kselftest for checking possible
register regressions.
The get-reg-list kvm selftest was ported from aarch64 and tested with
Linux 6.4-rc6 on a Qemu riscv64 virt machine.
---
Changed since v3:
* Rebase to Linux 6.4-rc6
* Address Andrew's suggestions and comments:
* Move reject_set check logic to a function
* Only do get/set tests on present blessed list
* Only enable ISA extension for the specified config
* For disable-not-allowed registers, move them to the filter-reg-list
Andrew Jones (7):
KVM: arm64: selftests: Replace str_with_index with strdup_printf
KVM: arm64: selftests: Drop SVE cap check in print_reg
KVM: arm64: selftests: Remove print_reg's dependency on vcpu_config
KVM: arm64: selftests: Rename vcpu_config and add to kvm_util.h
KVM: arm64: selftests: Delete core_reg_fixup
KVM: arm64: selftests: Split get-reg-list test code
KVM: arm64: selftests: Finish generalizing get-reg-list
Haibo Xu (5):
KVM: arm64: selftests: Move reject_set check logic to a function
KVM: selftests: Only do get/set tests on present blessed list
KVM: riscv: Add KVM_GET_REG_LIST API support
KVM: riscv: selftests: Add finalize_vcpu check in run_test
KVM: riscv: selftests: Add get-reg-list test
Documentation/virt/kvm/api.rst | 2 +-
arch/riscv/kvm/vcpu.c | 375 +++++++++
tools/testing/selftests/kvm/Makefile | 11 +-
.../selftests/kvm/aarch64/get-reg-list.c | 538 ++-----------
tools/testing/selftests/kvm/get-reg-list.c | 439 ++++++++++
.../selftests/kvm/include/kvm_util_base.h | 16 +
.../selftests/kvm/include/riscv/processor.h | 3 +
.../testing/selftests/kvm/include/test_util.h | 2 +
tools/testing/selftests/kvm/lib/test_util.c | 15 +
.../selftests/kvm/riscv/get-reg-list.c | 752 ++++++++++++++++++
10 files changed, 1658 insertions(+), 495 deletions(-)
create mode 100644 tools/testing/selftests/kvm/get-reg-list.c
create mode 100644 tools/testing/selftests/kvm/riscv/get-reg-list.c
--
2.34.1
This patch introduces two tests for the EVIOCSABS ioctl. The first one
checks that the ioctl fails when the EV_ABS bit was not set, and the
second one just checks that the normal workflow for this ioctl
succeeds.
Signed-off-by: Dana Elfassy <dangel101(a)gmail.com>
---
This patch depends on '[v3] selftests/input: Introduce basic tests for evdev ioctls' [1] sent to the ML.
[1] https://patchwork.kernel.org/project/linux-input/patch/20230607153214.15933…
tools/testing/selftests/input/evioc-test.c | 23 ++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/tools/testing/selftests/input/evioc-test.c b/tools/testing/selftests/input/evioc-test.c
index 4c0c8ebed378..7afd537f0b24 100644
--- a/tools/testing/selftests/input/evioc-test.c
+++ b/tools/testing/selftests/input/evioc-test.c
@@ -279,4 +279,27 @@ TEST(eviocgrep_get_repeat_settings)
selftest_uinput_destroy(uidev);
}
+TEST(eviocsabs_set_abs_value_limits)
+{
+ struct selftest_uinput *uidev;
+ struct input_absinfo absinfo;
+ int rc;
+
+ // fail test on dev->absinfo
+ rc = selftest_uinput_create_device(&uidev), -1;
+ ASSERT_EQ(0, rc);
+ ASSERT_NE(NULL, uidev);
+ rc = ioctl(uidev->evdev_fd, EVIOCSABS(0), &absinfo);
+ ASSERT_EQ(-1, rc);
+ selftest_uinput_destroy(uidev);
+
+ // ioctl normal flow
+ rc = selftest_uinput_create_device(&uidev, EV_ABS, -1);
+ ASSERT_EQ(0, rc);
+ ASSERT_NE(NULL, uidev);
+ rc = ioctl(uidev->evdev_fd, EVIOCSABS(0), &absinfo);
+ ASSERT_EQ(0, rc);
+ selftest_uinput_destroy(uidev);
+}
+
TEST_HARNESS_MAIN
--
2.41.0
Changes in v21:
- Abort walk instead of returning error if WP is to be performed on
partial hugetlb
*Changes in v20*
- Correct PAGE_IS_FILE and add PAGE_IS_PFNZERO
*Changes in v19*
- Minor changes and interface updates
*Changes in v18*
- Rebase on top of next-20230613
- Minor updates
*Changes in v17*
- Rebase on top of next-20230606
- Minor improvements in PAGEMAP_SCAN IOCTL patch
*Changes in v16*
- Fix a corner case
- Add exclusive PM_SCAN_OP_WP back
*Changes in v15*
- Build fix (Add missed build fix in RESEND)
*Changes in v14*
- Fix build error caused by #ifdef added at last minute in some configs
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 58 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 560 +++++++
fs/userfaultfd.c | 26 +-
include/linux/hugetlb.h | 1 +
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 54 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 34 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 54 +
tools/testing/selftests/mm/.gitignore | 2 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1464 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
16 files changed, 2329 insertions(+), 24 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
Hi Linus,
Please pull the following Kselftest update for Linux 6.5-rc1.
This kselftest update for Linux 6.5-rc1 consists of:
- change to allow runners to override the timeout
This change is made to avoid future increases of long
timeouts
- several other spelling and cleanups
- a new subtest to video_device_test
- enhancements to test coverage in clone3 test
- other fixes to ftrace and cpufreq tests
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 858fd168a95c5b9669aac8db6c14a9aeab446375:
Linux 6.4-rc6 (2023-06-11 14:35:30 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-next-6.5-rc1
for you to fetch changes up to 8cd0d8633e2de4e6dd9ddae7980432e726220fdb:
selftests/ftace: Fix KTAP output ordering (2023-06-12 16:40:22 -0600)
----------------------------------------------------------------
linux-kselftest-next-6.5-rc1
This kselftest update for Linux 6.5-rc1 consists of:
- change to allow runners to override the timeout
This change is made to avoid future increases of long
timeouts
- several other spelling and cleanups
- a new subtest to video_device_test
- enhancements to test coverage in clone3 test
- other fixes to ftrace and cpufreq tests
----------------------------------------------------------------
Akanksha J N (1):
selftests/ftrace: Add new test case which checks for optimized probes
Colin Ian King (2):
selftests: prctl: Fix spelling mistake "anonynous" -> "anonymous"
kselftest: vDSO: Fix accumulation of uninitialized ret when CLOCK_REALTIME is undefined
Ivan Orlov (1):
selftests: media_tests: Add new subtest to video_device_test
Luis Chamberlain (1):
selftests: allow runners to override the timeout
Mark Brown (2):
selftests/cpufreq: Don't enable generic lock debugging options
selftests/ftace: Fix KTAP output ordering
Rishabh Bhatnagar (1):
kselftests: Sort the collections list to avoid duplicate tests
Tobias Klauser (1):
selftests/clone3: test clone3 with exit signal in flags
Ziqi Zhao (1):
selftest: pidfd: Omit long and repeating outputs
Documentation/dev-tools/kselftest.rst | 22 ++++
tools/testing/selftests/clone3/clone3.c | 5 +-
tools/testing/selftests/cpufreq/config | 8 --
tools/testing/selftests/ftrace/ftracetest | 2 +-
.../ftrace/test.d/kprobe/kprobe_opt_types.tc | 34 +++++++
tools/testing/selftests/kselftest/runner.sh | 11 +-
.../selftests/media_tests/video_device_test.c | 111 +++++++++++++++------
tools/testing/selftests/pidfd/pidfd.h | 1 -
tools/testing/selftests/pidfd/pidfd_fdinfo_test.c | 1 +
tools/testing/selftests/pidfd/pidfd_test.c | 3 +-
.../selftests/prctl/set-anon-vma-name-test.c | 2 +-
tools/testing/selftests/run_kselftest.sh | 7 +-
.../selftests/vDSO/vdso_test_clock_getres.c | 4 +-
13 files changed, 166 insertions(+), 45 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kprobe_opt_types.tc
----------------------------------------------------------------
Make sv39 the default address space for mmap as some applications
currently depend on this assumption. The RISC-V specification enforces
that bits outside of the virtual address range are not used, so
restricting the size of the default address space as such should be
temporary. A hint address passed to mmap will cause the largest address
space that fits entirely into the hint to be used. If the hint is less
than or equal to 1<<38, a 39-bit address will be used. After an address
space is completely full, the next smallest address space will be used.
Documentation is also added to the RISC-V virtual memory section to explain
these changes.
Charlie Jenkins (2):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Update documentation and include test
Documentation/riscv/vm-layout.rst | 20 ++++++++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 21 ++++++--
arch/riscv/include/asm/processor.h | 41 +++++++++++++---
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/Makefile | 22 +++++++++
.../selftests/riscv/mm/testcases/mmap.c | 49 +++++++++++++++++++
7 files changed, 144 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap.c
base-commit: eef509789cecdce895020682192d32e8bac790e8
--
2.34.1
Hello!
Here is v4 of the mremap start address optimization / fix for exec warning. It
took me a while to write a test that catches the issue me/Linus discussed in
the last version. And I verified kernel crashes without the check. See below.
The main changes in this series is:
Care to be taken to move purely within a VMA, in other words this check
in call_align_down():
if (vma->vm_start != addr_masked)
return false;
As an example of why this is needed:
Consider the following range which is 2MB aligned and is
a part of a larger 10MB range which is not shown. Each
character is 256KB below making the source and destination
2MB each. The lower case letters are moved (s to d) and the
upper case letters are not moved.
|DDDDddddSSSSssss|
If we align down 'ssss' to start from the 'SSSS', we will end up destroying
SSSS. The above if statement prevents that and I verified it.
I also added a test for this in the last patch.
History of patches
==================
v3->v4:
1. Make sure to check address to align is beginning of VMA
2. Add test to check this (test fails with a kernel crash if we don't do this).
v2->v3:
1. Masked address was stored in int, fixed it to unsigned long to avoid truncation.
2. We now handle moves happening purely within a VMA, a new test is added to handle this.
3. More code comments.
v1->v2:
1. Trigger the optimization for mremaps smaller than a PMD. I tested by tracing
that it works correctly.
2. Fix issue with bogus return value found by Linus if we broke out of the
above loop for the first PMD itself.
v1: Initial RFC.
Description of patches
======================
These patches optimizes the start addresses in move_page_tables() and tests the
changes. It addresses a warning [1] that occurs due to a downward, overlapping
move on a mutually-aligned offset within a PMD during exec. By initiating the
copy process at the PMD level when such alignment is present, we can prevent
this warning and speed up the copying process at the same time. Linus Torvalds
suggested this idea.
Please check the individual patches for more details.
thanks,
- Joel
[1] https://lore.kernel.org/all/ZB2GTBD%2FLWTrkOiO@dhcp22.suse.cz/
Joel Fernandes (Google) (7):
mm/mremap: Optimize the start addresses in move_page_tables()
mm/mremap: Allow moves within the same VMA for stack
selftests: mm: Fix failure case when new remap region was not found
selftests: mm: Add a test for mutually aligned moves > PMD size
selftests: mm: Add a test for remapping to area immediately after
existing mapping
selftests: mm: Add a test for remapping within a range
selftests: mm: Add a test for moving from an offset from start of
mapping
fs/exec.c | 2 +-
include/linux/mm.h | 2 +-
mm/mremap.c | 63 ++++-
tools/testing/selftests/mm/mremap_test.c | 301 +++++++++++++++++++----
4 files changed, 319 insertions(+), 49 deletions(-)
--
2.41.0.rc2.161.g9c6817b8e7-goog
Hi Linus,
Please pull the following KUnit next update for Linux 6.5-rc1.
This KUnit update for Linux 6.5-rc1 consists of:
- kunit_add_action() API to defer a call until test exit.
- Update document to add kunit_add_action() usage notes.
- Changes to always run cleanup from a test kthread.
- Documentation updates to clarify cleanup usage
- assertions should not be used in cleanup
- Documentation update to clearly indicate that exit
functions should run even if init fails
- Several fixes and enhancements to existing tests.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit ac9a78681b921877518763ba0e89202254349d1b:
Linux 6.4-rc1 (2023-05-07 13:34:35 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-kunit-6.5-rc1
for you to fetch changes up to 2e66833579ed759d7b7da1a8f07eb727ec6e80db:
MAINTAINERS: Add source tree entry for kunit (2023-06-15 09:16:01 -0600)
----------------------------------------------------------------
linux-kselftest-kunit-6.5-rc1
This KUnit update for Linux 6.5-rc1 consists of:
- kunit_add_action() API to defer a call until test exit.
- Update document to add kunit_add_action() usage notes.
- Changes to always run cleanup from a test kthread.
- Documentation updates to clarify cleanup usage
- assertions should not be used in cleanup
- Documentation update to clearly indicate that exit
functions should run even if init fails
- Several fixes and enhancements to existing tests.
----------------------------------------------------------------
Daniel Latypov (1):
kunit: tool: undo type subscripts for subprocess.Popen
David Gow (11):
kunit: Always run cleanup from a test kthread
Documentation: kunit: Note that assertions should not be used in cleanup
Documentation: kunit: Warn that exit functions run even if init fails
kunit: example: Provide example exit functions
kunit: Add kunit_add_action() to defer a call until test exit
kunit: executor_test: Use kunit_add_action()
kunit: kmalloc_array: Use kunit_add_action()
Documentation: kunit: Add usage notes for kunit_add_action()
kunit: Fix obsolete name in documentation headers (func->action)
kunit: Move kunit_abort() call out of kunit_do_failed_assertion()
Documentation: kunit: Rename references to kunit_abort()
Geert Uytterhoeven (1):
Documentation: kunit: Modular tests should not depend on KUNIT=y
Michal Wajdeczko (3):
kunit/test: Add example test showing parameterized testing
kunit: Fix reporting of the skipped parameterized tests
kunit: Update kunit_print_ok_not_ok function
SeongJae Park (1):
MAINTAINERS: Add source tree entry for kunit
Takashi Sakamoto (1):
Documentation: Kunit: add MODULE_LICENSE to sample code
Documentation/dev-tools/kunit/architecture.rst | 4 +-
Documentation/dev-tools/kunit/start.rst | 7 +-
Documentation/dev-tools/kunit/usage.rst | 69 ++++++++++-
MAINTAINERS | 2 +
include/kunit/resource.h | 92 +++++++++++++++
include/kunit/test.h | 34 ++++--
lib/kunit/executor_test.c | 11 +-
lib/kunit/kunit-example-test.c | 56 +++++++++
lib/kunit/kunit-test.c | 88 +++++++++++++-
lib/kunit/resource.c | 99 ++++++++++++++++
lib/kunit/test.c | 157 ++++++++++++++-----------
tools/testing/kunit/kunit_kernel.py | 6 +-
tools/testing/kunit/mypy.ini | 6 +
tools/testing/kunit/run_checks.py | 2 +-
14 files changed, 538 insertions(+), 95 deletions(-)
create mode 100644 tools/testing/kunit/mypy.ini
----------------------------------------------------------------
Hi Shuah,
This series contains updates to the rseq selftests.
* A typo in the Makefile prevents the basic_percpu_ops_mm_cid_test to use
the mm_cid field.
* Fix load-acquire/store-release macros which were buggy on arm64.
(this depends on commit "Implement rseq_unqual_scalar_typeof").
* The change "Use rseq_unqual_scalar_typeof in macros" is not a fix
per se, but improves the assembler generated.
Can you pick these in the selftests tree please ?
Thanks,
Mathieu
Mathieu Desnoyers (4):
selftests/rseq: Fix CID_ID typo in Makefile
selftests/rseq: Implement rseq_unqual_scalar_typeof
selftests/rseq: Fix arm64 buggy load-acquire/store-release macros
selftests/rseq: Use rseq_unqual_scalar_typeof in macros
tools/testing/selftests/rseq/Makefile | 2 +-
tools/testing/selftests/rseq/compiler.h | 26 ++++++++++
tools/testing/selftests/rseq/rseq-arm.h | 4 +-
tools/testing/selftests/rseq/rseq-arm64.h | 58 ++++++++++++-----------
tools/testing/selftests/rseq/rseq-mips.h | 4 +-
tools/testing/selftests/rseq/rseq-ppc.h | 4 +-
tools/testing/selftests/rseq/rseq-riscv.h | 6 +--
tools/testing/selftests/rseq/rseq-s390.h | 4 +-
tools/testing/selftests/rseq/rseq-x86.h | 4 +-
9 files changed, 70 insertions(+), 42 deletions(-)
--
2.25.1
We want to replace iptables TPROXY with a BPF program at TC ingress.
To make this work in all cases we need to assign a SO_REUSEPORT socket
to an skb, which is currently prohibited. This series adds support for
such sockets to bpf_sk_assing.
I did some refactoring to cut down on the amount of duplicate code. The
key to this is to use INDIRECT_CALL in the reuseport helpers. To show
that this approach is not just beneficial to TC sk_assign I removed
duplicate code for bpf_sk_lookup as well.
Changes from v1:
- Correct commit abbrev length (Kuniyuki)
- Reduce duplication (Kuniyuki)
- Add checks on sk_state (Martin)
- Split exporting inet[6]_lookup_reuseport into separate patch (Eric)
Joint work with Daniel Borkmann.
Signed-off-by: Lorenz Bauer <lmb(a)isovalent.com>
---
Changes in v3:
- Fix warning re udp_ehashfn and udp6_ehashfn (Simon)
- Return higher scoring connected UDP reuseport sockets (Kuniyuki)
- Fix ipv6 module builds
- Link to v2: https://lore.kernel.org/r/20230613-so-reuseport-v2-0-b7c69a342613@isovalent…
---
Daniel Borkmann (1):
selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper
Lorenz Bauer (6):
udp: re-score reuseport groups when connected sockets are present
net: export inet_lookup_reuseport and inet6_lookup_reuseport
net: document inet[6]_lookup_reuseport sk_state requirements
net: remove duplicate reuseport_lookup functions
net: remove duplicate sk_lookup helpers
bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
include/net/inet6_hashtables.h | 84 ++++++++-
include/net/inet_hashtables.h | 77 +++++++-
include/net/sock.h | 7 +-
include/net/udp.h | 8 +
include/uapi/linux/bpf.h | 3 -
net/core/filter.c | 2 -
net/ipv4/inet_hashtables.c | 70 +++++---
net/ipv4/udp.c | 88 ++++-----
net/ipv6/inet6_hashtables.c | 73 +++++---
net/ipv6/udp.c | 98 ++++------
tools/include/uapi/linux/bpf.h | 3 -
tools/testing/selftests/bpf/network_helpers.c | 3 +
.../selftests/bpf/prog_tests/assign_reuse.c | 197 +++++++++++++++++++++
.../selftests/bpf/progs/test_assign_reuse.c | 142 +++++++++++++++
14 files changed, 676 insertions(+), 179 deletions(-)
---
base-commit: 970308a7b544fa1c7ee98a2721faba3765be8dd8
change-id: 20230613-so-reuseport-e92c526173ee
Best regards,
--
Lorenz Bauer <lmb(a)isovalent.com>
v3:
- [v2] https://lore.kernel.org/lkml/20230531163405.2200292-1-longman@redhat.com/
- Change the new control file from root-only "cpuset.cpus.reserve" to
non-root "cpuset.cpus.exclusive" which lists the set of exclusive
CPUs distributed down the hierarchy.
- Add a patch to restrict boot-time isolated CPUs to isolated
partitions only.
- Update the test_cpuset_prs.sh test script and documentation
accordingly.
v2:
- [v1] https://lore.kernel.org/lkml/20230412153758.3088111-1-longman@redhat.com/
- Dropped the special "isolcpus" partition in v1
- Add the root only "cpuset.cpus.reserve" control file for reserving
CPUs used for remote isolated partitions.
- Update the test_cpuset_prs.sh test script and documentation
accordingly.
This patch series introduces a new cpuset control file
"cpuset.cpus.exclusive" which must be a subset of "cpuset.cpus"
and the parent's "cpuset.cpus.exclusive". This control file lists
the exclusive CPUs to be distributed down the hierarchy. Any one
of the exclusive CPUs can only be distributed to at most one child
cpuset. Unlike "cpuset.cpus", invalid input to "cpuset.cpus.exclusive"
will be rejected with an error. This new control file has no effect on
the behavior of the cpuset until it turns into a partition root. At that
point, its effective CPUs will be set to its exclusive CPUs unless some
of them are offline.
This patch series also introduces a new category of cpuset partition
called remote partitions. The existing partition category where the
partition roots have to be clustered around the root cgroup in a
hierarchical way is now referred to as local partitions.
A remote partition can be formed far from the root cgroup
with no partition root parent. While local partitions can be
created without touching "cpuset.cpus.exclusive" as it can be set
automatically if a cpuset becomes a local partition root. Properly set
"cpuset.cpus.exclusive" values down the hierarchy are required to create
a remote partition.
Both scheduling and isolated partitions can be formed in a remote
partition. A local partition can be created under a remote partition.
A remote partition, however, cannot be formed under a local partition
for now.
Modern container orchestration tools like Kubernetes use the cgroup
hierarchy to manage different containers. And it is relying on other
middleware like systemd to help managing it. If a container needs to
use isolated CPUs, it is hard to get those with the local partitions
as it will require the administrative parent cgroup to be a partition
root too which tool like systemd may not be ready to manage.
With this patch series, we allow the creation of remote partition
far from the root. The container management tool can manage the
"cpuset.cpus.exclusive" file without impacting the other cpuset
files that are managed by other middlewares. Of course, invalid
"cpuset.cpus.exclusive" values will be rejected and changes to
"cpuset.cpus" can affect the value of "cpuset.cpus.exclusive" due to
the requirement that it has to be a subset of the former control file.
Waiman Long (9):
cgroup/cpuset: Inherit parent's load balance state in v2
cgroup/cpuset: Extract out CS_CPU_EXCLUSIVE & CS_SCHED_LOAD_BALANCE
handling
cgroup/cpuset: Improve temporary cpumasks handling
cgroup/cpuset: Allow suppression of sched domain rebuild in
update_cpumasks_hier()
cgroup/cpuset: Add cpuset.cpus.exclusive for v2
cgroup/cpuset: Introduce remote partition
cgroup/cpuset: Check partition conflict with housekeeping setup
cgroup/cpuset: Documentation update for partition
cgroup/cpuset: Extend test_cpuset_prs.sh to test remote partition
Documentation/admin-guide/cgroup-v2.rst | 100 +-
kernel/cgroup/cpuset.c | 1352 ++++++++++++-----
.../selftests/cgroup/test_cpuset_prs.sh | 398 +++--
3 files changed, 1297 insertions(+), 553 deletions(-)
--
2.31.1
Now the writing operation return the count of writes regardless of whether
events are enabled or disabled. Fix this by just return -EBADF when events
are disabled.
v3 -> v4:
- Change the return value from zero to -EBADF
v2 -> v3:
- Change the return value from -ENOENT to zero
v1 -> v2:
- Change the return value from -EFAULT to -ENOENT
sunliming (3):
tracing/user_events: Fix incorrect return value for writing operation
when events are disabled
selftests/user_events: Enable the event before write_fault test in
ftrace self-test
selftests/user_events: Add test cases when event is disabled
kernel/trace/trace_events_user.c | 3 ++-
tools/testing/selftests/user_events/ftrace_test.c | 8 ++++++++
2 files changed, 10 insertions(+), 1 deletion(-)
--
2.25.1
On systems where netdevsim is built-in or loaded before the test
starts, kci_test_ipsec_offload doesn't remove the netdevsim device it
created during the test.
Fixes: e05b2d141fef ("netdevsim: move netdev creation/destruction to dev probe")
Signed-off-by: Sabrina Dubroca <sd(a)queasysnail.net>
---
tools/testing/selftests/net/rtnetlink.sh | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
index 383ac6fc037d..ba286d680fd9 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -860,6 +860,7 @@ EOF
fi
# clean up any leftovers
+ echo 0 > /sys/bus/netdevsim/del_device
$probed && rmmod netdevsim
if [ $ret -ne 0 ]; then
--
2.40.1
*Changes in v20*
- Correct PAGE_IS_FILE and add PAGE_IS_PFNZERO
*Changes in v19*
- Minor changes and interface updates
*Changes in v18*
- Rebase on top of next-20230613
- Minor updates
*Changes in v17*
- Rebase on top of next-20230606
- Minor improvements in PAGEMAP_SCAN IOCTL patch
*Changes in v16*
- Fix a corner case
- Add exclusive PM_SCAN_OP_WP back
*Changes in v15*
- Build fix (Add missed build fix in RESEND)
*Changes in v14*
- Fix build error caused by #ifdef added at last minute in some configs
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 58 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 560 +++++++
fs/userfaultfd.c | 26 +-
include/linux/hugetlb.h | 1 +
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 54 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 34 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 54 +
tools/testing/selftests/mm/.gitignore | 2 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1464 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
16 files changed, 2329 insertions(+), 24 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
Erdem Aktas wrote:
> On Mon, Jun 12, 2023 at 12:03 PM Dan Williams <dan.j.williams(a)intel.com>
> wrote:
>
> > [ add David, Brijesh, and Atish]
> >
> > Kuppuswamy Sathyanarayanan wrote:
> > > In TDX guest, the second stage of the attestation process is Quote
> > > generation. This process is required to convert the locally generated
> > > TDREPORT into a remotely verifiable Quote. It involves sending the
> > > TDREPORT data to a Quoting Enclave (QE) which will verify the
> > > integrity of the TDREPORT and sign it with an attestation key.
> > >
> > > Intel's TDX attestation driver exposes TDX_CMD_GET_QUOTE IOCTL to
> > > allow the user agent to get the TD Quote.
> > >
> > > Add a kernel selftest module to verify the Quote generation feature.
> > >
> > > TD Quote generation involves following steps:
> > >
> > > * Get the TDREPORT data using TDX_CMD_GET_REPORT IOCTL.
> > > * Embed the TDREPORT data in quote buffer and request for quote
> > > generation via TDX_CMD_GET_QUOTE IOCTL request.
> > > * Upon completion of the GetQuote request, check for non zero value
> > > in the status field of Quote header to make sure the generated
> > > quote is valid.
> >
> > What this cover letter does not say is that this is adding another
> > instance of the similar pattern as SNP_GET_REPORT.
> >
> > Linux is best served when multiple vendors trying to do similar
> > operations are brought together behind a common ABI. We see this in the
> > history of wrangling SCSI vendors behind common interfaces.
>
> Compared to the number of SCSI vendors, I think the number of CPU vendors
> for confidential computing seems manageable to me. Is this really a good
> comparison?
Fair enough, and prompted by this I talk a bit more about the
motiviations and benefits of a Keys abstraction for attestation here:
https://lore.kernel.org/all/64961c3baf8ce_142af829436@dwillia2-xfh.jf.intel…
> > Now multiple
> > confidential computing vendors trying to develop similar flows with
> > differentiated formats where that differentiation need not leak over the
> > ABI boundary.
> >
>
> <Just my personal opinion below>
> I agree with this statement in the high level but it is also somehow
> surprising for me after all the discussion happened around this topic.
> Honestly, I feel like there are multiple versions of "Intel" working in
> different directions.
This proposal was sent while firmly wearing my Linux community hat. I
agree, the timing here is unfortunate.
> If we want multiple vendors trying to do the similar things behind a common
> ABI, it should start with the spec. Since this comment is coming from
> Intel, I wonder if there is any plan to combine the GHCB and GHCI
> interfaces under common ABI in the future or why it did not even happen in
> the first place.
Per above comment about firmly wearing my Linux hat I am coming at this
purely from the perspective of what do we do now as a community that
continues to see these implementations proliferate and grow more
features. Common specs are great, but I agree with you, it is too late
for that, but I hope that as Linux asserts "this is what it should look
like" it starts to influence future IP innovation, and attestation
service providers, to acommodate the kernel's ABI momentum.
> What I see is that Intel has GETQUOTE TDVMCALL interface in its spec and
> again Intel does not really want to provide support for it in linux. It
> feels really frustrating.
I am aware of how frustrating late feedback can be. I am also encouraged
by some of the conversations and investigations that have already
happened around how Keys fits what these attestation solutions need.
> > My observation of SNP_GET_REPORT and TDX_CMD_GET_REPORT is that they are
> > both passing blobs across the user/kernel and platform/kernel boundary
> > for the purposes of unlocking other resources. To me that is a flow that
> > the Keys subsystem has infrastructure to handle. It has the concept of
> > upcalls and asynchronous population of blobs by handles and mechanisms
> > to protect and cache those communications. Linux / the Keys subsystem
> > could benefit from the enhancements it would need to cover these 2
> > cases. Specifically, the benefit that when ARM and RISC-V arrive with
> > similar communications with platform TSMs (Trusted Security Module) they
> > can build upon the same infrastructure.
> >
> > David, am I reaching with that association? My strawman mapping of
> > TDX_CMD_GET_QUOTE to request_key() is something like:
> >
> > request_key(coco_quote, "description", "<uuencoded tdreport>")
> >
> > Where this is a common key_type for all vendors, but the description and
> > arguments have room for vendor differentiation when doing the upcall to
> > the platform TSM, but userspace never needs to contend with the
> > different vendor formats, that is all handled internally to the kernel.
> >
> > I think the problem definition here is not accurate. With AMD SNP, guests
> need to do a hypercall to KVM and KVM needs to issue
> a SNP_GUEST_REQUEST(MSG_REPORT_REQ) to the SP firmware. In TDX, guests
> need to do a TDCALL to TDXMODULE to get the TDREPORT and then it needs to
> get that report delivered to the host userspace to get the TDQUOTE
> generated by the SGX quoting enclave. Also TDQUOTE is designed to work
> async while the SNP_GUEST_REQUESTS are blocking vmcalls.
>
> Those are completely different flows. Are you suggesting that intel should
> also come down to a single call to get the TDQUOTE like AMD SNP?
The Keys subsystem supports async instantiation of key material with
usermode upcalls if necessary. So I do not see a problem supporting
these flows behind a common key type.
> The TDCALL interface asking for the TDREPORT is already there. AMD does not
> need to ask the report and the quote separately.
>
> Here, the problem was that Intel (upstream) did not want to implement
> hypercall for TDQUOTE which would be handled by the user space VMM. The
> alternative implementation (using vsock) does not work for many use cases
> including ours. I do not see how your suggestion addresses the problem that
> this patch was trying to solve.
Perhaps the strawman mockup makes it more clear:
https://lore.kernel.org/all/64961c3baf8ce_142af829436@dwillia2-xfh.jf.intel…
> So while I like the suggested direction, I am not sure how much it is
> possible to come up with a common ABI even with just only for 2 vendors
> (AMD and Intel) without doing spec changes which is a multi year effort
> imho.
I agree, hardware spec changes are out of scope for this effort, but
Keys might require some additional flows to be built up in the kernel
that could be previously handled in userspace. I.e. the "bottom half"
that I reference in the mockup.
This is something we went through with using "encrypted-keys" for
nvdimm. Instead of an ioctl to inject a secret key over the user kernel
boundary a key server need to store a serialized version of the
encrypted key blob and pass that into the kernel.
The restoring of TPIDR2 signal context has been broken since it was
merged, fix this and add a test case covering it. This is a result of
TPIDR2 context management following a different flow to any of the other
state that we provide and the fact that we don't expose TPIDR (which
follows the same pattern) to signals.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Added a feature check for SME to the new test.
- Link to v1: https://lore.kernel.org/r/20230621-arm64-fix-tpidr2-signal-restore-v1-0-b6d…
---
Mark Brown (2):
arm64/signal: Restore TPIDR2 register rather than memory state
kselftest/arm64: Add a test case for TPIDR2 restore
arch/arm64/kernel/signal.c | 2 +-
tools/testing/selftests/arm64/signal/.gitignore | 2 +-
.../arm64/signal/testcases/tpidr2_restore.c | 86 ++++++++++++++++++++++
3 files changed, 88 insertions(+), 2 deletions(-)
---
base-commit: 858fd168a95c5b9669aac8db6c14a9aeab446375
change-id: 20230621-arm64-fix-tpidr2-signal-restore-713d93798f99
Best regards,
--
Mark Brown <broonie(a)kernel.org>
TCP SYN/ACK packets of connections from processes/sockets outside a
cgroup on the same host are not received by the cgroup's installed
cgroup_skb filters.
There were two BPF cgroup_skb programs attached to a cgroup named
"my_cgroup".
SEC("cgroup_skb/ingress")
int ingress(struct __sk_buff *skb)
{
/* .... process skb ... */
return 1;
}
SEC("cgroup_skb/egress")
int egress(struct __sk_buff *skb)
{
/* .... process skb ... */
return 1;
}
We discovered that when running the command "nc -6 -l 8000" in
"my_group" and connecting to it from outside of "my_cgroup" with the
command "nc -6 localhost 8000", the egress filter did not detect the
SYN/ACK packet. However, we did observe the SYN/ACK packet at the
ingress when connecting from a socket in "my_cgroup" to a socket
outside of it.
We came across BPF_CGROUP_RUN_PROG_INET_EGRESS(). This macro is
responsible for calling BPF programs that are attached to the egress
hook of a cgroup and it skips programs if the sending socket is not the
owner of the skb. Specifically, in our situation, the SYN/ACK
skb is owned by a struct request_sock instance, but the sending
socket is the listener socket we use to receive incoming
connections. The request_sock is created to manage an incoming
connection.
It has been determined that checking the owner of a skb against
the sending socket is not required. Removing this check will allow the
filters to receive SYN/ACK packets.
To ensure that cgroup_skb filters can receive all signaling packets,
including SYN, SYN/ACK, ACK, FIN, and FIN/ACK. A new self-test has
been added as well.
Changes from v2:
- Remove redundant blank lines.
Changes from v1:
- Check the number of observed packets instead of just sleeping.
- Use ASSERT_XXX() instead of CHECK()/
[v1] https://lore.kernel.org/all/20230612191641.441774-1-kuifeng@meta.com/
[v2] https://lore.kernel.org/all/20230617052756.640916-2-kuifeng@meta.com/
Kui-Feng Lee (2):
net: bpf: Always call BPF cgroup filters for egress.
selftests/bpf: Verify that the cgroup_skb filters receive expected
packets.
include/linux/bpf-cgroup.h | 2 +-
tools/testing/selftests/bpf/cgroup_helpers.c | 12 +
tools/testing/selftests/bpf/cgroup_helpers.h | 1 +
tools/testing/selftests/bpf/cgroup_tcp_skb.h | 35 ++
.../selftests/bpf/prog_tests/cgroup_tcp_skb.c | 399 ++++++++++++++++++
.../selftests/bpf/progs/cgroup_tcp_skb.c | 382 +++++++++++++++++
6 files changed, 830 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/cgroup_tcp_skb.h
create mode 100644 tools/testing/selftests/bpf/prog_tests/cgroup_tcp_skb.c
create mode 100644 tools/testing/selftests/bpf/progs/cgroup_tcp_skb.c
--
2.34.1
Patch 1-3/9 track and expose some aggregated data counters at the MPTCP
level: the number of retransmissions and the bytes that have been
transferred. The first patch prepares the work by moving where snd_una
is updated for fallback sockets while the last patch adds some tests to
cover the new code.
Patch 4-6/9 introduce a new getsockopt for SOL_MPTCP: MPTCP_FULL_INFO.
This new socket option allows to combine info from MPTCP_INFO,
MPTCP_TCPINFO and MPTCP_SUBFLOW_ADDRS socket options into one. It can be
needed to have all info in one because the path-manager can close and
re-create subflows between getsockopt() and fooling the accounting. The
first patch introduces a unique subflow ID to easily detect when
subflows are being re-created with the same 5-tuple while the last patch
adds some tests to cover the new code.
Please note that patch 5/9 ("mptcp: introduce MPTCP_FULL_INFO getsockopt")
can reveal a bug that were there for a bit of time, see [1]. A fix has
recently been fixed to netdev for the -net tree: "mptcp: ensure listener
is unhashed before updating the sk status", see [2]. There is no
conflicts between the two patches but it might be better to apply this
series after the one for -net and after having merged "net" into
"net-next".
Patch 7/9 is similar to commit 47867f0a7e83 ("selftests: mptcp: join:
skip check if MIB counter not supported") recently applied in the -net
tree but here it adapts the new code that is only in net-next (and it
fixes a merge conflict resolution which didn't have any impact).
Patch 8 and 9/9 are two simple refactoring. One to consolidate the
transition to TCP_CLOSE in mptcp_do_fastclose() and avoid duplicated
code. The other one reduces the scope of an argument passed to
mptcp_pm_alloc_anno_list() function.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/407 [1]
Link: https://lore.kernel.org/netdev/20230620-upstream-net-20230620-misc-fixes-fo… [2]
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Geliang Tang (1):
mptcp: pass addr to mptcp_pm_alloc_anno_list
Matthieu Baerts (1):
selftests: mptcp: join: skip check if MIB counter not supported (part 2)
Paolo Abeni (7):
mptcp: move snd_una update earlier for fallback socket
mptcp: track some aggregate data counters
selftests: mptcp: explicitly tests aggregate counters
mptcp: add subflow unique id
mptcp: introduce MPTCP_FULL_INFO getsockopt
selftests: mptcp: add MPTCP_FULL_INFO testcase
mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()
include/uapi/linux/mptcp.h | 29 +++++
net/mptcp/options.c | 14 +-
net/mptcp/pm_netlink.c | 8 +-
net/mptcp/pm_userspace.c | 2 +-
net/mptcp/protocol.c | 31 +++--
net/mptcp/protocol.h | 11 +-
net/mptcp/sockopt.c | 152 +++++++++++++++++++++-
net/mptcp/subflow.c | 2 +
tools/testing/selftests/net/mptcp/mptcp_join.sh | 33 ++---
tools/testing/selftests/net/mptcp/mptcp_sockopt.c | 120 ++++++++++++++++-
10 files changed, 356 insertions(+), 46 deletions(-)
---
base-commit: 712557f210723101717570844c95ac0913af74d7
change-id: 20230620-upstream-net-next-20230620-mptcp-expose-more-info-and-misc-6b4a3a415ec5
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
*Changes in v19*
- Minor changes and interface updates
*Changes in v18*
- Rebase on top of next-20230613
- Minor updates
*Changes in v17*
- Rebase on top of next-20230606
- Minor improvements in PAGEMAP_SCAN IOCTL patch
*Changes in v16*
- Fix a corner case
- Add exclusive PM_SCAN_OP_WP back
*Changes in v15*
- Build fix (Add missed build fix in RESEND)
*Changes in v14*
- Fix build error caused by #ifdef added at last minute in some configs
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 58 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 526 +++++++
fs/userfaultfd.c | 26 +-
include/linux/hugetlb.h | 1 +
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 53 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 34 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 53 +
tools/testing/selftests/mm/.gitignore | 2 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1458 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
16 files changed, 2287 insertions(+), 24 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
This patch introduces a specific test case for the EVIOCGLED ioctl.
The test covers the case where len > maxlen in the
EVIOCGLED(sizeof(all_leds)), all_leds) ioctl.
Signed-off-by: Dana Elfassy <dangel101(a)gmail.com>
---
Changes in v2:
- Changed variable leds from an array to an int
This patch depends on '[v3] selftests/input: Introduce basic tests for evdev ioctls' [1] sent to the ML.
[1] https://patchwork.kernel.org/project/linux-input/patch/20230607153214.15933…
tools/testing/selftests/input/evioc-test.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/tools/testing/selftests/input/evioc-test.c b/tools/testing/selftests/input/evioc-test.c
index ad7b93fe39cf..378db2b4dd56 100644
--- a/tools/testing/selftests/input/evioc-test.c
+++ b/tools/testing/selftests/input/evioc-test.c
@@ -234,4 +234,21 @@ TEST(eviocsrep_set_repeat_settings)
selftest_uinput_destroy(uidev);
}
+TEST(eviocgled_get_all_leds)
+{
+ struct selftest_uinput *uidev;
+ int leds = 0;
+ int rc;
+
+ rc = selftest_uinput_create_device(&uidev, -1);
+ ASSERT_EQ(0, rc);
+ ASSERT_NE(NULL, uidev);
+
+ /* ioctl to set the maxlen = 0 */
+ rc = ioctl(uidev->evdev_fd, EVIOCGLED(0), leds);
+ ASSERT_EQ(0, rc);
+
+ selftest_uinput_destroy(uidev);
+}
+
TEST_HARNESS_MAIN
--
2.41.0
This patch introduces a specific test case for the EVIOCGKEY ioctl.
The test covers the case where len > maxlen in the
EVIOCGKEY(sizeof(keystate)), keystate) ioctl.
Signed-off-by: Dana Elfassy <dangel101(a)gmail.com>
---
Changes in v3:
- Edited commit's subject and description
- Renamed variable rep_values to keystate
- Added argument to selftest_uinput_create_device()
- Removed memset
Changes in v2:
- Added following note about the patch's dependency
This patch depends on '[v3] selftests/input: Introduce basic tests for evdev ioctls' [1] sent to the ML.
[1] https://patchwork.kernel.org/project/linux-input/patch/20230607153214.15933…
tools/testing/selftests/input/evioc-test.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/tools/testing/selftests/input/evioc-test.c b/tools/testing/selftests/input/evioc-test.c
index ad7b93fe39cf..e0f69459f504 100644
--- a/tools/testing/selftests/input/evioc-test.c
+++ b/tools/testing/selftests/input/evioc-test.c
@@ -234,4 +234,21 @@ TEST(eviocsrep_set_repeat_settings)
selftest_uinput_destroy(uidev);
}
+TEST(eviocgkey_get_global_key_state)
+{
+ struct selftest_uinput *uidev;
+ int keystate = 0;
+ int rc;
+
+ rc = selftest_uinput_create_device(&uidev, -1);
+ ASSERT_EQ(0, rc);
+ ASSERT_NE(NULL, uidev);
+
+ /* ioctl to create the scenario where len > maxlen in bits_to_user() */
+ rc = ioctl(uidev->evdev_fd, EVIOCGKEY(0), keystate);
+ ASSERT_EQ(0, rc);
+
+ selftest_uinput_destroy(uidev);
+}
+
TEST_HARNESS_MAIN
--
2.41.0
This patch introduces a specific test case for the EVIOCGLED ioctl.
The test covers the case where len > maxlen in the
EVIOCGLED(sizeof(all_leds)), all_leds) ioctl.
Signed-off-by: Dana Elfassy <dangel101(a)gmail.com>
---
This patch depends on '[v3] selftests/input: Introduce basic tests for evdev ioctls' [1] sent to the ML.
[1] https://patchwork.kernel.org/project/linux-input/patch/20230607153214.15933…
tools/testing/selftests/input/evioc-test.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/tools/testing/selftests/input/evioc-test.c b/tools/testing/selftests/input/evioc-test.c
index ad7b93fe39cf..2bf1b32ae01a 100644
--- a/tools/testing/selftests/input/evioc-test.c
+++ b/tools/testing/selftests/input/evioc-test.c
@@ -234,4 +234,21 @@ TEST(eviocsrep_set_repeat_settings)
selftest_uinput_destroy(uidev);
}
+TEST(eviocgled_get_all_leds)
+{
+ struct selftest_uinput *uidev;
+ int leds[2];
+ int rc;
+
+ rc = selftest_uinput_create_device(&uidev, -1);
+ ASSERT_EQ(0, rc);
+ ASSERT_NE(NULL, uidev);
+
+ /* ioctl to set the maxlen = 0 */
+ rc = ioctl(uidev->evdev_fd, EVIOCGLED(0), leds);
+ ASSERT_EQ(0, rc);
+
+ selftest_uinput_destroy(uidev);
+}
+
TEST_HARNESS_MAIN
--
2.41.0
The restoring of TPIDR2 signal context has been broken since it was
merged, fix this and add a test case covering it. This is a result of
TPIDR2 context management following a different flow to any of the other
state that we provide and the fact that we don't expose TPIDR (which
follows the same pattern) to signals.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (2):
arm64/signal: Restore TPIDR2 register rather than memory state
kselftest/arm64: Add a test case for TPIDR2 restore
arch/arm64/kernel/signal.c | 2 +-
tools/testing/selftests/arm64/signal/.gitignore | 2 +-
.../arm64/signal/testcases/tpidr2_restore.c | 85 ++++++++++++++++++++++
3 files changed, 87 insertions(+), 2 deletions(-)
---
base-commit: 858fd168a95c5b9669aac8db6c14a9aeab446375
change-id: 20230621-arm64-fix-tpidr2-signal-restore-713d93798f99
Best regards,
--
Mark Brown <broonie(a)kernel.org>
In order to cover this case, setting 'maxlen = 0', with the following
explanation:
EVIOCGKEY is executed from evdev_do_ioctl(), which is called from
evdev_ioctl_handler().
evdev_ioctl_handler() is called from 2 functions, where by code coverage,
only the first one is in use.
‘compat’ is given the value ‘0’ [1].
Thus, the condition [2] is always false.
This means ‘len’ always equals a positive number [3]
‘maxlen’ in evdev_handle_get_val [4] is defined locally in
evdev_do_ioctl() [5], and is sent in the variable 'size' [6]
[1] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L1281
[2] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L705
[3] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L707
[4] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L886
[5] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L1155
[6] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L1141
Signed-off-by: Dana Elfassy <dangel101(a)gmail.com>
---
Changes in v2:
- Added following note about the patch's dependency
This patch depends on '[v3] selftests/input: Introduce basic tests for evdev ioctls' [1] sent to the ML.
[1] https://patchwork.kernel.org/project/linux-input/patch/20230607153214.15933…
tools/testing/selftests/input/evioc-test.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/tools/testing/selftests/input/evioc-test.c b/tools/testing/selftests/input/evioc-test.c
index ad7b93fe39cf..b94de2ee5596 100644
--- a/tools/testing/selftests/input/evioc-test.c
+++ b/tools/testing/selftests/input/evioc-test.c
@@ -234,4 +234,23 @@ TEST(eviocsrep_set_repeat_settings)
selftest_uinput_destroy(uidev);
}
+TEST(eviocgkey_get_global_key_state)
+{
+ struct selftest_uinput *uidev;
+ int rep_values[2];
+ int rc;
+
+ memset(rep_values, 0, sizeof(rep_values));
+
+ rc = selftest_uinput_create_device(&uidev);
+ ASSERT_EQ(0, rc);
+ ASSERT_NE(NULL, uidev);
+
+ /* ioctl to create the scenario where len > maxlen in bits_to_user() */
+ rc = ioctl(uidev->evdev_fd, EVIOCGKEY(0), rep_values);
+ ASSERT_EQ(0, rc);
+
+ selftest_uinput_destroy(uidev);
+}
+
TEST_HARNESS_MAIN
--
2.41.0
From: Danielle Ratson <danieller(a)nvidia.com>
When mirroring to a gretap in hardware the device expects to be
programmed with the egress port and all the encapsulating headers. This
requires the driver to resolve the path the packet will take in the
software data path and program the device accordingly.
If the path cannot be resolved (in this case because of an unresolved
neighbor), then mirror installation fails until the path is resolved.
This results in a race that causes the test to sometimes fail.
Fix this by setting the neighbor's state to permanent in a couple of
tests, so that it is always valid.
Fixes: 35c31d5c323f ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1d")
Fixes: 239e754af854 ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1q")
Signed-off-by: Danielle Ratson <danieller(a)nvidia.com>
Reviewed-by: Petr Machata <petrm(a)nvidia.com>
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
---
.../testing/selftests/net/forwarding/mirror_gre_bridge_1d.sh | 4 ++++
.../testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh | 4 ++++
2 files changed, 8 insertions(+)
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1d.sh b/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1d.sh
index c5095da7f6bf..aec752a22e9e 100755
--- a/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1d.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1d.sh
@@ -93,12 +93,16 @@ cleanup()
test_gretap()
{
+ ip neigh replace 192.0.2.130 lladdr $(mac_get $h3) \
+ nud permanent dev br2
full_test_span_gre_dir gt4 ingress 8 0 "mirror to gretap"
full_test_span_gre_dir gt4 egress 0 8 "mirror to gretap"
}
test_ip6gretap()
{
+ ip neigh replace 2001:db8:2::2 lladdr $(mac_get $h3) \
+ nud permanent dev br2
full_test_span_gre_dir gt6 ingress 8 0 "mirror to ip6gretap"
full_test_span_gre_dir gt6 egress 0 8 "mirror to ip6gretap"
}
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh b/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh
index 9ff22f28032d..0cf4c47a46f9 100755
--- a/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh
@@ -90,12 +90,16 @@ cleanup()
test_gretap()
{
+ ip neigh replace 192.0.2.130 lladdr $(mac_get $h3) \
+ nud permanent dev br1
full_test_span_gre_dir gt4 ingress 8 0 "mirror to gretap"
full_test_span_gre_dir gt4 egress 0 8 "mirror to gretap"
}
test_ip6gretap()
{
+ ip neigh replace 2001:db8:2::2 lladdr $(mac_get $h3) \
+ nud permanent dev br1
full_test_span_gre_dir gt6 ingress 8 0 "mirror to ip6gretap"
full_test_span_gre_dir gt6 egress 0 8 "mirror to ip6gretap"
}
--
2.40.1
When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't
respected. This patchset fixes this by regarding the incoming device's
VRF attachment when performing the socket lookups from tc/xdp.
The first two patches are coding changes which factor out the tc helper's
logic which was shared with cg/sk_skb (which operate correctly).
This refactoring is needed in order to avoid affecting the cgroup/sk_skb
flows as there does not seem to be a strict criteria for discerning which
flow the helper is called from based on the net device or packet
information.
The third patch contains the actual bugfix.
The fourth patch adds bpf tests for these lookup functions.
---
v6: - Remove redundant IS_ENABLED as suggested by Daniel Borkmann
- Declare net_device variable and use it as suggested by Daniel Borkmann
v5: Use reverse xmas tree indentation
v4: - Move dev_sdif() to include/linux/netdevice.h as suggested by Stanislav Fomichev
- Remove SYS and SYS_NOFAIL duplicate definitions
v3: - Rename bpf_l2_sdif() to dev_sdif() as suggested by Stanislav Fomichev
- Added xdp tests as suggested by Daniel Borkmann
- Use start_server() to avoid duplicate code as suggested by Stanislav Fomichev
v2: Fixed uninitialized var in test patch (4).
Gilad Sever (4):
bpf: factor out socket lookup functions for the TC hookpoint.
bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC
hookpoint
bpf: fix bpf socket lookup from tc/xdp to respect socket VRF bindings
selftests/bpf: Add vrf_socket_lookup tests
include/linux/netdevice.h | 9 +
net/core/filter.c | 141 ++++++--
.../bpf/prog_tests/vrf_socket_lookup.c | 312 ++++++++++++++++++
.../selftests/bpf/progs/vrf_socket_lookup.c | 88 +++++
4 files changed, 526 insertions(+), 24 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/vrf_socket_lookup.c
create mode 100644 tools/testing/selftests/bpf/progs/vrf_socket_lookup.c
--
2.34.1
The mlxsw driver currently makes the assumption that the user applies
configuration in a bottom-up manner. Thus netdevices need to be added to
the bridge before IP addresses are configured on that bridge or SVI added
on top of it. Enslaving a netdevice to another netdevice that already has
uppers is in fact forbidden by mlxsw for this reason. Despite this safety,
it is rather easy to get into situations where the offloaded configuration
is just plain wrong.
Over the course of the following several patchsets, mlxsw code is going to
be adjusted to diminish the space of wrongly offloaded configurations.
Ideally the offload state will reflect the actual state, regardless of the
sequence of operation used to construct that state.
Several selftests build configurations that will not be offloadable in the
future on some systems. The reason is that what will get offloaded is the
actual configuration, not the configuration steps.
For example, when a port is added to a bridge that has an IP address, that
bridge will get a RIF, which it would not have with the current code. But
on Nvidia Spectrum-1 machines, MAC addresses of all RIFs need to have the
same prefix, which the bridge will violate. The RIF thus couldn't be
created, and the enslavement is therefore canceled, because it would lead
to an unoffloadable configuration. This breaks some selftests.
In this patchset, adjust selftests to avoid the configurations that mlxsw
would be incapable of offloading, while maintaining relevance with regards
to the feature that is being tested. There are generally two cases of
fixes:
- Disabling IPv6 autogen on bridges that do not participate in routing,
either because of the abovementioned requirement to keep the same MAC
prefix on all in-HW router interfaces, or, on 802.1ad bridges, because
in-HW router interfaces are not supported at all.
- Setting the bridge MAC address to what it will become after the first
member port is attached, so that the in-HW router interface is created
with a supported MAC address.
The patchset is then split thus:
- Patches #1-#7 adjust generic selftests
- Patches #8-#16 adjust mlxsw-specific selftests
Petr Machata (16):
selftests: forwarding: q_in_vni: Disable IPv6 autogen on bridges
selftests: forwarding: dual_vxlan_bridge: Disable IPv6 autogen on
bridges
selftests: forwarding: skbedit_priority: Disable IPv6 autogen on a
bridge
selftests: forwarding: pedit_dsfield: Disable IPv6 autogen on a bridge
selftests: forwarding: mirror_gre_*: Disable IPv6 autogen on bridges
selftests: forwarding: mirror_gre_*: Use port MAC for bridge address
selftests: forwarding: router_bridge: Use port MAC for bridge address
selftests: mlxsw: q_in_q_veto: Disable IPv6 autogen on bridges
selftests: mlxsw: extack: Disable IPv6 autogen on bridges
selftests: mlxsw: mirror_gre_scale: Disable IPv6 autogen on a bridge
selftests: mlxsw: qos_dscp_bridge: Disable IPv6 autogen on a bridge
selftests: mlxsw: qos_ets_strict: Disable IPv6 autogen on bridges
selftests: mlxsw: qos_mc_aware: Disable IPv6 autogen on bridges
selftests: mlxsw: spectrum: q_in_vni_veto: Disable IPv6 autogen on a
bridge
selftests: mlxsw: vxlan: Disable IPv6 autogen on bridges
selftests: mlxsw: one_armed_router: Use port MAC for bridge address
.../selftests/drivers/net/mlxsw/extack.sh | 24 ++++++++---
.../drivers/net/mlxsw/mirror_gre_scale.sh | 1 +
.../drivers/net/mlxsw/one_armed_router.sh | 3 +-
.../drivers/net/mlxsw/q_in_q_veto.sh | 8 ++++
.../drivers/net/mlxsw/qos_dscp_bridge.sh | 1 +
.../drivers/net/mlxsw/qos_ets_strict.sh | 8 +++-
.../drivers/net/mlxsw/qos_mc_aware.sh | 2 +
.../net/mlxsw/spectrum/q_in_vni_veto.sh | 1 +
.../selftests/drivers/net/mlxsw/vxlan.sh | 41 ++++++++++++++-----
.../net/forwarding/dual_vxlan_bridge.sh | 1 +
.../net/forwarding/mirror_gre_bound.sh | 1 +
.../net/forwarding/mirror_gre_bridge_1d.sh | 3 +-
.../forwarding/mirror_gre_bridge_1d_vlan.sh | 3 +-
.../forwarding/mirror_gre_bridge_1q_lag.sh | 3 +-
.../net/forwarding/mirror_topo_lib.sh | 1 +
.../selftests/net/forwarding/pedit_dsfield.sh | 4 +-
.../selftests/net/forwarding/q_in_vni.sh | 1 +
.../selftests/net/forwarding/router_bridge.sh | 3 +-
.../net/forwarding/skbedit_priority.sh | 4 +-
19 files changed, 88 insertions(+), 25 deletions(-)
--
2.40.1
If we get an unexpected signal during a signal test log a bit more data to
aid diagnostics.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/signal/test_signals_utils.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.c b/tools/testing/selftests/arm64/signal/test_signals_utils.c
index 40be8443949d..0dc948db3a4a 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.c
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.c
@@ -249,7 +249,8 @@ static void default_handler(int signum, siginfo_t *si, void *uc)
fprintf(stderr, "-- Timeout !\n");
} else {
fprintf(stderr,
- "-- RX UNEXPECTED SIGNAL: %d\n", signum);
+ "-- RX UNEXPECTED SIGNAL: %d code %d address %p\n",
+ signum, si->si_code, si->si_addr);
}
default_result(current, 1);
}
---
base-commit: 44c026a73be8038f03dbdeef028b642880cf1511
change-id: 20230620-arm64-selftest-log-wrong-signal-cd8c34ae5e4f
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This series adds 2 zswap related selftests that verify known and fixed
issues. A new dedicated test program (test_zswap) is proposed since
the test cases are specific to zswap and hosts specific helpers.
The first patch adds the (empty) test program, while the other 2 add an
actual test function each.
Domenico Cerasuolo (3):
selftests: cgroup: add test_zswap program
selftests: cgroup: add test_zswap with no kmem bypass test
selftests: cgroup: add zswap-memcg unwanted writeback test
tools/testing/selftests/cgroup/.gitignore | 1 +
tools/testing/selftests/cgroup/Makefile | 2 +
tools/testing/selftests/cgroup/test_zswap.c | 286 ++++++++++++++++++++
3 files changed, 289 insertions(+)
create mode 100644 tools/testing/selftests/cgroup/test_zswap.c
--
2.34.1
We want to replace iptables TPROXY with a BPF program at TC ingress.
To make this work in all cases we need to assign a SO_REUSEPORT socket
to an skb, which is currently prohibited. This series adds support for
such sockets to bpf_sk_assing. See patch 5 for details.
I did some refactoring to cut down on the amount of duplicate code. The
key to this is to use INDIRECT_CALL in the reuseport helpers. To show
that this approach is not just beneficial to TC sk_assign I removed
duplicate code for bpf_sk_lookup as well.
Changes from v1:
- Correct commit abbrev length (Kuniyuki)
- Reduce duplication (Kuniyuki)
- Add checks on sk_state (Martin)
- Split exporting inet[6]_lookup_reuseport into separate patch (Eric)
Joint work with Daniel Borkmann.
Signed-off-by: Lorenz Bauer <lmb(a)isovalent.com>
---
Daniel Borkmann (1):
selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper
Lorenz Bauer (5):
net: export inet_lookup_reuseport and inet6_lookup_reuseport
net: document inet[6]_lookup_reuseport sk_state requirements
net: remove duplicate reuseport_lookup functions
net: remove duplicate sk_lookup helpers
bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
include/net/inet6_hashtables.h | 84 ++++++++-
include/net/inet_hashtables.h | 77 +++++++-
include/net/sock.h | 7 +-
include/uapi/linux/bpf.h | 3 -
net/core/filter.c | 2 -
net/ipv4/inet_hashtables.c | 69 +++++---
net/ipv4/udp.c | 73 +++-----
net/ipv6/inet6_hashtables.c | 71 +++++---
net/ipv6/udp.c | 85 +++------
tools/include/uapi/linux/bpf.h | 3 -
tools/testing/selftests/bpf/network_helpers.c | 3 +
.../selftests/bpf/prog_tests/assign_reuse.c | 197 +++++++++++++++++++++
.../selftests/bpf/progs/test_assign_reuse.c | 142 +++++++++++++++
13 files changed, 637 insertions(+), 179 deletions(-)
---
base-commit: 25085b4e9251c77758964a8e8651338972353642
change-id: 20230613-so-reuseport-e92c526173ee
Best regards,
--
Lorenz Bauer <lmb(a)isovalent.com>
*Changes in v18*
- Rebase on top of next-20230613
- Minor updates
*Changes in v17*
- Rebase on top of next-20230606
- Minor improvements in PAGEMAP_SCAN IOCTL patch
*Changes in v16*
- Fix a corner case
- Add exclusive PM_SCAN_OP_WP back
*Changes in v15*
- Build fix (Add missed build fix in RESEND)
*Changes in v14*
- Fix build error caused by #ifdef added at last minute in some configs
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 58 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 513 ++++++
fs/userfaultfd.c | 26 +-
include/linux/hugetlb.h | 1 +
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 53 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 34 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 53 +
tools/testing/selftests/mm/.gitignore | 2 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1459 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
16 files changed, 2275 insertions(+), 24 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
This patchset is based on the next branch of shuah/linux-kselftest.git
Tiezhu Yang (2):
selftests/vDSO: Add support for LoongArch
selftests/vDSO: Get version and name for all archs
tools/testing/selftests/vDSO/vdso_config.h | 6 ++++-
tools/testing/selftests/vDSO/vdso_test_getcpu.c | 16 +++++--------
.../selftests/vDSO/vdso_test_gettimeofday.c | 26 ++++++----------------
3 files changed, 18 insertions(+), 30 deletions(-)
--
2.1.0
When execute the following command to test clone3 on LoongArch:
# cd tools/testing/selftests/clone3 && make && ./clone3
we can see the following error info:
# [5719] Trying clone3() with flags 0x80 (size 0)
# Invalid argument - Failed to create new process
# [5719] clone3() with flags says: -22 expected 0
not ok 18 [5719] Result (-22) is different than expected (0)
This is because if CONFIG_TIME_NS is not set, but the flag
CLONE_NEWTIME (0x80) is used to clone a time namespace, it
will return -EINVAL in copy_time_ns().
If kernel does not support CONFIG_TIME_NS, /proc/self/ns/time
will be not exist, and then we should skip clone3() test with
CLONE_NEWTIME.
With this patch under !CONFIG_TIME_NS:
# cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
# Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0
Fixes: 515bddf0ec41 ("selftests/clone3: test clone3 with CLONE_NEWTIME")
Suggested-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn>
---
v5:
-- Rebase on the next branch of shuah/linux-kselftest.git
to avoid potential merge conflicts due to changes in the link:
https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git/c…
-- Update the commit message and send it as a single patch
Here is the v4 patch:
https://lore.kernel.org/loongarch/1685968410-5412-2-git-send-email-yangtiez…
tools/testing/selftests/clone3/clone3.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index e60cf4d..1c61e3c 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -196,7 +196,12 @@ int main(int argc, char *argv[])
CLONE3_ARGS_NO_TEST);
/* Do a clone3() in a new time namespace */
- test_clone3(CLONE_NEWTIME, 0, 0, CLONE3_ARGS_NO_TEST);
+ if (access("/proc/self/ns/time", F_OK) == 0) {
+ test_clone3(CLONE_NEWTIME, 0, 0, CLONE3_ARGS_NO_TEST);
+ } else {
+ ksft_print_msg("Time namespaces are not supported\n");
+ ksft_test_result_skip("Skipping clone3() with CLONE_NEWTIME\n");
+ }
/* Do a clone3() with exit signal (SIGCHLD) in flags */
test_clone3(SIGCHLD, 0, -EINVAL, CLONE3_ARGS_NO_TEST);
--
2.1.0
Hello,
This patchset builds upon a soon-to-be-published WIP patchset that Sean
published at https://github.com/sean-jc/linux/tree/x86/kvm_gmem_solo, mentioned
at [1].
The tree can be found at:
https://github.com/googleprodkernel/linux-cc/tree/gmem-hugetlb-rfc-v1
In this patchset, hugetlb support for KVM's guest_mem (aka gmem) is introduced,
allowing VM private memory (for confidential computing) to be backed by hugetlb
pages.
guest_mem provides userspace with a handle, with which userspace can allocate
and deallocate memory for confidential VMs without mapping the memory into
userspace.
Why use hugetlb instead of introducing a new allocator, like gmem does for 4K
and transparent hugepages?
+ hugetlb provides the following useful functionality, which would otherwise
have to be reimplemented:
+ Allocation of hugetlb pages at boot time, including
+ Parsing of kernel boot parameters to configure hugetlb
+ Tracking of usage in hstate
+ gmem will share the same system-wide pool of hugetlb pages, so users
don't have to have separate pools for hugetlb and gmem
+ Page accounting with subpools
+ hugetlb pages are tracked in subpools, which gmem uses to reserve
pages from the global hstate
+ Memory charging
+ hugetlb provides code that charges memory to cgroups
+ Reporting: hugetlb usage and availability are available at /proc/meminfo,
etc
The first 11 patches in this patchset is a series of refactoring to decouple
hugetlb and hugetlbfs.
The central thread binding the refactoring is that some functions (like
inode_resv_map(), inode_subpool(), inode_hstate(), etc) rely on a hugetlbfs
concept, that the resv_map, subpool, hstate, are in a specific field in a
hugetlb inode.
Refactoring to parametrize functions by hstate, subpool, resv_map will allow
hugetlb to be used by gmem and in other places where these data structures
aren't necessarily stored in the same positions in the inode.
The refactoring proposed here is just the minimum required to get a
proof-of-concept working with gmem. I would like to get opinions on this
approach before doing further refactoring. (See TODOs)
TODOs:
+ hugetlb/hugetlbfs refactoring
+ remove_inode_hugepages() no longer needs to be exposed, it is hugetlbfs
specific and used only in inode.c
+ remove_mapping_hugepages(), remove_inode_single_folio(),
hugetlb_unreserve_pages() shouldn't need to take inode as a parameter
+ Updating inode->i_blocks can be refactored to a separate function and
called from hugetlbfs and gmem
+ alloc_hugetlb_folio_from_subpool() shouldn't need to be parametrized by
vma
+ hugetlb_reserve_pages() should be refactored to be symmetric with
hugetlb_unreserve_pages()
+ It should be parametrized by resv_map
+ alloc_hugetlb_folio_from_subpool() could perhaps use
hugetlb_reserve_pages()?
+ gmem
+ Figure out if resv_map should be used by gmem at all
+ Probably needs more refactoring to decouple resv_map from hugetlb
functions
Questions for the community:
1. In this patchset, every gmem file backed with hugetlb is given a new
subpool. Is that desirable?
+ In hugetlbfs, a subpool always belongs to a mount, and hugetlbfs has one
mount per hugetlb size (2M, 1G, etc)
+ memfd_create(MFD_HUGETLB) effectively returns a full hugetlbfs file, so it
(rightfully) uses the hugetlbfs kernel mounts and their subpools
+ I gave each file a subpool mostly to speed up implementation and still be
able to reserve hugetlb pages from the global hstate based on the gmem
file size.
+ gmem, unlike hugetlbfs, isn't meant to be a full filesystem, so
+ Should there be multiple mounts, one for each hugetlb size?
+ Will the mounts be initialized on boot or on first gmem file creation?
+ Or is one subpool per gmem file fine?
2. Should resv_map be used for gmem at all, since gmem doesn't allow userspace
reservations?
[1] https://lore.kernel.org/lkml/ZEM5Zq8oo+xnApW9@google.com/
---
Ackerley Tng (19):
mm: hugetlb: Expose get_hstate_idx()
mm: hugetlb: Move and expose hugetlbfs_zero_partial_page
mm: hugetlb: Expose remove_inode_hugepages
mm: hugetlb: Decouple hstate, subpool from inode
mm: hugetlb: Allow alloc_hugetlb_folio() to be parametrized by subpool
and hstate
mm: hugetlb: Provide hugetlb_filemap_add_folio()
mm: hugetlb: Refactor vma_*_reservation functions
mm: hugetlb: Refactor restore_reserve_on_error
mm: hugetlb: Use restore_reserve_on_error directly in filesystems
mm: hugetlb: Parametrize alloc_hugetlb_folio_from_subpool() by
resv_map
mm: hugetlb: Parametrize hugetlb functions by resv_map
mm: truncate: Expose preparation steps for truncate_inode_pages_final
KVM: guest_mem: Refactor kvm_gmem fd creation to be in layers
KVM: guest_mem: Refactor cleanup to separate inode and file cleanup
KVM: guest_mem: hugetlb: initialization and cleanup
KVM: guest_mem: hugetlb: allocate and truncate from hugetlb
KVM: selftests: Add basic selftests for hugetlbfs-backed guest_mem
KVM: selftests: Support various types of backing sources for private
memory
KVM: selftests: Update test for various private memory backing source
types
fs/hugetlbfs/inode.c | 102 ++--
include/linux/hugetlb.h | 86 ++-
include/linux/mm.h | 1 +
include/uapi/linux/kvm.h | 25 +
mm/hugetlb.c | 324 +++++++-----
mm/truncate.c | 24 +-
.../testing/selftests/kvm/guest_memfd_test.c | 33 +-
.../testing/selftests/kvm/include/test_util.h | 14 +
tools/testing/selftests/kvm/lib/test_util.c | 74 +++
.../kvm/x86_64/private_mem_conversions_test.c | 38 +-
virt/kvm/guest_mem.c | 488 ++++++++++++++----
11 files changed, 882 insertions(+), 327 deletions(-)
--
2.41.0.rc0.172.g3f132b7071-goog
KVM_GET_REG_LIST will dump all register IDs that are available to
KVM_GET/SET_ONE_REG and It's very useful to identify some platform
regression issue during VM migration.
Patch 1-7 re-structured the get-reg-list test in aarch64 to make some
of the code as common test framework that can be shared by riscv.
Patch 8 enabled the KVM_GET_REG_LIST API in riscv and patch 9-10 added
the corresponding kselftest for checking possible register regressions.
The get-reg-list kvm selftest was ported from aarch64 and tested with
Linux 6.4-rc5 on a Qemu riscv64 virt machine.
---
Changed since v2:
* Rebase to Linux 6.4-rc5
* Filter out ZICBO* config and ISA_EXT registers report if the
extensions were not supported in host
* Enable AIA CSR test
* Move vCPU extension check_supported() to finalize_vcpu() per
Andrew's suggestion
* Switch to use KVM_REG_SIZE_ULONG for most registers' definition
---
Changed since v1:
* rebase to Andrew's changes
* fix coding style
Andrew Jones (7):
KVM: arm64: selftests: Replace str_with_index with strdup_printf
KVM: arm64: selftests: Drop SVE cap check in print_reg
KVM: arm64: selftests: Remove print_reg's dependency on vcpu_config
KVM: arm64: selftests: Rename vcpu_config and add to kvm_util.h
KVM: arm64: selftests: Delete core_reg_fixup
KVM: arm64: selftests: Split get-reg-list test code
KVM: arm64: selftests: Finish generalizing get-reg-list
Haibo Xu (3):
KVM: riscv: Add KVM_GET_REG_LIST API support
KVM: riscv: selftests: Skip some registers set operation
KVM: riscv: selftests: Add get-reg-list test
Documentation/virt/kvm/api.rst | 2 +-
arch/riscv/kvm/vcpu.c | 378 +++++++++++
tools/testing/selftests/kvm/Makefile | 11 +-
.../selftests/kvm/aarch64/get-reg-list.c | 540 ++--------------
tools/testing/selftests/kvm/get-reg-list.c | 421 ++++++++++++
.../selftests/kvm/include/kvm_util_base.h | 16 +
.../selftests/kvm/include/riscv/processor.h | 3 +
.../testing/selftests/kvm/include/test_util.h | 2 +
tools/testing/selftests/kvm/lib/test_util.c | 15 +
.../selftests/kvm/riscv/get-reg-list.c | 611 ++++++++++++++++++
10 files changed, 1499 insertions(+), 500 deletions(-)
create mode 100644 tools/testing/selftests/kvm/get-reg-list.c
create mode 100644 tools/testing/selftests/kvm/riscv/get-reg-list.c
--
2.34.1
When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't
respected. This patchset fixes this by regarding the incoming device's
VRF attachment when performing the socket lookups from tc/xdp.
The first two patches are coding changes which factor out the tc helper's
logic which was shared with cg/sk_skb (which operate correctly).
This refactoring is needed in order to avoid affecting the cgroup/sk_skb
flows as there does not seem to be a strict criteria for discerning which
flow the helper is called from based on the net device or packet
information.
The third patch contains the actual bugfix.
The fourth patch adds bpf tests for these lookup functions.
---
v5: Use reverse xmas tree indentation
v4: - Move dev_sdif() to include/linux/netdevice.h as suggested by Stanislav Fomichev
- Remove SYS and SYS_NOFAIL duplicate definitions
v3: - Rename bpf_l2_sdif() to dev_sdif() as suggested by Stanislav Fomichev
- Added xdp tests as suggested by Daniel Borkmann
- Use start_server() to avoid duplicate code as suggested by Stanislav Fomichev
v2: Fixed uninitialized var in test patch (4).
Gilad Sever (4):
bpf: factor out socket lookup functions for the TC hookpoint.
bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC
hookpoint
bpf: fix bpf socket lookup from tc/xdp to respect socket VRF bindings
selftests/bpf: Add vrf_socket_lookup tests
include/linux/netdevice.h | 9 +
net/core/filter.c | 123 +++++--
.../bpf/prog_tests/vrf_socket_lookup.c | 312 ++++++++++++++++++
.../selftests/bpf/progs/vrf_socket_lookup.c | 88 +++++
4 files changed, 511 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/vrf_socket_lookup.c
create mode 100644 tools/testing/selftests/bpf/progs/vrf_socket_lookup.c
--
2.34.1
PTP_SYS_OFFSET_EXTENDED was added in November 2018 in
361800876f80 (" ptp: add PTP_SYS_OFFSET_EXTENDED ioctl")
and PTP_SYS_OFFSET_PRECISE was added in February 2016 in
719f1aa4a671 ("ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping")
The PTP selftest code is lacking support for these two IOCTLS.
This short series of patches adds support for them.
Alex Maftei (2):
selftests/ptp: Add -x option for testing PTP_SYS_OFFSET_EXTENDED
selftests/ptp: Add -X option for testing PTP_SYS_OFFSET_PRECISE
tools/testing/selftests/ptp/testptp.c | 71 ++++++++++++++++++++++++++-
1 file changed, 69 insertions(+), 2 deletions(-)
--
2.28.0
Now the writing operation return the count of writes whether events are
enabled or disabled. Fix this by just return -ENOENT when events are disabled.
v1 -> v2:
- Change the returh vale from -EFAULT to -ENOENT
sunliming (3):
tracing/user_events: Fix incorrect return value for writing operation
when events are disabled
selftests/user_events: Enable the event before write_fault test in
ftrace self-test
selftests/user_events: Add test cases when event is disabled
kernel/trace/trace_events_user.c | 3 ++-
tools/testing/selftests/user_events/ftrace_test.c | 8 ++++++++
2 files changed, 10 insertions(+), 1 deletion(-)
--
2.25.1
This patch-set implements 2 small extensions to the current F_OFD_GETLK,
allowing it to gather more information than it currently returns.
First extension allows to use F_UNLCK on query, which currently returns
EINVAL. Instead it can be used to query the locks on a particular fd -
something that is not currently possible. The basic idea is that on
F_OFD_GETLK, F_UNLCK would "conflict" with (or query) any types of the
lock on the same fd, and ignore any locks on other fds.
Use-cases:
1. CRIU-alike scenario when you want to read the locking info from an
fd for the later reconstruction. This can now be done by setting
l_start and l_len to 0 to cover entire file range, and do F_OFD_GETLK.
In the loop you need to advance l_start past the returned lock ranges,
to eventually collect all locked ranges.
2. Implementing the lock checking/enforcing policy.
Say you want to implement an "auditor" module in your program,
that checks that the I/O is done only after the proper locking is
applied on a file region. In this case you need to know if the
particular region is locked on that fd, and if so - with what type
of the lock. If you would do that currently (without this extension)
then you can only check for the write locks, and for that you need to
probe the lock on your fd and then open the same file via nother fd and
probe there. That way you can identify the write lock on a particular
fd, but such trick is non-atomic and complex. As for finding out the
read lock on a particular fd - impossible.
This extension allows to do such queries without any extra efforts.
3. Implementing the mandatory locking policy.
Suppose you want to make a policy where the write lock inhibits any
unlocked readers and writers. Currently you need to check if the
write lock is present on some other fd, and if it is not there - allow
the I/O operation. But because the write lock can appear at any moment,
you need to do that under some global lock, which can be released only
when the I/O operation is finished.
With the proposed extension you can instead just check the write lock
on your own fd first, and if it is there - allow the I/O operation on
that fd without using any global lock. Only if there is no write lock
on this fd, then you need to take global lock and check for a write
lock on other fds.
The second patch implements another extension.
Currently F_OFD_GETLK returns -1 in the l_pid member.
This patch removes the code that writes -1 there, so that the proper
pid is returned. I am not sure why it was decided to deliberately hide
the owner's pid. It may be needed in case you want to send some
message to the offending locker, like eg SIGKILL.
The third patch adds a test-case for OFD locks.
It tests both the generic things and the proposed extensions.
Stas Sergeev (3):
fs/locks: F_UNLCK extension for F_OFD_GETLK
fd/locks: allow get the lock owner by F_OFD_GETLK
selftests: add OFD lock tests
fs/locks.c | 25 +++-
tools/testing/selftests/locking/Makefile | 2 +
tools/testing/selftests/locking/ofdlocks.c | 135 +++++++++++++++++++++
3 files changed, 157 insertions(+), 5 deletions(-)
create mode 100644 tools/testing/selftests/locking/ofdlocks.c
CC: Jeff Layton <jlayton(a)kernel.org>
CC: Chuck Lever <chuck.lever(a)oracle.com>
CC: Alexander Viro <viro(a)zeniv.linux.org.uk>
CC: Christian Brauner <brauner(a)kernel.org>
CC: linux-fsdevel(a)vger.kernel.org
CC: linux-kernel(a)vger.kernel.org
CC: Shuah Khan <shuah(a)kernel.org>
CC: linux-kselftest(a)vger.kernel.org
--
2.39.2
This is to add Intel VT-d nested translation based on IOMMUFD nesting
infrastructure. As the iommufd nesting infrastructure series[1], iommu
core supports new ops to report iommu hardware information, allocate
domains with user data and sync stage-1 IOTLB. The data required in
the three paths are vendor-specific, so
1) IOMMU_HW_INFO_TYPE_INTEL_VTD and struct iommu_device_info_vtd are
defined to report iommu hardware information for Intel VT-d .
2) IOMMU_HWPT_DATA_VTD_S1 is defined for the Intel VT-d stage-1 page
table, it will be used in the stage-1 domain allocation and IOTLB
syncing path. struct iommu_hwpt_intel_vtd is defined to pass user_data
for the Intel VT-d stage-1 domain allocation.
struct iommu_hwpt_invalidate_intel_vtd is defined to pass the data for
the Intel VT-d stage-1 IOTLB invalidation.
With above IOMMUFD extensions, the intel iommu driver implements the three
paths to support nested translation.
The first Intel platform supporting nested translation is Sapphire
Rapids which, unfortunately, has a hardware errata [2] requiring special
treatment. This errata happens when a stage-1 page table page (either
level) is located in a stage-2 read-only region. In that case the IOMMU
hardware may ignore the stage-2 RO permission and still set the A/D bit
in stage-1 page table entries during page table walking.
A flag IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is introduced to report
this errata to userspace. With that restriction the user should either
disable nested translation to favor RO stage-2 mappings or ensure no
RO stage-2 mapping to enable nested translation.
Intel-iommu driver is armed with necessary checks to prevent such mix
in patch10 of this series.
Qemu currently does add RO mappings though. The vfio agent in Qemu
simply maps all valid regions in the GPA address space which certainly
includes RO regions e.g. vbios.
In reality we don't know a usage relying on DMA reads from the BIOS
region. Hence finding a way to allow user opt-out RO mappings in
Qemu might be an acceptable tradeoff. But how to achieve it cleanly
needs more discussion in Qemu community. For now we just hacked Qemu
to test.
Complete code can be found in [3], QEMU could can be found in [4].
base-commit: ce9b593b1f74ccd090edc5d2ad397da84baa9946
[1] https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.c…
[2] https://www.intel.com/content/www/us/en/content-details/772415/content-deta…
[3] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[4] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv4.mig.reset.v4_var3%…
Change log:
v3:
- Further split the patches into an order of adding helpers for nested
domain, iotlb flush, nested domain attachment and nested domain allocation
callback, then report the hw_info to userspace.
- Add batch support in cache invalidation from userspace
- Disallow nested translation usage if RO mappings exists in stage-2 domain
due to errata on readonly mappings on Sapphire Rapids platform.
v2: https://lore.kernel.org/linux-iommu/20230309082207.612346-1-yi.l.liu@intel.…
- The iommufd infrastructure is split to be separate series.
v1: https://lore.kernel.org/linux-iommu/20230209043153.14964-1-yi.l.liu@intel.c…
Regards,
Yi Liu
Lu Baolu (5):
iommu/vt-d: Extend dmar_domain to support nested domain
iommu/vt-d: Add helper for nested domain allocation
iommu/vt-d: Add helper to setup pasid nested translation
iommu/vt-d: Add nested domain allocation
iommu/vt-d: Disallow nesting on domains with read-only mappings
Yi Liu (5):
iommufd: Add data structure for Intel VT-d stage-1 domain allocation
iommu/vt-d: Make domain attach helpers to be extern
iommu/vt-d: Set the nested domain to a device
iommu/vt-d: Add iotlb flush for nested domain
iommu/vt-d: Implement hw_info for iommu capability query
drivers/iommu/intel/Makefile | 2 +-
drivers/iommu/intel/iommu.c | 78 ++++++++++++---
drivers/iommu/intel/iommu.h | 55 +++++++++--
drivers/iommu/intel/nested.c | 181 +++++++++++++++++++++++++++++++++++
drivers/iommu/intel/pasid.c | 151 +++++++++++++++++++++++++++++
drivers/iommu/intel/pasid.h | 2 +
drivers/iommu/iommufd/main.c | 6 ++
include/linux/iommu.h | 1 +
include/uapi/linux/iommufd.h | 149 ++++++++++++++++++++++++++++
9 files changed, 603 insertions(+), 22 deletions(-)
create mode 100644 drivers/iommu/intel/nested.c
--
2.34.1
In order to cover this case, setting 'maxlen = 0', with the following
explanation:
EVIOCGKEY is executed from evdev_do_ioctl(), which is called from
evdev_ioctl_handler().
evdev_ioctl_handler() is called from 2 functions, where by code coverage,
only the first one is in use.
‘compat’ is given the value ‘0’ [1].
Thus, the condition [2] is always false.
This means ‘len’ always equals a positive number [3]
‘maxlen’ in evdev_handle_get_val [4] is defined locally in
evdev_do_ioctl() [5], and is sent in the variable 'size' [6]
[1] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L1281
[2] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L705
[3] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L707
[4] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L886
[5] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L1155
[6] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L1141
Signed-off-by: Dana Elfassy <dangel101(a)gmail.com>
---
tools/testing/selftests/input/evioc-test.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/tools/testing/selftests/input/evioc-test.c b/tools/testing/selftests/input/evioc-test.c
index ad7b93fe39cf..b94de2ee5596 100644
--- a/tools/testing/selftests/input/evioc-test.c
+++ b/tools/testing/selftests/input/evioc-test.c
@@ -234,4 +234,23 @@ TEST(eviocsrep_set_repeat_settings)
selftest_uinput_destroy(uidev);
}
+TEST(eviocgkey_get_global_key_state)
+{
+ struct selftest_uinput *uidev;
+ int rep_values[2];
+ int rc;
+
+ memset(rep_values, 0, sizeof(rep_values));
+
+ rc = selftest_uinput_create_device(&uidev);
+ ASSERT_EQ(0, rc);
+ ASSERT_NE(NULL, uidev);
+
+ /* ioctl to create the scenario where len > maxlen in bits_to_user() */
+ rc = ioctl(uidev->evdev_fd, EVIOCGKEY(0), rep_values);
+ ASSERT_EQ(0, rc);
+
+ selftest_uinput_destroy(uidev);
+}
+
TEST_HARNESS_MAIN
--
2.41.0
From: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ Upstream commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 ]
Dan Carpenter spotted a race condition in a couple of situations like
these in the test_firmware driver:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
ret = kstrtou8(buf, 10, &val);
if (ret)
return ret;
mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
static ssize_t config_num_requests_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
int rc;
mutex_lock(&test_fw_mutex);
if (test_fw_config->reqs) {
pr_err("Must call release_all_firmware prior to changing config\n");
rc = -EINVAL;
mutex_unlock(&test_fw_mutex);
goto out;
}
mutex_unlock(&test_fw_mutex);
rc = test_dev_config_update_u8(buf, count,
&test_fw_config->num_requests);
out:
return rc;
}
static ssize_t config_read_fw_idx_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
return test_dev_config_update_u8(buf, count,
&test_fw_config->read_fw_idx);
}
The function test_dev_config_update_u8() is called from both the locked
and the unlocked context, function config_num_requests_store() and
config_read_fw_idx_store() which can both be called asynchronously as
they are driver's methods, while test_dev_config_update_u8() and siblings
change their argument pointed to by u8 *cfg or similar pointer.
To avoid deadlock on test_fw_mutex, the lock is dropped before calling
test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8()
itself, but alas this creates a race condition.
Having two locks wouldn't assure a race-proof mutual exclusion.
This situation is best avoided by the introduction of a new, unlocked
function __test_dev_config_update_u8() which can be called from the locked
context and reducing test_dev_config_update_u8() to:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
mutex_lock(&test_fw_mutex);
ret = __test_dev_config_update_u8(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
}
doing the locking and calling the unlocked primitive, which enables both
locked and unlocked versions without duplication of code.
The similar approach was applied to all functions called from the locked
and the unlocked context, which safely mitigates both deadlocks and race
conditions in the driver.
__test_dev_config_update_bool(), __test_dev_config_update_u8() and
__test_dev_config_update_size_t() unlocked versions of the functions
were introduced to be called from the locked contexts as a workaround
without releasing the main driver's lock and thereof causing a race
condition.
The test_dev_config_update_bool(), test_dev_config_update_u8() and
test_dev_config_update_size_t() locked versions of the functions
are being called from driver methods without the unnecessary multiplying
of the locking and unlocking code for each method, and complicating
the code with saving of the return value across lock.
Fixes: 7feebfa487b92 ("test_firmware: add support for request_firmware_into_buf")
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Takashi Iwai <tiwai(a)suse.de>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v5.4
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
Link: https://lore.kernel.org/r/20230509084746.48259-1-mirsad.todorovac@alu.unizg…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
lib/test_firmware.c | 52 ++++++++++++++++++++++++++++++---------------
1 file changed, 35 insertions(+), 17 deletions(-)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index b99cf0a50a698..4884057eb53f0 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -321,16 +321,26 @@ static ssize_t config_test_show_str(char *dst,
return len;
}
-static int test_dev_config_update_bool(const char *buf, size_t size,
+static inline int __test_dev_config_update_bool(const char *buf, size_t size,
bool *cfg)
{
int ret;
- mutex_lock(&test_fw_mutex);
if (kstrtobool(buf, cfg) < 0)
ret = -EINVAL;
else
ret = size;
+
+ return ret;
+}
+
+static int test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_bool(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
@@ -341,7 +351,8 @@ static ssize_t test_dev_config_show_bool(char *buf, bool val)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_size_t(const char *buf,
+static int __test_dev_config_update_size_t(
+ const char *buf,
size_t size,
size_t *cfg)
{
@@ -352,9 +363,7 @@ static int test_dev_config_update_size_t(const char *buf,
if (ret)
return ret;
- mutex_lock(&test_fw_mutex);
*(size_t *)cfg = new;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
@@ -370,7 +379,7 @@ static ssize_t test_dev_config_show_int(char *buf, int val)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+static int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
@@ -379,14 +388,23 @@ static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
if (ret)
return ret;
- mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
+static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_u8(buf, size, cfg);
+ mutex_unlock(&test_fw_mutex);
+
+ return ret;
+}
+
static ssize_t test_dev_config_show_u8(char *buf, u8 val)
{
return snprintf(buf, PAGE_SIZE, "%u\n", val);
@@ -413,10 +431,10 @@ static ssize_t config_num_requests_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_u8(buf, count,
- &test_fw_config->num_requests);
+ rc = __test_dev_config_update_u8(buf, count,
+ &test_fw_config->num_requests);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
@@ -460,10 +478,10 @@ static ssize_t config_buf_size_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_size_t(buf, count,
- &test_fw_config->buf_size);
+ rc = __test_dev_config_update_size_t(buf, count,
+ &test_fw_config->buf_size);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
@@ -490,10 +508,10 @@ static ssize_t config_file_offset_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_size_t(buf, count,
- &test_fw_config->file_offset);
+ rc = __test_dev_config_update_size_t(buf, count,
+ &test_fw_config->file_offset);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
--
2.39.2
From: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ Upstream commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 ]
Dan Carpenter spotted a race condition in a couple of situations like
these in the test_firmware driver:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
ret = kstrtou8(buf, 10, &val);
if (ret)
return ret;
mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
static ssize_t config_num_requests_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
int rc;
mutex_lock(&test_fw_mutex);
if (test_fw_config->reqs) {
pr_err("Must call release_all_firmware prior to changing config\n");
rc = -EINVAL;
mutex_unlock(&test_fw_mutex);
goto out;
}
mutex_unlock(&test_fw_mutex);
rc = test_dev_config_update_u8(buf, count,
&test_fw_config->num_requests);
out:
return rc;
}
static ssize_t config_read_fw_idx_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
return test_dev_config_update_u8(buf, count,
&test_fw_config->read_fw_idx);
}
The function test_dev_config_update_u8() is called from both the locked
and the unlocked context, function config_num_requests_store() and
config_read_fw_idx_store() which can both be called asynchronously as
they are driver's methods, while test_dev_config_update_u8() and siblings
change their argument pointed to by u8 *cfg or similar pointer.
To avoid deadlock on test_fw_mutex, the lock is dropped before calling
test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8()
itself, but alas this creates a race condition.
Having two locks wouldn't assure a race-proof mutual exclusion.
This situation is best avoided by the introduction of a new, unlocked
function __test_dev_config_update_u8() which can be called from the locked
context and reducing test_dev_config_update_u8() to:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
mutex_lock(&test_fw_mutex);
ret = __test_dev_config_update_u8(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
}
doing the locking and calling the unlocked primitive, which enables both
locked and unlocked versions without duplication of code.
The similar approach was applied to all functions called from the locked
and the unlocked context, which safely mitigates both deadlocks and race
conditions in the driver.
__test_dev_config_update_bool(), __test_dev_config_update_u8() and
__test_dev_config_update_size_t() unlocked versions of the functions
were introduced to be called from the locked contexts as a workaround
without releasing the main driver's lock and thereof causing a race
condition.
The test_dev_config_update_bool(), test_dev_config_update_u8() and
test_dev_config_update_size_t() locked versions of the functions
are being called from driver methods without the unnecessary multiplying
of the locking and unlocking code for each method, and complicating
the code with saving of the return value across lock.
Fixes: 7feebfa487b92 ("test_firmware: add support for request_firmware_into_buf")
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Takashi Iwai <tiwai(a)suse.de>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v5.4
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
Link: https://lore.kernel.org/r/20230509084746.48259-1-mirsad.todorovac@alu.unizg…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
lib/test_firmware.c | 52 ++++++++++++++++++++++++++++++---------------
1 file changed, 35 insertions(+), 17 deletions(-)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index 0b4e3de3f1748..4ad01dbe7e729 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -321,16 +321,26 @@ static ssize_t config_test_show_str(char *dst,
return len;
}
-static int test_dev_config_update_bool(const char *buf, size_t size,
+static inline int __test_dev_config_update_bool(const char *buf, size_t size,
bool *cfg)
{
int ret;
- mutex_lock(&test_fw_mutex);
if (kstrtobool(buf, cfg) < 0)
ret = -EINVAL;
else
ret = size;
+
+ return ret;
+}
+
+static int test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_bool(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
@@ -341,7 +351,8 @@ static ssize_t test_dev_config_show_bool(char *buf, bool val)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_size_t(const char *buf,
+static int __test_dev_config_update_size_t(
+ const char *buf,
size_t size,
size_t *cfg)
{
@@ -352,9 +363,7 @@ static int test_dev_config_update_size_t(const char *buf,
if (ret)
return ret;
- mutex_lock(&test_fw_mutex);
*(size_t *)cfg = new;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
@@ -370,7 +379,7 @@ static ssize_t test_dev_config_show_int(char *buf, int val)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+static int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
@@ -379,14 +388,23 @@ static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
if (ret)
return ret;
- mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
+static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_u8(buf, size, cfg);
+ mutex_unlock(&test_fw_mutex);
+
+ return ret;
+}
+
static ssize_t test_dev_config_show_u8(char *buf, u8 val)
{
return snprintf(buf, PAGE_SIZE, "%u\n", val);
@@ -413,10 +431,10 @@ static ssize_t config_num_requests_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_u8(buf, count,
- &test_fw_config->num_requests);
+ rc = __test_dev_config_update_u8(buf, count,
+ &test_fw_config->num_requests);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
@@ -460,10 +478,10 @@ static ssize_t config_buf_size_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_size_t(buf, count,
- &test_fw_config->buf_size);
+ rc = __test_dev_config_update_size_t(buf, count,
+ &test_fw_config->buf_size);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
@@ -490,10 +508,10 @@ static ssize_t config_file_offset_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_size_t(buf, count,
- &test_fw_config->file_offset);
+ rc = __test_dev_config_update_size_t(buf, count,
+ &test_fw_config->file_offset);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
--
2.39.2
From: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ Upstream commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 ]
Dan Carpenter spotted a race condition in a couple of situations like
these in the test_firmware driver:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
ret = kstrtou8(buf, 10, &val);
if (ret)
return ret;
mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
static ssize_t config_num_requests_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
int rc;
mutex_lock(&test_fw_mutex);
if (test_fw_config->reqs) {
pr_err("Must call release_all_firmware prior to changing config\n");
rc = -EINVAL;
mutex_unlock(&test_fw_mutex);
goto out;
}
mutex_unlock(&test_fw_mutex);
rc = test_dev_config_update_u8(buf, count,
&test_fw_config->num_requests);
out:
return rc;
}
static ssize_t config_read_fw_idx_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
return test_dev_config_update_u8(buf, count,
&test_fw_config->read_fw_idx);
}
The function test_dev_config_update_u8() is called from both the locked
and the unlocked context, function config_num_requests_store() and
config_read_fw_idx_store() which can both be called asynchronously as
they are driver's methods, while test_dev_config_update_u8() and siblings
change their argument pointed to by u8 *cfg or similar pointer.
To avoid deadlock on test_fw_mutex, the lock is dropped before calling
test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8()
itself, but alas this creates a race condition.
Having two locks wouldn't assure a race-proof mutual exclusion.
This situation is best avoided by the introduction of a new, unlocked
function __test_dev_config_update_u8() which can be called from the locked
context and reducing test_dev_config_update_u8() to:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
mutex_lock(&test_fw_mutex);
ret = __test_dev_config_update_u8(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
}
doing the locking and calling the unlocked primitive, which enables both
locked and unlocked versions without duplication of code.
The similar approach was applied to all functions called from the locked
and the unlocked context, which safely mitigates both deadlocks and race
conditions in the driver.
__test_dev_config_update_bool(), __test_dev_config_update_u8() and
__test_dev_config_update_size_t() unlocked versions of the functions
were introduced to be called from the locked contexts as a workaround
without releasing the main driver's lock and thereof causing a race
condition.
The test_dev_config_update_bool(), test_dev_config_update_u8() and
test_dev_config_update_size_t() locked versions of the functions
are being called from driver methods without the unnecessary multiplying
of the locking and unlocking code for each method, and complicating
the code with saving of the return value across lock.
Fixes: 7feebfa487b92 ("test_firmware: add support for request_firmware_into_buf")
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Takashi Iwai <tiwai(a)suse.de>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v5.4
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
Link: https://lore.kernel.org/r/20230509084746.48259-1-mirsad.todorovac@alu.unizg…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
lib/test_firmware.c | 52 ++++++++++++++++++++++++++++++---------------
1 file changed, 35 insertions(+), 17 deletions(-)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index 6ef3e6926da8a..13d3fa6aa972c 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -360,16 +360,26 @@ static ssize_t config_test_show_str(char *dst,
return len;
}
-static int test_dev_config_update_bool(const char *buf, size_t size,
+static inline int __test_dev_config_update_bool(const char *buf, size_t size,
bool *cfg)
{
int ret;
- mutex_lock(&test_fw_mutex);
if (kstrtobool(buf, cfg) < 0)
ret = -EINVAL;
else
ret = size;
+
+ return ret;
+}
+
+static int test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_bool(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
@@ -380,7 +390,8 @@ static ssize_t test_dev_config_show_bool(char *buf, bool val)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_size_t(const char *buf,
+static int __test_dev_config_update_size_t(
+ const char *buf,
size_t size,
size_t *cfg)
{
@@ -391,9 +402,7 @@ static int test_dev_config_update_size_t(const char *buf,
if (ret)
return ret;
- mutex_lock(&test_fw_mutex);
*(size_t *)cfg = new;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
@@ -409,7 +418,7 @@ static ssize_t test_dev_config_show_int(char *buf, int val)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+static int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
@@ -418,14 +427,23 @@ static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
if (ret)
return ret;
- mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
+static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_u8(buf, size, cfg);
+ mutex_unlock(&test_fw_mutex);
+
+ return ret;
+}
+
static ssize_t test_dev_config_show_u8(char *buf, u8 val)
{
return snprintf(buf, PAGE_SIZE, "%u\n", val);
@@ -478,10 +496,10 @@ static ssize_t config_num_requests_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_u8(buf, count,
- &test_fw_config->num_requests);
+ rc = __test_dev_config_update_u8(buf, count,
+ &test_fw_config->num_requests);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
@@ -525,10 +543,10 @@ static ssize_t config_buf_size_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_size_t(buf, count,
- &test_fw_config->buf_size);
+ rc = __test_dev_config_update_size_t(buf, count,
+ &test_fw_config->buf_size);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
@@ -555,10 +573,10 @@ static ssize_t config_file_offset_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_size_t(buf, count,
- &test_fw_config->file_offset);
+ rc = __test_dev_config_update_size_t(buf, count,
+ &test_fw_config->file_offset);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
--
2.39.2
This is part of the effort to remove the empty element of the ctl_table
structures (used to calculate size) and replace it with an ARRAY_SIZE call. By
replacing the child element in struct ctl_table with a flags element we make
sure that there are no forward recursions on child nodes and therefore set
ourselves up for just using an ARRAY_SIZE. We also added some self tests to
make sure that we do not break anything.
Patchset is separated in 4: parport fixes, selftests fixes, selftests additions and
replacement of child element. Tested everything with sysctl self tests and everything
seems "ok".
1. parport fixes: This is related to my previous series and it plugs a sysct
table leak in the parport driver. @mcgrof: I'm just leaving this here so we
don't have to retest the parport stuff
2. Selftests fixes: Remove the prefixed zeros when passing a awk field to the
awk print command because it was causing $0009 to be interpreted as $0.
Replaced continue with return in sysctl.sh(test_case) so the test actually
gets skipped. The skip decision is now in sysctl.sh(skip_test).
3. Selftest additions: New test to confirm that unregister actually removes
targets. New test to confirm that permanently empty targets are indeed
created and that no other targets can be created "on top".
4. Replaced the child pointer in struct ctl_table with an enum which is used to
differentiate between permanently empty targets and non-empty ones.
V2: Replaced the u8 flag with an enumeration.
Comments/feedback greatly appreciated
Best
Joel
Joel Granados (8):
parport: plug a sysctl register leak
test_sysctl: Fix test metadata getters
test_sysctl: Group node sysctl test under one func
test_sysctl: Add an unregister sysctl test
test_sysctl: Add an option to prevent test skip
test_sysclt: Test for registering a mount point
sysctl: Remove debugging dump_stack
sysctl: replace child with an enumeration
drivers/parport/procfs.c | 23 ++---
fs/proc/proc_sysctl.c | 82 ++++------------
include/linux/sysctl.h | 14 ++-
lib/test_sysctl.c | 91 ++++++++++++++++--
tools/testing/selftests/sysctl/sysctl.sh | 115 +++++++++++++++++------
5 files changed, 214 insertions(+), 111 deletions(-)
--
2.30.2
Events Tracing infrastructure contains lot of files, directories
(internally in terms of inodes, dentries). And ends up by consuming
memory in MBs. We can have multiple events of Events Tracing, which
further requires more memory.
Instead of creating inodes/dentries, eventfs could keep meta-data and
skip the creation of inodes/dentries. As and when require, eventfs will
create the inodes/dentries only for required files/directories.
Also eventfs would delete the inodes/dentries once no more requires
but preserve the meta data.
Tracing events took ~9MB, with this approach it took ~4.5MB
for ~10K files/dir.
Diff from v1:
Patch 1: add header file
Patch 2: resolved kernel test robot issues
protecting eventfs lists using nested eventfs_rwsem
Patch 3: protecting eventfs lists using nested eventfs_rwsem
Patch 4: improve events cleanup code to fix crashes
Patch 5: resolved kernel test robot issues
removed d_instantiate_anon() calls
Patch 6: resolved kernel test robot issues
fix kprobe test in eventfs_root_lookup()
protecting eventfs lists using nested eventfs_rwsem
Patch 7: remove header file
Patch 8: pass eventfs_rwsem as argument to eventfs functions
called eventfs_remove_events_dir() instead of tracefs_remove()
from event_trace_del_tracer()
Patch 9: new patch to fix kprobe test case
fs/tracefs/Makefile | 1 +
fs/tracefs/event_inode.c | 761 ++++++++++++++++++
fs/tracefs/inode.c | 124 ++-
fs/tracefs/internal.h | 25 +
include/linux/trace_events.h | 1 +
include/linux/tracefs.h | 49 ++
kernel/trace/trace.h | 3 +-
kernel/trace/trace_events.c | 66 +-
.../ftrace/test.d/kprobe/kprobe_args_char.tc | 4 +-
.../test.d/kprobe/kprobe_args_string.tc | 4 +-
10 files changed, 992 insertions(+), 46 deletions(-)
create mode 100644 fs/tracefs/event_inode.c
create mode 100644 fs/tracefs/internal.h
--
2.39.0
Some test cases from net/tls, net/fcnal-test and net/vrf-xfrm-tests
that rely on cryptographic functions to work and use non-compliant FIPS
algorithms fail in FIPS mode.
In order to allow these tests to pass in a wider set of kernels,
- for net/tls, skip the test variants that use the ChaCha20-Poly1305
and SM4 algorithms, when FIPS mode is enabled;
- for net/fcnal-test, skip the MD5 tests, when FIPS mode is enabled;
- for net/vrf-xfrm-tests, replace the algorithms that are not
FIPS-compliant with compliant ones.
Changes in v4:
- Remove extra newline.
- Add R-b tag.
Changes in v3:
- Add new commit to allow skipping test directly from test setup.
- No need to initialize static variable to zero.
- Skip tests during test setup only.
- Use the constructor attribute to set fips_enabled before entering
main().
Changes in v2:
- Add R-b tags.
- Put fips_non_compliant into the variants.
- Turn fips_enabled into a static global variable.
- Read /proc/sys/crypto/fips_enabled only once at main().
v1: https://lore.kernel.org/netdev/20230607174302.19542-1-magali.lemes@canonica…
v2: https://lore.kernel.org/netdev/20230609164324.497813-1-magali.lemes@canonic…
v3: https://lore.kernel.org/netdev/20230612125107.73795-1-magali.lemes@canonica…
Magali Lemes (4):
selftests/harness: allow tests to be skipped during setup
selftests: net: tls: check if FIPS mode is enabled
selftests: net: vrf-xfrm-tests: change authentication and encryption
algos
selftests: net: fcnal-test: check if FIPS mode is enabled
tools/testing/selftests/kselftest_harness.h | 6 ++--
tools/testing/selftests/net/fcnal-test.sh | 27 +++++++++++-----
tools/testing/selftests/net/tls.c | 24 +++++++++++++-
tools/testing/selftests/net/vrf-xfrm-tests.sh | 32 +++++++++----------
4 files changed, 61 insertions(+), 28 deletions(-)
--
2.34.1
From: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
According to Mirsad the gpio-sim.sh test appears to FAIL in a wrong way
due to missing initialisation of shell variables:
4.2. Bias settings work correctly
cat: /sys/devices/platform/gpio-sim.0/gpiochip18/sim_gpio0/value: No such file or directory
./gpio-sim.sh: line 393: test: =: unary operator expected
bias setting does not work
GPIO gpio-sim test FAIL
After this change the test passed:
4.2. Bias settings work correctly
GPIO gpio-sim test PASS
His testing environment is AlmaLinux 8.7 on Lenovo desktop box with
the latest Linux kernel based on v6.2:
Linux 6.2.0-mglru-kmlk-andy-09238-gd2980d8d8265 x86_64
Suggested-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
---
tools/testing/selftests/gpio/gpio-sim.sh | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/gpio/gpio-sim.sh b/tools/testing/selftests/gpio/gpio-sim.sh
index 9f539d454ee4..fa2ce2b9dd5f 100755
--- a/tools/testing/selftests/gpio/gpio-sim.sh
+++ b/tools/testing/selftests/gpio/gpio-sim.sh
@@ -389,6 +389,9 @@ create_chip chip
create_bank chip bank
set_num_lines chip bank 8
enable_chip chip
+DEVNAME=`configfs_dev_name chip`
+CHIPNAME=`configfs_chip_name chip bank`
+SYSFS_PATH="/sys/devices/platform/$DEVNAME/$CHIPNAME/sim_gpio0/value"
$BASE_DIR/gpio-mockup-cdev -b pull-up /dev/`configfs_chip_name chip bank` 0
test `cat $SYSFS_PATH` = "1" || fail "bias setting does not work"
remove_chip chip
--
2.40.0.1.gaa8946217a0b
The default timeout for kselftests is 45 seconds, but pcm-test can take
longer than that to run depending on the number of PCMs present on a
device.
As a data point, running pcm-test on mt8192-asurada-spherion takes about
1m15s.
Set the timeout to 10 minutes, which should give enough slack to run the
test even on devices with many PCMs.
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
---
tools/testing/selftests/alsa/settings | 1 +
1 file changed, 1 insertion(+)
create mode 100644 tools/testing/selftests/alsa/settings
diff --git a/tools/testing/selftests/alsa/settings b/tools/testing/selftests/alsa/settings
new file mode 100644
index 000000000000..a62d2fa1275c
--- /dev/null
+++ b/tools/testing/selftests/alsa/settings
@@ -0,0 +1 @@
+timeout=600
--
2.39.0
Here is a series with some fixes and cleanups to resctrl selftests and
rewrite of CAT test into something that really tests CAT working or not
condition.
v2:
- Rebased on top of next to solve the conflicts
- Added 2 patches related to resctrl FS mount/umount (fix + cleanup)
- Consistently use "alloc" in cache_alloc_size()
- CAT test error handling tweaked
- Remove a spurious newline change from the CAT patch
- Small improvements to changelogs
Ilpo Järvinen (24):
selftests/resctrl: Add resctrl.h into build deps
selftests/resctrl: Check also too low values for CBM bits
selftests/resctrl: Move resctrl FS mount/umount to higher level
selftests/resctrl: Remove mum_resctrlfs
selftests/resctrl: Make span unsigned long everywhere
selftests/resctrl: Express span in bytes
selftests/resctrl: Remove duplicated preparation for span arg
selftests/resctrl: Don't use variable argument list for ->setup()
selftests/resctrl: Remove "malloc_and_init_memory" param from
run_fill_buf()
selftests/resctrl: Split run_fill_buf() to alloc, work, and dealloc
helpers
selftests/resctrl: Remove start_buf local variable from buffer alloc
func
selftests/resctrl: Don't pass test name to fill_buf
selftests/resctrl: Add flush_buffer() to fill_buf
selftests/resctrl: Remove test type checks from cat_val()
selftests/resctrl: Refactor get_cbm_mask()
selftests/resctrl: Create cache_alloc_size() helper
selftests/resctrl: Replace count_bits with count_consecutive_bits()
selftests/resctrl: Exclude shareable bits from schemata in CAT test
selftests/resctrl: Pass the real number of tests to show_cache_info()
selftests/resctrl: Move CAT/CMT test global vars to func they are used
selftests/resctrl: Read in less obvious order to defeat prefetch
optimizations
selftests/resctrl: Split measure_cache_vals() function
selftests/resctrl: Split show_cache_info() to test specific and
generic parts
selftests/resctrl: Rewrite Cache Allocation Technology (CAT) test
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/resctrl/cache.c | 154 ++++++------
tools/testing/selftests/resctrl/cat_test.c | 235 ++++++++----------
tools/testing/selftests/resctrl/cmt_test.c | 65 +++--
tools/testing/selftests/resctrl/fill_buf.c | 105 ++++----
tools/testing/selftests/resctrl/mba_test.c | 9 +-
tools/testing/selftests/resctrl/mbm_test.c | 17 +-
tools/testing/selftests/resctrl/resctrl.h | 32 +--
.../testing/selftests/resctrl/resctrl_tests.c | 82 ++++--
tools/testing/selftests/resctrl/resctrl_val.c | 9 +-
tools/testing/selftests/resctrl/resctrlfs.c | 187 ++++++++++----
11 files changed, 499 insertions(+), 398 deletions(-)
--
2.30.2
Fix the following coccicheck warning:
tools/testing/selftests/nolibc/nolibc-test.c:646:5-8: Unneeded variable:
"ret". Return "0"
Signed-off-by: Yonggang Wu <wuyonggang001(a)208suo.com>
---
tools/testing/selftests/nolibc/nolibc-test.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/nolibc/nolibc-test.c
b/tools/testing/selftests/nolibc/nolibc-test.c
index 486334981e60..2b723354e085 100644
--- a/tools/testing/selftests/nolibc/nolibc-test.c
+++ b/tools/testing/selftests/nolibc/nolibc-test.c
@@ -546,7 +546,6 @@ int run_syscall(int min, int max)
int proc;
int test;
int tmp;
- int ret = 0;
void *p1, *p2;
/* <proc> indicates whether or not /proc is mounted */
@@ -632,18 +631,17 @@ int run_syscall(int min, int max)
CASE_TEST(syscall_noargs); EXPECT_SYSEQ(1,
syscall(__NR_getpid), getpid()); break;
CASE_TEST(syscall_args); EXPECT_SYSER(1,
syscall(__NR_statx, 0, NULL, 0, 0, NULL), -1, EFAULT); break;
case __LINE__:
- return ret; /* must be last */
+ return 0; /* must be last */
/* note: do not set any defaults so as to permit holes above */
}
}
- return ret;
+ return 0;
}
int run_stdlib(int min, int max)
{
int test;
int tmp;
- int ret = 0;
void *p1, *p2;
for (test = min; test >= 0 && test <= max; test++) {
@@ -726,11 +724,11 @@ int run_stdlib(int min, int max)
# warning "__SIZEOF_LONG__ is undefined"
#endif /* __SIZEOF_LONG__ */
case __LINE__:
- return ret; /* must be last */
+ return 0; /* must be last */
/* note: do not set any defaults so as to permit holes above */
}
}
- return ret;
+ return 0;
}
#define EXPECT_VFPRINTF(c, expected, fmt, ...) \
@@ -790,7 +788,6 @@ static int run_vfprintf(int min, int max)
{
int test;
int tmp;
- int ret = 0;
void *p1, *p2;
for (test = min; test >= 0 && test <= max; test++) {
@@ -810,11 +807,11 @@ static int run_vfprintf(int min, int max)
CASE_TEST(hex); EXPECT_VFPRINTF(1, "f", "%x", 0xf);
break;
CASE_TEST(pointer); EXPECT_VFPRINTF(3, "0x1", "%p", (void
*) 0x1); break;
case __LINE__:
- return ret; /* must be last */
+ return 0; /* must be last */
/* note: do not set any defaults so as to permit holes above */
}
}
- return ret;
+ return 0;
}
static int smash_stack(void)
Currently the MM selftests attempt to work out the target architecture by
using CROSS_COMPILE or otherwise querying the host machine, storing the
target architecture in a variable called MACHINE rather than the usual ARCH
though as far as I can tell (including for x86_64) the value is the same as
we would use for architecture.
When cross compiling with LLVM we don't need a CROSS_COMPILE as LLVM can
support many target architectures in a single build so this logic does not
work, CROSS_COMPILE is not set and we end up selecting tests for the host
rather than target architecture. Fix this by using the more standard ARCH
to describe the architecture, taking it from the environment if specified.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/mm/Makefile | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 23af4633f0f4..4f0c50c33ba7 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -5,12 +5,15 @@ LOCAL_HDRS += $(selfdir)/mm/local_config.h $(top_srcdir)/mm/gup_test.h
include local_config.mk
+ifeq ($(ARCH),)
+
ifeq ($(CROSS_COMPILE),)
uname_M := $(shell uname -m 2>/dev/null || echo not)
else
uname_M := $(shell echo $(CROSS_COMPILE) | grep -o '^[a-z0-9]\+')
endif
-MACHINE ?= $(shell echo $(uname_M) | sed -e 's/aarch64.*/arm64/' -e 's/ppc64.*/ppc64/')
+ARCH ?= $(shell echo $(uname_M) | sed -e 's/aarch64.*/arm64/' -e 's/ppc64.*/ppc64/')
+endif
# Without this, failed build products remain, with up-to-date timestamps,
# thus tricking Make (and you!) into believing that All Is Well, in subsequent
@@ -65,7 +68,7 @@ TEST_GEN_PROGS += ksm_tests
TEST_GEN_PROGS += ksm_functional_tests
TEST_GEN_PROGS += mdwe_test
-ifeq ($(MACHINE),x86_64)
+ifeq ($(ARCH),x86_64)
CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_32bit_program.c -m32)
CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_64bit_program.c)
CAN_BUILD_WITH_NOPIE := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_program.c -no-pie)
@@ -87,13 +90,13 @@ TEST_GEN_PROGS += $(BINARIES_64)
endif
else
-ifneq (,$(findstring $(MACHINE),ppc64))
+ifneq (,$(findstring $(ARCH),ppc64))
TEST_GEN_PROGS += protection_keys
endif
endif
-ifneq (,$(filter $(MACHINE),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sparc64 x86_64))
+ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sparc64 x86_64))
TEST_GEN_PROGS += va_high_addr_switch
TEST_GEN_PROGS += virtual_address_range
TEST_GEN_PROGS += write_to_hugetlbfs
@@ -112,7 +115,7 @@ $(TEST_GEN_PROGS): vm_util.c
$(OUTPUT)/uffd-stress: uffd-common.c
$(OUTPUT)/uffd-unit-tests: uffd-common.c
-ifeq ($(MACHINE),x86_64)
+ifeq ($(ARCH),x86_64)
BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32))
BINARIES_64 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_64))
---
base-commit: 858fd168a95c5b9669aac8db6c14a9aeab446375
change-id: 20230614-kselftest-mm-llvm-a25a7daffa6f
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Hi,
The very recent 6.4-rc3 kernel build with AlmaLinux 8.7 on LENOVO 10TX000VCR
desktop box fails one test:
[root@host net]# ./fcnal-test.sh
[...]
TEST: ping out, vrf device+address bind - ns-B loopback IPv6 [ OK ]
TEST: ping out, vrf device+address bind - ns-B IPv6 LLA [FAIL]
TEST: ping in - ns-A IPv6 [ OK ]
[...]
Tests passed: 887
Tests failed: 1
[root@host net]#
Please find the config, + dmesg and lshw output here:
https://domac.alu.unizg.hr/~mtodorov/linux/selftests/net-fcnal-test/config-…https://domac.alu.unizg.hr/~mtodorov/linux/selftests/net-fcnal-test/dmesg.l…https://domac.alu.unizg.hr/~mtodorov/linux/selftests/net-fcnal-test/lshw.txt
I believe that I have all required configs merged for the selftest/net tests.
Maybe we have a regression?
My knowledge of fcnal-test.sh isn't sufficient to build a smaller reproducer.
Guillaume said in January he could help with the net/fcnal-test.sh, but I was doing
the other things in the meantime. Tempus fugit :-/
Best regards,
Mirsad
--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
"What’s this thing suddenly coming towards me very fast? Very very fast.
... I wonder if it will be friends with me?"
Hi,
Static analysis with cppcheck has found an issue in the following commit:
commit 047e6575aec71d75b765c22111820c4776cd1c43
Author: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Date: Tue Sep 24 09:22:53 2019 +0530
powerpc/mm: Fixup tlbie vs mtpidr/mtlpidr ordering issue on POWER9
The issue in tools/testing/selftests/powerpc/mm/tlbie_test.c in
end_verification_log() is as follows:
static inline void end_verification_log(unsigned int tid, unsigned
nr_anamolies)
{
FILE *f = fp[tid];
char logfile[30];
char path[LOGDIR_NAME_SIZE + 30];
char separator[] = "/";
fclose(f);
if (nr_anamolies == 0) {
remove(path);
return;
}
.... etc
in the case where nr_anamolies is zero the remove(path) call is using an
uninitialized path, this potentially could contain uninitialized garbage
on the stack (and if one is unlucky enough it may be a valid filename
that one does not want to be removed).
Not sure what the original intention was, but this code looks incorrect
to me.
Colin
Dzień dobry,
zapoznałem się z Państwa ofertą i z przyjemnością przyznaję, że przyciąga uwagę i zachęca do dalszych rozmów.
Pomyślałem, że może mógłbym mieć swój wkład w Państwa rozwój i pomóc dotrzeć z tą ofertą do większego grona odbiorców. Pozycjonuję strony www, dzięki czemu generują świetny ruch w sieci.
Możemy porozmawiać w najbliższym czasie?
Pozdrawiam
Adam Charachuta
Since commit ("selftests: error out if kernel header files are not yet
built") got merged, the kselftest build correctly because the
KBUILD_OUTPUT isn't set when building out-of-tree and specifying 'O='
This is the error message that pops up.
make --silent --keep-going --jobs=32 O=/home/anders/.cache/tuxmake/builds/1482/build INSTALL_PATH=/home/anders/.cache/tuxmake/builds/1482/build/kselftest_install ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- V=1 CROSS_COMPILE_COMPAT=arm-linux-gnueabihf- kselftest-install
make[3]: Entering directory '/home/anders/src/kernel/next/tools/testing/selftests/alsa'
-e [1;31merror[0m: missing kernel header files.
Please run this and try again:
cd /home/anders/src/kernel/next/tools/testing/selftests/../../..
make headers
make[3]: Leaving directory '/home/anders/src/kernel/next/tools/testing/selftests/alsa'
make[3]: *** [../lib.mk:77: kernel_header_files] Error 1
Fixing the issue by assigning KBUILD_OUTPUT the same way how its done in
kselftest's Makefile. By adding 'KBUILD_OUTPUT := $(O)' 'if $(origin O)'
is set to 'command line'. This will set the the BUILD dir to
KBUILD_OUTPUT/kselftest when doing out-of-tree builds which makes them
in its own separete output directory.
Signed-off-by: Anders Roxell <anders.roxell(a)linaro.org>
---
tools/testing/selftests/lib.mk | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index b8ea03b9a015..d17854285f2b 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -44,6 +44,10 @@ endif
selfdir = $(realpath $(dir $(filter %/lib.mk,$(MAKEFILE_LIST))))
top_srcdir = $(selfdir)/../../..
+ifeq ("$(origin O)", "command line")
+ KBUILD_OUTPUT := $(O)
+endif
+
ifneq ($(KBUILD_OUTPUT),)
# Make's built-in functions such as $(abspath ...), $(realpath ...) cannot
# expand a shell special character '~'. We use a somewhat tedious way here.
--
2.39.2
tls:no_pad exits the test when tls is not available. It should skip the
test like all others do
Signed-off-by: Kuba Pawlak <kuba.pawlak(a)canonical.com>
---
tools/testing/selftests/net/tls.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index e699548d4247dd57555a72ec1627566962128f73..ea3ec8463df993d80f0b70c4632b2a1e3c57b424 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -1727,7 +1727,7 @@ TEST(no_pad) {
ulp_sock_pair(_metadata, &fd, &cfd, ¬ls);
if (notls)
- exit(KSFT_SKIP);
+ SKIP(return, "no TLS support");
ret = setsockopt(fd, SOL_TLS, TLS_TX, &tls12, sizeof(tls12));
EXPECT_EQ(ret, 0);
--
2.37.2
Hi,
Enclosed are a pair of patches for an oops that can occur if an exception is
generated while a bpf subprogram is running. One of the bpf_prog_aux entries
for the subprograms are missing an extable. This can lead to an exception that
would otherwise be handled turning into a NULL pointer bug.
These changes were tested via the verifier and progs selftests and no
regressions were observed.
Changes from v4:
- Ensure that num_exentries is copied to prog->aux from func[0] (Feedback from
Ilya Leoshkevich)
Changes from v3:
- Selftest style fixups (Feedback from Yonghong Song)
- Selftest needs to assert that test bpf program executed (Feedback from
Yonghong Song)
- Selftest should combine open and load using open_and_load (Feedback from
Yonghong Song)
Changes from v2:
- Insert only the main program's kallsyms (Feedback from Yonghong Song and
Alexei Starovoitov)
- Selftest should use ASSERT instead of CHECK (Feedback from Yonghong Song)
- Selftest needs some cleanup (Feedback from Yonghong Song)
- Switch patch order (Feedback from Alexei Starovoitov)
Changes from v1:
- Add a selftest (Feedback From Alexei Starovoitov)
- Move to a 1-line verifier change instead of searching multiple extables
Krister Johansen (2):
bpf: ensure main program has an extable
selftests/bpf: add a test for subprogram extables
kernel/bpf/verifier.c | 7 ++-
.../bpf/prog_tests/subprogs_extable.c | 29 +++++++++++
.../bpf/progs/test_subprogs_extable.c | 51 +++++++++++++++++++
3 files changed, 85 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/subprogs_extable.c
create mode 100644 tools/testing/selftests/bpf/progs/test_subprogs_extable.c
--
2.25.1
Hello everyone,
This is an RFC patch series to propose the addition of a test attributes
framework to KUnit.
There has been interest in filtering out "slow" KUnit tests. Most notably,
a new config, CONFIG_MEMCPY_SLOW_KUNIT_TEST, has been added to exclude
particularly slow memcpy tests
(https://lore.kernel.org/all/20230118200653.give.574-kees@kernel.org/).
This proposed attributes framework would be used to save and access test
associated data, including whether a test is slow. These attributes would
be reportable (via KTAP and command line output) and some will be
filterable.
This framework is designed to allow for the addition of other attributes in
the future. These attributes could include whether the test is flaky,
associated test files, etc.
Note that this could intersect with the discussions on how to format
test-associated data in KTAP v2 that I am also involved in
(https://lore.kernel.org/all/20230420205734.1288498-1-rmoar@google.com/).
If the overall idea seems good, I'll make sure to add tests/documentation,
and more patches marking existing tests as slow to the patch series.
Thanks!
Rae
Rae Moar (6):
kunit: Add test attributes API structure
kunit: Add speed attribute
kunit: Add ability to filter attributes
kunit: tool: Add command line interface to filter and report
attributes
kunit: memcpy: Mark tests as slow using test attributes
kunit: time: Mark test as slow using test attributes
include/kunit/attributes.h | 41 ++++
include/kunit/test.h | 62 ++++++
kernel/time/time_test.c | 2 +-
lib/kunit/Makefile | 3 +-
lib/kunit/attributes.c | 280 +++++++++++++++++++++++++
lib/kunit/executor.c | 89 ++++++--
lib/kunit/executor_test.c | 8 +-
lib/kunit/kunit-example-test.c | 9 +
lib/kunit/test.c | 17 +-
lib/memcpy_kunit.c | 8 +-
tools/testing/kunit/kunit.py | 34 ++-
tools/testing/kunit/kunit_kernel.py | 6 +-
tools/testing/kunit/kunit_tool_test.py | 41 ++--
13 files changed, 536 insertions(+), 64 deletions(-)
create mode 100644 include/kunit/attributes.h
create mode 100644 lib/kunit/attributes.c
base-commit: fefdb43943c1a0d87e1b43ae4d03e5f9a1d058f4
--
2.41.0.162.gfafddb0af9-goog
On 6/13/23 1:50 AM, baomingtong001(a)208suo.com wrote:
> Fix the following coccicheck warning:
>
> tools/testing/selftests/bpf/progs/tailcall_bpf2bpf6.c:28:14-17: Unneeded
> variable: "ret".
>
> Return "1".
>
> Signed-off-by: Mingtong Bao <baomingtong001(a)208suo.com>
> ---
> tools/testing/selftests/bpf/progs/tailcall_bpf2bpf6.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf6.c
> b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf6.c
> index 4a9f63bea66c..7f0146682577 100644
> --- a/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf6.c
> +++ b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf6.c
> @@ -25,10 +25,9 @@ static __noinline
> int subprog_tail(struct __sk_buff *skb)
> {
> /* Don't propagate the constant to the caller */
> - volatile int ret = 1;
>
> bpf_tail_call_static(skb, &jmp_table, 0);
> - return ret;
> + return 1;
Please pay attention to the comment:
/* Don't propagate the constant to the caller */
which clearly says 'constant' is not preferred.
The patch introduced this change is:
5e0b0a4c52d30 selftests/bpf: Test tail call counting with bpf2bpf
and data on stack
The test intentionally want to:
'Specifically when the size
of data allocated on BPF stack is not a multiple on 8.'
Note that with volatile and without volatile, the generated
code will be different and it will result in different
verification path.
cc Jakub for further clarification.
> }
>
> SEC("tc")
On Thu, Jun 08, 2023 at 07:52:54PM +0200, Michal Sekletar wrote:
> On Thu, Jun 8, 2023 at 1:51 PM Greg KH <gregkh(a)linuxfoundation.org> wrote:
>
> > So how are you protecting this from being an information leak like we
> > have had in the past where you could monitor how many characters were
> > being sent to the tty through a proc file? Seems like now you can just
> > monitor any tty node in the system and get the same information, while
> > today you can only do it for the tty devices you have permissions for,
> > right?
>
> Hi Greg,
>
> I am not protecting against it in any way, but proposed changes are only
> about timestamp updates which still happen in at least 8 seconds intervals
> so exact timing of read/writes to tty can't be inferred. Frankly, I may
> have misunderstood something. It would be great if you could mention a bit
> more details about CVE you had in mind.
Ah, I missed that this is in 8 second increments, nevermind then!
thanks,
greg k-h
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 56 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 426 ++++++
fs/userfaultfd.c | 26 +-
include/linux/userfaultfd_k.h | 29 +-
include/uapi/linux/fs.h | 53 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 32 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 53 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 4 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1301 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
15 files changed, 2034 insertions(+), 23 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
Now the writing operation return the count of writes whether events are
enabled or disabled. Fix this by just return -EFAULT when events are disabled.
sunliming (3):
tracing/user_events: Fix incorrect return value for writing operation
when events are disabled
selftests/user_events: Enable the event before write_fault test in
ftrace self-test
selftests/user_events: Add test cases when event is disabled
kernel/trace/trace_events_user.c | 3 ++-
tools/testing/selftests/user_events/ftrace_test.c | 7 +++++++
2 files changed, 9 insertions(+), 1 deletion(-)
--
2.25.1
Some test cases from net/tls, net/fcnal-test and net/vrf-xfrm-tests
that rely on cryptographic functions to work and use non-compliant FIPS
algorithms fail in FIPS mode.
In order to allow these tests to pass in a wider set of kernels,
- for net/tls, skip the test variants that use the ChaCha20-Poly1305
and SM4 algorithms, when FIPS mode is enabled;
- for net/fcnal-test, skip the MD5 tests, when FIPS mode is enabled;
- for net/vrf-xfrm-tests, replace the algorithms that are not
FIPS-compliant with compliant ones.
Changes in v3:
- Add new commit to allow skipping test directly from test setup.
- No need to initialize static variable to zero.
- Skip tests during test setup only.
- Use the constructor attribute to set fips_enabled before entering
main().
Changes in v2:
- Add R-b tags.
- Put fips_non_compliant into the variants.
- Turn fips_enabled into a static global variable.
- Read /proc/sys/crypto/fips_enabled only once at main().
v1: https://lore.kernel.org/netdev/20230607174302.19542-1-magali.lemes@canonica…
v2: https://lore.kernel.org/netdev/20230609164324.497813-1-magali.lemes@canonic…
Magali Lemes (4):
selftests/harness: allow tests to be skipped during setup
selftests: net: tls: check if FIPS mode is enabled
selftests: net: vrf-xfrm-tests: change authentication and encryption
algos
selftests: net: fcnal-test: check if FIPS mode is enabled
tools/testing/selftests/kselftest_harness.h | 6 ++--
tools/testing/selftests/net/fcnal-test.sh | 27 +++++++++++-----
tools/testing/selftests/net/tls.c | 25 ++++++++++++++-
tools/testing/selftests/net/vrf-xfrm-tests.sh | 32 +++++++++----------
4 files changed, 62 insertions(+), 28 deletions(-)
--
2.34.1
Patches for kunit are managed in linux-kselftest tree before merged into
the mainline, but the MAINTAINERS section for kunit doesn't have the
entry for the tree. Add it.
Signed-off-by: SeongJae Park <sj(a)kernel.org>
---
MAINTAINERS | 1 +
1 file changed, 1 insertion(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index ce5f343c1443..8a217438956b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11327,6 +11327,7 @@ L: linux-kselftest(a)vger.kernel.org
L: kunit-dev(a)googlegroups.com
S: Maintained
W: https://google.github.io/kunit-docs/third_party/kernel/docs/
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git
F: Documentation/dev-tools/kunit/
F: include/kunit/
F: lib/kunit/
--
2.25.1
After a few years of increasing test coverage in the MPTCP selftests, we
realised [1] the last version of the selftests is supposed to run on old
kernels without issues.
Supporting older versions is not that easy for this MPTCP case: these
selftests are often validating the internals by checking packets that
are exchanged, when some MIB counters are incremented after some
actions, how connections are getting opened and closed in some cases,
etc. In other words, it is not limited to the socket interface between
the userspace and the kernelspace.
In addition to that, the current MPTCP selftests run a lot of different
sub-tests but the TAP13 protocol used in the selftests don't support
sub-tests: one failure in sub-tests implies that the whole selftest is
seen as failed at the end because sub-tests are not tracked. It is then
important to skip sub-tests not supported by old kernels.
To minimise the modifications and reduce the complexity to support old
versions, the idea is to look at external signs and skip the whole
selftest or just some sub-tests before starting them. This cannot be
applied in all cases.
Similar to the second part, this third one focuses on marking different
sub-tests as skipped if some MPTCP features are not supported. This
time, only in "mptcp_join.sh" selftest, the remaining one, is modified.
Several techniques are used here to achieve this task:
- Before starting some tests:
- Check if a file (sysctl knob) is present: that's what patch 12/17 is
doing for the userspace PM feature.
- Check if a required kernel symbol is present in /proc/kallsyms:
patches 9, 10, 14 and 15/17 are using this technique.
- Check if it is possible to setup a particular network environment
requiring Netfilter or TC: if the preparation step fail, the linked
sub-test is marked as skipped. Patch 5/17 is doing that.
- Check if a MIB counter is available: patches 7 and 13/17 do that.
- Check if the kernel version is newer than a specific one: patch 1/17
adds some helpers in mptcp_lib.sh to ease its use. That's not ideal
and it is only used as last resort but as mentioned above, it is
important to skip tests if they are not supported not to have the
whole selftest always being marked as failed on old kernels. Patches
11 and 17/17 are checking the kernel version. An alternative would
be to ignore the results for some sub-tests but that's not ideal
too. Note that SELFTESTS_MPTCP_LIB_NO_KVERSION_CHECK env var can be
set to 1 not to skip these tests if the running kernel doesn't have
a supported version.
- After having launched the tests:
- Adapt the expectations depending on the presence of a kernel symbol
(patch 6/17) or a kernel version (patch 8/17).
- Check is a MIB counter is available and skip the verification if
not. Patch 4/17 is using this technique.
Before skipping tests, SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var
value is checked: if it is set to 1, the test is marked as "failed"
instead of "skipped". MPTCP public CI expects to have all features
supported and it sets this env var to 1 to catch regressions in these
new checks.
Patch 2/17 uses 'iptables-legacy' if available because it might be
needed when using an older kernel not supporting iptables-nft.
Patch 3/17 adds some helpers used in the other patches mentioned to
easily mark sub-tests as skipped.
Patch 16/17 uniforms MPTCP Join "listener" tests: it was imported code
from userspace_pm.sh but without using the "code style" and ways of
using tools and printing messages from MPTCP Join selftest.
Link: https://lore.kernel.org/stable/CA+G9fYtDGpgT4dckXD-y-N92nqUxuvue_7AtDdBcHrb… [1]
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Note that it is supposed to be the last series on this subject for -net.
Also, this will conflict with commit 0639fa230a21 ("selftests: mptcp:
add explicit check for new mibs") that is currently in net-next but not
in -net. Here is the resolution. It is a bit long but you will see, it
is simple: take the version from -net with get_counter() and for the
last one, move the new call to chk_rm_tx_nr() inside the 'if' statement:
------------------- 8< -------------------
diff --cc tools/testing/selftests/net/mptcp/mptcp_join.sh
index 0ae8cafde439,85474e029784..bd47cdc2bd15
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@@ -1360,27 -1265,23 +1355,25 @@@ chk_fclose_nr(
fi
printf "%-${nr_blank}s %s" " " "ctx"
- count=$(ip netns exec $ns_tx nstat -as | grep MPTcpExtMPFastcloseTx | awk '{print $2}')
- [ -z "$count" ] && count=0
- [ "$count" != "$fclose_tx" ] && extra_msg="$extra_msg,tx=$count"
- if [ "$count" != "$fclose_tx" ]; then
+ count=$(get_counter ${ns_tx} "MPTcpExtMPFastcloseTx")
+ if [ -z "$count" ]; then
+ echo -n "[skip]"
+ elif [ "$count" != "$fclose_tx" ]; then
+ extra_msg="$extra_msg,tx=$count"
echo "[fail] got $count MP_FASTCLOSE[s] TX expected $fclose_tx"
fail_test
- dump_stats=1
else
echo -n "[ ok ]"
fi
echo -n " - fclzrx"
- count=$(ip netns exec $ns_rx nstat -as | grep MPTcpExtMPFastcloseRx | awk '{print $2}')
- [ -z "$count" ] && count=0
- [ "$count" != "$fclose_rx" ] && extra_msg="$extra_msg,rx=$count"
- if [ "$count" != "$fclose_rx" ]; then
+ count=$(get_counter ${ns_rx} "MPTcpExtMPFastcloseRx")
+ if [ -z "$count" ]; then
+ echo -n "[skip]"
+ elif [ "$count" != "$fclose_rx" ]; then
+ extra_msg="$extra_msg,rx=$count"
echo "[fail] got $count MP_FASTCLOSE[s] RX expected $fclose_rx"
fail_test
- dump_stats=1
else
echo -n "[ ok ]"
fi
@@@ -1408,25 -1306,21 +1398,23 @@@ chk_rst_nr(
fi
printf "%-${nr_blank}s %s" " " "rtx"
- count=$(ip netns exec $ns_tx nstat -as | grep MPTcpExtMPRstTx | awk '{print $2}')
- [ -z "$count" ] && count=0
- if [ $count -lt $rst_tx ]; then
+ count=$(get_counter ${ns_tx} "MPTcpExtMPRstTx")
+ if [ -z "$count" ]; then
+ echo -n "[skip]"
+ elif [ $count -lt $rst_tx ]; then
echo "[fail] got $count MP_RST[s] TX expected $rst_tx"
fail_test
- dump_stats=1
else
echo -n "[ ok ]"
fi
echo -n " - rstrx "
- count=$(ip netns exec $ns_rx nstat -as | grep MPTcpExtMPRstRx | awk '{print $2}')
- [ -z "$count" ] && count=0
- if [ "$count" -lt "$rst_rx" ]; then
+ count=$(get_counter ${ns_rx} "MPTcpExtMPRstRx")
+ if [ -z "$count" ]; then
+ echo -n "[skip]"
+ elif [ "$count" -lt "$rst_rx" ]; then
echo "[fail] got $count MP_RST[s] RX expected $rst_rx"
fail_test
- dump_stats=1
else
echo -n "[ ok ]"
fi
@@@ -1441,28 -1333,23 +1427,25 @@@ chk_infi_nr(
local infi_tx=$1
local infi_rx=$2
local count
- local dump_stats
printf "%-${nr_blank}s %s" " " "itx"
- count=$(ip netns exec $ns2 nstat -as | grep InfiniteMapTx | awk '{print $2}')
- [ -z "$count" ] && count=0
- if [ "$count" != "$infi_tx" ]; then
+ count=$(get_counter ${ns2} "MPTcpExtInfiniteMapTx")
+ if [ -z "$count" ]; then
+ echo -n "[skip]"
+ elif [ "$count" != "$infi_tx" ]; then
echo "[fail] got $count infinite map[s] TX expected $infi_tx"
fail_test
- dump_stats=1
else
echo -n "[ ok ]"
fi
echo -n " - infirx"
- count=$(ip netns exec $ns1 nstat -as | grep InfiniteMapRx | awk '{print $2}')
- [ -z "$count" ] && count=0
- if [ "$count" != "$infi_rx" ]; then
+ count=$(get_counter ${ns1} "MPTcpExtInfiniteMapRx")
+ if [ -z "$count" ]; then
+ echo "[skip]"
+ elif [ "$count" != "$infi_rx" ]; then
echo "[fail] got $count infinite map[s] RX expected $infi_rx"
fail_test
- dump_stats=1
else
echo "[ ok ]"
fi
@@@ -1491,13 -1375,11 +1471,12 @@@ chk_join_nr(
fi
printf "%03u %-36s %s" "${TEST_COUNT}" "${title}" "syn"
- count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMPJoinSynRx | awk '{print $2}')
- [ -z "$count" ] && count=0
- if [ "$count" != "$syn_nr" ]; then
+ count=$(get_counter ${ns1} "MPTcpExtMPJoinSynRx")
+ if [ -z "$count" ]; then
+ echo -n "[skip]"
+ elif [ "$count" != "$syn_nr" ]; then
echo "[fail] got $count JOIN[s] syn expected $syn_nr"
fail_test
- dump_stats=1
else
echo -n "[ ok ]"
fi
@@@ -1523,13 -1403,11 +1501,12 @@@
fi
echo -n " - ack"
- count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMPJoinAckRx | awk '{print $2}')
- [ -z "$count" ] && count=0
- if [ "$count" != "$ack_nr" ]; then
+ count=$(get_counter ${ns1} "MPTcpExtMPJoinAckRx")
+ if [ -z "$count" ]; then
+ echo "[skip]"
+ elif [ "$count" != "$ack_nr" ]; then
echo "[fail] got $count JOIN[s] ack expected $ack_nr"
fail_test
- dump_stats=1
else
echo "[ ok ]"
fi
@@@ -1599,40 -1475,35 +1574,37 @@@ chk_add_nr(
timeout=$(ip netns exec $ns1 sysctl -n net.mptcp.add_addr_timeout)
printf "%-${nr_blank}s %s" " " "add"
- count=$(ip netns exec $ns2 nstat -as MPTcpExtAddAddr | grep MPTcpExtAddAddr | awk '{print $2}')
- [ -z "$count" ] && count=0
-
+ count=$(get_counter ${ns2} "MPTcpExtAddAddr")
+ if [ -z "$count" ]; then
+ echo -n "[skip]"
# if the test configured a short timeout tolerate greater then expected
# add addrs options, due to retransmissions
- if [ "$count" != "$add_nr" ] && { [ "$timeout" -gt 1 ] || [ "$count" -lt "$add_nr" ]; }; then
+ elif [ "$count" != "$add_nr" ] && { [ "$timeout" -gt 1 ] || [ "$count" -lt "$add_nr" ]; }; then
echo "[fail] got $count ADD_ADDR[s] expected $add_nr"
fail_test
- dump_stats=1
else
echo -n "[ ok ]"
fi
echo -n " - echo "
- count=$(ip netns exec $ns1 nstat -as MPTcpExtEchoAdd | grep MPTcpExtEchoAdd | awk '{print $2}')
- [ -z "$count" ] && count=0
- if [ "$count" != "$echo_nr" ]; then
+ count=$(get_counter ${ns1} "MPTcpExtEchoAdd")
+ if [ -z "$count" ]; then
+ echo -n "[skip]"
+ elif [ "$count" != "$echo_nr" ]; then
echo "[fail] got $count ADD_ADDR echo[s] expected $echo_nr"
fail_test
- dump_stats=1
else
echo -n "[ ok ]"
fi
if [ $port_nr -gt 0 ]; then
echo -n " - pt "
- count=$(ip netns exec $ns2 nstat -as | grep MPTcpExtPortAdd | awk '{print $2}')
- [ -z "$count" ] && count=0
- if [ "$count" != "$port_nr" ]; then
+ count=$(get_counter ${ns2} "MPTcpExtPortAdd")
+ if [ -z "$count" ]; then
+ echo "[skip]"
+ elif [ "$count" != "$port_nr" ]; then
echo "[fail] got $count ADD_ADDR[s] with a port-number expected $port_nr"
fail_test
- dump_stats=1
else
echo "[ ok ]"
fi
@@@ -1737,13 -1633,11 +1734,12 @@@ chk_rm_nr(
fi
printf "%-${nr_blank}s %s" " " "rm "
- count=$(ip netns exec $addr_ns nstat -as MPTcpExtRmAddr | grep MPTcpExtRmAddr | awk '{print $2}')
- [ -z "$count" ] && count=0
- if [ "$count" != "$rm_addr_nr" ]; then
+ count=$(get_counter ${addr_ns} "MPTcpExtRmAddr")
+ if [ -z "$count" ]; then
+ echo -n "[skip]"
+ elif [ "$count" != "$rm_addr_nr" ]; then
echo "[fail] got $count RM_ADDR[s] expected $rm_addr_nr"
fail_test
- dump_stats=1
else
echo -n "[ ok ]"
fi
@@@ -1767,12 -1661,12 +1763,10 @@@
else
echo "[fail] got $count RM_SUBFLOW[s] expected in range [$rm_subflow_nr:$((rm_subflow_nr*2))]"
fail_test
- dump_stats=1
fi
- return
- fi
- if [ "$count" != "$rm_subflow_nr" ]; then
+ elif [ "$count" != "$rm_subflow_nr" ]; then
echo "[fail] got $count RM_SUBFLOW[s] expected $rm_subflow_nr"
fail_test
- dump_stats=1
else
echo -n "[ ok ]"
fi
@@@ -1787,28 -1696,23 +1796,25 @@@ chk_prio_nr(
local mp_prio_nr_tx=$1
local mp_prio_nr_rx=$2
local count
- local dump_stats
printf "%-${nr_blank}s %s" " " "ptx"
- count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMPPrioTx | awk '{print $2}')
- [ -z "$count" ] && count=0
- if [ "$count" != "$mp_prio_nr_tx" ]; then
+ count=$(get_counter ${ns1} "MPTcpExtMPPrioTx")
+ if [ -z "$count" ]; then
+ echo -n "[skip]"
+ elif [ "$count" != "$mp_prio_nr_tx" ]; then
echo "[fail] got $count MP_PRIO[s] TX expected $mp_prio_nr_tx"
fail_test
- dump_stats=1
else
echo -n "[ ok ]"
fi
echo -n " - prx "
- count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMPPrioRx | awk '{print $2}')
- [ -z "$count" ] && count=0
- if [ "$count" != "$mp_prio_nr_rx" ]; then
+ count=$(get_counter ${ns1} "MPTcpExtMPPrioRx")
+ if [ -z "$count" ]; then
+ echo "[skip]"
+ elif [ "$count" != "$mp_prio_nr_rx" ]; then
echo "[fail] got $count MP_PRIO[s] RX expected $mp_prio_nr_rx"
fail_test
- dump_stats=1
else
echo "[ ok ]"
fi
@@@ -2394,12 -2290,8 +2399,13 @@@ remove_tests(
pm_nl_add_endpoint $ns2 10.0.4.2 flags subflow
run_tests $ns1 $ns2 10.0.1.1 0 -8 -8 slow
chk_join_nr 3 3 3
- chk_rm_tx_nr 0
- chk_rm_nr 0 3 simult
+
+ if mptcp_lib_kversion_ge 5.18; then
++ chk_rm_tx_nr 0
+ chk_rm_nr 0 3 simult
+ else
+ chk_rm_nr 3 3
+ fi
fi
# addresses flush
------------------- 8< -------------------
The resolved conflicts are also visible there:
https://github.com/multipath-tcp/mptcp_net-next/blob/t/DO-NOT-MERGE-git-mar…
---
Matthieu Baerts (17):
selftests: mptcp: lib: skip if not below kernel version
selftests: mptcp: join: use 'iptables-legacy' if available
selftests: mptcp: join: helpers to skip tests
selftests: mptcp: join: skip check if MIB counter not supported
selftests: mptcp: join: skip test if iptables/tc cmds fail
selftests: mptcp: join: support local endpoint being tracked or not
selftests: mptcp: join: skip Fastclose tests if not supported
selftests: mptcp: join: support RM_ADDR for used endpoints or not
selftests: mptcp: join: skip implicit tests if not supported
selftests: mptcp: join: skip backup if set flag on ID not supported
selftests: mptcp: join: skip fullmesh flag tests if not supported
selftests: mptcp: join: skip userspace PM tests if not supported
selftests: mptcp: join: skip fail tests if not supported
selftests: mptcp: join: skip MPC backups tests if not supported
selftests: mptcp: join: skip PM listener tests if not supported
selftests: mptcp: join: uniform listener tests
selftests: mptcp: join: skip mixed tests if not supported
tools/testing/selftests/net/mptcp/mptcp_join.sh | 513 +++++++++++++++---------
tools/testing/selftests/net/mptcp/mptcp_lib.sh | 26 ++
2 files changed, 354 insertions(+), 185 deletions(-)
---
base-commit: 1b8975f30abffc4f74f1ba049f9042e7d8f646cc
change-id: 20230609-upstream-net-20230610-mptcp-selftests-support-old-kernels-part-3-37aa5185e955
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
The KTAP parser I used to test the KTAP output for ftracetest was overly
robust and did not notice that the test number and pass/fail result were
reversed. Fix this.
Fixes: dbcf76390eb9 ("elftests/ftrace: Improve integration with kselftest runner")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/ftrace/ftracetest | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/ftracetest b/tools/testing/selftests/ftrace/ftracetest
index 2506621e75df..cb5f18c06593 100755
--- a/tools/testing/selftests/ftrace/ftracetest
+++ b/tools/testing/selftests/ftrace/ftracetest
@@ -301,7 +301,7 @@ ktaptest() { # result comment
comment="# $comment"
fi
- echo $CASENO $result $INSTANCE$CASENAME $comment
+ echo $result $CASENO $INSTANCE$CASENAME $comment
}
eval_result() { # sigval
---
base-commit: dbcf76390eb9a65d5d0c37b0cd57335218564e37
change-id: 20230609-ftrace-ktap-order-d5b64a74dc79
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Building and running the subsuite 'damon' of kselftest, shows the
following issues:
selftests: damon: debugfs_attrs.sh
/sys/kernel/debug/damon not found
By creating a config file enabling DAMON fragments in the
selftests/damon/ directory the tests pass.
Fixes: b348eb7abd09 ("mm/damon: add user space selftests")
Reported-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Signed-off-by: Anders Roxell <anders.roxell(a)linaro.org>
---
tools/testing/selftests/damon/config | 7 +++++++
1 file changed, 7 insertions(+)
create mode 100644 tools/testing/selftests/damon/config
diff --git a/tools/testing/selftests/damon/config b/tools/testing/selftests/damon/config
new file mode 100644
index 000000000000..0daf38974eb0
--- /dev/null
+++ b/tools/testing/selftests/damon/config
@@ -0,0 +1,7 @@
+CONFIG_DAMON=y
+CONFIG_DAMON_SYSFS=y
+CONFIG_DAMON_DBGFS=y
+CONFIG_DAMON_PADDR=y
+CONFIG_DAMON_VADDR=y
+CONFIG_DAMON_RECLAIM=y
+CONFIG_DAMON_LRU_SORT=y
--
2.39.2
Hi,
Commit cb2c7d1a1776 ("landlock: Support filesystem access-control")
introduced a new ARCH_EPHEMERAL_INODES configuration, only enabled for
User-Mode Linux. The reason was that UML's hostfs managed inodes in an
ephemeral way: from the kernel point of view, the same inode struct
could be created several times while being used by user space because
the kernel didn't hold references to inodes. Because Landlock (and
probably other subsystems) ties properties (i.e. access rights) to inode
objects, it wasn't possible to create rules that match inodes and then
allow specific accesses.
This patch series fixes the way UML manages inodes according to the
underlying filesystem. They are now properly handles as for other
filesystems, which enables to support Landlock (and probably other
features).
Changes since v1:
https://lore.kernel.org/r/20230309165455.175131-1-mic@digikod.net
- Remove Cc stable@ (suggested by Richard).
- Add Acked-by: Richard Weinberger to the first patch.
- Split the test patch into two patches: one for the common
pseudo-filesystems, and another patch dedicated to hostfs.
- Remove CONFIG_SECURITY_PATH because it is useless for merge_config.sh
- Move CONFIG_HOSTFS to a new config.um file.
- Fix commit message spelling and test warnings.
- Improve prepare_layout_opt() with remove_path() call to avoid
cascading errors when some tested filesystems are not supported.
- Remove cgroup-v1 tests because this filesystem cannot really be
mounted several times.
- Add test coverage with and without kernel debug code, according to
GCC 12 and GCC 13.
Regards,
Mickaël Salaün (6):
hostfs: Fix ephemeral inodes
selftests/landlock: Don't create useless file layouts
selftests/landlock: Add supports_filesystem() helper
selftests/landlock: Make mounts configurable
selftests/landlock: Add tests for pseudo filesystems
selftests/landlock: Add hostfs tests
arch/Kconfig | 7 -
arch/um/Kconfig | 1 -
fs/hostfs/hostfs.h | 1 +
fs/hostfs/hostfs_kern.c | 213 ++++++------
fs/hostfs/hostfs_user.c | 1 +
security/landlock/Kconfig | 2 +-
tools/testing/selftests/landlock/config | 9 +-
tools/testing/selftests/landlock/config.um | 1 +
tools/testing/selftests/landlock/fs_test.c | 387 +++++++++++++++++++--
9 files changed, 478 insertions(+), 144 deletions(-)
create mode 100644 tools/testing/selftests/landlock/config.um
base-commit: 858fd168a95c5b9669aac8db6c14a9aeab446375
--
2.41.0
Hi,
Commit cb2c7d1a1776 ("landlock: Support filesystem access-control")
introduced a new ARCH_EPHEMERAL_INODES configuration, only enabled for
User-Mode Linux. The reason was that UML's hostfs managed inodes in an
ephemeral way: from the kernel point of view, the same inode struct
could be created several times while being used by user space because
the kernel didn't hold references to inodes. Because Landlock (and
probably other subsystems) ties properties (i.e. access rights) to inode
objects, it wasn't possible to create rules that match inodes and then
allow specific accesses.
This patch series fixes the way UML manages inodes according to the
underlying filesystem. They are now properly handles as for other
filesystems, which enables to support Landlock (and probably other
features).
Backporting these patches requires some selftest harness patches
backports too.
Regards,
Mickaël Salaün (5):
hostfs: Fix ephemeral inodes
selftests/landlock: Don't create useless file layouts
selftests/landlock: Add supports_filesystem() helper
selftests/landlock: Make mounts configurable
selftests/landlock: Add tests for pseudo filesystems
arch/Kconfig | 7 -
arch/um/Kconfig | 1 -
fs/hostfs/hostfs.h | 1 +
fs/hostfs/hostfs_kern.c | 213 ++++++------
fs/hostfs/hostfs_user.c | 1 +
security/landlock/Kconfig | 2 +-
tools/testing/selftests/landlock/config | 8 +-
tools/testing/selftests/landlock/fs_test.c | 381 +++++++++++++++++++--
8 files changed, 472 insertions(+), 142 deletions(-)
base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6
--
2.39.2
It is wrong to include unprocessed user header files directly. They are
processed to "<source_tree>/usr/include" by running "make headers" and
they are included in selftests by kselftest makefiles automatically with
help of KHDR_INCLUDES variable. These headers should always bulilt
first before building kselftests.
Fixes: 07115fcc15b4 ("selftests/mm: add new selftests for KSM")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
tools/testing/selftests/mm/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 95acb099315e..e6cd60ca9e48 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -29,7 +29,7 @@ MACHINE ?= $(shell echo $(uname_M) | sed -e 's/aarch64.*/arm64/' -e 's/ppc64.*/p
# LDLIBS.
MAKEFLAGS += --no-builtin-rules
-CFLAGS = -Wall -I $(top_srcdir) -I $(top_srcdir)/tools/include/uapi $(EXTRA_CFLAGS) $(KHDR_INCLUDES)
+CFLAGS = -Wall -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES)
LDLIBS = -lrt -lpthread
TEST_GEN_PROGS = cow
--
2.39.2
Here is a series with some fixes and cleanups to resctrl selftests. In
v3, the rewritten CAT test is not included as an issue was discovered
in one of its components requiring further work before it can be
included to mainline.
v3:
- Don't include rewritten CAT test into this series!
- Tweak wildcard style in Makefile
- Fix many changelog typos, remove some wrong claims, and generally
improve them.
- Add fix to PARENT_EXIT() to unmount resctrl FS
- Add unmounting resctrl FS before starting any tests
- Add fix for buf leak
- Add fix for perf fd closing
- Split mount/remount/umount patches differently
- Use size_t and %zu for span
- Keep MBM print as MB, only internally use span in bytes
- Drop start_buf global from fill_buf
v2 (was sent with CAT test rewrite which is no longer included in v3):
- Rebased on top of next to solve the conflicts
- Added 2 patches related to resctrl FS mount/umount (fix + cleanup)
- Consistently use "alloc" in cache_alloc_size()
- CAT test error handling tweaked
- Remove a spurious newline change from the CAT patch
- Small improvements to changelogs
Ilpo Järvinen (19):
selftests/resctrl: Add resctrl.h into build deps
selftests/resctrl: Don't leak buffer in fill_cache()
selftests/resctrl: Unmount resctrl FS if child fails to run benchmark
selftests/resctrl: Close perf value read fd on errors
selftests/resctrl: Unmount resctrl FS before starting the first test
selftests/resctrl: Move resctrl FS mount/umount to higher level
selftests/resctrl: Refactor remount_resctrl(bool mum_resctrlfs) to
mount_resctrl()
selftests/resctrl: Remove mum_resctrlfs from struct resctrl_val_param
selftests/resctrl: Convert span to size_t
selftests/resctrl: Express span internally in bytes
selftests/resctrl: Remove duplicated preparation for span arg
selftests/resctrl: Remove "malloc_and_init_memory" param from
run_fill_buf()
selftests/resctrl: Remove unnecessary startptr global from fill_buf
selftests/resctrl: Improve parameter consistency in fill_buf
selftests/resctrl: Don't pass test name to fill_buf
selftests/resctrl: Don't use variable argument list for ->setup()
selftests/resctrl: Move CAT/CMT test global vars to function they are
used in
selftests/resctrl: Pass the real number of tests to show_cache_info()
selftests/resctrl: Remove test type checks from cat_val()
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/resctrl/cache.c | 64 +++++++-------
tools/testing/selftests/resctrl/cat_test.c | 28 ++----
tools/testing/selftests/resctrl/cmt_test.c | 29 ++-----
tools/testing/selftests/resctrl/fill_buf.c | 87 +++++++------------
tools/testing/selftests/resctrl/mba_test.c | 9 +-
tools/testing/selftests/resctrl/mbm_test.c | 17 ++--
tools/testing/selftests/resctrl/resctrl.h | 17 ++--
.../testing/selftests/resctrl/resctrl_tests.c | 82 +++++++++++------
tools/testing/selftests/resctrl/resctrl_val.c | 7 +-
tools/testing/selftests/resctrl/resctrlfs.c | 57 ++++++------
11 files changed, 169 insertions(+), 230 deletions(-)
--
2.30.2
Hi,
The test failed with the latest torvalds tree kernel 6.4-rc5-00305-g022ce8862dff
on AMD Ryzen 9 and Ubuntu 22.04 Jammy.
The config is a merge of Ubuntu generic config and selftest config files.
Debug output with `set -x` is [edited]:
root@host:selftests/drivers/net/bonding# ./bond-arp-interval-causes-panic.sh
Cannot find device "link1_1"
root@defiant:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/drivers/net/bonding# vi !$
vi ./bond-arp-interval-causes-panic.sh
root@host:selftests/drivers/net/bonding# ./bond-arp-interval-causes-panic.sh
+ test 0 -ne 0
+ trap finish EXIT
+ client_ip4=192.168.1.198
+ server_ip4=192.168.1.254
+ echo 180
+ ip link add dev link1_1 type veth peer name link1_2
+ ip netns add server
+ ip link set dev link1_2 netns server up name eth0
+ ip netns exec server ip addr add 192.168.1.254/24 dev eth0
+ ip netns add client
+ ip link set dev link1_1 netns client down name eth0
+ ip netns exec client ip link add dev bond0 down type bond mode 1 miimon 100 all_slaves_active 1
+ ip netns exec client ip link set dev eth0 down master bond0
+ ip netns exec client ip link set dev bond0 up
+ ip netns exec client ip addr add 192.168.1.198/24 dev bond0
+ ip netns exec client ping -c 5 192.168.1.254
+ finish
+ ip netns delete server
+ ip netns delete client
+ ip link del link1_1
Cannot find device "link1_1"
+ true
root@host:testing/selftests/drivers/net/bonding# uname -rms
Linux 6.4.0-rc5-kmlk-netdbg-iwlwifi-00305-g022ce8862dff x86_64
root@host:testing/selftests/drivers/net/bonding#
Some debugging:
I have added some "ip link show" commands in the finish() function:
finish()
{
ip link show
ip netns delete server || true
ip netns delete client || true
ip link show
ip link del link1_1 || true
}
Now the debug output is like this:
root@host:selftests/drivers/net/bonding# ./bond-arp-interval-causes-panic.sh
+ test 0 -ne 0
+ trap finish EXIT
+ client_ip4=192.168.1.198
+ server_ip4=192.168.1.254
+ echo 180
+ ip link add dev link1_1 type veth peer name link1_2
+ ip netns add server
+ ip link set dev link1_2 netns server up name eth0
+ ip netns exec server ip addr add 192.168.1.254/24 dev eth0
+ ip netns add client
+ ip link set dev link1_1 netns client down name eth0
+ ip netns exec client ip link add dev bond0 down type bond mode 1 miimon 100 all_slaves_active 1
+ ip netns exec client ip link set dev eth0 down master bond0
+ ip netns exec client ip link set dev bond0 up
+ ip netns exec client ip addr add 192.168.1.198/24 dev bond0
+ ip netns exec client ping -c 5 192.168.1.254
+ finish
+ ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 02:fc:ca:49:e2:d4 brd ff:ff:ff:ff:ff:ff
3: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/gre 0.0.0.0 brd 0.0.0.0
5: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
6: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
7: ip_vti0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
8: ip6_vti0@NONE: <NOARP> mtu 1332 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/tunnel6 :: brd :: permaddr 325b:a7df:c8db::
9: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/sit 0.0.0.0 brd 0.0.0.0
10: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/tunnel6 :: brd :: permaddr 76d3:be76:4187::
11: ip6gre0@NONE: <NOARP> mtu 1448 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/gre6 :: brd :: permaddr 569b:65fd:b94b::
12: enp16s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 9c:6b:00:01:fb:80 brd ff:ff:ff:ff:ff:ff
+ ip netns delete server
+ ip netns delete client
+ ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 02:fc:ca:49:e2:d4 brd ff:ff:ff:ff:ff:ff
3: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/gre 0.0.0.0 brd 0.0.0.0
5: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
6: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
7: ip_vti0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
8: ip6_vti0@NONE: <NOARP> mtu 1332 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/tunnel6 :: brd :: permaddr 325b:a7df:c8db::
9: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/sit 0.0.0.0 brd 0.0.0.0
10: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/tunnel6 :: brd :: permaddr 76d3:be76:4187::
11: ip6gre0@NONE: <NOARP> mtu 1448 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/gre6 :: brd :: permaddr 569b:65fd:b94b::
12: enp16s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 9c:6b:00:01:fb:80 brd ff:ff:ff:ff:ff:ff
+ ip link del link1_1
Cannot find device "link1_1"
+ true
root@host:selftests/drivers/net/bonding#
Adding more `ip link show` before and after operations with link_1
had shown that `ip link set dev link1_1 netns client down name eth0` command
shuts down the link, so the `ip link del link1_1` doesn't succeed, as seen
here:
+ ip netns exec server ip addr add 192.168.1.254/24 dev eth0
+ ip netns add client
+ ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 02:fc:ca:49:e2:d4 brd ff:ff:ff:ff:ff:ff
3: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/gre 0.0.0.0 brd 0.0.0.0
5: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
6: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
7: ip_vti0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
8: ip6_vti0@NONE: <NOARP> mtu 1332 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/tunnel6 :: brd :: permaddr 325b:a7df:c8db::
9: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/sit 0.0.0.0 brd 0.0.0.0
10: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/tunnel6 :: brd :: permaddr 76d3:be76:4187::
11: ip6gre0@NONE: <NOARP> mtu 1448 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/gre6 :: brd :: permaddr 569b:65fd:b94b::
12: enp16s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 9c:6b:00:01:fb:80 brd ff:ff:ff:ff:ff:ff
64: link1_1@if63: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 32:d6:de:9f:5d:e2 brd ff:ff:ff:ff:ff:ff link-netns server
+ ip link set dev link1_1 netns client down name eth0
+ ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 02:fc:ca:49:e2:d4 brd ff:ff:ff:ff:ff:ff
3: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/gre 0.0.0.0 brd 0.0.0.0
5: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
6: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
7: ip_vti0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
8: ip6_vti0@NONE: <NOARP> mtu 1332 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/tunnel6 :: brd :: permaddr 325b:a7df:c8db::
9: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/sit 0.0.0.0 brd 0.0.0.0
10: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/tunnel6 :: brd :: permaddr 76d3:be76:4187::
11: ip6gre0@NONE: <NOARP> mtu 1448 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/gre6 :: brd :: permaddr 569b:65fd:b94b::
12: enp16s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 9c:6b:00:01:fb:80 brd ff:ff:ff:ff:ff:ff
+ ip netns exec client ip link add dev bond0 down type bond mode 1 miimon 100 all_slaves_active 1
Hope this helps.
I am not sure what is the right thing to do with this test, and whether it is
the expected behaviour of the kernel.
Best regards,
Mirsad
Hi,
On several platforms, and for some time, I've noticed alsa: pcm-test TIMEOUT.
I have tried to increase the timeout in selftests/alsa/settings: timeout=300,
so I think there is no more purpose to further increase it, so something appears
generally stuck.
The test runs up to "default.time4.1.8 section where it hangs for more than
200 seconds, possibly indefinitely.
The output of the selftest is:
# # default.time3.1.8.0.PLAYBACK - 44.1kHz stereo large periods
# # default.time3.1.8.0.PLAYBACK hw_params.RW_INTERLEAVED.S16_LE.44100.2.22496.202464 sw_params.202464
# ok 61 default.time3.0.3.0.PLAYBACK
# # default.time4.0.3.0.PLAYBACK - 48kHz stereo small periods
# # default.time4.0.3.0.PLAYBACK hw_params.RW_INTERLEAVED.S16_LE.48000.2.512.4096 sw_params.4096
# ok 62 default.time4.0.3.0.PLAYBACK
# # default.time5.0.3.0.PLAYBACK - 48kHz stereo large periods
# # default.time5.0.3.0.PLAYBACK hw_params.RW_INTERLEAVED.S16_LE.48000.2.24000.192000 sw_params.192000
# ok 63 default.time5.0.3.0.PLAYBACK
# # default.time6.0.3.0.PLAYBACK - 48kHz 6 channel large periods
# # default.time6.0.3.0.PLAYBACK hw_params.RW_INTERLEAVED.S16_LE.48000.2.48000.576000 sw_params.576000
# ok 64 default.time6.0.3.0.PLAYBACK
# # default.time7.0.3.0.PLAYBACK - 96kHz stereo large periods
# # default.time7.0.3.0.PLAYBACK hw_params.RW_INTERLEAVED.S16_LE.96000.2.48000.192000 sw_params.192000
# not ok 65 default.time3.1.8.0.PLAYBACK
# # time mismatch: expected 4000ms got 17005
# # default.time4.1.8.#
not ok 2 selftests: alsa: pcm-test # TIMEOUT 300 seconds
The platform is AMD Ryzen 9 assembled box with AsRock mainboard. Config and Lshw attached.
CONTINUED:
Just to test further, I have increased timeout even further, to 400 seconds.
Only then the test passed, but with numerous errors, and this is Ryzen 9, so
I guess it can only be worse on hardware like i3 or i5.
But many subtests failed, so I am submitting the entire test log (due to mailing list
constraints to 100K size of attachments, it is compressed).
Best regards,
Mirsad
--------------
diff -u /dev/null tools/testing/selftests/alsa/settings
--- /dev/null 2023-06-11 00:36:30.651447094 +0200
+++ tools/testing/selftests/alsa/settings 2023-06-11 00:37:32.067504069 +0200
@@ -0,0 +1 @@
+timeout=400
Willy, Thomas
This is the revision of the v1 syscall helpers [1], just rebased it on
20230606-nolibc-rv32+stkp7a of [2]. It doesn't conflict with the -ENOSYS
patchset [3], so, it is ok to simply merge both of them.
This revision mainly applied your suggestions of v1, both of the syscall
return and call helpers are simplified or cleaned up.
Changes from v1 -> v2:
* tools/nolibc: sys.h: add __syscall() and __sysret() helpers
* Use inline function instead of macro for the syscall return helper
(Suggestion from Thomas)
* Rename syscall return helper from __syscall_ret to __sysret
(align with __syscall and it is not that long now)
* Make __sysret() be always inline
(Suggestion from Willy)
* Simplify the whole __syscall() macro to oneline code
(Benefit from the fixed 'long' return type of syscalls)
* tools/nolibc: unistd.h: apply __sysret() helper
* Convert the whole _syscall() macro to oneline code
* tools/nolibc: sys.h: apply __sysret() helper
* Futher convert both brk() and getpagesize() to oneline code
* tools/nolibc: sys.h: apply __syscall() helper
* Keep the same as v1, because the __syscall() usage not changed
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/linux-riscv/cover.1685856497.git.falcon@tinylab.org/
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/wtarreau/nolibc.git
[3]: https://lore.kernel.org/linux-riscv/cover.1685780412.git.falcon@tinylab.org/
Zhangjin Wu (4):
tools/nolibc: sys.h: add __syscall() and __sysret() helpers
tools/nolibc: unistd.h: apply __sysret() helper
tools/nolibc: sys.h: apply __sysret() helper
tools/nolibc: sys.h: apply __syscall() helper
tools/include/nolibc/sys.h | 366 ++++++----------------------------
tools/include/nolibc/unistd.h | 11 +-
2 files changed, 57 insertions(+), 320 deletions(-)
--
2.25.1
On Sun, Jun 04, 2023 at 10:41:05PM +0800, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed "sysctl_could_not_get_directory" on:
>
> commit: 1997935e918fa4c07b70be47ef8f37622df427bd ("[PATCH 6/8] test_sysclt: Test for registering a mount point")
> url: https://protect2.fireeye.com/v1/url?k=ee66a422-8f1d0eab-ee672f6d-74fe486000…
> base: https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux.git sysctl-next
> patch link: https://lore.kernel.org/all/20230602110638.789426-7-j.granados@samsung.com/
> patch subject: [PATCH 6/8] test_sysclt: Test for registering a mount point
>
> in testcase: boot
>
> compiler: gcc-12
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <oliver.sang(a)intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202306042234.f2d7beff-oliver.sang@intel.com
>
>
> [ 15.271017][ T1] initcall io_uring_init+0x0/0x40 returned 0 after 87 usecs
> [ 15.272122][ T1] calling test_firmware_init+0x0/0x190 @ 1
> [ 15.274422][ T1] test_firmware: interface ready
> [ 15.275240][ T1] initcall test_firmware_init+0x0/0x190 returned 0 after 2200 usecs
> [ 15.276480][ T1] calling test_sysctl_init+0x0/0x630 @ 1
> [ 15.277687][ T1] sysctl could not get directory: /debug/test_sysctl/mnt/mnt_error -30
This is precisely what I'm trying to test. I'm trying to create a
directory on top of a permanently empty directory and expecting the
failure and checking to see that the mnt_error directory was not
created.
@mcgrof: Can we just ignore this 0-day report as a false positive?
Best
> [ 15.279055][ T1] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.4.0-rc2-00016-g1997935e918f #1
> [ 15.280027][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 15.280027][ T1] Call Trace:
> [ 15.280027][ T1] <TASK>
> [ 15.280027][ T1] dump_stack_lvl (kbuild/src/consumer/lib/dump_stack.c:107)
> [ 15.280027][ T1] __register_sysctl_table (kbuild/src/consumer/fs/proc/proc_sysctl.c:1379)
> [ 15.280027][ T1] test_sysctl_init (kbuild/src/consumer/lib/test_sysctl.c:220 kbuild/src/consumer/lib/test_sysctl.c:235)
> [ 15.280027][ T1] ? test_firmware_init (kbuild/src/consumer/lib/test_sysctl.c:224)
>
>
> To reproduce:
>
> # build kernel
> cd linux
> cp config-6.4.0-rc2-00016-g1997935e918f .config
> make HOSTCC=gcc-12 CC=gcc-12 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
> make HOSTCC=gcc-12 CC=gcc-12 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
> cd <mod-install-dir>
> find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
>
>
> git clone https://protect2.fireeye.com/v1/url?k=739e8a44-12e520cd-739f010b-74fe486000…
> cd lkp-tests
> bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://protect2.fireeye.com/v1/url?k=0a5d5e5e-6b26f4d7-0a5cd511-74fe486000…
>
>
> #
> # Automatically generated file; DO NOT EDIT.
> # Linux/x86_64 6.4.0-rc2 Kernel Configuration
> #
> CONFIG_CC_VERSION_TEXT="gcc-12 (Debian 12.2.0-14) 12.2.0"
> CONFIG_CC_IS_GCC=y
> CONFIG_GCC_VERSION=120200
> CONFIG_CLANG_VERSION=0
> CONFIG_AS_IS_GNU=y
> CONFIG_AS_VERSION=24000
> CONFIG_LD_IS_BFD=y
> CONFIG_LD_VERSION=24000
> CONFIG_LLD_VERSION=0
> CONFIG_CC_CAN_LINK=y
> CONFIG_CC_CAN_LINK_STATIC=y
> CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
> CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT=y
> CONFIG_TOOLS_SUPPORT_RELR=y
> CONFIG_CC_HAS_ASM_INLINE=y
> CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
> CONFIG_PAHOLE_VERSION=125
> CONFIG_CONSTRUCTORS=y
> CONFIG_IRQ_WORK=y
> CONFIG_BUILDTIME_TABLE_SORT=y
> CONFIG_THREAD_INFO_IN_TASK=y
>
> #
> # General setup
> #
> CONFIG_INIT_ENV_ARG_LIMIT=32
> # CONFIG_COMPILE_TEST is not set
> # CONFIG_WERROR is not set
> CONFIG_LOCALVERSION=""
> CONFIG_LOCALVERSION_AUTO=y
> CONFIG_BUILD_SALT=""
> CONFIG_HAVE_KERNEL_GZIP=y
> CONFIG_HAVE_KERNEL_BZIP2=y
> CONFIG_HAVE_KERNEL_LZMA=y
> CONFIG_HAVE_KERNEL_XZ=y
> CONFIG_HAVE_KERNEL_LZO=y
> CONFIG_HAVE_KERNEL_LZ4=y
> CONFIG_HAVE_KERNEL_ZSTD=y
> CONFIG_KERNEL_GZIP=y
> # CONFIG_KERNEL_BZIP2 is not set
> # CONFIG_KERNEL_LZMA is not set
> # CONFIG_KERNEL_XZ is not set
> # CONFIG_KERNEL_LZO is not set
> # CONFIG_KERNEL_LZ4 is not set
> # CONFIG_KERNEL_ZSTD is not set
> CONFIG_DEFAULT_INIT=""
> CONFIG_DEFAULT_HOSTNAME="(none)"
> CONFIG_SYSVIPC=y
> CONFIG_SYSVIPC_SYSCTL=y
> CONFIG_SYSVIPC_COMPAT=y
> CONFIG_POSIX_MQUEUE=y
> CONFIG_POSIX_MQUEUE_SYSCTL=y
> # CONFIG_WATCH_QUEUE is not set
> CONFIG_CROSS_MEMORY_ATTACH=y
> # CONFIG_USELIB is not set
> CONFIG_AUDIT=y
> CONFIG_HAVE_ARCH_AUDITSYSCALL=y
> CONFIG_AUDITSYSCALL=y
>
> #
> # IRQ subsystem
> #
> CONFIG_GENERIC_IRQ_PROBE=y
> CONFIG_GENERIC_IRQ_SHOW=y
> CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
> CONFIG_GENERIC_PENDING_IRQ=y
> CONFIG_GENERIC_IRQ_MIGRATION=y
> CONFIG_GENERIC_IRQ_INJECTION=y
> CONFIG_HARDIRQS_SW_RESEND=y
> CONFIG_IRQ_DOMAIN=y
> CONFIG_IRQ_SIM=y
> CONFIG_IRQ_DOMAIN_HIERARCHY=y
> CONFIG_GENERIC_MSI_IRQ=y
> CONFIG_IRQ_MSI_IOMMU=y
> CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
> CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
> CONFIG_IRQ_FORCED_THREADING=y
> CONFIG_SPARSE_IRQ=y
> # CONFIG_GENERIC_IRQ_DEBUGFS is not set
> # end of IRQ subsystem
>
> CONFIG_CLOCKSOURCE_WATCHDOG=y
> CONFIG_ARCH_CLOCKSOURCE_INIT=y
> CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
> CONFIG_GENERIC_TIME_VSYSCALL=y
> CONFIG_GENERIC_CLOCKEVENTS=y
> CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
> CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
> CONFIG_GENERIC_CMOS_UPDATE=y
> CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y
> CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y
> CONFIG_CONTEXT_TRACKING=y
> CONFIG_CONTEXT_TRACKING_IDLE=y
>
> #
> # Timers subsystem
> #
> CONFIG_TICK_ONESHOT=y
> CONFIG_NO_HZ_COMMON=y
> # CONFIG_HZ_PERIODIC is not set
> # CONFIG_NO_HZ_IDLE is not set
> CONFIG_NO_HZ_FULL=y
> CONFIG_CONTEXT_TRACKING_USER=y
> # CONFIG_CONTEXT_TRACKING_USER_FORCE is not set
> CONFIG_NO_HZ=y
> CONFIG_HIGH_RES_TIMERS=y
> CONFIG_CLOCKSOURCE_WATCHDOG_MAX_SKEW_US=125
> # end of Timers subsystem
>
> CONFIG_BPF=y
> CONFIG_HAVE_EBPF_JIT=y
> CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y
>
> #
> # BPF subsystem
> #
> CONFIG_BPF_SYSCALL=y
> CONFIG_BPF_JIT=y
> CONFIG_BPF_JIT_ALWAYS_ON=y
> CONFIG_BPF_JIT_DEFAULT_ON=y
> CONFIG_BPF_UNPRIV_DEFAULT_OFF=y
> # CONFIG_BPF_PRELOAD is not set
> # CONFIG_BPF_LSM is not set
> # end of BPF subsystem
>
> CONFIG_PREEMPT_BUILD=y
> # CONFIG_PREEMPT_NONE is not set
> CONFIG_PREEMPT_VOLUNTARY=y
> # CONFIG_PREEMPT is not set
> CONFIG_PREEMPT_COUNT=y
> CONFIG_PREEMPTION=y
> CONFIG_PREEMPT_DYNAMIC=y
> # CONFIG_SCHED_CORE is not set
>
> #
> # CPU/Task time and stats accounting
> #
> CONFIG_VIRT_CPU_ACCOUNTING=y
> CONFIG_VIRT_CPU_ACCOUNTING_GEN=y
> CONFIG_IRQ_TIME_ACCOUNTING=y
> CONFIG_HAVE_SCHED_AVG_IRQ=y
> CONFIG_BSD_PROCESS_ACCT=y
> CONFIG_BSD_PROCESS_ACCT_V3=y
> CONFIG_TASKSTATS=y
> CONFIG_TASK_DELAY_ACCT=y
> CONFIG_TASK_XACCT=y
> CONFIG_TASK_IO_ACCOUNTING=y
> # CONFIG_PSI is not set
> # end of CPU/Task time and stats accounting
>
> CONFIG_CPU_ISOLATION=y
>
> #
> # RCU Subsystem
> #
> CONFIG_TREE_RCU=y
> CONFIG_PREEMPT_RCU=y
> # CONFIG_RCU_EXPERT is not set
> CONFIG_TREE_SRCU=y
> CONFIG_TASKS_RCU_GENERIC=y
> CONFIG_TASKS_RCU=y
> CONFIG_TASKS_RUDE_RCU=y
> CONFIG_TASKS_TRACE_RCU=y
> CONFIG_RCU_STALL_COMMON=y
> CONFIG_RCU_NEED_SEGCBLIST=y
> CONFIG_RCU_NOCB_CPU=y
> # CONFIG_RCU_NOCB_CPU_DEFAULT_ALL is not set
> # CONFIG_RCU_LAZY is not set
> # end of RCU Subsystem
>
> CONFIG_IKCONFIG=y
> CONFIG_IKCONFIG_PROC=y
> # CONFIG_IKHEADERS is not set
> CONFIG_LOG_BUF_SHIFT=20
> CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
> # CONFIG_PRINTK_INDEX is not set
> CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
>
> #
> # Scheduler features
> #
> # CONFIG_UCLAMP_TASK is not set
> # end of Scheduler features
>
> CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
> CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
> CONFIG_CC_HAS_INT128=y
> CONFIG_CC_IMPLICIT_FALLTHROUGH="-Wimplicit-fallthrough=5"
> CONFIG_GCC11_NO_ARRAY_BOUNDS=y
> CONFIG_CC_NO_ARRAY_BOUNDS=y
> CONFIG_ARCH_SUPPORTS_INT128=y
> CONFIG_NUMA_BALANCING=y
> CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
> CONFIG_CGROUPS=y
> CONFIG_PAGE_COUNTER=y
> # CONFIG_CGROUP_FAVOR_DYNMODS is not set
> CONFIG_MEMCG=y
> CONFIG_MEMCG_KMEM=y
> CONFIG_BLK_CGROUP=y
> CONFIG_CGROUP_WRITEBACK=y
> CONFIG_CGROUP_SCHED=y
> CONFIG_FAIR_GROUP_SCHED=y
> CONFIG_CFS_BANDWIDTH=y
> CONFIG_RT_GROUP_SCHED=y
> CONFIG_SCHED_MM_CID=y
> CONFIG_CGROUP_PIDS=y
> CONFIG_CGROUP_RDMA=y
> CONFIG_CGROUP_FREEZER=y
> CONFIG_CGROUP_HUGETLB=y
> CONFIG_CPUSETS=y
> CONFIG_PROC_PID_CPUSET=y
> CONFIG_CGROUP_DEVICE=y
> CONFIG_CGROUP_CPUACCT=y
> CONFIG_CGROUP_PERF=y
> CONFIG_CGROUP_BPF=y
> # CONFIG_CGROUP_MISC is not set
> # CONFIG_CGROUP_DEBUG is not set
> CONFIG_SOCK_CGROUP_DATA=y
> CONFIG_NAMESPACES=y
> CONFIG_UTS_NS=y
> CONFIG_TIME_NS=y
> CONFIG_IPC_NS=y
> CONFIG_USER_NS=y
> CONFIG_PID_NS=y
> CONFIG_NET_NS=y
> CONFIG_CHECKPOINT_RESTORE=y
> CONFIG_SCHED_AUTOGROUP=y
> CONFIG_RELAY=y
> CONFIG_BLK_DEV_INITRD=y
> CONFIG_INITRAMFS_SOURCE=""
> CONFIG_RD_GZIP=y
> CONFIG_RD_BZIP2=y
> CONFIG_RD_LZMA=y
> CONFIG_RD_XZ=y
> CONFIG_RD_LZO=y
> CONFIG_RD_LZ4=y
> CONFIG_RD_ZSTD=y
> # CONFIG_BOOT_CONFIG is not set
> CONFIG_INITRAMFS_PRESERVE_MTIME=y
> CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
> # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
> CONFIG_LD_ORPHAN_WARN=y
> CONFIG_LD_ORPHAN_WARN_LEVEL="warn"
> CONFIG_SYSCTL=y
> CONFIG_HAVE_UID16=y
> CONFIG_SYSCTL_EXCEPTION_TRACE=y
> CONFIG_HAVE_PCSPKR_PLATFORM=y
> CONFIG_EXPERT=y
> CONFIG_UID16=y
> CONFIG_MULTIUSER=y
> CONFIG_SGETMASK_SYSCALL=y
> CONFIG_SYSFS_SYSCALL=y
> CONFIG_FHANDLE=y
> CONFIG_POSIX_TIMERS=y
> CONFIG_PRINTK=y
> CONFIG_BUG=y
> CONFIG_ELF_CORE=y
> CONFIG_PCSPKR_PLATFORM=y
> CONFIG_BASE_FULL=y
> CONFIG_FUTEX=y
> CONFIG_FUTEX_PI=y
> CONFIG_EPOLL=y
> CONFIG_SIGNALFD=y
> CONFIG_TIMERFD=y
> CONFIG_EVENTFD=y
> CONFIG_SHMEM=y
> CONFIG_AIO=y
> CONFIG_IO_URING=y
> CONFIG_ADVISE_SYSCALLS=y
> CONFIG_MEMBARRIER=y
> CONFIG_KALLSYMS=y
> # CONFIG_KALLSYMS_SELFTEST is not set
> CONFIG_KALLSYMS_ALL=y
> CONFIG_KALLSYMS_ABSOLUTE_PERCPU=y
> CONFIG_KALLSYMS_BASE_RELATIVE=y
> CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y
> CONFIG_KCMP=y
> CONFIG_RSEQ=y
> # CONFIG_DEBUG_RSEQ is not set
> CONFIG_EMBEDDED=y
> CONFIG_HAVE_PERF_EVENTS=y
> CONFIG_GUEST_PERF_EVENTS=y
> # CONFIG_PC104 is not set
>
> #
> # Kernel Performance Events And Counters
> #
> CONFIG_PERF_EVENTS=y
> # CONFIG_DEBUG_PERF_USE_VMALLOC is not set
> # end of Kernel Performance Events And Counters
>
> CONFIG_SYSTEM_DATA_VERIFICATION=y
> CONFIG_PROFILING=y
> CONFIG_TRACEPOINTS=y
> # end of General setup
>
> CONFIG_64BIT=y
> CONFIG_X86_64=y
> CONFIG_X86=y
> CONFIG_INSTRUCTION_DECODER=y
> CONFIG_OUTPUT_FORMAT="elf64-x86-64"
> CONFIG_LOCKDEP_SUPPORT=y
> CONFIG_STACKTRACE_SUPPORT=y
> CONFIG_MMU=y
> CONFIG_ARCH_MMAP_RND_BITS_MIN=28
> CONFIG_ARCH_MMAP_RND_BITS_MAX=32
> CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
> CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
> CONFIG_GENERIC_ISA_DMA=y
> CONFIG_GENERIC_CSUM=y
> CONFIG_GENERIC_BUG=y
> CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
> CONFIG_ARCH_MAY_HAVE_PC_FDC=y
> CONFIG_GENERIC_CALIBRATE_DELAY=y
> CONFIG_ARCH_HAS_CPU_RELAX=y
> CONFIG_ARCH_HIBERNATION_POSSIBLE=y
> CONFIG_ARCH_SUSPEND_POSSIBLE=y
> CONFIG_AUDIT_ARCH=y
> CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
> CONFIG_HAVE_INTEL_TXT=y
> CONFIG_X86_64_SMP=y
> CONFIG_ARCH_SUPPORTS_UPROBES=y
> CONFIG_FIX_EARLYCON_MEM=y
> CONFIG_DYNAMIC_PHYSICAL_MASK=y
> CONFIG_PGTABLE_LEVELS=5
> CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
>
> #
> # Processor type and features
> #
> CONFIG_SMP=y
> CONFIG_X86_FEATURE_NAMES=y
> CONFIG_X86_X2APIC=y
> CONFIG_X86_MPPARSE=y
> # CONFIG_GOLDFISH is not set
> CONFIG_X86_CPU_RESCTRL=y
> CONFIG_X86_EXTENDED_PLATFORM=y
> # CONFIG_X86_NUMACHIP is not set
> # CONFIG_X86_VSMP is not set
> CONFIG_X86_UV=y
> # CONFIG_X86_GOLDFISH is not set
> # CONFIG_X86_INTEL_MID is not set
> CONFIG_X86_INTEL_LPSS=y
> # CONFIG_X86_AMD_PLATFORM_DEVICE is not set
> CONFIG_IOSF_MBI=y
> # CONFIG_IOSF_MBI_DEBUG is not set
> CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
> # CONFIG_SCHED_OMIT_FRAME_POINTER is not set
> CONFIG_HYPERVISOR_GUEST=y
> CONFIG_PARAVIRT=y
> # CONFIG_PARAVIRT_DEBUG is not set
> CONFIG_PARAVIRT_SPINLOCKS=y
> CONFIG_X86_HV_CALLBACK_VECTOR=y
> # CONFIG_XEN is not set
> CONFIG_KVM_GUEST=y
> CONFIG_ARCH_CPUIDLE_HALTPOLL=y
> # CONFIG_PVH is not set
> CONFIG_PARAVIRT_TIME_ACCOUNTING=y
> CONFIG_PARAVIRT_CLOCK=y
> # CONFIG_JAILHOUSE_GUEST is not set
> # CONFIG_ACRN_GUEST is not set
> CONFIG_INTEL_TDX_GUEST=y
> # CONFIG_MK8 is not set
> # CONFIG_MPSC is not set
> # CONFIG_MCORE2 is not set
> # CONFIG_MATOM is not set
> CONFIG_GENERIC_CPU=y
> CONFIG_X86_INTERNODE_CACHE_SHIFT=6
> CONFIG_X86_L1_CACHE_SHIFT=6
> CONFIG_X86_TSC=y
> CONFIG_X86_CMPXCHG64=y
> CONFIG_X86_CMOV=y
> CONFIG_X86_MINIMUM_CPU_FAMILY=64
> CONFIG_X86_DEBUGCTLMSR=y
> CONFIG_IA32_FEAT_CTL=y
> CONFIG_X86_VMX_FEATURE_NAMES=y
> CONFIG_PROCESSOR_SELECT=y
> CONFIG_CPU_SUP_INTEL=y
> # CONFIG_CPU_SUP_AMD is not set
> # CONFIG_CPU_SUP_HYGON is not set
> # CONFIG_CPU_SUP_CENTAUR is not set
> # CONFIG_CPU_SUP_ZHAOXIN is not set
> CONFIG_HPET_TIMER=y
> CONFIG_HPET_EMULATE_RTC=y
> CONFIG_DMI=y
> CONFIG_BOOT_VESA_SUPPORT=y
> CONFIG_MAXSMP=y
> CONFIG_NR_CPUS_RANGE_BEGIN=8192
> CONFIG_NR_CPUS_RANGE_END=8192
> CONFIG_NR_CPUS_DEFAULT=8192
> CONFIG_NR_CPUS=8192
> CONFIG_SCHED_CLUSTER=y
> CONFIG_SCHED_SMT=y
> CONFIG_SCHED_MC=y
> CONFIG_SCHED_MC_PRIO=y
> CONFIG_X86_LOCAL_APIC=y
> CONFIG_X86_IO_APIC=y
> CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
> CONFIG_X86_MCE=y
> CONFIG_X86_MCELOG_LEGACY=y
> CONFIG_X86_MCE_INTEL=y
> CONFIG_X86_MCE_THRESHOLD=y
> CONFIG_X86_MCE_INJECT=m
>
> #
> # Performance monitoring
> #
> CONFIG_PERF_EVENTS_INTEL_UNCORE=m
> CONFIG_PERF_EVENTS_INTEL_RAPL=m
> CONFIG_PERF_EVENTS_INTEL_CSTATE=m
> # end of Performance monitoring
>
> CONFIG_X86_16BIT=y
> CONFIG_X86_ESPFIX64=y
> CONFIG_X86_VSYSCALL_EMULATION=y
> CONFIG_X86_IOPL_IOPERM=y
> CONFIG_MICROCODE=y
> CONFIG_MICROCODE_INTEL=y
> CONFIG_MICROCODE_LATE_LOADING=y
> CONFIG_X86_MSR=y
> CONFIG_X86_CPUID=y
> CONFIG_X86_5LEVEL=y
> CONFIG_X86_DIRECT_GBPAGES=y
> # CONFIG_X86_CPA_STATISTICS is not set
> CONFIG_X86_MEM_ENCRYPT=y
> CONFIG_NUMA=y
> # CONFIG_AMD_NUMA is not set
> CONFIG_X86_64_ACPI_NUMA=y
> CONFIG_NUMA_EMU=y
> CONFIG_NODES_SHIFT=10
> CONFIG_ARCH_SPARSEMEM_ENABLE=y
> CONFIG_ARCH_SPARSEMEM_DEFAULT=y
> # CONFIG_ARCH_MEMORY_PROBE is not set
> CONFIG_ARCH_PROC_KCORE_TEXT=y
> CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
> CONFIG_X86_PMEM_LEGACY_DEVICE=y
> CONFIG_X86_PMEM_LEGACY=m
> CONFIG_X86_CHECK_BIOS_CORRUPTION=y
> # CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK is not set
> CONFIG_MTRR=y
> CONFIG_MTRR_SANITIZER=y
> CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=1
> CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
> CONFIG_X86_PAT=y
> CONFIG_ARCH_USES_PG_UNCACHED=y
> CONFIG_X86_UMIP=y
> CONFIG_CC_HAS_IBT=y
> CONFIG_X86_KERNEL_IBT=y
> CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=y
> CONFIG_X86_INTEL_TSX_MODE_OFF=y
> # CONFIG_X86_INTEL_TSX_MODE_ON is not set
> # CONFIG_X86_INTEL_TSX_MODE_AUTO is not set
> CONFIG_X86_SGX=y
> CONFIG_EFI=y
> CONFIG_EFI_STUB=y
> CONFIG_EFI_HANDOVER_PROTOCOL=y
> CONFIG_EFI_MIXED=y
> # CONFIG_EFI_FAKE_MEMMAP is not set
> CONFIG_EFI_RUNTIME_MAP=y
> # CONFIG_HZ_100 is not set
> # CONFIG_HZ_250 is not set
> # CONFIG_HZ_300 is not set
> CONFIG_HZ_1000=y
> CONFIG_HZ=1000
> CONFIG_SCHED_HRTICK=y
> CONFIG_KEXEC=y
> CONFIG_KEXEC_FILE=y
> CONFIG_ARCH_HAS_KEXEC_PURGATORY=y
> # CONFIG_KEXEC_SIG is not set
> CONFIG_CRASH_DUMP=y
> CONFIG_KEXEC_JUMP=y
> CONFIG_PHYSICAL_START=0x1000000
> CONFIG_RELOCATABLE=y
> CONFIG_RANDOMIZE_BASE=y
> CONFIG_X86_NEED_RELOCS=y
> CONFIG_PHYSICAL_ALIGN=0x200000
> CONFIG_DYNAMIC_MEMORY_LAYOUT=y
> CONFIG_RANDOMIZE_MEMORY=y
> CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING=0xa
> # CONFIG_ADDRESS_MASKING is not set
> CONFIG_HOTPLUG_CPU=y
> CONFIG_BOOTPARAM_HOTPLUG_CPU0=y
> # CONFIG_DEBUG_HOTPLUG_CPU0 is not set
> # CONFIG_COMPAT_VDSO is not set
> CONFIG_LEGACY_VSYSCALL_XONLY=y
> # CONFIG_LEGACY_VSYSCALL_NONE is not set
> # CONFIG_CMDLINE_BOOL is not set
> CONFIG_MODIFY_LDT_SYSCALL=y
> # CONFIG_STRICT_SIGALTSTACK_SIZE is not set
> CONFIG_HAVE_LIVEPATCH=y
> CONFIG_LIVEPATCH=y
> # end of Processor type and features
>
> CONFIG_CC_HAS_SLS=y
> CONFIG_CC_HAS_RETURN_THUNK=y
> CONFIG_CC_HAS_ENTRY_PADDING=y
> CONFIG_FUNCTION_PADDING_CFI=11
> CONFIG_FUNCTION_PADDING_BYTES=16
> CONFIG_SPECULATION_MITIGATIONS=y
> CONFIG_PAGE_TABLE_ISOLATION=y
> # CONFIG_RETPOLINE is not set
> CONFIG_CPU_IBRS_ENTRY=y
> # CONFIG_SLS is not set
> CONFIG_ARCH_HAS_ADD_PAGES=y
> CONFIG_ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE=y
>
> #
> # Power management and ACPI options
> #
> CONFIG_ARCH_HIBERNATION_HEADER=y
> CONFIG_SUSPEND=y
> CONFIG_SUSPEND_FREEZER=y
> # CONFIG_SUSPEND_SKIP_SYNC is not set
> CONFIG_HIBERNATE_CALLBACKS=y
> CONFIG_HIBERNATION=y
> CONFIG_HIBERNATION_SNAPSHOT_DEV=y
> CONFIG_PM_STD_PARTITION=""
> CONFIG_PM_SLEEP=y
> CONFIG_PM_SLEEP_SMP=y
> # CONFIG_PM_AUTOSLEEP is not set
> # CONFIG_PM_USERSPACE_AUTOSLEEP is not set
> # CONFIG_PM_WAKELOCKS is not set
> CONFIG_PM=y
> CONFIG_PM_DEBUG=y
> # CONFIG_PM_ADVANCED_DEBUG is not set
> # CONFIG_PM_TEST_SUSPEND is not set
> CONFIG_PM_SLEEP_DEBUG=y
> # CONFIG_DPM_WATCHDOG is not set
> # CONFIG_PM_TRACE_RTC is not set
> CONFIG_PM_CLK=y
> # CONFIG_WQ_POWER_EFFICIENT_DEFAULT is not set
> # CONFIG_ENERGY_MODEL is not set
> CONFIG_ARCH_SUPPORTS_ACPI=y
> CONFIG_ACPI=y
> CONFIG_ACPI_LEGACY_TABLES_LOOKUP=y
> CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC=y
> CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT=y
> # CONFIG_ACPI_DEBUGGER is not set
> CONFIG_ACPI_SPCR_TABLE=y
> # CONFIG_ACPI_FPDT is not set
> CONFIG_ACPI_LPIT=y
> CONFIG_ACPI_SLEEP=y
> CONFIG_ACPI_REV_OVERRIDE_POSSIBLE=y
> CONFIG_ACPI_EC_DEBUGFS=m
> CONFIG_ACPI_AC=y
> CONFIG_ACPI_BATTERY=y
> CONFIG_ACPI_BUTTON=y
> CONFIG_ACPI_VIDEO=m
> CONFIG_ACPI_FAN=y
> CONFIG_ACPI_TAD=m
> CONFIG_ACPI_DOCK=y
> CONFIG_ACPI_CPU_FREQ_PSS=y
> CONFIG_ACPI_PROCESSOR_CSTATE=y
> CONFIG_ACPI_PROCESSOR_IDLE=y
> CONFIG_ACPI_CPPC_LIB=y
> CONFIG_ACPI_PROCESSOR=y
> CONFIG_ACPI_IPMI=m
> CONFIG_ACPI_HOTPLUG_CPU=y
> CONFIG_ACPI_PROCESSOR_AGGREGATOR=m
> CONFIG_ACPI_THERMAL=y
> CONFIG_ACPI_PLATFORM_PROFILE=m
> CONFIG_ARCH_HAS_ACPI_TABLE_UPGRADE=y
> CONFIG_ACPI_TABLE_UPGRADE=y
> # CONFIG_ACPI_DEBUG is not set
> CONFIG_ACPI_PCI_SLOT=y
> CONFIG_ACPI_CONTAINER=y
> CONFIG_ACPI_HOTPLUG_MEMORY=y
> CONFIG_ACPI_HOTPLUG_IOAPIC=y
> CONFIG_ACPI_SBS=m
> CONFIG_ACPI_HED=y
> # CONFIG_ACPI_CUSTOM_METHOD is not set
> CONFIG_ACPI_BGRT=y
> # CONFIG_ACPI_REDUCED_HARDWARE_ONLY is not set
> CONFIG_ACPI_NFIT=m
> # CONFIG_NFIT_SECURITY_DEBUG is not set
> CONFIG_ACPI_NUMA=y
> CONFIG_ACPI_HMAT=y
> CONFIG_HAVE_ACPI_APEI=y
> CONFIG_HAVE_ACPI_APEI_NMI=y
> CONFIG_ACPI_APEI=y
> CONFIG_ACPI_APEI_GHES=y
> CONFIG_ACPI_APEI_PCIEAER=y
> CONFIG_ACPI_APEI_MEMORY_FAILURE=y
> CONFIG_ACPI_APEI_EINJ=m
> # CONFIG_ACPI_APEI_ERST_DEBUG is not set
> # CONFIG_ACPI_DPTF is not set
> CONFIG_ACPI_WATCHDOG=y
> CONFIG_ACPI_EXTLOG=m
> CONFIG_ACPI_ADXL=y
> # CONFIG_ACPI_CONFIGFS is not set
> # CONFIG_ACPI_PFRUT is not set
> CONFIG_ACPI_PCC=y
> # CONFIG_ACPI_FFH is not set
> # CONFIG_PMIC_OPREGION is not set
> CONFIG_ACPI_PRMT=y
> CONFIG_X86_PM_TIMER=y
>
> #
> # CPU Frequency scaling
> #
> CONFIG_CPU_FREQ=y
> CONFIG_CPU_FREQ_GOV_ATTR_SET=y
> CONFIG_CPU_FREQ_GOV_COMMON=y
> CONFIG_CPU_FREQ_STAT=y
> CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
> # CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
> # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
> # CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL is not set
> CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
> CONFIG_CPU_FREQ_GOV_POWERSAVE=y
> CONFIG_CPU_FREQ_GOV_USERSPACE=y
> CONFIG_CPU_FREQ_GOV_ONDEMAND=y
> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
> CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y
>
> #
> # CPU frequency scaling drivers
> #
> CONFIG_X86_INTEL_PSTATE=y
> # CONFIG_X86_PCC_CPUFREQ is not set
> # CONFIG_X86_AMD_PSTATE is not set
> # CONFIG_X86_AMD_PSTATE_UT is not set
> CONFIG_X86_ACPI_CPUFREQ=m
> # CONFIG_X86_POWERNOW_K8 is not set
> # CONFIG_X86_SPEEDSTEP_CENTRINO is not set
> CONFIG_X86_P4_CLOCKMOD=m
>
> #
> # shared options
> #
> CONFIG_X86_SPEEDSTEP_LIB=m
> # end of CPU Frequency scaling
>
> #
> # CPU Idle
> #
> CONFIG_CPU_IDLE=y
> # CONFIG_CPU_IDLE_GOV_LADDER is not set
> CONFIG_CPU_IDLE_GOV_MENU=y
> # CONFIG_CPU_IDLE_GOV_TEO is not set
> CONFIG_CPU_IDLE_GOV_HALTPOLL=y
> CONFIG_HALTPOLL_CPUIDLE=y
> # end of CPU Idle
>
> CONFIG_INTEL_IDLE=y
> # end of Power management and ACPI options
>
> #
> # Bus options (PCI etc.)
> #
> CONFIG_PCI_DIRECT=y
> CONFIG_PCI_MMCONFIG=y
> CONFIG_MMCONF_FAM10H=y
> # CONFIG_PCI_CNB20LE_QUIRK is not set
> # CONFIG_ISA_BUS is not set
> CONFIG_ISA_DMA_API=y
> # end of Bus options (PCI etc.)
>
> #
> # Binary Emulations
> #
> CONFIG_IA32_EMULATION=y
> # CONFIG_X86_X32_ABI is not set
> CONFIG_COMPAT_32=y
> CONFIG_COMPAT=y
> CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
> # end of Binary Emulations
>
> CONFIG_HAVE_KVM=y
> CONFIG_HAVE_KVM_PFNCACHE=y
> CONFIG_HAVE_KVM_IRQCHIP=y
> CONFIG_HAVE_KVM_IRQFD=y
> CONFIG_HAVE_KVM_IRQ_ROUTING=y
> CONFIG_HAVE_KVM_DIRTY_RING=y
> CONFIG_HAVE_KVM_DIRTY_RING_TSO=y
> CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL=y
> CONFIG_HAVE_KVM_EVENTFD=y
> CONFIG_KVM_MMIO=y
> CONFIG_KVM_ASYNC_PF=y
> CONFIG_HAVE_KVM_MSI=y
> CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
> CONFIG_KVM_VFIO=y
> CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT=y
> CONFIG_KVM_COMPAT=y
> CONFIG_HAVE_KVM_IRQ_BYPASS=y
> CONFIG_HAVE_KVM_NO_POLL=y
> CONFIG_KVM_XFER_TO_GUEST_WORK=y
> CONFIG_HAVE_KVM_PM_NOTIFIER=y
> CONFIG_KVM_GENERIC_HARDWARE_ENABLING=y
> CONFIG_VIRTUALIZATION=y
> CONFIG_KVM=m
> # CONFIG_KVM_WERROR is not set
> CONFIG_KVM_INTEL=m
> # CONFIG_X86_SGX_KVM is not set
> # CONFIG_KVM_AMD is not set
> CONFIG_KVM_SMM=y
> # CONFIG_KVM_XEN is not set
> CONFIG_AS_AVX512=y
> CONFIG_AS_SHA1_NI=y
> CONFIG_AS_SHA256_NI=y
> CONFIG_AS_TPAUSE=y
> CONFIG_AS_GFNI=y
>
> #
> # General architecture-dependent options
> #
> CONFIG_CRASH_CORE=y
> CONFIG_KEXEC_CORE=y
> CONFIG_HAVE_IMA_KEXEC=y
> CONFIG_HOTPLUG_SMT=y
> CONFIG_GENERIC_ENTRY=y
> CONFIG_KPROBES=y
> CONFIG_JUMP_LABEL=y
> # CONFIG_STATIC_KEYS_SELFTEST is not set
> # CONFIG_STATIC_CALL_SELFTEST is not set
> CONFIG_OPTPROBES=y
> CONFIG_KPROBES_ON_FTRACE=y
> CONFIG_UPROBES=y
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
> CONFIG_ARCH_USE_BUILTIN_BSWAP=y
> CONFIG_KRETPROBES=y
> CONFIG_KRETPROBE_ON_RETHOOK=y
> CONFIG_USER_RETURN_NOTIFIER=y
> CONFIG_HAVE_IOREMAP_PROT=y
> CONFIG_HAVE_KPROBES=y
> CONFIG_HAVE_KRETPROBES=y
> CONFIG_HAVE_OPTPROBES=y
> CONFIG_HAVE_KPROBES_ON_FTRACE=y
> CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE=y
> CONFIG_HAVE_FUNCTION_ERROR_INJECTION=y
> CONFIG_HAVE_NMI=y
> CONFIG_TRACE_IRQFLAGS_SUPPORT=y
> CONFIG_TRACE_IRQFLAGS_NMI_SUPPORT=y
> CONFIG_HAVE_ARCH_TRACEHOOK=y
> CONFIG_HAVE_DMA_CONTIGUOUS=y
> CONFIG_GENERIC_SMP_IDLE_THREAD=y
> CONFIG_ARCH_HAS_FORTIFY_SOURCE=y
> CONFIG_ARCH_HAS_SET_MEMORY=y
> CONFIG_ARCH_HAS_SET_DIRECT_MAP=y
> CONFIG_HAVE_ARCH_THREAD_STRUCT_WHITELIST=y
> CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT=y
> CONFIG_ARCH_WANTS_NO_INSTR=y
> CONFIG_HAVE_ASM_MODVERSIONS=y
> CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
> CONFIG_HAVE_RSEQ=y
> CONFIG_HAVE_RUST=y
> CONFIG_HAVE_FUNCTION_ARG_ACCESS_API=y
> CONFIG_HAVE_HW_BREAKPOINT=y
> CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
> CONFIG_HAVE_USER_RETURN_NOTIFIER=y
> CONFIG_HAVE_PERF_EVENTS_NMI=y
> CONFIG_HAVE_HARDLOCKUP_DETECTOR_PERF=y
> CONFIG_HAVE_PERF_REGS=y
> CONFIG_HAVE_PERF_USER_STACK_DUMP=y
> CONFIG_HAVE_ARCH_JUMP_LABEL=y
> CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE=y
> CONFIG_MMU_GATHER_TABLE_FREE=y
> CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
> CONFIG_MMU_GATHER_MERGE_VMAS=y
> CONFIG_MMU_LAZY_TLB_REFCOUNT=y
> CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
> CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS=y
> CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
> CONFIG_HAVE_CMPXCHG_LOCAL=y
> CONFIG_HAVE_CMPXCHG_DOUBLE=y
> CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
> CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
> CONFIG_HAVE_ARCH_SECCOMP=y
> CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
> CONFIG_SECCOMP=y
> CONFIG_SECCOMP_FILTER=y
> # CONFIG_SECCOMP_CACHE_DEBUG is not set
> CONFIG_HAVE_ARCH_STACKLEAK=y
> CONFIG_HAVE_STACKPROTECTOR=y
> CONFIG_STACKPROTECTOR=y
> CONFIG_STACKPROTECTOR_STRONG=y
> CONFIG_ARCH_SUPPORTS_LTO_CLANG=y
> CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y
> CONFIG_LTO_NONE=y
> CONFIG_ARCH_SUPPORTS_CFI_CLANG=y
> CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES=y
> CONFIG_HAVE_CONTEXT_TRACKING_USER=y
> CONFIG_HAVE_CONTEXT_TRACKING_USER_OFFSTACK=y
> CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
> CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
> CONFIG_HAVE_MOVE_PUD=y
> CONFIG_HAVE_MOVE_PMD=y
> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y
> CONFIG_HAVE_ARCH_HUGE_VMAP=y
> CONFIG_HAVE_ARCH_HUGE_VMALLOC=y
> CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
> CONFIG_HAVE_ARCH_SOFT_DIRTY=y
> CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
> CONFIG_MODULES_USE_ELF_RELA=y
> CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK=y
> CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK=y
> CONFIG_SOFTIRQ_ON_OWN_STACK=y
> CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
> CONFIG_HAVE_ARCH_MMAP_RND_BITS=y
> CONFIG_HAVE_EXIT_THREAD=y
> CONFIG_ARCH_MMAP_RND_BITS=28
> CONFIG_HAVE_ARCH_MMAP_RND_COMPAT_BITS=y
> CONFIG_ARCH_MMAP_RND_COMPAT_BITS=8
> CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES=y
> CONFIG_PAGE_SIZE_LESS_THAN_64KB=y
> CONFIG_PAGE_SIZE_LESS_THAN_256KB=y
> CONFIG_HAVE_OBJTOOL=y
> CONFIG_HAVE_JUMP_LABEL_HACK=y
> CONFIG_HAVE_NOINSTR_HACK=y
> CONFIG_HAVE_NOINSTR_VALIDATION=y
> CONFIG_HAVE_UACCESS_VALIDATION=y
> CONFIG_HAVE_STACK_VALIDATION=y
> CONFIG_HAVE_RELIABLE_STACKTRACE=y
> CONFIG_OLD_SIGSUSPEND3=y
> CONFIG_COMPAT_OLD_SIGACTION=y
> CONFIG_COMPAT_32BIT_TIME=y
> CONFIG_HAVE_ARCH_VMAP_STACK=y
> CONFIG_VMAP_STACK=y
> CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET=y
> CONFIG_RANDOMIZE_KSTACK_OFFSET=y
> CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT=y
> CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
> CONFIG_STRICT_KERNEL_RWX=y
> CONFIG_ARCH_HAS_STRICT_MODULE_RWX=y
> CONFIG_STRICT_MODULE_RWX=y
> CONFIG_HAVE_ARCH_PREL32_RELOCATIONS=y
> CONFIG_ARCH_USE_MEMREMAP_PROT=y
> # CONFIG_LOCK_EVENT_COUNTS is not set
> CONFIG_ARCH_HAS_MEM_ENCRYPT=y
> CONFIG_ARCH_HAS_CC_PLATFORM=y
> CONFIG_HAVE_STATIC_CALL=y
> CONFIG_HAVE_STATIC_CALL_INLINE=y
> CONFIG_HAVE_PREEMPT_DYNAMIC=y
> CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y
> CONFIG_ARCH_WANT_LD_ORPHAN_WARN=y
> CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
> CONFIG_ARCH_SUPPORTS_PAGE_TABLE_CHECK=y
> CONFIG_ARCH_HAS_ELFCORE_COMPAT=y
> CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH=y
> CONFIG_DYNAMIC_SIGFRAME=y
> CONFIG_HAVE_ARCH_NODE_DEV_GROUP=y
> CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y
>
> #
> # GCOV-based kernel profiling
> #
> # CONFIG_GCOV_KERNEL is not set
> CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
> # end of GCOV-based kernel profiling
>
> CONFIG_HAVE_GCC_PLUGINS=y
> CONFIG_GCC_PLUGINS=y
> # CONFIG_GCC_PLUGIN_LATENT_ENTROPY is not set
> CONFIG_FUNCTION_ALIGNMENT_4B=y
> CONFIG_FUNCTION_ALIGNMENT_16B=y
> CONFIG_FUNCTION_ALIGNMENT=16
> # end of General architecture-dependent options
>
> CONFIG_RT_MUTEXES=y
> CONFIG_BASE_SMALL=0
> CONFIG_MODULE_SIG_FORMAT=y
> CONFIG_MODULES=y
> # CONFIG_MODULE_DEBUG is not set
> CONFIG_MODULE_FORCE_LOAD=y
> CONFIG_MODULE_UNLOAD=y
> # CONFIG_MODULE_FORCE_UNLOAD is not set
> # CONFIG_MODULE_UNLOAD_TAINT_TRACKING is not set
> # CONFIG_MODVERSIONS is not set
> # CONFIG_MODULE_SRCVERSION_ALL is not set
> CONFIG_MODULE_SIG=y
> # CONFIG_MODULE_SIG_FORCE is not set
> CONFIG_MODULE_SIG_ALL=y
> # CONFIG_MODULE_SIG_SHA1 is not set
> # CONFIG_MODULE_SIG_SHA224 is not set
> CONFIG_MODULE_SIG_SHA256=y
> # CONFIG_MODULE_SIG_SHA384 is not set
> # CONFIG_MODULE_SIG_SHA512 is not set
> CONFIG_MODULE_SIG_HASH="sha256"
> CONFIG_MODULE_COMPRESS_NONE=y
> # CONFIG_MODULE_COMPRESS_GZIP is not set
> # CONFIG_MODULE_COMPRESS_XZ is not set
> # CONFIG_MODULE_COMPRESS_ZSTD is not set
> # CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
> CONFIG_MODPROBE_PATH="/sbin/modprobe"
> # CONFIG_TRIM_UNUSED_KSYMS is not set
> CONFIG_MODULES_TREE_LOOKUP=y
> CONFIG_BLOCK=y
> CONFIG_BLOCK_LEGACY_AUTOLOAD=y
> CONFIG_BLK_CGROUP_RWSTAT=y
> CONFIG_BLK_CGROUP_PUNT_BIO=y
> CONFIG_BLK_DEV_BSG_COMMON=y
> CONFIG_BLK_ICQ=y
> CONFIG_BLK_DEV_BSGLIB=y
> CONFIG_BLK_DEV_INTEGRITY=y
> CONFIG_BLK_DEV_INTEGRITY_T10=m
> # CONFIG_BLK_DEV_ZONED is not set
> CONFIG_BLK_DEV_THROTTLING=y
> # CONFIG_BLK_DEV_THROTTLING_LOW is not set
> CONFIG_BLK_WBT=y
> CONFIG_BLK_WBT_MQ=y
> # CONFIG_BLK_CGROUP_IOLATENCY is not set
> # CONFIG_BLK_CGROUP_IOCOST is not set
> # CONFIG_BLK_CGROUP_IOPRIO is not set
> CONFIG_BLK_DEBUG_FS=y
> # CONFIG_BLK_SED_OPAL is not set
> # CONFIG_BLK_INLINE_ENCRYPTION is not set
>
> #
> # Partition Types
> #
> # CONFIG_PARTITION_ADVANCED is not set
> CONFIG_MSDOS_PARTITION=y
> CONFIG_EFI_PARTITION=y
> # end of Partition Types
>
> CONFIG_BLK_MQ_PCI=y
> CONFIG_BLK_MQ_VIRTIO=y
> CONFIG_BLK_PM=y
> CONFIG_BLOCK_HOLDER_DEPRECATED=y
> CONFIG_BLK_MQ_STACKING=y
>
> #
> # IO Schedulers
> #
> CONFIG_MQ_IOSCHED_DEADLINE=y
> CONFIG_MQ_IOSCHED_KYBER=y
> CONFIG_IOSCHED_BFQ=y
> CONFIG_BFQ_GROUP_IOSCHED=y
> # CONFIG_BFQ_CGROUP_DEBUG is not set
> # end of IO Schedulers
>
> CONFIG_PREEMPT_NOTIFIERS=y
> CONFIG_PADATA=y
> CONFIG_ASN1=y
> CONFIG_UNINLINE_SPIN_UNLOCK=y
> CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
> CONFIG_MUTEX_SPIN_ON_OWNER=y
> CONFIG_RWSEM_SPIN_ON_OWNER=y
> CONFIG_LOCK_SPIN_ON_OWNER=y
> CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
> CONFIG_QUEUED_SPINLOCKS=y
> CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
> CONFIG_QUEUED_RWLOCKS=y
> CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE=y
> CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE=y
> CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
> CONFIG_FREEZER=y
>
> #
> # Executable file formats
> #
> CONFIG_BINFMT_ELF=y
> CONFIG_COMPAT_BINFMT_ELF=y
> CONFIG_ELFCORE=y
> CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
> CONFIG_BINFMT_SCRIPT=y
> CONFIG_BINFMT_MISC=m
> CONFIG_COREDUMP=y
> # end of Executable file formats
>
> #
> # Memory Management options
> #
> CONFIG_ZPOOL=y
> CONFIG_SWAP=y
> CONFIG_ZSWAP=y
> # CONFIG_ZSWAP_DEFAULT_ON is not set
> # CONFIG_ZSWAP_COMPRESSOR_DEFAULT_DEFLATE is not set
> CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZO=y
> # CONFIG_ZSWAP_COMPRESSOR_DEFAULT_842 is not set
> # CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4 is not set
> # CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4HC is not set
> # CONFIG_ZSWAP_COMPRESSOR_DEFAULT_ZSTD is not set
> CONFIG_ZSWAP_COMPRESSOR_DEFAULT="lzo"
> CONFIG_ZSWAP_ZPOOL_DEFAULT_ZBUD=y
> # CONFIG_ZSWAP_ZPOOL_DEFAULT_Z3FOLD is not set
> # CONFIG_ZSWAP_ZPOOL_DEFAULT_ZSMALLOC is not set
> CONFIG_ZSWAP_ZPOOL_DEFAULT="zbud"
> CONFIG_ZBUD=y
> # CONFIG_Z3FOLD is not set
> CONFIG_ZSMALLOC=y
> CONFIG_ZSMALLOC_STAT=y
> CONFIG_ZSMALLOC_CHAIN_SIZE=8
>
> #
> # SLAB allocator options
> #
> # CONFIG_SLAB is not set
> CONFIG_SLUB=y
> # CONFIG_SLUB_TINY is not set
> CONFIG_SLAB_MERGE_DEFAULT=y
> CONFIG_SLAB_FREELIST_RANDOM=y
> CONFIG_SLAB_FREELIST_HARDENED=y
> # CONFIG_SLUB_STATS is not set
> CONFIG_SLUB_CPU_PARTIAL=y
> # end of SLAB allocator options
>
> CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
> # CONFIG_COMPAT_BRK is not set
> CONFIG_SPARSEMEM=y
> CONFIG_SPARSEMEM_EXTREME=y
> CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
> CONFIG_SPARSEMEM_VMEMMAP=y
> CONFIG_ARCH_WANT_OPTIMIZE_VMEMMAP=y
> CONFIG_HAVE_FAST_GUP=y
> CONFIG_NUMA_KEEP_MEMINFO=y
> CONFIG_MEMORY_ISOLATION=y
> CONFIG_EXCLUSIVE_SYSTEM_RAM=y
> CONFIG_HAVE_BOOTMEM_INFO_NODE=y
> CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
> CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
> CONFIG_MEMORY_HOTPLUG=y
> # CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE is not set
> CONFIG_MEMORY_HOTREMOVE=y
> CONFIG_MHP_MEMMAP_ON_MEMORY=y
> CONFIG_SPLIT_PTLOCK_CPUS=4
> CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
> CONFIG_MEMORY_BALLOON=y
> CONFIG_BALLOON_COMPACTION=y
> CONFIG_COMPACTION=y
> CONFIG_COMPACT_UNEVICTABLE_DEFAULT=1
> CONFIG_PAGE_REPORTING=y
> CONFIG_MIGRATION=y
> CONFIG_DEVICE_MIGRATION=y
> CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y
> CONFIG_ARCH_ENABLE_THP_MIGRATION=y
> CONFIG_CONTIG_ALLOC=y
> CONFIG_PHYS_ADDR_T_64BIT=y
> CONFIG_MMU_NOTIFIER=y
> CONFIG_KSM=y
> CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
> CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
> CONFIG_MEMORY_FAILURE=y
> CONFIG_HWPOISON_INJECT=m
> CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
> CONFIG_ARCH_WANTS_THP_SWAP=y
> CONFIG_TRANSPARENT_HUGEPAGE=y
> CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
> # CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
> CONFIG_THP_SWAP=y
> # CONFIG_READ_ONLY_THP_FOR_FS is not set
> CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
> CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
> CONFIG_USE_PERCPU_NUMA_NODE_ID=y
> CONFIG_HAVE_SETUP_PER_CPU_AREA=y
> CONFIG_FRONTSWAP=y
> # CONFIG_CMA is not set
> CONFIG_MEM_SOFT_DIRTY=y
> CONFIG_GENERIC_EARLY_IOREMAP=y
> CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
> CONFIG_PAGE_IDLE_FLAG=y
> CONFIG_IDLE_PAGE_TRACKING=y
> CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
> CONFIG_ARCH_HAS_CURRENT_STACK_POINTER=y
> CONFIG_ARCH_HAS_PTE_DEVMAP=y
> CONFIG_ARCH_HAS_ZONE_DMA_SET=y
> CONFIG_ZONE_DMA=y
> CONFIG_ZONE_DMA32=y
> CONFIG_ZONE_DEVICE=y
> CONFIG_HMM_MIRROR=y
> CONFIG_GET_FREE_REGION=y
> CONFIG_DEVICE_PRIVATE=y
> CONFIG_VMAP_PFN=y
> CONFIG_ARCH_USES_HIGH_VMA_FLAGS=y
> CONFIG_ARCH_HAS_PKEYS=y
> CONFIG_VM_EVENT_COUNTERS=y
> # CONFIG_PERCPU_STATS is not set
> CONFIG_GUP_TEST=y
> # CONFIG_DMAPOOL_TEST is not set
> CONFIG_ARCH_HAS_PTE_SPECIAL=y
> CONFIG_SECRETMEM=y
> CONFIG_ANON_VMA_NAME=y
> CONFIG_USERFAULTFD=y
> CONFIG_HAVE_ARCH_USERFAULTFD_WP=y
> CONFIG_HAVE_ARCH_USERFAULTFD_MINOR=y
> CONFIG_PTE_MARKER_UFFD_WP=y
> # CONFIG_LRU_GEN is not set
> CONFIG_ARCH_SUPPORTS_PER_VMA_LOCK=y
> CONFIG_PER_VMA_LOCK=y
>
> #
> # Data Access Monitoring
> #
> CONFIG_DAMON=y
> CONFIG_DAMON_VADDR=y
> CONFIG_DAMON_PADDR=y
> CONFIG_DAMON_SYSFS=y
> CONFIG_DAMON_DBGFS=y
> # CONFIG_DAMON_RECLAIM is not set
> # CONFIG_DAMON_LRU_SORT is not set
> # end of Data Access Monitoring
> # end of Memory Management options
>
> CONFIG_NET=y
> CONFIG_NET_INGRESS=y
> CONFIG_NET_EGRESS=y
> CONFIG_NET_REDIRECT=y
> CONFIG_SKB_EXTENSIONS=y
>
> #
> # Networking options
> #
> CONFIG_PACKET=y
> CONFIG_PACKET_DIAG=m
> CONFIG_UNIX=y
> CONFIG_UNIX_SCM=y
> CONFIG_AF_UNIX_OOB=y
> CONFIG_UNIX_DIAG=m
> CONFIG_TLS=m
> CONFIG_TLS_DEVICE=y
> # CONFIG_TLS_TOE is not set
> CONFIG_XFRM=y
> CONFIG_XFRM_OFFLOAD=y
> CONFIG_XFRM_ALGO=y
> CONFIG_XFRM_USER=y
> # CONFIG_XFRM_USER_COMPAT is not set
> # CONFIG_XFRM_INTERFACE is not set
> CONFIG_XFRM_SUB_POLICY=y
> CONFIG_XFRM_MIGRATE=y
> CONFIG_XFRM_STATISTICS=y
> CONFIG_XFRM_AH=m
> CONFIG_XFRM_ESP=m
> CONFIG_XFRM_IPCOMP=m
> # CONFIG_NET_KEY is not set
> CONFIG_XDP_SOCKETS=y
> # CONFIG_XDP_SOCKETS_DIAG is not set
> CONFIG_NET_HANDSHAKE=y
> CONFIG_INET=y
> CONFIG_IP_MULTICAST=y
> CONFIG_IP_ADVANCED_ROUTER=y
> CONFIG_IP_FIB_TRIE_STATS=y
> CONFIG_IP_MULTIPLE_TABLES=y
> CONFIG_IP_ROUTE_MULTIPATH=y
> CONFIG_IP_ROUTE_VERBOSE=y
> CONFIG_IP_ROUTE_CLASSID=y
> CONFIG_IP_PNP=y
> CONFIG_IP_PNP_DHCP=y
> # CONFIG_IP_PNP_BOOTP is not set
> # CONFIG_IP_PNP_RARP is not set
> CONFIG_NET_IPIP=m
> CONFIG_NET_IPGRE_DEMUX=m
> CONFIG_NET_IP_TUNNEL=m
> CONFIG_NET_IPGRE=m
> CONFIG_NET_IPGRE_BROADCAST=y
> CONFIG_IP_MROUTE_COMMON=y
> CONFIG_IP_MROUTE=y
> CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
> CONFIG_IP_PIMSM_V1=y
> CONFIG_IP_PIMSM_V2=y
> CONFIG_SYN_COOKIES=y
> CONFIG_NET_IPVTI=m
> CONFIG_NET_UDP_TUNNEL=m
> CONFIG_NET_FOU=m
> CONFIG_NET_FOU_IP_TUNNELS=y
> CONFIG_INET_AH=m
> CONFIG_INET_ESP=m
> CONFIG_INET_ESP_OFFLOAD=m
> # CONFIG_INET_ESPINTCP is not set
> CONFIG_INET_IPCOMP=m
> CONFIG_INET_TABLE_PERTURB_ORDER=16
> CONFIG_INET_XFRM_TUNNEL=m
> CONFIG_INET_TUNNEL=m
> CONFIG_INET_DIAG=m
> CONFIG_INET_TCP_DIAG=m
> CONFIG_INET_UDP_DIAG=m
> CONFIG_INET_RAW_DIAG=m
> # CONFIG_INET_DIAG_DESTROY is not set
> CONFIG_TCP_CONG_ADVANCED=y
> CONFIG_TCP_CONG_BIC=m
> CONFIG_TCP_CONG_CUBIC=y
> CONFIG_TCP_CONG_WESTWOOD=m
> CONFIG_TCP_CONG_HTCP=m
> CONFIG_TCP_CONG_HSTCP=m
> CONFIG_TCP_CONG_HYBLA=m
> CONFIG_TCP_CONG_VEGAS=m
> CONFIG_TCP_CONG_NV=m
> CONFIG_TCP_CONG_SCALABLE=m
> CONFIG_TCP_CONG_LP=m
> CONFIG_TCP_CONG_VENO=m
> CONFIG_TCP_CONG_YEAH=m
> CONFIG_TCP_CONG_ILLINOIS=m
> CONFIG_TCP_CONG_DCTCP=m
> # CONFIG_TCP_CONG_CDG is not set
> CONFIG_TCP_CONG_BBR=m
> CONFIG_DEFAULT_CUBIC=y
> # CONFIG_DEFAULT_RENO is not set
> CONFIG_DEFAULT_TCP_CONG="cubic"
> CONFIG_TCP_MD5SIG=y
> CONFIG_IPV6=y
> CONFIG_IPV6_ROUTER_PREF=y
> CONFIG_IPV6_ROUTE_INFO=y
> CONFIG_IPV6_OPTIMISTIC_DAD=y
> CONFIG_INET6_AH=m
> CONFIG_INET6_ESP=m
> CONFIG_INET6_ESP_OFFLOAD=m
> # CONFIG_INET6_ESPINTCP is not set
> CONFIG_INET6_IPCOMP=m
> CONFIG_IPV6_MIP6=m
> # CONFIG_IPV6_ILA is not set
> CONFIG_INET6_XFRM_TUNNEL=m
> CONFIG_INET6_TUNNEL=m
> CONFIG_IPV6_VTI=m
> CONFIG_IPV6_SIT=m
> CONFIG_IPV6_SIT_6RD=y
> CONFIG_IPV6_NDISC_NODETYPE=y
> CONFIG_IPV6_TUNNEL=m
> CONFIG_IPV6_GRE=m
> CONFIG_IPV6_FOU=m
> CONFIG_IPV6_FOU_TUNNEL=m
> CONFIG_IPV6_MULTIPLE_TABLES=y
> # CONFIG_IPV6_SUBTREES is not set
> CONFIG_IPV6_MROUTE=y
> CONFIG_IPV6_MROUTE_MULTIPLE_TABLES=y
> CONFIG_IPV6_PIMSM_V2=y
> # CONFIG_IPV6_SEG6_LWTUNNEL is not set
> # CONFIG_IPV6_SEG6_HMAC is not set
> # CONFIG_IPV6_RPL_LWTUNNEL is not set
> CONFIG_IPV6_IOAM6_LWTUNNEL=y
> CONFIG_NETLABEL=y
> CONFIG_MPTCP=y
> CONFIG_INET_MPTCP_DIAG=m
> CONFIG_MPTCP_IPV6=y
> CONFIG_NETWORK_SECMARK=y
> CONFIG_NET_PTP_CLASSIFY=y
> CONFIG_NETWORK_PHY_TIMESTAMPING=y
> CONFIG_NETFILTER=y
> CONFIG_NETFILTER_ADVANCED=y
> CONFIG_BRIDGE_NETFILTER=m
>
> #
> # Core Netfilter Configuration
> #
> CONFIG_NETFILTER_INGRESS=y
> CONFIG_NETFILTER_EGRESS=y
> CONFIG_NETFILTER_SKIP_EGRESS=y
> CONFIG_NETFILTER_NETLINK=m
> CONFIG_NETFILTER_FAMILY_BRIDGE=y
> CONFIG_NETFILTER_FAMILY_ARP=y
> CONFIG_NETFILTER_BPF_LINK=y
> # CONFIG_NETFILTER_NETLINK_HOOK is not set
> # CONFIG_NETFILTER_NETLINK_ACCT is not set
> CONFIG_NETFILTER_NETLINK_QUEUE=m
> CONFIG_NETFILTER_NETLINK_LOG=m
> CONFIG_NETFILTER_NETLINK_OSF=m
> CONFIG_NF_CONNTRACK=m
> CONFIG_NF_LOG_SYSLOG=m
> CONFIG_NETFILTER_CONNCOUNT=m
> CONFIG_NF_CONNTRACK_MARK=y
> CONFIG_NF_CONNTRACK_SECMARK=y
> CONFIG_NF_CONNTRACK_ZONES=y
> CONFIG_NF_CONNTRACK_PROCFS=y
> CONFIG_NF_CONNTRACK_EVENTS=y
> CONFIG_NF_CONNTRACK_TIMEOUT=y
> CONFIG_NF_CONNTRACK_TIMESTAMP=y
> CONFIG_NF_CONNTRACK_LABELS=y
> CONFIG_NF_CONNTRACK_OVS=y
> CONFIG_NF_CT_PROTO_DCCP=y
> CONFIG_NF_CT_PROTO_GRE=y
> CONFIG_NF_CT_PROTO_SCTP=y
> CONFIG_NF_CT_PROTO_UDPLITE=y
> CONFIG_NF_CONNTRACK_AMANDA=m
> CONFIG_NF_CONNTRACK_FTP=m
> CONFIG_NF_CONNTRACK_H323=m
> CONFIG_NF_CONNTRACK_IRC=m
> CONFIG_NF_CONNTRACK_BROADCAST=m
> CONFIG_NF_CONNTRACK_NETBIOS_NS=m
> CONFIG_NF_CONNTRACK_SNMP=m
> CONFIG_NF_CONNTRACK_PPTP=m
> CONFIG_NF_CONNTRACK_SANE=m
> CONFIG_NF_CONNTRACK_SIP=m
> CONFIG_NF_CONNTRACK_TFTP=m
> CONFIG_NF_CT_NETLINK=m
> CONFIG_NF_CT_NETLINK_TIMEOUT=m
> CONFIG_NF_CT_NETLINK_HELPER=m
> CONFIG_NETFILTER_NETLINK_GLUE_CT=y
> CONFIG_NF_NAT=m
> CONFIG_NF_NAT_AMANDA=m
> CONFIG_NF_NAT_FTP=m
> CONFIG_NF_NAT_IRC=m
> CONFIG_NF_NAT_SIP=m
> CONFIG_NF_NAT_TFTP=m
> CONFIG_NF_NAT_REDIRECT=y
> CONFIG_NF_NAT_MASQUERADE=y
> CONFIG_NF_NAT_OVS=y
> CONFIG_NETFILTER_SYNPROXY=m
> CONFIG_NF_TABLES=m
> CONFIG_NF_TABLES_INET=y
> CONFIG_NF_TABLES_NETDEV=y
> CONFIG_NFT_NUMGEN=m
> CONFIG_NFT_CT=m
> CONFIG_NFT_FLOW_OFFLOAD=m
> CONFIG_NFT_CONNLIMIT=m
> CONFIG_NFT_LOG=m
> CONFIG_NFT_LIMIT=m
> CONFIG_NFT_MASQ=m
> CONFIG_NFT_REDIR=m
> CONFIG_NFT_NAT=m
> # CONFIG_NFT_TUNNEL is not set
> CONFIG_NFT_QUEUE=m
> CONFIG_NFT_QUOTA=m
> CONFIG_NFT_REJECT=m
> CONFIG_NFT_REJECT_INET=m
> CONFIG_NFT_COMPAT=m
> CONFIG_NFT_HASH=m
> CONFIG_NFT_FIB=m
> CONFIG_NFT_FIB_INET=m
> # CONFIG_NFT_XFRM is not set
> CONFIG_NFT_SOCKET=m
> # CONFIG_NFT_OSF is not set
> CONFIG_NFT_TPROXY=m
> CONFIG_NFT_SYNPROXY=m
> CONFIG_NF_DUP_NETDEV=m
> CONFIG_NFT_DUP_NETDEV=m
> CONFIG_NFT_FWD_NETDEV=m
> CONFIG_NFT_FIB_NETDEV=m
> # CONFIG_NFT_REJECT_NETDEV is not set
> CONFIG_NF_FLOW_TABLE_INET=m
> CONFIG_NF_FLOW_TABLE=m
> # CONFIG_NF_FLOW_TABLE_PROCFS is not set
> CONFIG_NETFILTER_XTABLES=y
> # CONFIG_NETFILTER_XTABLES_COMPAT is not set
>
> #
> # Xtables combined modules
> #
> CONFIG_NETFILTER_XT_MARK=m
> CONFIG_NETFILTER_XT_CONNMARK=m
> CONFIG_NETFILTER_XT_SET=m
>
> #
> # Xtables targets
> #
> CONFIG_NETFILTER_XT_TARGET_AUDIT=m
> CONFIG_NETFILTER_XT_TARGET_CHECKSUM=m
> CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
> CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
> CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=m
> CONFIG_NETFILTER_XT_TARGET_CT=m
> CONFIG_NETFILTER_XT_TARGET_DSCP=m
> CONFIG_NETFILTER_XT_TARGET_HL=m
> CONFIG_NETFILTER_XT_TARGET_HMARK=m
> CONFIG_NETFILTER_XT_TARGET_IDLETIMER=m
> # CONFIG_NETFILTER_XT_TARGET_LED is not set
> CONFIG_NETFILTER_XT_TARGET_LOG=m
> CONFIG_NETFILTER_XT_TARGET_MARK=m
> CONFIG_NETFILTER_XT_NAT=m
> CONFIG_NETFILTER_XT_TARGET_NETMAP=m
> CONFIG_NETFILTER_XT_TARGET_NFLOG=m
> CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
> CONFIG_NETFILTER_XT_TARGET_NOTRACK=m
> CONFIG_NETFILTER_XT_TARGET_RATEEST=m
> CONFIG_NETFILTER_XT_TARGET_REDIRECT=m
> CONFIG_NETFILTER_XT_TARGET_MASQUERADE=m
> CONFIG_NETFILTER_XT_TARGET_TEE=m
> CONFIG_NETFILTER_XT_TARGET_TPROXY=m
> CONFIG_NETFILTER_XT_TARGET_TRACE=m
> CONFIG_NETFILTER_XT_TARGET_SECMARK=m
> CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
> CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=m
>
> #
> # Xtables matches
> #
> CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=m
> CONFIG_NETFILTER_XT_MATCH_BPF=m
> CONFIG_NETFILTER_XT_MATCH_CGROUP=m
> CONFIG_NETFILTER_XT_MATCH_CLUSTER=m
> CONFIG_NETFILTER_XT_MATCH_COMMENT=m
> CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
> CONFIG_NETFILTER_XT_MATCH_CONNLABEL=m
> CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
> CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
> CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
> CONFIG_NETFILTER_XT_MATCH_CPU=m
> CONFIG_NETFILTER_XT_MATCH_DCCP=m
> CONFIG_NETFILTER_XT_MATCH_DEVGROUP=m
> CONFIG_NETFILTER_XT_MATCH_DSCP=m
> CONFIG_NETFILTER_XT_MATCH_ECN=m
> CONFIG_NETFILTER_XT_MATCH_ESP=m
> CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
> CONFIG_NETFILTER_XT_MATCH_HELPER=m
> CONFIG_NETFILTER_XT_MATCH_HL=m
> # CONFIG_NETFILTER_XT_MATCH_IPCOMP is not set
> CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
> CONFIG_NETFILTER_XT_MATCH_IPVS=m
> # CONFIG_NETFILTER_XT_MATCH_L2TP is not set
> CONFIG_NETFILTER_XT_MATCH_LENGTH=m
> CONFIG_NETFILTER_XT_MATCH_LIMIT=m
> CONFIG_NETFILTER_XT_MATCH_MAC=m
> CONFIG_NETFILTER_XT_MATCH_MARK=m
> CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
> # CONFIG_NETFILTER_XT_MATCH_NFACCT is not set
> CONFIG_NETFILTER_XT_MATCH_OSF=m
> CONFIG_NETFILTER_XT_MATCH_OWNER=m
> CONFIG_NETFILTER_XT_MATCH_POLICY=m
> CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
> CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
> CONFIG_NETFILTER_XT_MATCH_QUOTA=m
> CONFIG_NETFILTER_XT_MATCH_RATEEST=m
> CONFIG_NETFILTER_XT_MATCH_REALM=m
> CONFIG_NETFILTER_XT_MATCH_RECENT=m
> CONFIG_NETFILTER_XT_MATCH_SCTP=m
> CONFIG_NETFILTER_XT_MATCH_SOCKET=m
> CONFIG_NETFILTER_XT_MATCH_STATE=m
> CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
> CONFIG_NETFILTER_XT_MATCH_STRING=m
> CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
> # CONFIG_NETFILTER_XT_MATCH_TIME is not set
> # CONFIG_NETFILTER_XT_MATCH_U32 is not set
> # end of Core Netfilter Configuration
>
> CONFIG_IP_SET=m
> CONFIG_IP_SET_MAX=256
> CONFIG_IP_SET_BITMAP_IP=m
> CONFIG_IP_SET_BITMAP_IPMAC=m
> CONFIG_IP_SET_BITMAP_PORT=m
> CONFIG_IP_SET_HASH_IP=m
> CONFIG_IP_SET_HASH_IPMARK=m
> CONFIG_IP_SET_HASH_IPPORT=m
> CONFIG_IP_SET_HASH_IPPORTIP=m
> CONFIG_IP_SET_HASH_IPPORTNET=m
> CONFIG_IP_SET_HASH_IPMAC=m
> CONFIG_IP_SET_HASH_MAC=m
> CONFIG_IP_SET_HASH_NETPORTNET=m
> CONFIG_IP_SET_HASH_NET=m
> CONFIG_IP_SET_HASH_NETNET=m
> CONFIG_IP_SET_HASH_NETPORT=m
> CONFIG_IP_SET_HASH_NETIFACE=m
> CONFIG_IP_SET_LIST_SET=m
> CONFIG_IP_VS=m
> CONFIG_IP_VS_IPV6=y
> # CONFIG_IP_VS_DEBUG is not set
> CONFIG_IP_VS_TAB_BITS=12
>
> #
> # IPVS transport protocol load balancing support
> #
> CONFIG_IP_VS_PROTO_TCP=y
> CONFIG_IP_VS_PROTO_UDP=y
> CONFIG_IP_VS_PROTO_AH_ESP=y
> CONFIG_IP_VS_PROTO_ESP=y
> CONFIG_IP_VS_PROTO_AH=y
> CONFIG_IP_VS_PROTO_SCTP=y
>
> #
> # IPVS scheduler
> #
> CONFIG_IP_VS_RR=m
> CONFIG_IP_VS_WRR=m
> CONFIG_IP_VS_LC=m
> CONFIG_IP_VS_WLC=m
> CONFIG_IP_VS_FO=m
> CONFIG_IP_VS_OVF=m
> CONFIG_IP_VS_LBLC=m
> CONFIG_IP_VS_LBLCR=m
> CONFIG_IP_VS_DH=m
> CONFIG_IP_VS_SH=m
> # CONFIG_IP_VS_MH is not set
> CONFIG_IP_VS_SED=m
> CONFIG_IP_VS_NQ=m
> # CONFIG_IP_VS_TWOS is not set
>
> #
> # IPVS SH scheduler
> #
> CONFIG_IP_VS_SH_TAB_BITS=8
>
> #
> # IPVS MH scheduler
> #
> CONFIG_IP_VS_MH_TAB_INDEX=12
>
> #
> # IPVS application helper
> #
> CONFIG_IP_VS_FTP=m
> CONFIG_IP_VS_NFCT=y
> CONFIG_IP_VS_PE_SIP=m
>
> #
> # IP: Netfilter Configuration
> #
> CONFIG_NF_DEFRAG_IPV4=m
> CONFIG_NF_SOCKET_IPV4=m
> CONFIG_NF_TPROXY_IPV4=m
> CONFIG_NF_TABLES_IPV4=y
> CONFIG_NFT_REJECT_IPV4=m
> CONFIG_NFT_DUP_IPV4=m
> CONFIG_NFT_FIB_IPV4=m
> CONFIG_NF_TABLES_ARP=y
> CONFIG_NF_DUP_IPV4=m
> CONFIG_NF_LOG_ARP=m
> CONFIG_NF_LOG_IPV4=m
> CONFIG_NF_REJECT_IPV4=m
> CONFIG_NF_NAT_SNMP_BASIC=m
> CONFIG_NF_NAT_PPTP=m
> CONFIG_NF_NAT_H323=m
> CONFIG_IP_NF_IPTABLES=m
> CONFIG_IP_NF_MATCH_AH=m
> CONFIG_IP_NF_MATCH_ECN=m
> CONFIG_IP_NF_MATCH_RPFILTER=m
> CONFIG_IP_NF_MATCH_TTL=m
> CONFIG_IP_NF_FILTER=m
> CONFIG_IP_NF_TARGET_REJECT=m
> CONFIG_IP_NF_TARGET_SYNPROXY=m
> CONFIG_IP_NF_NAT=m
> CONFIG_IP_NF_TARGET_MASQUERADE=m
> CONFIG_IP_NF_TARGET_NETMAP=m
> CONFIG_IP_NF_TARGET_REDIRECT=m
> CONFIG_IP_NF_MANGLE=m
> CONFIG_IP_NF_TARGET_ECN=m
> CONFIG_IP_NF_TARGET_TTL=m
> CONFIG_IP_NF_RAW=m
> CONFIG_IP_NF_SECURITY=m
> CONFIG_IP_NF_ARPTABLES=m
> CONFIG_IP_NF_ARPFILTER=m
> CONFIG_IP_NF_ARP_MANGLE=m
> # end of IP: Netfilter Configuration
>
> #
> # IPv6: Netfilter Configuration
> #
> CONFIG_NF_SOCKET_IPV6=m
> CONFIG_NF_TPROXY_IPV6=m
> CONFIG_NF_TABLES_IPV6=y
> CONFIG_NFT_REJECT_IPV6=m
> CONFIG_NFT_DUP_IPV6=m
> CONFIG_NFT_FIB_IPV6=m
> CONFIG_NF_DUP_IPV6=m
> CONFIG_NF_REJECT_IPV6=m
> CONFIG_NF_LOG_IPV6=m
> CONFIG_IP6_NF_IPTABLES=m
> CONFIG_IP6_NF_MATCH_AH=m
> CONFIG_IP6_NF_MATCH_EUI64=m
> CONFIG_IP6_NF_MATCH_FRAG=m
> CONFIG_IP6_NF_MATCH_OPTS=m
> CONFIG_IP6_NF_MATCH_HL=m
> CONFIG_IP6_NF_MATCH_IPV6HEADER=m
> CONFIG_IP6_NF_MATCH_MH=m
> CONFIG_IP6_NF_MATCH_RPFILTER=m
> CONFIG_IP6_NF_MATCH_RT=m
> # CONFIG_IP6_NF_MATCH_SRH is not set
> # CONFIG_IP6_NF_TARGET_HL is not set
> CONFIG_IP6_NF_FILTER=m
> CONFIG_IP6_NF_TARGET_REJECT=m
> CONFIG_IP6_NF_TARGET_SYNPROXY=m
> CONFIG_IP6_NF_MANGLE=m
> CONFIG_IP6_NF_RAW=m
> CONFIG_IP6_NF_SECURITY=m
> CONFIG_IP6_NF_NAT=m
> CONFIG_IP6_NF_TARGET_MASQUERADE=m
> CONFIG_IP6_NF_TARGET_NPT=m
> # end of IPv6: Netfilter Configuration
>
> CONFIG_NF_DEFRAG_IPV6=m
> CONFIG_NF_TABLES_BRIDGE=m
> # CONFIG_NFT_BRIDGE_META is not set
> CONFIG_NFT_BRIDGE_REJECT=m
> # CONFIG_NF_CONNTRACK_BRIDGE is not set
> CONFIG_BRIDGE_NF_EBTABLES=m
> CONFIG_BRIDGE_EBT_BROUTE=m
> CONFIG_BRIDGE_EBT_T_FILTER=m
> CONFIG_BRIDGE_EBT_T_NAT=m
> CONFIG_BRIDGE_EBT_802_3=m
> CONFIG_BRIDGE_EBT_AMONG=m
> CONFIG_BRIDGE_EBT_ARP=m
> CONFIG_BRIDGE_EBT_IP=m
> CONFIG_BRIDGE_EBT_IP6=m
> CONFIG_BRIDGE_EBT_LIMIT=m
> CONFIG_BRIDGE_EBT_MARK=m
> CONFIG_BRIDGE_EBT_PKTTYPE=m
> CONFIG_BRIDGE_EBT_STP=m
> CONFIG_BRIDGE_EBT_VLAN=m
> CONFIG_BRIDGE_EBT_ARPREPLY=m
> CONFIG_BRIDGE_EBT_DNAT=m
> CONFIG_BRIDGE_EBT_MARK_T=m
> CONFIG_BRIDGE_EBT_REDIRECT=m
> CONFIG_BRIDGE_EBT_SNAT=m
> CONFIG_BRIDGE_EBT_LOG=m
> CONFIG_BRIDGE_EBT_NFLOG=m
> # CONFIG_BPFILTER is not set
> # CONFIG_IP_DCCP is not set
> CONFIG_IP_SCTP=m
> # CONFIG_SCTP_DBG_OBJCNT is not set
> # CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5 is not set
> CONFIG_SCTP_DEFAULT_COOKIE_HMAC_SHA1=y
> # CONFIG_SCTP_DEFAULT_COOKIE_HMAC_NONE is not set
> CONFIG_SCTP_COOKIE_HMAC_MD5=y
> CONFIG_SCTP_COOKIE_HMAC_SHA1=y
> CONFIG_INET_SCTP_DIAG=m
> # CONFIG_RDS is not set
> # CONFIG_TIPC is not set
> # CONFIG_ATM is not set
> # CONFIG_L2TP is not set
> CONFIG_STP=y
> CONFIG_GARP=y
> CONFIG_MRP=y
> CONFIG_BRIDGE=m
> CONFIG_BRIDGE_IGMP_SNOOPING=y
> CONFIG_BRIDGE_VLAN_FILTERING=y
> # CONFIG_BRIDGE_MRP is not set
> # CONFIG_BRIDGE_CFM is not set
> # CONFIG_NET_DSA is not set
> CONFIG_VLAN_8021Q=y
> CONFIG_VLAN_8021Q_GVRP=y
> CONFIG_VLAN_8021Q_MVRP=y
> CONFIG_LLC=y
> # CONFIG_LLC2 is not set
> # CONFIG_ATALK is not set
> # CONFIG_X25 is not set
> # CONFIG_LAPB is not set
> # CONFIG_PHONET is not set
> # CONFIG_6LOWPAN is not set
> # CONFIG_IEEE802154 is not set
> CONFIG_NET_SCHED=y
>
> #
> # Queueing/Scheduling
> #
> CONFIG_NET_SCH_HTB=m
> CONFIG_NET_SCH_HFSC=m
> CONFIG_NET_SCH_PRIO=m
> CONFIG_NET_SCH_MULTIQ=m
> CONFIG_NET_SCH_RED=m
> CONFIG_NET_SCH_SFB=m
> CONFIG_NET_SCH_SFQ=m
> CONFIG_NET_SCH_TEQL=m
> CONFIG_NET_SCH_TBF=m
> CONFIG_NET_SCH_CBS=m
> CONFIG_NET_SCH_ETF=m
> CONFIG_NET_SCH_MQPRIO_LIB=m
> CONFIG_NET_SCH_TAPRIO=m
> CONFIG_NET_SCH_GRED=m
> CONFIG_NET_SCH_NETEM=y
> CONFIG_NET_SCH_DRR=m
> CONFIG_NET_SCH_MQPRIO=m
> CONFIG_NET_SCH_SKBPRIO=m
> CONFIG_NET_SCH_CHOKE=m
> CONFIG_NET_SCH_QFQ=m
> CONFIG_NET_SCH_CODEL=m
> CONFIG_NET_SCH_FQ_CODEL=y
> CONFIG_NET_SCH_CAKE=m
> CONFIG_NET_SCH_FQ=m
> CONFIG_NET_SCH_HHF=m
> CONFIG_NET_SCH_PIE=m
> CONFIG_NET_SCH_FQ_PIE=m
> CONFIG_NET_SCH_INGRESS=y
> CONFIG_NET_SCH_PLUG=m
> CONFIG_NET_SCH_ETS=m
> CONFIG_NET_SCH_DEFAULT=y
> # CONFIG_DEFAULT_FQ is not set
> # CONFIG_DEFAULT_CODEL is not set
> CONFIG_DEFAULT_FQ_CODEL=y
> # CONFIG_DEFAULT_FQ_PIE is not set
> # CONFIG_DEFAULT_SFQ is not set
> # CONFIG_DEFAULT_PFIFO_FAST is not set
> CONFIG_DEFAULT_NET_SCH="fq_codel"
>
> #
> # Classification
> #
> CONFIG_NET_CLS=y
> CONFIG_NET_CLS_BASIC=m
> CONFIG_NET_CLS_ROUTE4=m
> CONFIG_NET_CLS_FW=m
> CONFIG_NET_CLS_U32=m
> CONFIG_CLS_U32_PERF=y
> CONFIG_CLS_U32_MARK=y
> CONFIG_NET_CLS_FLOW=m
> CONFIG_NET_CLS_CGROUP=y
> CONFIG_NET_CLS_BPF=m
> CONFIG_NET_CLS_FLOWER=m
> CONFIG_NET_CLS_MATCHALL=m
> CONFIG_NET_EMATCH=y
> CONFIG_NET_EMATCH_STACK=32
> CONFIG_NET_EMATCH_CMP=m
> CONFIG_NET_EMATCH_NBYTE=m
> CONFIG_NET_EMATCH_U32=m
> CONFIG_NET_EMATCH_META=m
> CONFIG_NET_EMATCH_TEXT=m
> CONFIG_NET_EMATCH_CANID=m
> CONFIG_NET_EMATCH_IPSET=m
> CONFIG_NET_EMATCH_IPT=m
> CONFIG_NET_CLS_ACT=y
> CONFIG_NET_ACT_POLICE=m
> CONFIG_NET_ACT_GACT=m
> CONFIG_GACT_PROB=y
> CONFIG_NET_ACT_MIRRED=m
> CONFIG_NET_ACT_SAMPLE=m
> CONFIG_NET_ACT_IPT=m
> CONFIG_NET_ACT_NAT=m
> CONFIG_NET_ACT_PEDIT=m
> CONFIG_NET_ACT_SIMP=m
> CONFIG_NET_ACT_SKBEDIT=m
> CONFIG_NET_ACT_CSUM=m
> CONFIG_NET_ACT_MPLS=m
> CONFIG_NET_ACT_VLAN=m
> CONFIG_NET_ACT_BPF=m
> CONFIG_NET_ACT_CONNMARK=m
> CONFIG_NET_ACT_CTINFO=m
> CONFIG_NET_ACT_SKBMOD=m
> CONFIG_NET_ACT_IFE=m
> CONFIG_NET_ACT_TUNNEL_KEY=m
> CONFIG_NET_ACT_CT=m
> CONFIG_NET_ACT_GATE=m
> CONFIG_NET_IFE_SKBMARK=m
> CONFIG_NET_IFE_SKBPRIO=m
> CONFIG_NET_IFE_SKBTCINDEX=m
> # CONFIG_NET_TC_SKB_EXT is not set
> CONFIG_NET_SCH_FIFO=y
> CONFIG_DCB=y
> CONFIG_DNS_RESOLVER=m
> # CONFIG_BATMAN_ADV is not set
> CONFIG_OPENVSWITCH=m
> CONFIG_OPENVSWITCH_GRE=m
> CONFIG_OPENVSWITCH_VXLAN=m
> CONFIG_VSOCKETS=m
> CONFIG_VSOCKETS_DIAG=m
> CONFIG_VSOCKETS_LOOPBACK=m
> CONFIG_VIRTIO_VSOCKETS=m
> CONFIG_VIRTIO_VSOCKETS_COMMON=m
> CONFIG_HYPERV_VSOCKETS=m
> CONFIG_NETLINK_DIAG=m
> CONFIG_MPLS=y
> CONFIG_NET_MPLS_GSO=y
> CONFIG_MPLS_ROUTING=m
> CONFIG_MPLS_IPTUNNEL=m
> CONFIG_NET_NSH=y
> # CONFIG_HSR is not set
> CONFIG_NET_SWITCHDEV=y
> CONFIG_NET_L3_MASTER_DEV=y
> # CONFIG_QRTR is not set
> # CONFIG_NET_NCSI is not set
> CONFIG_PCPU_DEV_REFCNT=y
> CONFIG_MAX_SKB_FRAGS=17
> CONFIG_RPS=y
> CONFIG_RFS_ACCEL=y
> CONFIG_SOCK_RX_QUEUE_MAPPING=y
> CONFIG_XPS=y
> CONFIG_CGROUP_NET_PRIO=y
> CONFIG_CGROUP_NET_CLASSID=y
> CONFIG_NET_RX_BUSY_POLL=y
> CONFIG_BQL=y
> CONFIG_BPF_STREAM_PARSER=y
> CONFIG_NET_FLOW_LIMIT=y
>
> #
> # Network testing
> #
> CONFIG_NET_PKTGEN=m
> CONFIG_NET_DROP_MONITOR=y
> # end of Network testing
> # end of Networking options
>
> # CONFIG_HAMRADIO is not set
> CONFIG_CAN=m
> CONFIG_CAN_RAW=m
> CONFIG_CAN_BCM=m
> CONFIG_CAN_GW=m
> # CONFIG_CAN_J1939 is not set
> # CONFIG_CAN_ISOTP is not set
> # CONFIG_BT is not set
> # CONFIG_AF_RXRPC is not set
> # CONFIG_AF_KCM is not set
> CONFIG_STREAM_PARSER=y
> # CONFIG_MCTP is not set
> CONFIG_FIB_RULES=y
> CONFIG_WIRELESS=y
> CONFIG_CFG80211=m
> # CONFIG_NL80211_TESTMODE is not set
> # CONFIG_CFG80211_DEVELOPER_WARNINGS is not set
> # CONFIG_CFG80211_CERTIFICATION_ONUS is not set
> CONFIG_CFG80211_REQUIRE_SIGNED_REGDB=y
> CONFIG_CFG80211_USE_KERNEL_REGDB_KEYS=y
> CONFIG_CFG80211_DEFAULT_PS=y
> # CONFIG_CFG80211_DEBUGFS is not set
> CONFIG_CFG80211_CRDA_SUPPORT=y
> # CONFIG_CFG80211_WEXT is not set
> CONFIG_MAC80211=m
> CONFIG_MAC80211_HAS_RC=y
> CONFIG_MAC80211_RC_MINSTREL=y
> CONFIG_MAC80211_RC_DEFAULT_MINSTREL=y
> CONFIG_MAC80211_RC_DEFAULT="minstrel_ht"
> # CONFIG_MAC80211_MESH is not set
> CONFIG_MAC80211_LEDS=y
> CONFIG_MAC80211_DEBUGFS=y
> # CONFIG_MAC80211_MESSAGE_TRACING is not set
> # CONFIG_MAC80211_DEBUG_MENU is not set
> CONFIG_MAC80211_STA_HASH_MAX_SIZE=0
> CONFIG_RFKILL=m
> CONFIG_RFKILL_LEDS=y
> CONFIG_RFKILL_INPUT=y
> # CONFIG_RFKILL_GPIO is not set
> CONFIG_NET_9P=y
> CONFIG_NET_9P_FD=y
> CONFIG_NET_9P_VIRTIO=y
> # CONFIG_NET_9P_DEBUG is not set
> # CONFIG_CAIF is not set
> CONFIG_CEPH_LIB=m
> # CONFIG_CEPH_LIB_PRETTYDEBUG is not set
> CONFIG_CEPH_LIB_USE_DNS_RESOLVER=y
> CONFIG_NFC=m
> # CONFIG_NFC_DIGITAL is not set
> CONFIG_NFC_NCI=m
> # CONFIG_NFC_NCI_SPI is not set
> # CONFIG_NFC_NCI_UART is not set
> # CONFIG_NFC_HCI is not set
>
> #
> # Near Field Communication (NFC) devices
> #
> CONFIG_NFC_VIRTUAL_NCI=m
> # CONFIG_NFC_FDP is not set
> # CONFIG_NFC_PN533_USB is not set
> # CONFIG_NFC_PN533_I2C is not set
> # CONFIG_NFC_MRVL_USB is not set
> # CONFIG_NFC_ST_NCI_I2C is not set
> # CONFIG_NFC_ST_NCI_SPI is not set
> # CONFIG_NFC_NXP_NCI is not set
> # CONFIG_NFC_S3FWRN5_I2C is not set
> # end of Near Field Communication (NFC) devices
>
> CONFIG_PSAMPLE=m
> CONFIG_NET_IFE=m
> CONFIG_LWTUNNEL=y
> CONFIG_LWTUNNEL_BPF=y
> CONFIG_DST_CACHE=y
> CONFIG_GRO_CELLS=y
> CONFIG_SOCK_VALIDATE_XMIT=y
> CONFIG_NET_SELFTESTS=y
> CONFIG_NET_SOCK_MSG=y
> CONFIG_NET_DEVLINK=y
> CONFIG_PAGE_POOL=y
> CONFIG_PAGE_POOL_STATS=y
> CONFIG_FAILOVER=m
> CONFIG_ETHTOOL_NETLINK=y
>
> #
> # Device Drivers
> #
> CONFIG_HAVE_EISA=y
> # CONFIG_EISA is not set
> CONFIG_HAVE_PCI=y
> CONFIG_PCI=y
> CONFIG_PCI_DOMAINS=y
> CONFIG_PCIEPORTBUS=y
> CONFIG_HOTPLUG_PCI_PCIE=y
> CONFIG_PCIEAER=y
> CONFIG_PCIEAER_INJECT=m
> CONFIG_PCIE_ECRC=y
> CONFIG_PCIEASPM=y
> CONFIG_PCIEASPM_DEFAULT=y
> # CONFIG_PCIEASPM_POWERSAVE is not set
> # CONFIG_PCIEASPM_POWER_SUPERSAVE is not set
> # CONFIG_PCIEASPM_PERFORMANCE is not set
> CONFIG_PCIE_PME=y
> CONFIG_PCIE_DPC=y
> # CONFIG_PCIE_PTM is not set
> # CONFIG_PCIE_EDR is not set
> CONFIG_PCI_MSI=y
> CONFIG_PCI_QUIRKS=y
> # CONFIG_PCI_DEBUG is not set
> # CONFIG_PCI_REALLOC_ENABLE_AUTO is not set
> CONFIG_PCI_STUB=y
> CONFIG_PCI_PF_STUB=m
> CONFIG_PCI_ATS=y
> CONFIG_PCI_LOCKLESS_CONFIG=y
> CONFIG_PCI_IOV=y
> CONFIG_PCI_PRI=y
> CONFIG_PCI_PASID=y
> # CONFIG_PCI_P2PDMA is not set
> CONFIG_PCI_LABEL=y
> CONFIG_PCI_HYPERV=m
> # CONFIG_PCIE_BUS_TUNE_OFF is not set
> CONFIG_PCIE_BUS_DEFAULT=y
> # CONFIG_PCIE_BUS_SAFE is not set
> # CONFIG_PCIE_BUS_PERFORMANCE is not set
> # CONFIG_PCIE_BUS_PEER2PEER is not set
> CONFIG_VGA_ARB=y
> CONFIG_VGA_ARB_MAX_GPUS=64
> CONFIG_HOTPLUG_PCI=y
> CONFIG_HOTPLUG_PCI_ACPI=y
> CONFIG_HOTPLUG_PCI_ACPI_IBM=m
> # CONFIG_HOTPLUG_PCI_CPCI is not set
> CONFIG_HOTPLUG_PCI_SHPC=y
>
> #
> # PCI controller drivers
> #
> CONFIG_VMD=y
> CONFIG_PCI_HYPERV_INTERFACE=m
>
> #
> # Cadence-based PCIe controllers
> #
> # end of Cadence-based PCIe controllers
>
> #
> # DesignWare-based PCIe controllers
> #
> # CONFIG_PCI_MESON is not set
> # CONFIG_PCIE_DW_PLAT_HOST is not set
> # end of DesignWare-based PCIe controllers
>
> #
> # Mobiveil-based PCIe controllers
> #
> # end of Mobiveil-based PCIe controllers
> # end of PCI controller drivers
>
> #
> # PCI Endpoint
> #
> # CONFIG_PCI_ENDPOINT is not set
> # end of PCI Endpoint
>
> #
> # PCI switch controller drivers
> #
> # CONFIG_PCI_SW_SWITCHTEC is not set
> # end of PCI switch controller drivers
>
> # CONFIG_CXL_BUS is not set
> # CONFIG_PCCARD is not set
> # CONFIG_RAPIDIO is not set
>
> #
> # Generic Driver Options
> #
> CONFIG_AUXILIARY_BUS=y
> # CONFIG_UEVENT_HELPER is not set
> CONFIG_DEVTMPFS=y
> CONFIG_DEVTMPFS_MOUNT=y
> # CONFIG_DEVTMPFS_SAFE is not set
> CONFIG_STANDALONE=y
> CONFIG_PREVENT_FIRMWARE_BUILD=y
>
> #
> # Firmware loader
> #
> CONFIG_FW_LOADER=y
> CONFIG_FW_LOADER_DEBUG=y
> CONFIG_FW_LOADER_PAGED_BUF=y
> CONFIG_FW_LOADER_SYSFS=y
> CONFIG_EXTRA_FIRMWARE=""
> CONFIG_FW_LOADER_USER_HELPER=y
> # CONFIG_FW_LOADER_USER_HELPER_FALLBACK is not set
> # CONFIG_FW_LOADER_COMPRESS is not set
> CONFIG_FW_CACHE=y
> CONFIG_FW_UPLOAD=y
> # end of Firmware loader
>
> CONFIG_ALLOW_DEV_COREDUMP=y
> # CONFIG_DEBUG_DRIVER is not set
> # CONFIG_DEBUG_DEVRES is not set
> # CONFIG_DEBUG_TEST_DRIVER_REMOVE is not set
> CONFIG_HMEM_REPORTING=y
> # CONFIG_TEST_ASYNC_DRIVER_PROBE is not set
> CONFIG_GENERIC_CPU_AUTOPROBE=y
> CONFIG_GENERIC_CPU_VULNERABILITIES=y
> CONFIG_REGMAP=y
> CONFIG_REGMAP_I2C=m
> CONFIG_REGMAP_SPI=m
> CONFIG_DMA_SHARED_BUFFER=y
> # CONFIG_DMA_FENCE_TRACE is not set
> # CONFIG_FW_DEVLINK_SYNC_STATE_TIMEOUT is not set
> # end of Generic Driver Options
>
> #
> # Bus devices
> #
> # CONFIG_MHI_BUS is not set
> # CONFIG_MHI_BUS_EP is not set
> # end of Bus devices
>
> CONFIG_CONNECTOR=y
> CONFIG_PROC_EVENTS=y
>
> #
> # Firmware Drivers
> #
>
> #
> # ARM System Control and Management Interface Protocol
> #
> # end of ARM System Control and Management Interface Protocol
>
> CONFIG_EDD=m
> # CONFIG_EDD_OFF is not set
> CONFIG_FIRMWARE_MEMMAP=y
> CONFIG_DMIID=y
> CONFIG_DMI_SYSFS=y
> CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y
> # CONFIG_ISCSI_IBFT is not set
> CONFIG_FW_CFG_SYSFS=y
> # CONFIG_FW_CFG_SYSFS_CMDLINE is not set
> CONFIG_SYSFB=y
> # CONFIG_SYSFB_SIMPLEFB is not set
> # CONFIG_GOOGLE_FIRMWARE is not set
>
> #
> # EFI (Extensible Firmware Interface) Support
> #
> CONFIG_EFI_ESRT=y
> CONFIG_EFI_VARS_PSTORE=y
> CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE=y
> CONFIG_EFI_SOFT_RESERVE=y
> CONFIG_EFI_DXE_MEM_ATTRIBUTES=y
> CONFIG_EFI_RUNTIME_WRAPPERS=y
> # CONFIG_EFI_BOOTLOADER_CONTROL is not set
> # CONFIG_EFI_CAPSULE_LOADER is not set
> # CONFIG_EFI_TEST is not set
> # CONFIG_APPLE_PROPERTIES is not set
> # CONFIG_RESET_ATTACK_MITIGATION is not set
> # CONFIG_EFI_RCI2_TABLE is not set
> # CONFIG_EFI_DISABLE_PCI_DMA is not set
> CONFIG_EFI_EARLYCON=y
> CONFIG_EFI_CUSTOM_SSDT_OVERLAYS=y
> # CONFIG_EFI_DISABLE_RUNTIME is not set
> # CONFIG_EFI_COCO_SECRET is not set
> # end of EFI (Extensible Firmware Interface) Support
>
> CONFIG_UEFI_CPER=y
> CONFIG_UEFI_CPER_X86=y
>
> #
> # Tegra firmware driver
> #
> # end of Tegra firmware driver
> # end of Firmware Drivers
>
> # CONFIG_GNSS is not set
> # CONFIG_MTD is not set
> # CONFIG_OF is not set
> CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
> CONFIG_PARPORT=m
> CONFIG_PARPORT_PC=m
> CONFIG_PARPORT_SERIAL=m
> # CONFIG_PARPORT_PC_FIFO is not set
> # CONFIG_PARPORT_PC_SUPERIO is not set
> CONFIG_PARPORT_1284=y
> CONFIG_PNP=y
> # CONFIG_PNP_DEBUG_MESSAGES is not set
>
> #
> # Protocols
> #
> CONFIG_PNPACPI=y
> CONFIG_BLK_DEV=y
> CONFIG_BLK_DEV_NULL_BLK=m
> # CONFIG_BLK_DEV_FD is not set
> CONFIG_CDROM=m
> # CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
> CONFIG_ZRAM=m
> CONFIG_ZRAM_DEF_COMP_LZORLE=y
> # CONFIG_ZRAM_DEF_COMP_LZO is not set
> CONFIG_ZRAM_DEF_COMP="lzo-rle"
> CONFIG_ZRAM_WRITEBACK=y
> # CONFIG_ZRAM_MEMORY_TRACKING is not set
> # CONFIG_ZRAM_MULTI_COMP is not set
> CONFIG_BLK_DEV_LOOP=m
> CONFIG_BLK_DEV_LOOP_MIN_COUNT=0
> # CONFIG_BLK_DEV_DRBD is not set
> CONFIG_BLK_DEV_NBD=m
> CONFIG_BLK_DEV_RAM=m
> CONFIG_BLK_DEV_RAM_COUNT=16
> CONFIG_BLK_DEV_RAM_SIZE=16384
> CONFIG_CDROM_PKTCDVD=m
> CONFIG_CDROM_PKTCDVD_BUFFERS=8
> # CONFIG_CDROM_PKTCDVD_WCACHE is not set
> # CONFIG_ATA_OVER_ETH is not set
> CONFIG_VIRTIO_BLK=m
> CONFIG_BLK_DEV_RBD=m
> # CONFIG_BLK_DEV_UBLK is not set
>
> #
> # NVME Support
> #
> CONFIG_NVME_CORE=m
> CONFIG_BLK_DEV_NVME=m
> CONFIG_NVME_MULTIPATH=y
> # CONFIG_NVME_VERBOSE_ERRORS is not set
> # CONFIG_NVME_HWMON is not set
> # CONFIG_NVME_FC is not set
> # CONFIG_NVME_TCP is not set
> # CONFIG_NVME_AUTH is not set
> # CONFIG_NVME_TARGET is not set
> # end of NVME Support
>
> #
> # Misc devices
> #
> # CONFIG_AD525X_DPOT is not set
> # CONFIG_DUMMY_IRQ is not set
> # CONFIG_IBM_ASM is not set
> # CONFIG_PHANTOM is not set
> CONFIG_TIFM_CORE=m
> CONFIG_TIFM_7XX1=m
> # CONFIG_ICS932S401 is not set
> CONFIG_ENCLOSURE_SERVICES=m
> # CONFIG_SGI_XP is not set
> CONFIG_HP_ILO=m
> # CONFIG_SGI_GRU is not set
> CONFIG_APDS9802ALS=m
> CONFIG_ISL29003=m
> CONFIG_ISL29020=m
> CONFIG_SENSORS_TSL2550=m
> CONFIG_SENSORS_BH1770=m
> CONFIG_SENSORS_APDS990X=m
> # CONFIG_HMC6352 is not set
> # CONFIG_DS1682 is not set
> # CONFIG_LATTICE_ECP3_CONFIG is not set
> # CONFIG_SRAM is not set
> # CONFIG_DW_XDATA_PCIE is not set
> # CONFIG_PCI_ENDPOINT_TEST is not set
> # CONFIG_XILINX_SDFEC is not set
> # CONFIG_C2PORT is not set
>
> #
> # EEPROM support
> #
> # CONFIG_EEPROM_AT24 is not set
> # CONFIG_EEPROM_AT25 is not set
> CONFIG_EEPROM_LEGACY=m
> CONFIG_EEPROM_MAX6875=m
> CONFIG_EEPROM_93CX6=m
> # CONFIG_EEPROM_93XX46 is not set
> # CONFIG_EEPROM_IDT_89HPESX is not set
> # CONFIG_EEPROM_EE1004 is not set
> # end of EEPROM support
>
> # CONFIG_CB710_CORE is not set
>
> #
> # Texas Instruments shared transport line discipline
> #
> # CONFIG_TI_ST is not set
> # end of Texas Instruments shared transport line discipline
>
> # CONFIG_SENSORS_LIS3_I2C is not set
> # CONFIG_ALTERA_STAPL is not set
> CONFIG_INTEL_MEI=m
> CONFIG_INTEL_MEI_ME=m
> # CONFIG_INTEL_MEI_TXE is not set
> # CONFIG_INTEL_MEI_GSC is not set
> # CONFIG_INTEL_MEI_HDCP is not set
> # CONFIG_INTEL_MEI_PXP is not set
> # CONFIG_VMWARE_VMCI is not set
> # CONFIG_GENWQE is not set
> # CONFIG_ECHO is not set
> # CONFIG_BCM_VK is not set
> # CONFIG_MISC_ALCOR_PCI is not set
> # CONFIG_MISC_RTSX_PCI is not set
> # CONFIG_MISC_RTSX_USB is not set
> # CONFIG_UACCE is not set
> CONFIG_PVPANIC=y
> # CONFIG_PVPANIC_MMIO is not set
> # CONFIG_PVPANIC_PCI is not set
> # CONFIG_GP_PCI1XXXX is not set
> # end of Misc devices
>
> #
> # SCSI device support
> #
> CONFIG_SCSI_MOD=y
> CONFIG_RAID_ATTRS=m
> CONFIG_SCSI_COMMON=y
> CONFIG_SCSI=y
> CONFIG_SCSI_DMA=y
> CONFIG_SCSI_NETLINK=y
> CONFIG_SCSI_PROC_FS=y
>
> #
> # SCSI support type (disk, tape, CD-ROM)
> #
> CONFIG_BLK_DEV_SD=m
> CONFIG_CHR_DEV_ST=m
> CONFIG_BLK_DEV_SR=m
> CONFIG_CHR_DEV_SG=m
> CONFIG_BLK_DEV_BSG=y
> CONFIG_CHR_DEV_SCH=m
> CONFIG_SCSI_ENCLOSURE=m
> CONFIG_SCSI_CONSTANTS=y
> CONFIG_SCSI_LOGGING=y
> CONFIG_SCSI_SCAN_ASYNC=y
>
> #
> # SCSI Transports
> #
> CONFIG_SCSI_SPI_ATTRS=m
> CONFIG_SCSI_FC_ATTRS=m
> CONFIG_SCSI_ISCSI_ATTRS=m
> CONFIG_SCSI_SAS_ATTRS=m
> CONFIG_SCSI_SAS_LIBSAS=m
> CONFIG_SCSI_SAS_ATA=y
> CONFIG_SCSI_SAS_HOST_SMP=y
> CONFIG_SCSI_SRP_ATTRS=m
> # end of SCSI Transports
>
> CONFIG_SCSI_LOWLEVEL=y
> # CONFIG_ISCSI_TCP is not set
> # CONFIG_ISCSI_BOOT_SYSFS is not set
> # CONFIG_SCSI_CXGB3_ISCSI is not set
> # CONFIG_SCSI_CXGB4_ISCSI is not set
> # CONFIG_SCSI_BNX2_ISCSI is not set
> # CONFIG_BE2ISCSI is not set
> # CONFIG_BLK_DEV_3W_XXXX_RAID is not set
> # CONFIG_SCSI_HPSA is not set
> # CONFIG_SCSI_3W_9XXX is not set
> # CONFIG_SCSI_3W_SAS is not set
> # CONFIG_SCSI_ACARD is not set
> # CONFIG_SCSI_AACRAID is not set
> # CONFIG_SCSI_AIC7XXX is not set
> # CONFIG_SCSI_AIC79XX is not set
> # CONFIG_SCSI_AIC94XX is not set
> # CONFIG_SCSI_MVSAS is not set
> # CONFIG_SCSI_MVUMI is not set
> # CONFIG_SCSI_ADVANSYS is not set
> # CONFIG_SCSI_ARCMSR is not set
> # CONFIG_SCSI_ESAS2R is not set
> CONFIG_MEGARAID_NEWGEN=y
> CONFIG_MEGARAID_MM=m
> CONFIG_MEGARAID_MAILBOX=m
> CONFIG_MEGARAID_LEGACY=m
> CONFIG_MEGARAID_SAS=m
> CONFIG_SCSI_MPT3SAS=m
> CONFIG_SCSI_MPT2SAS_MAX_SGE=128
> CONFIG_SCSI_MPT3SAS_MAX_SGE=128
> # CONFIG_SCSI_MPT2SAS is not set
> # CONFIG_SCSI_MPI3MR is not set
> # CONFIG_SCSI_SMARTPQI is not set
> # CONFIG_SCSI_HPTIOP is not set
> # CONFIG_SCSI_BUSLOGIC is not set
> # CONFIG_SCSI_MYRB is not set
> # CONFIG_SCSI_MYRS is not set
> # CONFIG_VMWARE_PVSCSI is not set
> CONFIG_HYPERV_STORAGE=m
> # CONFIG_LIBFC is not set
> # CONFIG_SCSI_SNIC is not set
> # CONFIG_SCSI_DMX3191D is not set
> # CONFIG_SCSI_FDOMAIN_PCI is not set
> CONFIG_SCSI_ISCI=m
> # CONFIG_SCSI_IPS is not set
> # CONFIG_SCSI_INITIO is not set
> # CONFIG_SCSI_INIA100 is not set
> # CONFIG_SCSI_PPA is not set
> # CONFIG_SCSI_IMM is not set
> # CONFIG_SCSI_STEX is not set
> # CONFIG_SCSI_SYM53C8XX_2 is not set
> # CONFIG_SCSI_IPR is not set
> # CONFIG_SCSI_QLOGIC_1280 is not set
> # CONFIG_SCSI_QLA_FC is not set
> # CONFIG_SCSI_QLA_ISCSI is not set
> # CONFIG_SCSI_LPFC is not set
> # CONFIG_SCSI_DC395x is not set
> # CONFIG_SCSI_AM53C974 is not set
> # CONFIG_SCSI_WD719X is not set
> CONFIG_SCSI_DEBUG=m
> # CONFIG_SCSI_PMCRAID is not set
> # CONFIG_SCSI_PM8001 is not set
> # CONFIG_SCSI_BFA_FC is not set
> # CONFIG_SCSI_VIRTIO is not set
> # CONFIG_SCSI_CHELSIO_FCOE is not set
> CONFIG_SCSI_DH=y
> CONFIG_SCSI_DH_RDAC=y
> CONFIG_SCSI_DH_HP_SW=y
> CONFIG_SCSI_DH_EMC=y
> CONFIG_SCSI_DH_ALUA=y
> # end of SCSI device support
>
> CONFIG_ATA=m
> CONFIG_SATA_HOST=y
> CONFIG_PATA_TIMINGS=y
> CONFIG_ATA_VERBOSE_ERROR=y
> CONFIG_ATA_FORCE=y
> CONFIG_ATA_ACPI=y
> # CONFIG_SATA_ZPODD is not set
> CONFIG_SATA_PMP=y
>
> #
> # Controllers with non-SFF native interface
> #
> CONFIG_SATA_AHCI=m
> CONFIG_SATA_MOBILE_LPM_POLICY=0
> CONFIG_SATA_AHCI_PLATFORM=m
> # CONFIG_AHCI_DWC is not set
> # CONFIG_SATA_INIC162X is not set
> # CONFIG_SATA_ACARD_AHCI is not set
> # CONFIG_SATA_SIL24 is not set
> CONFIG_ATA_SFF=y
>
> #
> # SFF controllers with custom DMA interface
> #
> # CONFIG_PDC_ADMA is not set
> # CONFIG_SATA_QSTOR is not set
> # CONFIG_SATA_SX4 is not set
> CONFIG_ATA_BMDMA=y
>
> #
> # SATA SFF controllers with BMDMA
> #
> CONFIG_ATA_PIIX=m
> # CONFIG_SATA_DWC is not set
> # CONFIG_SATA_MV is not set
> # CONFIG_SATA_NV is not set
> # CONFIG_SATA_PROMISE is not set
> # CONFIG_SATA_SIL is not set
> # CONFIG_SATA_SIS is not set
> # CONFIG_SATA_SVW is not set
> # CONFIG_SATA_ULI is not set
> # CONFIG_SATA_VIA is not set
> # CONFIG_SATA_VITESSE is not set
>
> #
> # PATA SFF controllers with BMDMA
> #
> # CONFIG_PATA_ALI is not set
> # CONFIG_PATA_AMD is not set
> # CONFIG_PATA_ARTOP is not set
> # CONFIG_PATA_ATIIXP is not set
> # CONFIG_PATA_ATP867X is not set
> # CONFIG_PATA_CMD64X is not set
> # CONFIG_PATA_CYPRESS is not set
> # CONFIG_PATA_EFAR is not set
> # CONFIG_PATA_HPT366 is not set
> # CONFIG_PATA_HPT37X is not set
> # CONFIG_PATA_HPT3X2N is not set
> # CONFIG_PATA_HPT3X3 is not set
> # CONFIG_PATA_IT8213 is not set
> # CONFIG_PATA_IT821X is not set
> # CONFIG_PATA_JMICRON is not set
> # CONFIG_PATA_MARVELL is not set
> # CONFIG_PATA_NETCELL is not set
> # CONFIG_PATA_NINJA32 is not set
> # CONFIG_PATA_NS87415 is not set
> # CONFIG_PATA_OLDPIIX is not set
> # CONFIG_PATA_OPTIDMA is not set
> # CONFIG_PATA_PDC2027X is not set
> # CONFIG_PATA_PDC_OLD is not set
> # CONFIG_PATA_RADISYS is not set
> # CONFIG_PATA_RDC is not set
> # CONFIG_PATA_SCH is not set
> # CONFIG_PATA_SERVERWORKS is not set
> # CONFIG_PATA_SIL680 is not set
> # CONFIG_PATA_SIS is not set
> # CONFIG_PATA_TOSHIBA is not set
> # CONFIG_PATA_TRIFLEX is not set
> # CONFIG_PATA_VIA is not set
> # CONFIG_PATA_WINBOND is not set
>
> #
> # PIO-only SFF controllers
> #
> # CONFIG_PATA_CMD640_PCI is not set
> # CONFIG_PATA_MPIIX is not set
> # CONFIG_PATA_NS87410 is not set
> # CONFIG_PATA_OPTI is not set
> # CONFIG_PATA_RZ1000 is not set
> # CONFIG_PATA_PARPORT is not set
>
> #
> # Generic fallback / legacy drivers
> #
> # CONFIG_PATA_ACPI is not set
> CONFIG_ATA_GENERIC=m
> # CONFIG_PATA_LEGACY is not set
> CONFIG_MD=y
> CONFIG_BLK_DEV_MD=y
> CONFIG_MD_AUTODETECT=y
> CONFIG_MD_LINEAR=m
> CONFIG_MD_RAID0=m
> CONFIG_MD_RAID1=m
> CONFIG_MD_RAID10=m
> CONFIG_MD_RAID456=m
> # CONFIG_MD_MULTIPATH is not set
> CONFIG_MD_FAULTY=m
> # CONFIG_BCACHE is not set
> CONFIG_BLK_DEV_DM_BUILTIN=y
> CONFIG_BLK_DEV_DM=m
> CONFIG_DM_DEBUG=y
> CONFIG_DM_BUFIO=m
> # CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING is not set
> CONFIG_DM_BIO_PRISON=m
> CONFIG_DM_PERSISTENT_DATA=m
> # CONFIG_DM_UNSTRIPED is not set
> CONFIG_DM_CRYPT=m
> CONFIG_DM_SNAPSHOT=m
> CONFIG_DM_THIN_PROVISIONING=m
> CONFIG_DM_CACHE=m
> CONFIG_DM_CACHE_SMQ=m
> CONFIG_DM_WRITECACHE=m
> # CONFIG_DM_EBS is not set
> CONFIG_DM_ERA=m
> # CONFIG_DM_CLONE is not set
> CONFIG_DM_MIRROR=m
> CONFIG_DM_LOG_USERSPACE=m
> CONFIG_DM_RAID=m
> CONFIG_DM_ZERO=m
> CONFIG_DM_MULTIPATH=m
> CONFIG_DM_MULTIPATH_QL=m
> CONFIG_DM_MULTIPATH_ST=m
> # CONFIG_DM_MULTIPATH_HST is not set
> # CONFIG_DM_MULTIPATH_IOA is not set
> CONFIG_DM_DELAY=m
> # CONFIG_DM_DUST is not set
> CONFIG_DM_UEVENT=y
> CONFIG_DM_FLAKEY=m
> CONFIG_DM_VERITY=m
> # CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG is not set
> # CONFIG_DM_VERITY_FEC is not set
> CONFIG_DM_SWITCH=m
> CONFIG_DM_LOG_WRITES=m
> CONFIG_DM_INTEGRITY=m
> CONFIG_DM_AUDIT=y
> # CONFIG_TARGET_CORE is not set
> # CONFIG_FUSION is not set
>
> #
> # IEEE 1394 (FireWire) support
> #
> CONFIG_FIREWIRE=m
> CONFIG_FIREWIRE_OHCI=m
> CONFIG_FIREWIRE_SBP2=m
> CONFIG_FIREWIRE_NET=m
> # CONFIG_FIREWIRE_NOSY is not set
> # end of IEEE 1394 (FireWire) support
>
> CONFIG_MACINTOSH_DRIVERS=y
> CONFIG_MAC_EMUMOUSEBTN=y
After a few years of increasing test coverage in the MPTCP selftests, we
realised [1] the last version of the selftests is supposed to run on old
kernels without issues.
Supporting older versions is not that easy for this MPTCP case: these
selftests are often validating the internals by checking packets that
are exchanged, when some MIB counters are incremented after some
actions, how connections are getting opened and closed in some cases,
etc. In other words, it is not limited to the socket interface between
the userspace and the kernelspace.
In addition to that, the current MPTCP selftests run a lot of different
sub-tests but the TAP13 protocol used in the selftests don't support
sub-tests: one failure in sub-tests implies that the whole selftest is
seen as failed at the end because sub-tests are not tracked. It is then
important to skip sub-tests not supported by old kernels.
To minimise the modifications and reduce the complexity to support old
versions, the idea is to look at external signs and skip the whole
selftests or just some sub-tests before starting them. This cannot be
applied in all cases.
This second part focuses on marking different sub-tests as skipped if
some MPTCP features are not supported. A few techniques are used here:
- Before starting some tests:
- Check if a file (sysctl knob) is present: that's what patch 13/14 is
doing for the userspace PM feature.
- Check if a symbol is present in /proc/kallsyms: patch 1/14 adds some
helpers in mptcp_lib.sh to ease its use. Then these helpers are used
in patches 2, 3, 4, 10, 11 and 14/14.
- Set a flag and get the status to check if a feature is supported:
patch 8/14 is doing that with the 'fullmesh' flag.
- After having launched the tests:
- Retrieve the counters after a test and check if they are different
than 0. Similar to the check with the flag, that's not ideal but in
this case, the counters were already present before the introduction
of MPTCP but they have been supported by MPTCP sockets only later.
Patches 5 and 6/14 are using this technique.
Before skipping tests, SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var
value is checked: if it is set to 1, the test is marked as "failed"
instead of "skipped". MPTCP public CI expects to have all features
supported and it sets this env var to 1 to catch regressions in these
new checks.
Patches 7/14 and 9/14 are a bit different because they don't skip tests:
- Patch 7/14 retrieves the default values instead of using hardcoded
ones because these default values have been modified at some points.
Then the comparisons are done with the default values.
- patch 9/14 relaxes the expected returned size from MPTCP's getsockopt
because the different structures gathering various info can get new
fields and get bigger over time. We cannot expect that the userspace
is using the same structure as the kernel.
Patch 12/14 marks the test as "skipped" instead of "failed" if the "ip"
tool is not available.
In this second part, the "mptcp_join" selftest is not modified yet. This
will come soon after in the third part with quite a few patches.
Link: https://lore.kernel.org/stable/CA+G9fYtDGpgT4dckXD-y-N92nqUxuvue_7AtDdBcHrb… [1]
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Matthieu Baerts (14):
selftests: mptcp: lib: skip if missing symbol
selftests: mptcp: connect: skip transp tests if not supported
selftests: mptcp: connect: skip disconnect tests if not supported
selftests: mptcp: connect: skip TFO tests if not supported
selftests: mptcp: diag: skip listen tests if not supported
selftests: mptcp: diag: skip inuse tests if not supported
selftests: mptcp: pm nl: remove hardcoded default limits
selftests: mptcp: pm nl: skip fullmesh flag checks if not supported
selftests: mptcp: sockopt: relax expected returned size
selftests: mptcp: sockopt: skip getsockopt checks if not supported
selftests: mptcp: sockopt: skip TCP_INQ checks if not supported
selftests: mptcp: userspace pm: skip if 'ip' tool is unavailable
selftests: mptcp: userspace pm: skip if not supported
selftests: mptcp: userspace pm: skip PM listener events tests if unavailable
tools/testing/selftests/net/mptcp/config | 1 +
tools/testing/selftests/net/mptcp/diag.sh | 42 +++++++++-------------
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 20 +++++++++++
tools/testing/selftests/net/mptcp/mptcp_lib.sh | 38 ++++++++++++++++++++
tools/testing/selftests/net/mptcp/mptcp_sockopt.c | 18 ++++++----
tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 20 +++++++++--
tools/testing/selftests/net/mptcp/pm_netlink.sh | 27 ++++++++------
tools/testing/selftests/net/mptcp/userspace_pm.sh | 13 ++++++-
8 files changed, 135 insertions(+), 44 deletions(-)
---
base-commit: 6c0ec7ab5aaff3706657dd4946798aed483b9471
change-id: 20230608-upstream-net-20230608-mptcp-selftests-support-old-kernels-part-2-6e337e1f047d
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
Hi,
Enclosed are a pair of patches for an oops that can occur if an exception is
generated while a bpf subprogram is running. One of the bpf_prog_aux entries
for the subprograms are missing an extable. This can lead to an exception that
would otherwise be handled turning into a NULL pointer bug.
These changes were tested via the verifier and progs selftests and no
regressions were observed.
Changes from v3:
- Selftest style fixups (Feedback from Yonghong Song)
- Selftest needs to assert that test bpf program executed (Feedback from
Yonghong Song)
- Selftest should combine open and load using open_and_load (Feedback from
Yonghong Song)
Changes from v2:
- Insert only the main program's kallsyms (Feedback from Yonghong Song and
Alexei Starovoitov)
- Selftest should use ASSERT instead of CHECK (Feedback from Yonghong Song)
- Selftest needs some cleanup (Feedback from Yonghong Song)
- Switch patch order (Feedback from Alexei Starovoitov)
Changes from v1:
- Add a selftest (Feedback From Alexei Starovoitov)
- Move to a 1-line verifier change instead of searching multiple extables
Krister Johansen (2):
bpf: ensure main program has an extable
selftests/bpf: add a test for subprogram extables
kernel/bpf/verifier.c | 6 ++-
.../bpf/prog_tests/subprogs_extable.c | 29 +++++++++++
.../bpf/progs/test_subprogs_extable.c | 51 +++++++++++++++++++
3 files changed, 84 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/subprogs_extable.c
create mode 100644 tools/testing/selftests/bpf/progs/test_subprogs_extable.c
--
2.25.1
For cases like IPv6 addresses, having a means to supply tracing
predicates for fields with more than 8 bytes would be convenient.
This series provides a simple way to support this by allowing
simple ==, != memory comparison with the predicate supplied when
the size of the field exceeds 8 bytes. For example, to trace
::1, the predicate
"dst == 0x00000000000000000000000000000001"
..could be used. Patch 1 implements this.
As a convenience, support for IPv4, IPv6 and MAC addresses are
also included; patches 2-4 cover these and allow simpler
comparisons which do not require getting the exact number of
bytes right; for exmaple
"dst == ::1"
"src != 127.0.0.1"
"mac_addr == ab:cd:ef:01:23:45"
Patch 5 adds tests for existing and new filter predicates, and patch 6
documents the fact that for the various addresses supported and
the >8 byte memory comparison. only == and != are supported.
Changes since v1 [1]:
- added support for IPv4, IPv6 and MAC addresses (patches 2-4)
(Masami and Steven)
- added selftests for IPv4, IPv6 and MAC addresses and updated
docs accordingly (patches 5,6)
Changes since RFC [2]:
- originally a fix was intermixed with the new functionality as
patch 1 in series [2]; the fix landed separately
- small tweaks to how filter predicates are defined via fn_num as
opposed to via fn directly
[1] https://lore.kernel.org/linux-trace-kernel/1682414197-13173-1-git-send-emai…
[22] https://lore.kernel.org/lkml/1659910883-18223-1-git-send-email-alan.maguire…
Alan Maguire (6):
tracing: support > 8 byte array filter predicates
tracing: support IPv4 address filter predicate
tracing: support IPv6 filter predicates
tracing: support MAC address filter predicates
selftests/ftrace: add test coverage for filter predicates
tracing: document IPv4, IPv6, MAC address and > 8 byte numeric
filtering support
Documentation/trace/events.rst | 21 +++
kernel/trace/trace_events_filter.c | 164 +++++++++++++++++-
.../selftests/ftrace/test.d/event/filter.tc | 91 ++++++++++
3 files changed, 275 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/event/filter.tc
--
2.31.1
Some test cases from net/tls, net/fcnal-test and net/vrf-xfrm-tests
that rely on cryptographic functions to work and use non-compliant FIPS
algorithms fail in FIPS mode.
In order to allow these tests to pass in a wider set of kernels,
- for net/tls, skip the test variants that use the ChaCha20-Poly1305
and SM4 algorithms, when FIPS mode is enabled;
- for net/fcnal-test, skip the MD5 tests, when FIPS mode is enabled;
- for net/vrf-xfrm-tests, replace the algorithms that are not
FIPS-compliant with compliant ones.
Changes in v2:
- Add R-b tags.
- Put fips_non_compliant into the variants.
- Turn fips_enabled into a static global variable.
- Read /proc/sys/crypto/fips_enabled only once at main().
v1: https://lore.kernel.org/netdev/20230607174302.19542-1-magali.lemes@canonica…
Magali Lemes (3):
selftests: net: tls: check if FIPS mode is enabled
selftests: net: vrf-xfrm-tests: change authentication and encryption
algos
selftests: net: fcnal-test: check if FIPS mode is enabled
tools/testing/selftests/net/fcnal-test.sh | 27 ++-
tools/testing/selftests/net/tls.c | 175 +++++++++++++++++-
tools/testing/selftests/net/vrf-xfrm-tests.sh | 32 ++--
3 files changed, 209 insertions(+), 25 deletions(-)
--
2.34.1
Currently the the config fragment for cpufreq enables a lot of generic
lock debugging. While these options are useful when testing cpufreq
they aren't actually required to run the tests and are therefore out of
scope for the cpufreq fragement, they are more of a thing that it's good
to enable while doing testing than an actual requirement for cpufreq
testing specifically. Having these debugging options enabled,
especially the mutex and spinlock instrumentation, mean that any build
that includes the cpufreq fragment is both very much larger than a
standard defconfig (eg, I'm seeing 35% on x86_64) and also slower at
runtime.
This is causing real problems for CI systems. In order to avoid
building large numbers of kernels they try to group kselftest fragments
together, frequently just grouping all the kselftest fragments into a
single block. The increased size is an issue for memory constrained
systems and is also problematic for systems with fixed storage
allocations for kernel images (eg, typical u-boot systems) where it
frequently causes the kernel to overflow the storage space allocated for
kernels. The reduced performance isn't too bad with real hardware but
can be disruptive on emulated platforms.
In order to avoid these issues remove these generic instrumentation
options from the cpufreq fragment, bringing the cpufreq fragment into
line with other fragments which generally set requirements for testing
rather than nice to haves.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/cpufreq/config | 8 --------
1 file changed, 8 deletions(-)
diff --git a/tools/testing/selftests/cpufreq/config b/tools/testing/selftests/cpufreq/config
index 75e900793e8a..ce5068f5a6a2 100644
--- a/tools/testing/selftests/cpufreq/config
+++ b/tools/testing/selftests/cpufreq/config
@@ -5,11 +5,3 @@ CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y
-CONFIG_DEBUG_RT_MUTEXES=y
-CONFIG_DEBUG_PLIST=y
-CONFIG_DEBUG_SPINLOCK=y
-CONFIG_DEBUG_MUTEXES=y
-CONFIG_DEBUG_LOCK_ALLOC=y
-CONFIG_PROVE_LOCKING=y
-CONFIG_LOCKDEP=y
-CONFIG_DEBUG_ATOMIC_SLEEP=y
---
base-commit: ac9a78681b921877518763ba0e89202254349d1b
change-id: 20230605-kselftest-cpufreq-options-2fd6d4742333
Best regards,
--
Mark Brown <broonie(a)kernel.org>
While KUnit tests that cannot be built as a loadable module must depend
on "KUNIT=y", this is not true for modular tests, where it adds an
unnecessary limitation.
Fix this by relaxing the dependency to "KUNIT".
Fixes: 08809e482a1c44d9 ("HID: uclogic: KUnit best practices and naming conventions")
Signed-off-by: Geert Uytterhoeven <geert+renesas(a)glider.be>
Reviewed-by: David Gow <davidgow(a)google.com>
Reviewed-by: José Expósito <jose.exposito89(a)gmail.com>
---
v2:
- Add Reviewed-by.
---
drivers/hid/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hid/Kconfig b/drivers/hid/Kconfig
index 4ce012f83253ec9f..b977450cac75265d 100644
--- a/drivers/hid/Kconfig
+++ b/drivers/hid/Kconfig
@@ -1285,7 +1285,7 @@ config HID_MCP2221
config HID_KUNIT_TEST
tristate "KUnit tests for HID" if !KUNIT_ALL_TESTS
- depends on KUNIT=y
+ depends on KUNIT
depends on HID_BATTERY_STRENGTH
depends on HID_UCLOGIC
default KUNIT_ALL_TESTS
--
2.34.1
From: Menglong Dong <imagedong(a)tencent.com>
For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used
on the kernel functions whose arguments count less than 6. This is not
friendly at all, as too many functions have arguments count more than 6.
Therefore, let's enhance it by increasing the function arguments count
allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.
In the 1st patch, we make arch_prepare_bpf_trampoline() support to copy
function arguments in stack for x86 arch. Therefore, the maximum
arguments can be up to MAX_BPF_FUNC_ARGS for FENTRY and FEXIT.
In the 2nd patch, we clean garbage value in upper bytes of the trampoline
when we store the arguments from regs into stack.
And the 3rd patches are for the testcases of the 1st patch.
Changes since v2:
- keep MAX_BPF_FUNC_ARGS still
- clean garbage value in upper bytes in the 2nd patch
- move bpf_fentry_test{7,12} to bpf_testmod.c and rename them to
bpf_testmod_fentry_test{7,12} meanwhile in the 3rd patch
Changes since v1:
- change the maximun function arguments to 14 from 12
- add testcases (Jiri Olsa)
- instead EMIT4 with EMIT3_off32 for "lea" to prevent overflow
Menglong Dong (3):
bpf, x86: allow function arguments up to 12 for TRACING
bpf, x86: clean garbage value in the stack of trampoline
selftests/bpf: add testcase for FENTRY/FEXIT with 6+ arguments
arch/x86/net/bpf_jit_comp.c | 105 +++++++++++++++---
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 19 +++-
.../selftests/bpf/prog_tests/fentry_fexit.c | 4 +-
.../selftests/bpf/prog_tests/fentry_test.c | 2 +
.../selftests/bpf/prog_tests/fexit_test.c | 2 +
.../testing/selftests/bpf/progs/fentry_test.c | 21 ++++
.../testing/selftests/bpf/progs/fexit_test.c | 33 ++++++
7 files changed, 169 insertions(+), 17 deletions(-)
--
2.40.1
Some test cases from net/tls, net/fcnal-test and net/vrf-xfrm-tests
that rely on cryptographic functions to work and use non-compliant FIPS
algorithms fail in FIPS mode.
In order to allow these tests to pass in a wider set of kernels,
- for net/tls, skip the test variants that use the ChaCha20-Poly1305
and SM4 algorithms, when FIPS mode is enabled;
- for net/fcnal-test, skip the MD5 tests, when FIPS mode is enabled;
- for net/vrf-xfrm-tests, replace the algorithms that are not
FIPS-compliant with compliant ones.
Magali Lemes (3):
selftests: net: tls: check if FIPS mode is enabled
selftests: net: vrf-xfrm-tests: change authentication and encryption
algos
selftests: net: fcnal-test: check if FIPS mode is enabled
tools/testing/selftests/net/fcnal-test.sh | 27 +-
tools/testing/selftests/net/tls.c | 265 +++++++++++++++++-
tools/testing/selftests/net/vrf-xfrm-tests.sh | 32 +--
3 files changed, 298 insertions(+), 26 deletions(-)
--
2.34.1
KVM_GET_REG_LIST will dump all register IDs that are available to
KVM_GET/SET_ONE_REG and It's very useful to identify some platform
regression issue during VM migration.
Patch 1-7 re-structured the get-reg-list test in aarch64 to make some
of the code as common test framework that can be shared by riscv.
Patch 8 enabled the KVM_GET_REG_LIST API in riscv and patch 9-11 added
the corresponding kselftest for checking possible register regressions.
The get-reg-list kvm selftest was ported from aarch64 and tested with
Linux 6.4-rc1 on a Qemu riscv virt machine.
---
Changed since v1:
* rebase to Andrew's changes
* fix coding style
Andrew Jones (7):
KVM: arm64: selftests: Replace str_with_index with strdup_printf
KVM: arm64: selftests: Drop SVE cap check in print_reg
KVM: arm64: selftests: Remove print_reg's dependency on vcpu_config
KVM: arm64: selftests: Rename vcpu_config and add to kvm_util.h
KVM: arm64: selftests: Delete core_reg_fixup
KVM: arm64: selftests: Split get-reg-list test code
KVM: arm64: selftests: Finish generalizing get-reg-list
Haibo Xu (4):
KVM: riscv: Add KVM_GET_REG_LIST API support
KVM: riscv: selftests: Make check_supported arch specific
KVM: riscv: selftests: Skip some registers set operation
KVM: riscv: selftests: Add get-reg-list test
Documentation/virt/kvm/api.rst | 2 +-
arch/riscv/kvm/vcpu.c | 372 ++++++++++++
tools/testing/selftests/kvm/Makefile | 13 +-
.../selftests/kvm/aarch64/get-reg-list.c | 540 ++----------------
tools/testing/selftests/kvm/get-reg-list.c | 426 ++++++++++++++
.../selftests/kvm/include/kvm_util_base.h | 16 +
.../selftests/kvm/include/riscv/processor.h | 3 +
.../testing/selftests/kvm/include/test_util.h | 2 +
tools/testing/selftests/kvm/lib/test_util.c | 15 +
.../selftests/kvm/riscv/get-reg-list.c | 539 +++++++++++++++++
10 files changed, 1428 insertions(+), 500 deletions(-)
create mode 100644 tools/testing/selftests/kvm/get-reg-list.c
create mode 100644 tools/testing/selftests/kvm/riscv/get-reg-list.c
--
2.34.1
Hi,
Enclosed are a pair of patches for an oops that can occur if an exception is
generated while a bpf subprogram is running. One of the bpf_prog_aux entries
for the subprograms are missing an extable. This can lead to an exception that
would otherwise be handled turning into a NULL pointer bug.
The bulk of the change here is simply adding a pair of programs for the
selftest. The proposed fix in this iteration is a 1-line change.
These changes were tested via the verifier and progs selftests and no
regressions were observed.
Changes from v1:
- Add a selftest (Feedback From Alexei Starovoitov)
- Move to a 1-line verifier change instead of searching multiple extables
Krister Johansen (2):
Add a selftest for subprogram extables
bpf: ensure main program has an extable
kernel/bpf/verifier.c | 1 +
.../bpf/prog_tests/subprogs_extable.c | 35 +++++++++
.../bpf/progs/test_subprogs_extable.c | 71 +++++++++++++++++++
3 files changed, 107 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/subprogs_extable.c
create mode 100644 tools/testing/selftests/bpf/progs/test_subprogs_extable.c
--
2.25.1
Hi,
This series is on top of kvmarm/next as I needed to also modify Eager
page splitting logic in clear-dirty-log API. Eager page splitting is not
present in Linux 6.4-rc4.
Also, I had to change selftests patches (1 to 5) as some commits were
removed from kvm/queue remote. This caused issue due to different APIs
being present in dirty_log_perf_test when I was rebasing v2. Those
removed commits are now back in kvm-x86 branch of Sean [1] but not in
kvmarm/next or kvm/queue. I didn't want to wait for review of v2, so I
changed dirty_log_perf_test to work with kvmarm/next branch. When Sean's
kvm-x86 branch is merged, sleftests in this patch series need to be
modified to use new APIs or whoever merges last need to take care of
that.
This patch series modifies clear-dirty-log operation to run under MMU
read lock. It write protects SPTEs and split huge pages using MMU read
lock instead of MMU write lock.
Use of MMU read lock is made possible by using shared page table
walkers. Currently only page fault handlers use shared page table
walkers, with this series, clear-dirty-log operation will also use
shared page table walkers.
Patches 1 to 5:
These patches are modifying dirty_log_perf_test. Intent is to mimic
production scenarios where guest keeps on executing while userspace
thread collects and clears dirty logs independently.
Three new command line options are added:
1. j: Allows to run guest vCPUs and main thread collecting dirty logs
independently of each other after initialization is complete.
2. k: Allows to clear dirty logs in smaller chunks compared to existing
whole memslot clear in one call.
3. l: Allows to add customizable wait time between consecutive clear
dirty log calls to mimic sending dirty memory to destination.
Patch 7-16:
These patches refactor code to move MMU lock operations to arch specific
code, refactor Arm's page table walker APIs, and change MMU write lock
for clearing dirty logs to read lock. Patch 16 has results showing
improvements based on dirty_log_perf_test.
1. https://lore.kernel.org/lkml/168565341087.666819.6731422637224460050.b4-ty@…
v2:
- Fix compile warning for mips and riscv.
- Added logic to continue or retry shared page walk which are not fault
handler.
- Huge page split also changed to run under MMU read lock.
- Added more explanations in commit logs.
- Selftests is modified because a commit series was reverted back in
dirty_log_perf_test on kvm/queue.
v1: https://lore.kernel.org/lkml/20230421165305.804301-1-vipinsh@google.com/
Vipin Sharma (16):
KVM: selftests: Clear dirty logs in user defined chunks sizes in
dirty_log_perf_test
KVM: selftests: Add optional delay between consecutive clear-dirty-log
calls
KVM: selftests: Pass the count of read and write accesses from guest
to host
KVM: selftests: Print read-write progress by vCPUs in
dirty_log_perf_test
KVM: selftests: Allow independent execution of vCPUs in
dirty_log_perf_test
KVM: arm64: Correct the kvm_pgtable_stage2_flush() documentation
KVM: mmu: Move mmu lock/unlock to arch code for clear dirty log
KMV: arm64: Pass page table walker flags to stage2_apply_range_*()
KVM: arm64: Document the page table walker actions based on the
callback's return value
KVM: arm64: Return -ENOENT if PTE is not valid in stage2_attr_walker
KVM: arm64: Use KVM_PGTABLE_WALK_SHARED flag instead of
KVM_PGTABLE_WALK_HANDLE_FAULT
KVM: arm64: Retry shared page table walks outside of fault handler
KVM: arm64: Run clear-dirty-log under MMU read lock
KVM: arm64: Pass page walker flags from callers of stage 2 split
walker
KVM: arm64: Provide option to pass page walker flag for huge page
splits
KVM: arm64: Split huge pages during clear-dirty-log under MMU read
lock
arch/arm64/include/asm/kvm_pgtable.h | 42 +++--
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 4 +-
arch/arm64/kvm/hyp/pgtable.c | 68 ++++++--
arch/arm64/kvm/mmu.c | 65 +++++---
arch/mips/kvm/mmu.c | 2 +
arch/riscv/kvm/mmu.c | 2 +
arch/x86/kvm/mmu/mmu.c | 3 +
.../selftests/kvm/dirty_log_perf_test.c | 147 ++++++++++++++----
tools/testing/selftests/kvm/lib/memstress.c | 13 +-
virt/kvm/dirty_ring.c | 2 -
virt/kvm/kvm_main.c | 4 -
11 files changed, 265 insertions(+), 87 deletions(-)
base-commit: 532b2ecfa547f02b1825108711565eff026bce5a
--
2.41.0.rc0.172.g3f132b7071-goog
Hello Paul,
Thomas and Zhangjin have provided significant nolibc cleanups, and
fixes, as well as preparation work to later support riscv32.
These consist in the following main series:
- generalization of stackprotector to other archs that were not
previously supported (riscv, mips, loongarch, arm, arm64)
- general cleanups of the makefile, test report output, deduplication
of certain tests
- slightly better compliance of some tests performed on certain syscalls
(e.g. no longer pass (void*)1 to gettimeofday() since glibc hates it).
- add support for nanoseconds in stat() and statx()
- fixes for some syscalls (e.g. ppoll() has 5 arguments not 4)
- fixes around limits.h and INT_MAX / INT_FAST64_MAX
I rebased the whole series on top of your latest dev branch (d19a9ca3d5)
and it works fine for all archs.
I don't know if you're still planning on merging new stuff in this area
for 6.5 or not (since I know that it involves new series of tests on your
side as well), but given that Zhangjin will engage into deeper changes
later for riscv32 that will likely imply to update more syscalls to use
the time64 ones, I would prefer to split the cleanups from the hard stuff,
but I'll let you judge based on the current state of what's pending for
6.5.
In any case I'm putting all this here for now (not for merge yet):
git://git.kernel.org/pub/scm/linux/kernel/git/wtarreau/nolibc.git 20230604-nolibc-rv32+stkp6
I'd like Thomas and Zhangjin to perform a last check to confirm they're
OK with this final integration.
Thanks!
Willy
Fixes: 8e3ab529bef9 ("tools/nolibc/unistd: add syscall()")
Signed-off-by: Zhangjin Wu <falcon(a)tinylab.org>
---
Hi, Willy
Since this may be ok for v6.5, so, directly based it on your
20230606-nolibc-rv32+stkp7a branch.
This may conflict with the reviewed series [1], if require, I can renew
that series too.
[1]: https://lore.kernel.org/linux-riscv/cover.1686135913.git.falcon@tinylab.org/
tools/include/nolibc/unistd.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/include/nolibc/unistd.h b/tools/include/nolibc/unistd.h
index c20b2fbf065e..0e832e10a0b2 100644
--- a/tools/include/nolibc/unistd.h
+++ b/tools/include/nolibc/unistd.h
@@ -66,10 +66,10 @@ int tcsetpgrp(int fd, pid_t pid)
_ret; \
})
-#define _sycall_narg(...) __syscall_narg(__VA_ARGS__, 6, 5, 4, 3, 2, 1, 0)
+#define _syscall_narg(...) __syscall_narg(__VA_ARGS__, 6, 5, 4, 3, 2, 1, 0)
#define __syscall_narg(_0, _1, _2, _3, _4, _5, _6, N, ...) N
#define _syscall_n(N, ...) _syscall(N, __VA_ARGS__)
-#define syscall(...) _syscall_n(_sycall_narg(__VA_ARGS__), ##__VA_ARGS__)
+#define syscall(...) _syscall_n(_syscall_narg(__VA_ARGS__), ##__VA_ARGS__)
/* make sure to include all global symbols */
#include "nolibc.h"
--
2.25.1
User space applications watch for timestamp changes on character device
files in order to determine idle time of a given terminal session. For
example, "w" program uses this information to populate the IDLE column
of its output [1]. Similarly, systemd-logind has optional feature where
it uses atime of the tty character device to determine if there was
activity on the terminal associated with the logind's session object. If
there was no activity for a configured period of time then logind will
terminate such session [2].
Now, usually (e.g. bash running on the terminal) the use of the terminal
will update timestamps (atime and mtime) on the corresponding terminal
character device. However, if access to the terminal, e.g. /dev/pts/0,
is performed through magic character device /dev/tty then such access
obviously changes the state of the terminal, however timestamps on the
device that correspond to the terminal (/dev/pts/0) are not updated.
This patch makes sure that we update timestamps on *all* character
devices that correspond to the given tty, because outside observers (w,
systemd-logind) are maybe checking these timestamps. Obviously, they can
not check timestamps on /dev/tty as that has per-process meaning.
[1] https://gitlab.com/procps-ng/procps/-/blob/v4.0.0/w.c#L286
[2] https://github.com/systemd/systemd/blob/v252/NEWS#L477
Signed-off-by: Michal Sekletar <msekleta(a)redhat.com>
---
drivers/tty/tty_io.c | 32 +++++++++++++++++++++-----------
1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 36fb945fdad4..48e0148b0f3e 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -101,6 +101,7 @@
#include <linux/compat.h>
#include <linux/uaccess.h>
#include <linux/termios_internal.h>
+#include <linux/fs.h>
#include <linux/kbd_kern.h>
#include <linux/vt_kern.h>
@@ -811,18 +812,27 @@ void start_tty(struct tty_struct *tty)
}
EXPORT_SYMBOL(start_tty);
-static void tty_update_time(struct timespec64 *time)
+static void tty_update_time(struct tty_struct *tty, int tstamp)
{
+ struct tty_file_private *priv;
time64_t sec = ktime_get_real_seconds();
- /*
- * We only care if the two values differ in anything other than the
- * lower three bits (i.e every 8 seconds). If so, then we can update
- * the time of the tty device, otherwise it could be construded as a
- * security leak to let userspace know the exact timing of the tty.
- */
- if ((sec ^ time->tv_sec) & ~7)
- time->tv_sec = sec;
+ spin_lock(&tty->files_lock);
+ list_for_each_entry(priv, &tty->tty_files, list) {
+ struct file *filp = priv->file;
+ struct inode *inode = file_inode(filp);
+ struct timespec64 *time = tstamp == S_MTIME ? &inode->i_mtime : &inode->i_atime;
+
+ /*
+ * We only care if the two values differ in anything other than the
+ * lower three bits (i.e every 8 seconds). If so, then we can update
+ * the time of the tty device, otherwise it could be construded as a
+ * security leak to let userspace know the exact timing of the tty.
+ */
+ if ((sec ^ time->tv_sec) & ~7)
+ time->tv_sec = sec;
+ }
+ spin_unlock(&tty->files_lock);
}
/*
@@ -928,7 +938,7 @@ static ssize_t tty_read(struct kiocb *iocb, struct iov_iter *to)
tty_ldisc_deref(ld);
if (i > 0)
- tty_update_time(&inode->i_atime);
+ tty_update_time(tty, S_ATIME);
return i;
}
@@ -1036,7 +1046,7 @@ static inline ssize_t do_tty_write(
cond_resched();
}
if (written) {
- tty_update_time(&file_inode(file)->i_mtime);
+ tty_update_time(tty, S_MTIME);
ret = written;
}
out:
--
2.39.2
From: Maxim Mikityanskiy <maxim(a)isovalent.com>
See the details in the commit message (TL/DR: under CAP_BPF, the
verifier can incorrectly conclude that a scalar is zero while in
fact it can be crafted to a predefined number.)
v1 and v2 were sent off-list.
v2 changes:
Added more tests, migrated them to inline asm, started using
bpf_get_prandom_u32, switched to a more bulletproof dead branch check
and modified the failing spill test scenarios so that an unauthorized
access attempt is performed in both branches.
v3 changes:
Dropped an improvement not necessary for the fix, changed the Fixes tag.
v4 changes:
Dropped supposedly redundant tests, kept the ones that result in
different verifier verdicts. Dropped the variable that is not yet
useful in this patch. Rephrased the commit message with Daniel's
suggestions.
Maxim Mikityanskiy (2):
bpf: Fix verifier id tracking of scalars on spill
selftests/bpf: Add test cases to assert proper ID tracking on spill
kernel/bpf/verifier.c | 3 +
.../selftests/bpf/progs/verifier_spill_fill.c | 79 +++++++++++++++++++
2 files changed, 82 insertions(+)
--
2.40.1
Willy, Thomas
This is the revision of the v2 syscall helpers [1], it is based on
20230606-nolibc-rv32+stkp7a of [2]. It doesn't conflict with the v4 of
-ENOSYS patchset [3], so, it is ok to simply merge both of them.
This revision mainly applied Thomas' method, removed the __syscall()
helper and replaced it with __sysret() instead, because __syscall()
looks like _syscall() and syscall(), it may mixlead the developers.
Changes from v2 -> v3:
* tools/nolibc: sys.h: add a syscall return helper
* The __syscall() is removed.
* Align the code style of __sysret() with the others, and use
__inline__ instead of inline (like stdlib.h) to let it work with
the default -std=c89 in tools/testing/selftests/nolibc/Makefile
* tools/nolibc: unistd.h: apply __sysret() helper
As v2.
* tools/nolibc: sys.h: apply __sysret() helper
replaced __syscall() with __sysret() and merged two separated patches of v2 to one.
Did run-user tests for rv32 (with [3]), rv64 and arm64.
BTW, two questions for Thomas,
* This commit 659a49abc9c2 ("tools/nolibc: validate C89 compatibility")
enables -std=c89, why not gnu11 used by kernel ? ;-)
* Do we need to tune the order of the macros in unistd.h like this:
#define _syscall(N, ...) __sysret(my_syscall##N(__VA_ARGS__))
#define _syscall_n(N, ...) _syscall(N, __VA_ARGS__)
#define __syscall_narg(_0, _1, _2, _3, _4, _5, _6, N, ...) N
#define _sycall_narg(...) __syscall_narg(__VA_ARGS__, 6, 5, 4, 3, 2, 1, 0)
#define syscall(...) _syscall_n(_sycall_narg(__VA_ARGS__), ##__VA_ARGS__)
Before, It works but seems not put in using order:
#define _syscall(N, ...) __sysret(my_syscall##N(__VA_ARGS__))
#define _sycall_narg(...) __syscall_narg(__VA_ARGS__, 6, 5, 4, 3, 2, 1, 0)
#define __syscall_narg(_0, _1, _2, _3, _4, _5, _6, N, ...) N
#define _syscall_n(N, ...) _syscall(N, __VA_ARGS__)
#define syscall(...) _syscall_n(_sycall_narg(__VA_ARGS__), ##__VA_ARGS__)
Thanks.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/linux-riscv/cover.1686036862.git.falcon@tinylab.org/
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/wtarreau/nolibc.git
[3]: https://lore.kernel.org/linux-riscv/cover.1686128703.git.falcon@tinylab.org…
Zhangjin Wu (3):
tools/nolibc: sys.h: add a syscall return helper
tools/nolibc: unistd.h: apply __sysret() helper
tools/nolibc: sys.h: apply __sysret() helper
tools/include/nolibc/sys.h | 364 +++++-----------------------------
tools/include/nolibc/unistd.h | 11 +-
2 files changed, 55 insertions(+), 320 deletions(-)
--
2.25.1
*Changes in v17*
- Rebase on top of next-20230606
- Minor improvements in PAGEMAP_SCAN IOCTL patch
*Changes in v16*
- Fix a corner case
- Add exclusive PM_SCAN_OP_WP back
*Changes in v15*
- Build fix (Add missed build fix in RESEND)
*Changes in v14*
- Fix build error caused by #ifdef added at last minute in some configs
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 58 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 505 ++++++
fs/userfaultfd.c | 26 +-
include/linux/hugetlb.h | 1 +
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 53 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 34 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 53 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1459 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
16 files changed, 2266 insertions(+), 24 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
Hi,
This follows the discussion here:
https://lore.kernel.org/linux-kselftest/20230324123157.bbwvfq4gsxnlnfwb@hou…
This shows a couple of inconsistencies with regard to how device-managed
resources are cleaned up. Basically, devm resources will only be cleaned up
if the device is attached to a bus and bound to a driver. Failing any of
these cases, a call to device_unregister will not end up in the devm
resources being released.
We had to work around it in DRM to provide helpers to create a device for
kunit tests, but the current discussion around creating similar, generic,
helpers for kunit resumed interest in fixing this.
This can be tested using the command:
./tools/testing/kunit/kunit.py run --kunitconfig=drivers/base/test/
Let me know what you think,
Maxime
Signed-off-by: Maxime Ripard <maxime(a)cerno.tech>
---
Maxime Ripard (2):
drivers: base: Add basic devm tests for root devices
drivers: base: Add basic devm tests for platform devices
drivers/base/test/.kunitconfig | 2 +
drivers/base/test/Kconfig | 4 +
drivers/base/test/Makefile | 3 +
drivers/base/test/platform-device-test.c | 278 +++++++++++++++++++++++++++++++
drivers/base/test/root-device-test.c | 120 +++++++++++++
5 files changed, 407 insertions(+)
---
base-commit: a6faf7ea9fcb7267d06116d4188947f26e00e57e
change-id: 20230329-kunit-devm-inconsistencies-test-5e5a7d01e60d
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
Add documentation for the new Virtual PCM Test Driver. It covers all
possible usage cases: errors and delay injections, random and
pattern-based data generation, playback and ioctl redefinition
functionalities testing.
We have a lot of different virtual media drivers, which can be used for
testing of the userspace applications and media subsystem middle layer.
However, all of them are aimed at testing the video functionality and
simulating the video devices. For audio devices we have only snd-dummy
module, which is good in simulating the correct behavior of an ALSA device.
I decided to write a tool, which would help to test the userspace ALSA
programs (and the PCM middle layer as well) under unusual circumstances
to figure out how they would behave. So I came up with this Virtual PCM
Test Driver.
This new Virtual PCM Test Driver has several features which can be useful
during the userspace ALSA applications testing/fuzzing, or testing/fuzzing
of the PCM middle layer. Not all of them can be implemented using the
existing virtual drivers (like dummy or loopback). Here is what can this
driver do:
- Simulate both capture and playback processes
- Check the playback stream for containing the looped pattern
- Generate random or pattern-based capture data
- Inject delays into the playback and capturing processes
- Inject errors during the PCM callbacks
Also, this driver can check the playback stream for containing the
predefined pattern, which is used in the corresponding selftest to check
the PCM middle layer data transferring functionality. Additionally, this
driver redefines the default RESET ioctl, and the selftest covers this PCM
API functionality as well.
The driver supports both interleaved and non-interleaved access modes, and
have separate pattern buffers for each channel. The driver supports up to
4 channels and up to 8 substreams.
Signed-off-by: Ivan Orlov <ivan.orlov0322(a)gmail.com>
---
V1 -> V2:
- Rename the driver from from 'valsa' to 'pcmtest'.
- Implement support for interleaved and non-interleaved access modes
- Add support for 8 substreams and 4 channels
- Extend supported formats
- Extend and rewrite in C the selftest for the driver
V2 -> V3:
- Add separate pattern buffers for each channel
- Speed up the capture data generation when using interleaved access mode
- Extend the corresponding selftest to cover the multiple channels
capturing and playback functionalities when using interleaved access mode.
- Fix documentation issues
Documentation/sound/cards/index.rst | 1 +
Documentation/sound/cards/pcmtest.rst | 120 ++++++++++++++++++++++++++
2 files changed, 121 insertions(+)
create mode 100644 Documentation/sound/cards/pcmtest.rst
diff --git a/Documentation/sound/cards/index.rst b/Documentation/sound/cards/index.rst
index c016f8c3b88b..49c1f2f688f8 100644
--- a/Documentation/sound/cards/index.rst
+++ b/Documentation/sound/cards/index.rst
@@ -17,3 +17,4 @@ Card-Specific Information
hdspm
serial-u16550
img-spdif-in
+ pcmtest
diff --git a/Documentation/sound/cards/pcmtest.rst b/Documentation/sound/cards/pcmtest.rst
new file mode 100644
index 000000000000..e163522f3205
--- /dev/null
+++ b/Documentation/sound/cards/pcmtest.rst
@@ -0,0 +1,120 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+The Virtual PCM Test Driver
+===========================
+
+The Virtual PCM Test Driver emulates a generic PCM device, and can be used for
+testing/fuzzing of the userspace ALSA applications, as well as for testing/fuzzing of
+the PCM middle layer. Additionally, it can be used for simulating hard to reproduce
+problems with PCM devices.
+
+What can this driver do?
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+At this moment the driver can do the following things:
+ * Simulate both capture and playback processes
+ * Generate random or pattern-based capturing data
+ * Inject delays into the playback and capturing processes
+ * Inject errors during the PCM callbacks
+
+It supports up to 8 substreams and 4 channels. Also it supports both interleaved and
+non-interleaved access modes.
+
+Also, this driver can check the playback stream for containing the predefined pattern,
+which is used in the corresponding selftest (alsa/pcmtest-test.sh) to check the PCM middle
+layer data transferring functionality. Additionally, this driver redefines the default
+RESET ioctl, and the selftest covers this PCM API functionality as well.
+
+Configuration
+-------------
+
+The driver has several parameters besides the common ALSA module parameters:
+
+ * fill_mode (bool) - Buffer fill mode (see below)
+ * inject_delay (int)
+ * inject_hwpars_err (bool)
+ * inject_prepare_err (bool)
+ * inject_trigger_err (bool)
+
+
+Capture Data Generation
+-----------------------
+
+The driver has two modes of data generation: the first (0 in the fill_mode parameter)
+means random data generation, the second (1 in the fill_mode) - pattern-based
+data generation. Let's look at the second mode.
+
+First of all, you may want to specify the pattern for data generation. You can do it
+by writing the pattern to the debugfs file. There are pattern buffer debugfs entries
+for each channel, as well as entries which contain the pattern buffer length.
+
+ * /sys/kernel/debug/pcmtest/fill_pattern[0-3]
+ * /sys/kernel/debug/pcmtest/fill_pattern[0-3]_len
+
+To set the pattern for the channel 0 you can execute the following command:
+
+.. code-block:: bash
+
+ echo -n mycoolpattern > /sys/kernel/debug/pcmtest/fill_pattern0
+
+Then, after every capture action performed on the 'pcmtest' device the buffer for the
+channel 0 will contain 'mycoolpatternmycoolpatternmycoolpatternmy...'.
+
+The pattern itself can be up to 4096 bytes long.
+
+Delay injection
+---------------
+
+The driver has 'inject_delay' parameter, which has very self-descriptive name and
+can be used for time delay/speedup simulations. The parameter has integer type, and
+it means the delay added between module's internal timer ticks.
+
+If the 'inject_delay' value is positive, the buffer will be filled slower, if it is
+negative - faster. You can try it yourself by starting a recording in any
+audiorecording application (like Audacity) and selecting the 'pcmtest' device as a
+source.
+
+This parameter can be also used for generating a huge amount of sound data in a very
+short period of time (with the negative 'inject_delay' value).
+
+Errors injection
+----------------
+
+This module can be used for injecting errors into the PCM communication process. This
+action can help you to figure out how the userspace ALSA program behaves under unusual
+circumstances.
+
+For example, you can make all 'hw_params' PCM callback calls return EBUSY error by
+writing '1' to the 'inject_hwpars_err' module parameter:
+
+.. code-block:: bash
+
+ echo 1 > /sys/module/snd_pcmtest/parameters/inject_hwpars_err
+
+Errors can be injected into the following PCM callbacks:
+
+ * hw_params (EBUSY)
+ * prepare (EINVAL)
+ * trigger (EINVAL)
+
+Playback test
+-------------
+
+This driver can be also used for the playback functionality testing - every time you
+write the playback data to the 'pcmtest' PCM device and close it, the driver checks the
+buffer for containing the looped pattern (which is specified in the fill_pattern
+debugfs file for each channel). If the playback buffer content represents the looped
+pattern, 'pc_test' debugfs entry is set into '1'. Otherwise, the driver sets it to '0'.
+
+ioctl redefinition test
+-----------------------
+
+The driver redefines the 'reset' ioctl, which is default for all PCM devices. To test
+this functionality, we can trigger the reset ioctl and check the 'ioctl_test' debugfs
+entry:
+
+.. code-block:: bash
+
+ cat /sys/kernel/debug/pcmtest/ioctl_test
+
+If the ioctl is triggered successfully, this file will contain '1', and '0' otherwise.
--
2.34.1
From: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 upstream.
Dan Carpenter spotted a race condition in a couple of situations like
these in the test_firmware driver:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
ret = kstrtou8(buf, 10, &val);
if (ret)
return ret;
mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
static ssize_t config_num_requests_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
int rc;
mutex_lock(&test_fw_mutex);
if (test_fw_config->reqs) {
pr_err("Must call release_all_firmware prior to changing config\n");
rc = -EINVAL;
mutex_unlock(&test_fw_mutex);
goto out;
}
mutex_unlock(&test_fw_mutex);
rc = test_dev_config_update_u8(buf, count,
&test_fw_config->num_requests);
out:
return rc;
}
static ssize_t config_read_fw_idx_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
return test_dev_config_update_u8(buf, count,
&test_fw_config->read_fw_idx);
}
The function test_dev_config_update_u8() is called from both the locked
and the unlocked context, function config_num_requests_store() and
config_read_fw_idx_store() which can both be called asynchronously as
they are driver's methods, while test_dev_config_update_u8() and siblings
change their argument pointed to by u8 *cfg or similar pointer.
To avoid deadlock on test_fw_mutex, the lock is dropped before calling
test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8()
itself, but alas this creates a race condition.
Having two locks wouldn't assure a race-proof mutual exclusion.
This situation is best avoided by the introduction of a new, unlocked
function __test_dev_config_update_u8() which can be called from the locked
context and reducing test_dev_config_update_u8() to:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
mutex_lock(&test_fw_mutex);
ret = __test_dev_config_update_u8(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
}
doing the locking and calling the unlocked primitive, which enables both
locked and unlocked versions without duplication of code.
The similar approach was applied to all functions called from the locked
and the unlocked context, which safely mitigates both deadlocks and race
conditions in the driver.
__test_dev_config_update_bool(), __test_dev_config_update_u8() and
__test_dev_config_update_size_t() unlocked versions of the functions
were introduced to be called from the locked contexts as a workaround
without releasing the main driver's lock and thereof causing a race
condition.
The test_dev_config_update_bool(), test_dev_config_update_u8() and
test_dev_config_update_size_t() locked versions of the functions
are being called from driver methods without the unnecessary multiplying
of the locking and unlocking code for each method, and complicating
the code with saving of the return value across lock.
Fixes: 7feebfa487b92 ("test_firmware: add support for request_firmware_into_buf")
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Takashi Iwai <tiwai(a)suse.de>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v5.4
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
Link: https://lore.kernel.org/r/20230509084746.48259-1-mirsad.todorovac@alu.unizg…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
lib/test_firmware.c | 52 +++++++++++++++++++++++++++++++++++-----------------
1 file changed, 35 insertions(+), 17 deletions(-)
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -353,16 +353,26 @@ static ssize_t config_test_show_str(char
return len;
}
-static int test_dev_config_update_bool(const char *buf, size_t size,
+static inline int __test_dev_config_update_bool(const char *buf, size_t size,
bool *cfg)
{
int ret;
- mutex_lock(&test_fw_mutex);
if (kstrtobool(buf, cfg) < 0)
ret = -EINVAL;
else
ret = size;
+
+ return ret;
+}
+
+static int test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_bool(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
@@ -373,7 +383,8 @@ static ssize_t test_dev_config_show_bool
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_size_t(const char *buf,
+static int __test_dev_config_update_size_t(
+ const char *buf,
size_t size,
size_t *cfg)
{
@@ -384,9 +395,7 @@ static int test_dev_config_update_size_t
if (ret)
return ret;
- mutex_lock(&test_fw_mutex);
*(size_t *)cfg = new;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
@@ -402,7 +411,7 @@ static ssize_t test_dev_config_show_int(
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+static int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
@@ -411,14 +420,23 @@ static int test_dev_config_update_u8(con
if (ret)
return ret;
- mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
+static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_u8(buf, size, cfg);
+ mutex_unlock(&test_fw_mutex);
+
+ return ret;
+}
+
static ssize_t test_dev_config_show_u8(char *buf, u8 val)
{
return snprintf(buf, PAGE_SIZE, "%u\n", val);
@@ -471,10 +489,10 @@ static ssize_t config_num_requests_store
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_u8(buf, count,
- &test_fw_config->num_requests);
+ rc = __test_dev_config_update_u8(buf, count,
+ &test_fw_config->num_requests);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
@@ -518,10 +536,10 @@ static ssize_t config_buf_size_store(str
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_size_t(buf, count,
- &test_fw_config->buf_size);
+ rc = __test_dev_config_update_size_t(buf, count,
+ &test_fw_config->buf_size);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
@@ -548,10 +566,10 @@ static ssize_t config_file_offset_store(
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_size_t(buf, count,
- &test_fw_config->file_offset);
+ rc = __test_dev_config_update_size_t(buf, count,
+ &test_fw_config->file_offset);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
Hi, Willy
Thanks very much for your merge of the v3 generic part1 of rv32, just
tested your latest 20230604-nolibc-rv32+stkp6 branch, everything work
well except a trivial test report regression on the 'run' target.
Besides the fixup, a standalone test-report target added to share them
among run, run-user and re-run and allow independent test report check
via direct 'make test-report'.
Best regards,
Zhangjin
---
Zhangjin Wu (4):
selftests/nolibc: add a test-report target
selftests/nolibc: allow run test-report directly
selftests/nolibc: always print the log file
selftests/nolibc: fix up test-report for run target
tools/testing/selftests/nolibc/Makefile | 30 ++++++++++++-------------
1 file changed, 15 insertions(+), 15 deletions(-)
--
2.25.1
Add documentation for the new Virtual PCM Test Driver. It covers all
possible usage cases: errors and delay injections, random and
pattern-based data generation, playback and ioctl redefinition
functionalities testing.
We have a lot of different virtual media drivers, which can be used for
testing of the userspace applications and media subsystem middle layer.
However, all of them are aimed at testing the video functionality and
simulating the video devices. For audio devices we have only snd-dummy
module, which is good in simulating the correct behavior of an ALSA device.
I decided to write a tool, which would help to test the userspace ALSA
programs (and the PCM middle layer as well) under unusual circumstances
to figure out how they would behave. So I came up with this Virtual PCM
Test Driver.
This new Virtual PCM Test Driver has several features which can be useful
during the userspace ALSA applications testing/fuzzing, or testing/fuzzing
of the PCM middle layer. Not all of them can be implemented using the
existing virtual drivers (like dummy or loopback). Here is what can this
driver do:
- Simulate both capture and playback processes
- Check the playback stream for containing the looped pattern
- Generate random or pattern-based capture data
- Inject delays into the playback and capturing processes
- Inject errors during the PCM callbacks
Also, this driver can check the playback stream for containing the
predefined pattern, which is used in the corresponding selftest to check
the PCM middle layer data transferring functionality. Additionally, this
driver redefines the default RESET ioctl, and the selftest covers this PCM
API functionality as well.
The driver supports both interleaved and non-interleaved access modes, and
have separate pattern buffers for each channel. The driver supports up to
4 channels and up to 8 substreams.
Signed-off-by: Ivan Orlov <ivan.orlov0322(a)gmail.com>
---
V1 -> V2:
- Rename the driver from from 'valsa' to 'pcmtest'.
- Implement support for interleaved and non-interleaved access modes
- Add support for 8 substreams and 4 channels
- Extend supported formats
- Extend and rewrite in C the selftest for the driver
V2 -> V3:
- Add separate pattern buffers for each channel
- Speed up the capture data generation when using interleaved access mode
- Extend the corresponding selftest to cover the multiple channels
capturing and playback functionalities when using interleaved access mode.
- Fix documentation issues
V3 -> V4:
- Fix issue in the selftest: there was a typo in the fscanf argument.
Documentation/sound/cards/index.rst | 1 +
Documentation/sound/cards/pcmtest.rst | 120 ++++++++++++++++++++++++++
2 files changed, 121 insertions(+)
create mode 100644 Documentation/sound/cards/pcmtest.rst
diff --git a/Documentation/sound/cards/index.rst b/Documentation/sound/cards/index.rst
index c016f8c3b88b..49c1f2f688f8 100644
--- a/Documentation/sound/cards/index.rst
+++ b/Documentation/sound/cards/index.rst
@@ -17,3 +17,4 @@ Card-Specific Information
hdspm
serial-u16550
img-spdif-in
+ pcmtest
diff --git a/Documentation/sound/cards/pcmtest.rst b/Documentation/sound/cards/pcmtest.rst
new file mode 100644
index 000000000000..e163522f3205
--- /dev/null
+++ b/Documentation/sound/cards/pcmtest.rst
@@ -0,0 +1,120 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+The Virtual PCM Test Driver
+===========================
+
+The Virtual PCM Test Driver emulates a generic PCM device, and can be used for
+testing/fuzzing of the userspace ALSA applications, as well as for testing/fuzzing of
+the PCM middle layer. Additionally, it can be used for simulating hard to reproduce
+problems with PCM devices.
+
+What can this driver do?
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+At this moment the driver can do the following things:
+ * Simulate both capture and playback processes
+ * Generate random or pattern-based capturing data
+ * Inject delays into the playback and capturing processes
+ * Inject errors during the PCM callbacks
+
+It supports up to 8 substreams and 4 channels. Also it supports both interleaved and
+non-interleaved access modes.
+
+Also, this driver can check the playback stream for containing the predefined pattern,
+which is used in the corresponding selftest (alsa/pcmtest-test.sh) to check the PCM middle
+layer data transferring functionality. Additionally, this driver redefines the default
+RESET ioctl, and the selftest covers this PCM API functionality as well.
+
+Configuration
+-------------
+
+The driver has several parameters besides the common ALSA module parameters:
+
+ * fill_mode (bool) - Buffer fill mode (see below)
+ * inject_delay (int)
+ * inject_hwpars_err (bool)
+ * inject_prepare_err (bool)
+ * inject_trigger_err (bool)
+
+
+Capture Data Generation
+-----------------------
+
+The driver has two modes of data generation: the first (0 in the fill_mode parameter)
+means random data generation, the second (1 in the fill_mode) - pattern-based
+data generation. Let's look at the second mode.
+
+First of all, you may want to specify the pattern for data generation. You can do it
+by writing the pattern to the debugfs file. There are pattern buffer debugfs entries
+for each channel, as well as entries which contain the pattern buffer length.
+
+ * /sys/kernel/debug/pcmtest/fill_pattern[0-3]
+ * /sys/kernel/debug/pcmtest/fill_pattern[0-3]_len
+
+To set the pattern for the channel 0 you can execute the following command:
+
+.. code-block:: bash
+
+ echo -n mycoolpattern > /sys/kernel/debug/pcmtest/fill_pattern0
+
+Then, after every capture action performed on the 'pcmtest' device the buffer for the
+channel 0 will contain 'mycoolpatternmycoolpatternmycoolpatternmy...'.
+
+The pattern itself can be up to 4096 bytes long.
+
+Delay injection
+---------------
+
+The driver has 'inject_delay' parameter, which has very self-descriptive name and
+can be used for time delay/speedup simulations. The parameter has integer type, and
+it means the delay added between module's internal timer ticks.
+
+If the 'inject_delay' value is positive, the buffer will be filled slower, if it is
+negative - faster. You can try it yourself by starting a recording in any
+audiorecording application (like Audacity) and selecting the 'pcmtest' device as a
+source.
+
+This parameter can be also used for generating a huge amount of sound data in a very
+short period of time (with the negative 'inject_delay' value).
+
+Errors injection
+----------------
+
+This module can be used for injecting errors into the PCM communication process. This
+action can help you to figure out how the userspace ALSA program behaves under unusual
+circumstances.
+
+For example, you can make all 'hw_params' PCM callback calls return EBUSY error by
+writing '1' to the 'inject_hwpars_err' module parameter:
+
+.. code-block:: bash
+
+ echo 1 > /sys/module/snd_pcmtest/parameters/inject_hwpars_err
+
+Errors can be injected into the following PCM callbacks:
+
+ * hw_params (EBUSY)
+ * prepare (EINVAL)
+ * trigger (EINVAL)
+
+Playback test
+-------------
+
+This driver can be also used for the playback functionality testing - every time you
+write the playback data to the 'pcmtest' PCM device and close it, the driver checks the
+buffer for containing the looped pattern (which is specified in the fill_pattern
+debugfs file for each channel). If the playback buffer content represents the looped
+pattern, 'pc_test' debugfs entry is set into '1'. Otherwise, the driver sets it to '0'.
+
+ioctl redefinition test
+-----------------------
+
+The driver redefines the 'reset' ioctl, which is default for all PCM devices. To test
+this functionality, we can trigger the reset ioctl and check the 'ioctl_test' debugfs
+entry:
+
+.. code-block:: bash
+
+ cat /sys/kernel/debug/pcmtest/ioctl_test
+
+If the ioctl is triggered successfully, this file will contain '1', and '0' otherwise.
--
2.34.1