selftests: pidfd: pidfd_wait hangs on linux next kernel on x86_64,
i386 and arm64 Juno-r2
These devices are using NFS mounted rootfs.
I have tested pidfd testcases independently and all test PASS.
The Hang or exit from test run noticed when run by run_kselftest.sh
pidfd_wait.c:208:wait_nonblock:Expected sys_waitid(P_PIDFD, pidfd,
&info, WSTOPPED, NULL) (-1) == 0 (0)
wait_nonblock: Test terminated by assertion
metadata:
git branch: master
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
git commit: e64997027d5f171148687e58b78c8b3c869a6158
git describe: next-20200922
make_kernelversion: 5.9.0-rc6
kernel-config:
http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/intel-core2-32/lkft…
Reported-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Test output log:
---------------------
[ 1385.104983] audit: type=1701 audit(1600804535.960:87865):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=31268
comm=\"pidfd_wait\"
exe=\"/opt/kselftests/default-in-kernel/pidfd/pidfd_wait\" sig=6 res=1
# selftests: pidfd: pidfd_wait
# TAP version 13
# 1..3
# # Starting 3 tests from 1 test cases.
# # RUN global.wait_simple ...
# # OK global.wait_simple
# ok 1 global.wait_simple
# # RUN global.wait_states ...
# # OK global.wait_states
# ok 2 global.wait_states
# # RUN global.wait_nonblock ...
# # pidfd_wait.c:208:wait_nonblock:Expected sys_waitid(P_PIDFD, pidfd,
&info, WSTOPPED, NULL) (-1) == 0 (0)
# # wait_nonblock: Test terminated by assertion
# # FAIL global.wait_nonblock
# not ok 3 global.wait_nonblock
# # FAILED: 2 / 3 tests passed.
# # Totals: pass:2 fail:1 xfail:0 xpass:0 skip:0 error:0
Marking unfinished test run as failed
ref:
https://lkft.validation.linaro.org/scheduler/job/1782129#L11737https://lkft.validation.linaro.org/scheduler/job/1782130#L12735https://lkft.validation.linaro.org/scheduler/job/1782138#L14178
--
Linaro LKFT
https://lkft.linaro.org
In case of errors, this message was printed:
(...)
balanced bwidth with unbalanced delay 5233 max 5005 [ fail ]
client exit code 0, server 0
\nnetns ns3-0-EwnkPH socket stat for 10003:
(...)
Obviously, the idea was to add a new line before the socket stat and not
print "\nnetns".
The commit 8b974778f998 ("selftests: mptcp: interpret \n as a new line")
is very similar to this one. But the modification in simult_flows.sh was
missed because this commit above was done in parallel to one here below.
Fixes: 1a418cb8e888 ("mptcp: simult flow self-tests")
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
tools/testing/selftests/net/mptcp/simult_flows.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh b/tools/testing/selftests/net/mptcp/simult_flows.sh
index 0d88225daa02..2f649b431456 100755
--- a/tools/testing/selftests/net/mptcp/simult_flows.sh
+++ b/tools/testing/selftests/net/mptcp/simult_flows.sh
@@ -200,9 +200,9 @@ do_transfer()
echo " [ fail ]"
echo "client exit code $retc, server $rets" 1>&2
- echo "\nnetns ${ns3} socket stat for $port:" 1>&2
+ echo -e "\nnetns ${ns3} socket stat for $port:" 1>&2
ip netns exec ${ns3} ss -nita 1>&2 -o "sport = :$port"
- echo "\nnetns ${ns1} socket stat for $port:" 1>&2
+ echo -e "\nnetns ${ns1} socket stat for $port:" 1>&2
ip netns exec ${ns1} ss -nita 1>&2 -o "dport = :$port"
ls -l $sin $cout
ls -l $cin $sout
--
2.27.0
This silences a static checker warning due to the unusual macro
construction of EXPECT_*() by adding explicit {}s around the enclosing
while loop.
Reported-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Fixes: 7f657d5bf507 ("selftests: tls: add selftests for TLS sockets")
Signed-off-by: Kees Cook <keescook(a)chromium.org>
---
v2: rebase to v5.9-rc2; oops, I lost this patch and just found it again
v1: https://lore.kernel.org/lkml/20190108214159.GA33292@beast/
---
tools/testing/selftests/net/tls.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index b599f1fa99b5..44984741bd41 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -387,8 +387,9 @@ TEST_F(tls, sendmsg_large)
EXPECT_EQ(sendmsg(self->cfd, &msg, 0), send_len);
}
- while (recvs++ < sends)
+ while (recvs++ < sends) {
EXPECT_NE(recv(self->fd, mem, send_len, 0), -1);
+ }
free(mem);
}
--
2.25.1
This patch series is a result of discussion at the refcount_t BOF
the Linux Plumbers Conference. In this discussion, we identified
a need for looking closely and investigating atomic_t usages in
the kernel when it is used strictly as a counter without it
controlling object lifetimes and state changes.
There are a number of atomic_t usages in the kernel where atomic_t api
is used strictly for counting and not for managing object lifetime. In
some cases, atomic_t might not even be needed.
The purpose of these counters is to clearly differentiate atomic_t
counters from atomic_t usages that guard object lifetimes, hence prone
to overflow and underflow errors. It allows tools that scan for underflow
and overflow on atomic_t usages to detect overflow and underflows to scan
just the cases that are prone to errors.
Simple atomic counters api provides interfaces for simple atomic counters
that just count, and don't guard resource lifetimes. Counter will wrap
around to 0 when it overflows and should not be used to guard resource
lifetimes, device usage and open counts that control state changes, and
pm states.
Using counter_atomic* to guard lifetimes could lead to use-after free
when it overflows and undefined behavior when used to manage state
changes and device usage/open states.
This patch series introduces Simple atomic counters. Counter atomic ops
leverage atomic_t and provide a sub-set of atomic_t ops.
In addition this patch series converts a few drivers to use the new api.
The following criteria is used for select variables for conversion:
1. Variable doesn't guard object lifetimes, manage state changes e.g:
device usage counts, device open counts, and pm states.
2. Variable is used for stats and counters.
3. The conversion doesn't change the overflow behavior.
Changes since Patch v1
-- Thanks for reviews and reviewed-by, and Acked-by tags. Updated
the patches with the tags.
-- Addressed Kees's and Joel's comments:
1. Removed dec_return interfaces (Patch 1/11)
2. Removed counter_simple interfaces to be added later with changes
to drivers that use them (if any) (Patch 1/11)
3. Comment and Changelogs updates to Patch 2/11
Kees, if this series is good, would you like to take this through your
tree or would you like to take this through mine?
Changes since RFC:
-- Thanks for reviews and reviewed-by, and Acked-by tags. Updated
the patches with the tags.
-- Addressed Kees's comments:
1. Non-atomic counters renamed to counter_simple32 and counter_simple64
to clearly indicate size.
2. Added warning for counter_simple* usage and it should be used only
when there is no need for atomicity.
3. Renamed counter_atomic to counter_atomic32 to clearly indicate size.
4. Renamed counter_atomic_long to counter_atomic64 and it now uses
atomic64_t ops and indicates size.
5. Test updated for the API renames.
6. Added helper functions for test results printing
7. Verified that the test module compiles in kunit env. and test
module can be loaded to run the test.
8. Updated Documentation to reflect the intent to make the API
restricted so it can never be used to guard object lifetimes
and state management. I left _return ops for now, inc_return
is necessary for now as per the discussion we had on this topic.
-- Updated driver patches with API name changes.
-- We discussed if binder counters can be non-atomic. For now I left
them the same as the RFC patch - using counter_atomic32
-- Unrelated to this patch series:
The patch series review uncovered improvements could be made to
test_async_driver_probe and vmw_vmci/vmci_guest. I will track
these for fixing later.
Shuah Khan (11):
counters: Introduce counter_atomic* counters
selftests:lib:test_counters: add new test for counters
drivers/base: convert deferred_trigger_count and probe_count to
counter_atomic32
drivers/base/devcoredump: convert devcd_count to counter_atomic32
drivers/acpi: convert seqno counter_atomic32
drivers/acpi/apei: convert seqno counter_atomic32
drivers/android/binder: convert stats, transaction_log to
counter_atomic32
drivers/base/test/test_async_driver_probe: convert to use
counter_atomic32
drivers/char/ipmi: convert stats to use counter_atomic32
drivers/misc/vmw_vmci: convert num guest devices counter to
counter_atomic32
drivers/edac: convert pci counters to counter_atomic32
Documentation/core-api/counters.rst | 103 +++++++++++
MAINTAINERS | 8 +
drivers/acpi/acpi_extlog.c | 5 +-
drivers/acpi/apei/ghes.c | 5 +-
drivers/android/binder.c | 41 ++---
drivers/android/binder_internal.h | 3 +-
drivers/base/dd.c | 19 +-
drivers/base/devcoredump.c | 5 +-
drivers/base/test/test_async_driver_probe.c | 23 +--
drivers/char/ipmi/ipmi_msghandler.c | 9 +-
drivers/char/ipmi/ipmi_si_intf.c | 9 +-
drivers/edac/edac_pci.h | 5 +-
drivers/edac/edac_pci_sysfs.c | 28 +--
drivers/misc/vmw_vmci/vmci_guest.c | 9 +-
include/linux/counters.h | 173 +++++++++++++++++++
lib/Kconfig | 10 ++
lib/Makefile | 1 +
lib/test_counters.c | 157 +++++++++++++++++
tools/testing/selftests/lib/Makefile | 1 +
tools/testing/selftests/lib/config | 1 +
tools/testing/selftests/lib/test_counters.sh | 5 +
21 files changed, 546 insertions(+), 74 deletions(-)
create mode 100644 Documentation/core-api/counters.rst
create mode 100644 include/linux/counters.h
create mode 100644 lib/test_counters.c
create mode 100755 tools/testing/selftests/lib/test_counters.sh
--
2.25.1
This patch reduces the running time for hmm-tests from about 10+
seconds, to just under 1.0 second, for an approximately 10x speedup.
That brings it in line with most of the other tests in selftests/vm,
which mostly run in < 1 sec.
This is done with a one-line change that simply reduces the number of
iterations of several tests, from 256, to 10. Thanks to Ralph Campbell
for suggesting changing NTIMES as a way to get the speedup.
Suggested-by: Ralph Campbell <rcampbell(a)nvidia.com>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
This is based on mmotm.
tools/testing/selftests/vm/hmm-tests.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vm/hmm-tests.c b/tools/testing/selftests/vm/hmm-tests.c
index 6b79723d7dc6..5d1ac691b9f4 100644
--- a/tools/testing/selftests/vm/hmm-tests.c
+++ b/tools/testing/selftests/vm/hmm-tests.c
@@ -49,7 +49,7 @@ struct hmm_buffer {
#define TWOMEG (1 << 21)
#define HMM_BUFFER_SIZE (1024 << 12)
#define HMM_PATH_MAX 64
-#define NTIMES 256
+#define NTIMES 10
#define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1)))
--
2.28.0
v3 -> v4:
- Rebasing
- Cast bpf_[per|this]_cpu_ptr's parameter to void __percpu * before
passing into per_cpu_ptr.
v2 -> v3:
- Rename functions and variables in verifier for better readability.
- Stick to logging message convention in libbpf.
- Move bpf_per_cpu_ptr and bpf_this_cpu_ptr from trace-specific
helper set to base helper set.
- More specific test in ksyms_btf.
- Fix return type cast in bpf_*_cpu_ptr.
- Fix btf leak in ksyms_btf selftest.
- Fix return error code for kallsyms_find().
v1 -> v2:
- Move check_pseudo_btf_id from check_ld_imm() to
replace_map_fd_with_map_ptr() and rename the latter.
- Add bpf_this_cpu_ptr().
- Use bpf_core_types_are_compat() in libbpf.c for checking type
compatibility.
- Rewrite typed ksym extern type in BTF with int to save space.
- Minor revision of bpf_per_cpu_ptr()'s comments.
- Avoid using long in tests that use skeleton.
- Refactored test_ksyms.c by moving kallsyms_find() to trace_helpers.c
- Fold the patches that sync include/linux/uapi and
tools/include/linux/uapi.
rfc -> v1:
- Encode VAR's btf_id for PSEUDO_BTF_ID.
- More checks in verifier. Checking the btf_id passed as
PSEUDO_BTF_ID is valid VAR, its name and type.
- Checks in libbpf on type compatibility of ksyms.
- Add bpf_per_cpu_ptr() to access kernel percpu vars. Introduced
new ARG and RET types for this helper.
This patch series extends the previously added __ksym externs with
btf support.
Right now the __ksym externs are treated as pure 64-bit scalar value.
Libbpf replaces ld_imm64 insn of __ksym by its kernel address at load
time. This patch series extend those externs with their btf info. Note
that btf support for __ksym must come with the kernel btf that has
VARs encoded to work properly. The corresponding chagnes in pahole
is available at [1] (with a fix at [2] for gcc 4.9+).
The first 3 patches in this series add support for general kernel
global variables, which include verifier checking (01/06), libpf
support (02/06) and selftests for getting typed ksym extern's kernel
address (03/06).
The next 3 patches extends that capability further by introducing
helpers bpf_per_cpu_ptr() and bpf_this_cpu_ptr(), which allows accessing
kernel percpu variables correctly (04/06 and 05/06).
The tests of this feature were performed against pahole that is extended
with [1] and [2]. For kernel BTF that does not have VARs encoded, the
selftests will be skipped.
[1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=f3d9054ba…
[2] https://www.spinics.net/lists/dwarves/msg00451.html
Hao Luo (6):
bpf: Introduce pseudo_btf_id
bpf/libbpf: BTF support for typed ksyms
selftests/bpf: ksyms_btf to test typed ksyms
bpf: Introduce bpf_per_cpu_ptr()
bpf: Introducte bpf_this_cpu_ptr()
bpf/selftests: Test for bpf_per_cpu_ptr() and bpf_this_cpu_ptr()
include/linux/bpf.h | 6 +
include/linux/bpf_verifier.h | 7 +
include/linux/btf.h | 26 +++
include/uapi/linux/bpf.h | 67 +++++-
kernel/bpf/btf.c | 25 ---
kernel/bpf/helpers.c | 32 +++
kernel/bpf/verifier.c | 190 ++++++++++++++++--
kernel/trace/bpf_trace.c | 4 +
tools/include/uapi/linux/bpf.h | 67 +++++-
tools/lib/bpf/libbpf.c | 112 +++++++++--
.../testing/selftests/bpf/prog_tests/ksyms.c | 38 ++--
.../selftests/bpf/prog_tests/ksyms_btf.c | 88 ++++++++
.../selftests/bpf/progs/test_ksyms_btf.c | 55 +++++
tools/testing/selftests/bpf/trace_helpers.c | 27 +++
tools/testing/selftests/bpf/trace_helpers.h | 4 +
15 files changed, 653 insertions(+), 95 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
create mode 100644 tools/testing/selftests/bpf/progs/test_ksyms_btf.c
--
2.28.0.709.gb0816b6eb0-goog
This patch series is a result of discussion at the refcount_t BOF
the Linux Plumbers Conference. In this discussion, we identified
a need for looking closely and investigating atomic_t usages in
the kernel when it is used strictly as a counter without it
controlling object lifetimes and state changes.
There are a number of atomic_t usages in the kernel where atomic_t api
is used strictly for counting and not for managing object lifetime. In
some cases, atomic_t might not even be needed.
The purpose of these counters is twofold: 1. clearly differentiate
atomic_t counters from atomic_t usages that guard object lifetimes,
hence prone to overflow and underflow errors. It allows tools that scan
for underflow and overflow on atomic_t usages to detect overflow and
underflows to scan just the cases that are prone to errors. 2. provides
non-atomic counters for cases where atomic isn't necessary.
Simple atomic and non-atomic counters api provides interfaces for simple
atomic and non-atomic counters that just count, and don't guard resource
lifetimes. Counters will wrap around to 0 when it overflows and should
not be used to guard resource lifetimes, device usage and open counts
that control state changes, and pm states.
Using counter_atomic to guard lifetimes could lead to use-after free
when it overflows and undefined behavior when used to manage state
changes and device usage/open states.
This patch series introduces Simple atomic and non-atomic counters.
Counter atomic ops leverage atomic_t and provide a sub-set of atomic_t
ops.
In addition this patch series converts a few drivers to use the new api.
The following criteria is used for select variables for conversion:
1. Variable doesn't guard object lifetimes, manage state changes e.g:
device usage counts, device open counts, and pm states.
2. Variable is used for stats and counters.
3. The conversion doesn't change the overflow behavior.
Changes since RFC:
-- Thanks for reviews and reviewed-by, and Acked-by tags. Updated
the patches with the tags.
-- Addressed Kees's comments:
1. Non-atomic counters renamed to counter_simple32 and counter_simple64
to clearly indicate size.
2. Added warning for counter_simple* usage and it should be used only
when there is no need for atomicity.
3. Renamed counter_atomic to counter_atomic32 to clearly indicate size.
4. Renamed counter_atomic_long to counter_atomic64 and it now uses
atomic64_t ops and indicates size.
5. Test updated for the API renames.
6. Added helper functions for test results printing
7. Verified that the test module compiles in kunit env. and test
module can be loaded to run the test.
8. Updated Documentation to reflect the intent to make the API
restricted so it can never be used to guard object lifetimes
and state management. I left _return ops for now, inc_return
is necessary for now as per the discussion we had on this topic.
-- Updated driver patches with API name changes.
-- We discussed if binder counters can be non-atomic. For now I left
them the same as the RFC patch - using counter_atomic32
-- Unrelated to this patch series:
The patch series review uncovered improvements could be made to
test_async_driver_probe and vmw_vmci/vmci_guest. I will track
these for fixing later.
Shuah Khan (11):
counters: Introduce counter_simple* and counter_atomic* counters
selftests:lib:test_counters: add new test for counters
drivers/base: convert deferred_trigger_count and probe_count to
counter_atomic32
drivers/base/devcoredump: convert devcd_count to counter_atomic32
drivers/acpi: convert seqno counter_atomic32
drivers/acpi/apei: convert seqno counter_atomic32
drivers/android/binder: convert stats, transaction_log to
counter_atomic32
drivers/base/test/test_async_driver_probe: convert to use
counter_atomic32
drivers/char/ipmi: convert stats to use counter_atomic32
drivers/misc/vmw_vmci: convert num guest devices counter to
counter_atomic32
drivers/edac: convert pci counters to counter_atomic32
Documentation/core-api/counters.rst | 174 +++++++++
MAINTAINERS | 8 +
drivers/acpi/acpi_extlog.c | 5 +-
drivers/acpi/apei/ghes.c | 5 +-
drivers/android/binder.c | 41 +--
drivers/android/binder_internal.h | 3 +-
drivers/base/dd.c | 19 +-
drivers/base/devcoredump.c | 5 +-
drivers/base/test/test_async_driver_probe.c | 23 +-
drivers/char/ipmi/ipmi_msghandler.c | 9 +-
drivers/char/ipmi/ipmi_si_intf.c | 9 +-
drivers/edac/edac_pci.h | 5 +-
drivers/edac/edac_pci_sysfs.c | 28 +-
drivers/misc/vmw_vmci/vmci_guest.c | 9 +-
include/linux/counters.h | 350 +++++++++++++++++++
lib/Kconfig | 10 +
lib/Makefile | 1 +
lib/test_counters.c | 276 +++++++++++++++
tools/testing/selftests/lib/Makefile | 1 +
tools/testing/selftests/lib/config | 1 +
tools/testing/selftests/lib/test_counters.sh | 5 +
21 files changed, 913 insertions(+), 74 deletions(-)
create mode 100644 Documentation/core-api/counters.rst
create mode 100644 include/linux/counters.h
create mode 100644 lib/test_counters.c
create mode 100755 tools/testing/selftests/lib/test_counters.sh
--
2.25.1
These patch series adds below kselftests to test the user-space support for the
ARMv8.5 Memory Tagging Extension present in arm64 tree [1]. This patch
series is based on Linux v5.9-rc3.
1) This test-case verifies that the memory allocated by kernel mmap interface
can support tagged memory access. It first checks the presence of tags at
address[56:59] and then proceeds with read and write. The pass criteria for
this test is that tag fault exception should not happen.
2) This test-case crosses the valid memory to the invalid memory. In this
memory area valid tags are not inserted so read and write should not pass. The
pass criteria for this test is that tag fault exception should happen for all
the illegal addresses. This test also verfies that PSTATE.TCO works properly.
3) This test-case verifies that the memory inherited by child process from
parent process should have same tags copied. The pass criteria for this test is
that tag fault exception should not happen.
4) This test checks different mmap flags with PROT_MTE memory protection.
5) This testcase checks that KSM should not merge pages containing different
MTE tag values. However, if the tags are same then the pages may merge. This
testcase uses the generic ksm sysfs interfaces to verify the MTE behaviour, so
this testcase is not fullproof and may be impacted due to other load in the system.
6) Fifth test verifies that syscalls read/write etc works by considering that
user pointer has valid/invalid allocation tags.
Changes since v1 [2]:
* Redefined MTE kernel header definitions to decouple kselftest compilations.
* Removed gmi masking instructions in mte_insert_random_tag assembly
function. This simplifies the tag inclusion mask test with only GCR
mask register used.
* Created a new mte_insert_random_tag function with gmi instruction.
This is useful for the 6th test which reuses the original tag.
* Now use /dev/shm/* to hold temporary files.
* Updated the 6th test to handle the error properly in case of failure
in accessing memory with invalid tag in kernel.
* Code and comment clean-ups.
Thanks,
Amit Daniel
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/mte
[2]: https://patchwork.kernel.org/patch/11747791/
Amit Daniel Kachhap (6):
kselftest/arm64: Add utilities and a test to validate mte memory
kselftest/arm64: Verify mte tag inclusion via prctl
kselftest/arm64: Check forked child mte memory accessibility
kselftest/arm64: Verify all different mmap MTE options
kselftest/arm64: Verify KSM page merge for MTE pages
kselftest/arm64: Check mte tagged user address in kernel
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/mte/.gitignore | 6 +
tools/testing/selftests/arm64/mte/Makefile | 29 ++
.../selftests/arm64/mte/check_buffer_fill.c | 475 ++++++++++++++++++
.../selftests/arm64/mte/check_child_memory.c | 195 +++++++
.../selftests/arm64/mte/check_ksm_options.c | 159 ++++++
.../selftests/arm64/mte/check_mmap_options.c | 262 ++++++++++
.../arm64/mte/check_tags_inclusion.c | 185 +++++++
.../selftests/arm64/mte/check_user_mem.c | 111 ++++
.../selftests/arm64/mte/mte_common_util.c | 341 +++++++++++++
.../selftests/arm64/mte/mte_common_util.h | 118 +++++
tools/testing/selftests/arm64/mte/mte_def.h | 60 +++
.../testing/selftests/arm64/mte/mte_helper.S | 128 +++++
13 files changed, 2070 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/arm64/mte/.gitignore
create mode 100644 tools/testing/selftests/arm64/mte/Makefile
create mode 100644 tools/testing/selftests/arm64/mte/check_buffer_fill.c
create mode 100644 tools/testing/selftests/arm64/mte/check_child_memory.c
create mode 100644 tools/testing/selftests/arm64/mte/check_ksm_options.c
create mode 100644 tools/testing/selftests/arm64/mte/check_mmap_options.c
create mode 100644 tools/testing/selftests/arm64/mte/check_tags_inclusion.c
create mode 100644 tools/testing/selftests/arm64/mte/check_user_mem.c
create mode 100644 tools/testing/selftests/arm64/mte/mte_common_util.c
create mode 100644 tools/testing/selftests/arm64/mte/mte_common_util.h
create mode 100644 tools/testing/selftests/arm64/mte/mte_def.h
create mode 100644 tools/testing/selftests/arm64/mte/mte_helper.S
--
2.17.1
This version 3 of the mremap speed up patches previously posted at:
v1 - https://lore.kernel.org/r/20200930222130.4175584-1-kaleshsingh@google.com
v2 - https://lore.kernel.org/r/20201002162101.665549-1-kaleshsingh@google.com
mremap time can be optimized by moving entries at the PMD/PUD level if
the source and destination addresses are PMD/PUD-aligned and
PMD/PUD-sized. Enable moving at the PMD and PUD levels on arm64 and
x86. Other architectures where this type of move is supported and known to
be safe can also opt-in to these optimizations by enabling HAVE_MOVE_PMD
and HAVE_MOVE_PUD.
Observed Performance Improvements for remapping a PUD-aligned 1GB-sized
region on x86 and arm64:
- HAVE_MOVE_PMD is already enabled on x86 : N/A
- Enabling HAVE_MOVE_PUD on x86 : ~13x speed up
- Enabling HAVE_MOVE_PMD on arm64 : ~ 8x speed up
- Enabling HAVE_MOVE_PUD on arm64 : ~19x speed up
Altogether, HAVE_MOVE_PMD and HAVE_MOVE_PUD
give a total of ~150x speed up on arm64.
Changes in v2:
- Reduce mremap_test time by only validating a configurable
threshold of the remapped region, as per John.
- Use a random pattern for mremap validation. Provide pattern
seed in test output, as per John.
- Moved set_pud_at() to separate patch, per Kirill.
- Use switch() instead of ifs in move_pgt_entry(), per Kirill.
- Update commit message with description of Android
garbage collector use case for HAVE_MOVE_PUD, as per Joel.
- Fix build test error reported by kernel test robot in [1].
Changes in v3:
- Make lines 80 cols or less where they don’t need to be longer,
per John.
- Removed unused PATTERN_SIZE in mremap_test
- Added Reviewed-by tag for patch 1/5 (mremap kselftest patch).
- Use switch() instead of ifs in get_extent(), per Kirill
- Add BUILD_BUG() is get_extent() default case.
- Move get_old_pud() and alloc_new_pud() out of
#ifdef CONFIG_HAVE_MOVE_PUD, per Kirill.
- Have get_old_pmd() and alloc_new_pmd() use get_old_pud() and
alloc_old_pud(), per Kirill.
- Replace #ifdef CONFIG_HAVE_MOVE_PMD / PUD in move_page_tables()
with IS_ENABLED(CONFIG_HAVE_MOVE_PMD / PUD), per Kirill.
- Fold Add set_pud_at() patch into patch 4/5, per Kirill.
[1] https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org/thread/CKPGL4F…
Kalesh Singh (5):
kselftests: vm: Add mremap tests
arm64: mremap speedup - Enable HAVE_MOVE_PMD
mm: Speedup mremap on 1GB or larger regions
arm64: mremap speedup - Enable HAVE_MOVE_PUD
x86: mremap speedup - Enable HAVE_MOVE_PUD
arch/Kconfig | 7 +
arch/arm64/Kconfig | 2 +
arch/arm64/include/asm/pgtable.h | 1 +
arch/x86/Kconfig | 1 +
mm/mremap.c | 230 ++++++++++++---
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 1 +
tools/testing/selftests/vm/mremap_test.c | 344 +++++++++++++++++++++++
tools/testing/selftests/vm/run_vmtests | 11 +
9 files changed, 558 insertions(+), 40 deletions(-)
create mode 100644 tools/testing/selftests/vm/mremap_test.c
base-commit: 549738f15da0e5a00275977623be199fbbf7df50
--
2.28.0.806.g8561365e88-goog
v4: https://lkml.org/lkml/2020/9/2/356
v4-->v5
Based on comments from Artem Bityutskiy, evaluation of timer based
wakeup latencies may not be a fruitful measurement especially on the x86
platform which has the capability to pre-arm a CPU when a timer is set.
Hence, including only the IPI based tests for latency measurement to
acheive expected behaviour across platforms.
kernel module + bash selftest approach which presents lower deviations
and higher accuracy: https://lkml.org/lkml/2020/7/21/567
---
The patch series introduces a mechanism to measure wakeup latency for
IPI based interrupts.
The motivation behind this series is to find significant deviations
behind advertised latency values
To achieve this in the userspace, IPI latencies are calculated by
sending information through pipes and inducing a wakeup.
To account for delays from kernel-userspace interactions baseline
observations are taken on a 100% busy CPU and subsequent obervations
must be considered relative to that.
One downside of the userspace approach in contrast to the kernel
implementation is that the run to run variance can turn out to be high
in the order of ms; which is the scope of the experiments at times.
Another downside of the userspace approach is that it takes much longer
to run and hence a command-line option quick and full are added to make
sure quick 1 CPU tests can be carried out when needed and otherwise it
can carry out a full system comprehensive test.
Usage
---
./cpuidle --mode <full / quick / num_cpus> --output <output location>
full: runs on all CPUS
quick: run on a random CPU
num_cpus: Limit the number of CPUS to run on
Sample output snippet
---------------------
--IPI Latency Test---
SRC_CPU DEST_CPU IPI_Latency(ns)
...
0 5 256178
0 6 478161
0 7 285445
0 8 273553
Expected IPI latency(ns): 100000
Observed Average IPI latency(ns): 248334
Pratik Rajesh Sampat (1):
selftests/cpuidle: Add support for cpuidle latency measurement
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/cpuidle/Makefile | 7 +
tools/testing/selftests/cpuidle/cpuidle.c | 479 ++++++++++++++++++++++
tools/testing/selftests/cpuidle/settings | 1 +
4 files changed, 488 insertions(+)
create mode 100644 tools/testing/selftests/cpuidle/Makefile
create mode 100644 tools/testing/selftests/cpuidle/cpuidle.c
create mode 100644 tools/testing/selftests/cpuidle/settings
--
2.26.2
Changes since v1:
* check_config.sh now invokes the compiler via the Makefile's ($CC),
thanks to Jason Gunthorpe for calling that out.
* Removed a misleading sentence from patch #6, as identified by Ira
Weiny.
* Removed a forward-looking sentence, about using -lpthread in
gup_test.c soon, from the commit message in patch #4, since I'm not yet
sure if my local pthread-based stress tests are actually worthwhile or
not.
Original cover letter, still accurate at this point:
This is based on the latest mmotm.
Summary: This series provides two main things, and a number of smaller
supporting goodies. The two main points are:
1) Add a new sub-test to gup_test, which in turn is a renamed version of
gup_benchmark. This sub-test allows nicer testing of dump_pages(), at
least on user-space pages.
For quite a while, I was doing a quick hack to gup_test.c whenever I
wanted to try out changes to dump_page(). Then Matthew Wilcox asked me
what I meant when I said "I used my dump_page() unit test", and I
realized that it might be nice to check in a polished up version of
that.
Details about how it works and how to use it are in the commit
description for patch #6.
2) Fixes a limitation of hmm-tests: these tests are incredibly useful,
but only if people actually build and run them. And it turns out that
libhugetlbfs is a little too effective at throwing a wrench in the
works, there. So I've added a little configuration check that removes
just two of the 21 hmm-tests, if libhugetlbfs is not available.
Further details in the commit description of patch #8.
Other smaller things that this series does:
a) Remove code duplication by creating gup_test.h.
b) Clear up the sub-test organization, and their invocation within
run_vmtests.sh.
c) Other minor assorted improvements.
John Hubbard (8):
mm/gup_benchmark: rename to mm/gup_test
selftests/vm: use a common gup_test.h
selftests/vm: rename run_vmtests --> run_vmtests.sh
selftests/vm: minor cleanup: Makefile and gup_test.c
selftests/vm: only some gup_test items are really benchmarks
selftests/vm: gup_test: introduce the dump_pages() sub-test
selftests/vm: run_vmtest.sh: update and clean up gup_test invocation
selftests/vm: hmm-tests: remove the libhugetlbfs dependency
Documentation/core-api/pin_user_pages.rst | 6 +-
arch/s390/configs/debug_defconfig | 2 +-
arch/s390/configs/defconfig | 2 +-
mm/Kconfig | 21 +-
mm/Makefile | 2 +-
mm/{gup_benchmark.c => gup_test.c} | 109 ++++++----
mm/gup_test.h | 32 +++
tools/testing/selftests/vm/.gitignore | 3 +-
tools/testing/selftests/vm/Makefile | 38 +++-
tools/testing/selftests/vm/check_config.sh | 31 +++
tools/testing/selftests/vm/config | 2 +-
tools/testing/selftests/vm/gup_benchmark.c | 137 -------------
tools/testing/selftests/vm/gup_test.c | 188 ++++++++++++++++++
tools/testing/selftests/vm/hmm-tests.c | 10 +-
.../vm/{run_vmtests => run_vmtest.sh} | 24 ++-
15 files changed, 404 insertions(+), 203 deletions(-)
rename mm/{gup_benchmark.c => gup_test.c} (59%)
create mode 100644 mm/gup_test.h
create mode 100755 tools/testing/selftests/vm/check_config.sh
delete mode 100644 tools/testing/selftests/vm/gup_benchmark.c
create mode 100644 tools/testing/selftests/vm/gup_test.c
rename tools/testing/selftests/vm/{run_vmtests => run_vmtest.sh} (91%)
--
2.28.0
This version 2 of the mremap speed up patches previously posted at:
https://lore.kernel.org/r/20200930222130.4175584-1-kaleshsingh@google.com
mremap time can be optimized by moving entries at the PMD/PUD level if
the source and destination addresses are PMD/PUD-aligned and
PMD/PUD-sized. Enable moving at the PMD and PUD levels on arm64 and
x86. Other architectures where this type of move is supported and known to
be safe can also opt-in to these optimizations by enabling HAVE_MOVE_PMD
and HAVE_MOVE_PUD.
Observed Performance Improvements for remapping a PUD-aligned 1GB-sized
region on x86 and arm64:
- HAVE_MOVE_PMD is already enabled on x86 : N/A
- Enabling HAVE_MOVE_PUD on x86 : ~13x speed up
- Enabling HAVE_MOVE_PMD on arm64 : ~ 8x speed up
- Enabling HAVE_MOVE_PUD on arm64 : ~19x speed up
Altogether, HAVE_MOVE_PMD and HAVE_MOVE_PUD
give a total of ~150x speed up on arm64.
Changes in v2:
- Reduce mremap_test time by only validating a configurable
threshold of the remapped region, as per John.
- Use a random pattern for mremap validation. Provide pattern
seed in test output, as per John.
- Moved set_pud_at() to separate patch, per Kirill.
- Use switch() instead of ifs in move_pgt_entry(), per Kirill.
- Update commit message with description of Android
garbage collector use case for HAVE_MOVE_PUD, as per Joel.
- Fix build test error reported by kernel test robot in [1].
[1] https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org/thread/CKPGL4F…
Kalesh Singh (6):
kselftests: vm: Add mremap tests
arm64: mremap speedup - Enable HAVE_MOVE_PMD
mm: Speedup mremap on 1GB or larger regions
arm64: Add set_pud_at() functions
arm64: mremap speedup - Enable HAVE_MOVE_PUD
x86: mremap speedup - Enable HAVE_MOVE_PUD
arch/Kconfig | 7 +
arch/arm64/Kconfig | 2 +
arch/arm64/include/asm/pgtable.h | 1 +
arch/x86/Kconfig | 1 +
mm/mremap.c | 220 +++++++++++++--
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 1 +
tools/testing/selftests/vm/mremap_test.c | 333 +++++++++++++++++++++++
tools/testing/selftests/vm/run_vmtests | 11 +
9 files changed, 547 insertions(+), 30 deletions(-)
create mode 100644 tools/testing/selftests/vm/mremap_test.c
base-commit: 472e5b056f000a778abb41f1e443de58eb259783
--
2.28.0.806.g8561365e88-goog
Hi,
Maybe this should really be an RFC, given that I don't fully understand
why the compaction_test.c program was mmap'ing 1 MB at a time. So
apologies in advance if I've mucked up something important, but if so,
maybe we can still find a way to get this fixed up to something better.
Anyway: there are 20+ tests in tools/testing/selftests/vm/. The entire
running time for these (via run_vmtest.sh) is about 56 seconds, of which
over half is due to just one test: compaction_test, which takes 27 sec!
(A runner-up is HMM, at 11 sec, so it's up for a look next.) The other
tests mostly take a few ms, and a few take 1.0 sec.
This drops the compaction_test run time from 27, to 3.3 sec. Enjoy. :)
thanks,
John Hubbard
NVIDIA
John Hubbard (1):
selftests/vm: 8x compaction_test speedup
tools/testing/selftests/vm/compaction_test.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
--
2.28.0
From: Colin Ian King <colin.king(a)canonical.com>
More recent libc implementations are now using openat/openat2 system
calls so also add do_sys_openat2 to the tracing so that the test
passes on these systems because do_sys_open may not be called.
Signed-off-by: Colin Ian King <colin.king(a)canonical.com>
---
.../testing/selftests/ftrace/test.d/kprobe/kprobe_args_user.tc | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_user.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_user.tc
index a30a9c07290d..cf1b4c3e9e6b 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_user.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_user.tc
@@ -9,6 +9,8 @@ grep -A10 "fetcharg:" README | grep -q '\[u\]<offset>' || exit_unsupported
:;: "user-memory access syntax and ustring working on user memory";:
echo 'p:myevent do_sys_open path=+0($arg2):ustring path2=+u0($arg2):string' \
> kprobe_events
+echo 'p:myevent2 do_sys_openat2 path=+0($arg2):ustring path2=+u0($arg2):string' \
+ > kprobe_events
grep myevent kprobe_events | \
grep -q 'path=+0($arg2):ustring path2=+u0($arg2):string'
--
2.27.0
mremap time can be optimized by moving entries at the PMD/PUD level if
the source and destination addresses are PMD/PUD-aligned and
PMD/PUD-sized. Enable moving at the PMD and PUD levels on arm64 and
x86. Other architectures where this type of move is supported and known to
be safe can also opt-in to these optimizations by enabling HAVE_MOVE_PMD
and HAVE_MOVE_PUD.
Observed Performance Improvements for remapping a PUD-aligned 1GB-sized
region on x86 and arm64:
- HAVE_MOVE_PMD is already enabled on x86 : N/A
- Enabling HAVE_MOVE_PUD on x86 : ~13x speed up
- Enabling HAVE_MOVE_PMD on arm64 : ~ 8x speed up
- Enabling HAVE_MOVE_PUD on arm64 : ~19x speed up
Altogether, HAVE_MOVE_PMD and HAVE_MOVE_PUD
give a total of ~150x speed up on arm64.
Kalesh Singh (5):
kselftests: vm: Add mremap tests
arm64: mremap speedup - Enable HAVE_MOVE_PMD
mm: Speedup mremap on 1GB or larger regions
arm64: mremap speedup - Enable HAVE_MOVE_PUD
x86: mremap speedup - Enable HAVE_MOVE_PUD
arch/Kconfig | 7 +
arch/arm64/Kconfig | 2 +
arch/arm64/include/asm/pgtable.h | 1 +
arch/x86/Kconfig | 1 +
mm/mremap.c | 211 +++++++++++++++++---
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 1 +
tools/testing/selftests/vm/mremap_test.c | 243 +++++++++++++++++++++++
tools/testing/selftests/vm/run_vmtests | 11 +
9 files changed, 448 insertions(+), 30 deletions(-)
create mode 100644 tools/testing/selftests/vm/mremap_test.c
--
2.28.0.709.gb0816b6eb0-goog
# Background
KUnit currently lacks any first-class support for mocking.
For an overview and discussion on the pros and cons, see
https://martinfowler.com/articles/mocksArentStubs.html
This patch set introduces the basic machinery needed for mocking:
setting and validating expectations, setting default actions, etc.
Using that basic infrastructure, we add macros for "class mocking", as
it's probably the easiest type of mocking to start with.
## Class mocking
By "class mocking", we're referring mocking out function pointers stored
in structs like:
struct sender {
int (*send)(struct sender *sender, int data);
};
After the necessary DEFINE_* macros, we can then write code like
struct MOCK(sender) mock_sender = CONSTRUCT_MOCK(sender, test);
/* Fake an error for a specific input. */
handle = KUNIT_EXPECT_CALL(send(<omitted>, kunit_int_eq(42)));
handle->action = kunit_int_return(test, -EINVAL);
/* Pass the mocked object to some code under test. */
KUNIT_EXPECT_EQ(test, -EINVAL, send_message(...));
I.e. the goal is to make it easier to test
1) with less dependencies (we don't need to setup a real `sender`)
2) unusual/error conditions more easily.
In the future, we hope to build upon this to support mocking in more
contexts, e.g. standalone funcs, etc.
# TODOs
## Naming
This introduces a number of new macros for dealing with mocks,
e.g:
DEFINE_STRUCT_CLASS_MOCK(METHOD(foo), CLASS(example),
RETURNS(int),
PARAMS(struct example *, int));
...
KUNIT_EXPECT_CALL(foo(mock_get_ctrl(mock_example), ...);
For consistency, we could prefix everything with KUNIT, e.g.
`KUNIT_DEFINE_STRUCT_CLASS_MOCK` and `kunit_mock_get_ctrl`, but it feels
like the names might be long enough that they would hinder readability.
## Usage
For now the only use of class mocking is in kunit-example-test.c
As part of changing this from an RFC to a real patch set, we're hoping
to include at least one example.
Pointers to bits of code where this would be useful that aren't too
hairy would be appreciated.
E.g. could easily add a test for tools/perf/ui/progress.h, e.g. that
ui_progress__init() calls ui_progress_ops.init(), but that likely isn't
useful to anyone.
Brendan Higgins (9):
kunit: test: add kunit_stream a std::stream like logger
kunit: test: add concept of post conditions
checkpatch: add support for struct MOCK(foo) syntax
kunit: mock: add parameter list manipulation macros
kunit: mock: add internal mock infrastructure
kunit: mock: add basic matchers and actions
kunit: mock: add class mocking support
kunit: mock: add struct param matcher
kunit: mock: implement nice, strict and naggy mock distinctions
Daniel Latypov (2):
Revert "kunit: move string-stream.h to lib/kunit"
kunit: expose kunit_set_failure() for use by mocking
Marcelo Schmitt (1):
kunit: mock: add macro machinery to pick correct format args
include/kunit/assert.h | 3 +-
include/kunit/kunit-stream.h | 94 +++
include/kunit/mock.h | 902 +++++++++++++++++++++++++
include/kunit/params.h | 305 +++++++++
{lib => include}/kunit/string-stream.h | 2 +
include/kunit/test.h | 9 +
lib/kunit/Makefile | 9 +-
lib/kunit/assert.c | 2 -
lib/kunit/common-mocks.c | 409 +++++++++++
lib/kunit/kunit-example-test.c | 90 +++
lib/kunit/kunit-stream.c | 110 +++
lib/kunit/mock-macro-test.c | 241 +++++++
lib/kunit/mock-test.c | 531 +++++++++++++++
lib/kunit/mock.c | 370 ++++++++++
lib/kunit/string-stream-test.c | 3 +-
lib/kunit/string-stream.c | 5 +-
lib/kunit/test.c | 15 +-
scripts/checkpatch.pl | 4 +
18 files changed, 3091 insertions(+), 13 deletions(-)
create mode 100644 include/kunit/kunit-stream.h
create mode 100644 include/kunit/mock.h
create mode 100644 include/kunit/params.h
rename {lib => include}/kunit/string-stream.h (95%)
create mode 100644 lib/kunit/common-mocks.c
create mode 100644 lib/kunit/kunit-stream.c
create mode 100644 lib/kunit/mock-macro-test.c
create mode 100644 lib/kunit/mock-test.c
create mode 100644 lib/kunit/mock.c
base-commit: 10b82d5176488acee2820e5a2cf0f2ec5c3488b6
--
2.28.0.681.g6f77f65b4e-goog
CalledProcessError stores the output of the failed process as `bytes`,
not a `str`.
So when we log it on build error, the make output is all crammed into
one line with "\n" instead of actually printing new lines.
After this change, we get readable output with new lines, e.g.
> CC lib/kunit/kunit-example-test.o
> In file included from ../lib/kunit/test.c:9:
> ../include/kunit/test.h:22:1: error: unknown type name ‘invalid_type_that_causes_compile’
> 22 | invalid_type_that_causes_compile errors;
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> make[3]: *** [../scripts/Makefile.build:283: lib/kunit/test.o] Error 1
Secondly, trying to concat exceptions to strings will fail with
> TypeError: can only concatenate str (not "OSError") to str
so fix this with an explicit cast to str.
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
tools/testing/kunit/kunit_kernel.py | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
index e20e2056cb38..0e19089f62f0 100644
--- a/tools/testing/kunit/kunit_kernel.py
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -36,9 +36,9 @@ class LinuxSourceTreeOperations(object):
try:
subprocess.check_output(['make', 'mrproper'], stderr=subprocess.STDOUT)
except OSError as e:
- raise ConfigError('Could not call make command: ' + e)
+ raise ConfigError('Could not call make command: ' + str(e))
except subprocess.CalledProcessError as e:
- raise ConfigError(e.output)
+ raise ConfigError(e.output.decode())
def make_olddefconfig(self, build_dir, make_options):
command = ['make', 'ARCH=um', 'olddefconfig']
@@ -49,9 +49,9 @@ class LinuxSourceTreeOperations(object):
try:
subprocess.check_output(command, stderr=subprocess.STDOUT)
except OSError as e:
- raise ConfigError('Could not call make command: ' + e)
+ raise ConfigError('Could not call make command: ' + str(e))
except subprocess.CalledProcessError as e:
- raise ConfigError(e.output)
+ raise ConfigError(e.output.decode())
def make_allyesconfig(self):
kunit_parser.print_with_timestamp(
@@ -79,9 +79,9 @@ class LinuxSourceTreeOperations(object):
try:
subprocess.check_output(command, stderr=subprocess.STDOUT)
except OSError as e:
- raise BuildError('Could not call execute make: ' + e)
+ raise BuildError('Could not call execute make: ' + str(e))
except subprocess.CalledProcessError as e:
- raise BuildError(e.output)
+ raise BuildError(e.output.decode())
def linux_bin(self, params, timeout, build_dir, outfile):
"""Runs the Linux UML binary. Must be named 'linux'."""
base-commit: ccc1d052eff9f3cfe59d201263903fe1d46c79a5
--
2.28.0.709.gb0816b6eb0-goog
v2 -> v3:
- Rename functions and variables in verifier for better readability.
- Stick to logging message convention in libbpf.
- Move bpf_per_cpu_ptr and bpf_this_cpu_ptr from trace-specific
helper set to base helper set.
- More specific test in ksyms_btf.
- Fix return type cast in bpf_*_cpu_ptr.
- Fix btf leak in ksyms_btf selftest.
- Fix return error code for kallsyms_find().
v1 -> v2:
- Move check_pseudo_btf_id from check_ld_imm() to
replace_map_fd_with_map_ptr() and rename the latter.
- Add bpf_this_cpu_ptr().
- Use bpf_core_types_are_compat() in libbpf.c for checking type
compatibility.
- Rewrite typed ksym extern type in BTF with int to save space.
- Minor revision of bpf_per_cpu_ptr()'s comments.
- Avoid using long in tests that use skeleton.
- Refactored test_ksyms.c by moving kallsyms_find() to trace_helpers.c
- Fold the patches that sync include/linux/uapi and
tools/include/linux/uapi.
rfc -> v1:
- Encode VAR's btf_id for PSEUDO_BTF_ID.
- More checks in verifier. Checking the btf_id passed as
PSEUDO_BTF_ID is valid VAR, its name and type.
- Checks in libbpf on type compatibility of ksyms.
- Add bpf_per_cpu_ptr() to access kernel percpu vars. Introduced
new ARG and RET types for this helper.
This patch series extends the previously added __ksym externs with
btf support.
Right now the __ksym externs are treated as pure 64-bit scalar value.
Libbpf replaces ld_imm64 insn of __ksym by its kernel address at load
time. This patch series extend those externs with their btf info. Note
that btf support for __ksym must come with the kernel btf that has
VARs encoded to work properly. The corresponding chagnes in pahole
is available at [1] (with a fix at [2] for gcc 4.9+).
The first 3 patches in this series add support for general kernel
global variables, which include verifier checking (01/06), libpf
support (02/06) and selftests for getting typed ksym extern's kernel
address (03/06).
The next 3 patches extends that capability further by introducing
helpers bpf_per_cpu_ptr() and bpf_this_cpu_ptr(), which allows accessing
kernel percpu variables correctly (04/06 and 05/06).
The tests of this feature were performed against pahole that is extended
with [1] and [2]. For kernel BTF that does not have VARs encoded, the
selftests will be skipped.
[1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=f3d9054ba…
[2] https://www.spinics.net/lists/dwarves/msg00451.html
Hao Luo (6):
bpf: Introduce pseudo_btf_id
bpf/libbpf: BTF support for typed ksyms
selftests/bpf: ksyms_btf to test typed ksyms
bpf: Introduce bpf_per_cpu_ptr()
bpf: Introduce bpf_this_cpu_ptr()
bpf/selftests: Test for bpf_per_cpu_ptr() and bpf_this_cpu_ptr()
include/linux/bpf.h | 6 +
include/linux/bpf_verifier.h | 7 +
include/linux/btf.h | 26 +++
include/uapi/linux/bpf.h | 67 +++++-
kernel/bpf/btf.c | 25 ---
kernel/bpf/helpers.c | 32 +++
kernel/bpf/verifier.c | 190 ++++++++++++++++--
kernel/trace/bpf_trace.c | 4 +
tools/include/uapi/linux/bpf.h | 67 +++++-
tools/lib/bpf/libbpf.c | 112 +++++++++--
.../testing/selftests/bpf/prog_tests/ksyms.c | 38 ++--
.../selftests/bpf/prog_tests/ksyms_btf.c | 88 ++++++++
.../selftests/bpf/progs/test_ksyms_btf.c | 55 +++++
tools/testing/selftests/bpf/trace_helpers.c | 27 +++
tools/testing/selftests/bpf/trace_helpers.h | 4 +
15 files changed, 653 insertions(+), 95 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
create mode 100644 tools/testing/selftests/bpf/progs/test_ksyms_btf.c
--
2.28.0.618.gf4bc123cb7-goog
This is based on the latest mmotm.
Summary: This series provides two main things, and a number of smaller
supporting goodies. The two main points are:
1) Add a new sub-test to gup_test, which in turn is a renamed version of
gup_benchmark. This sub-test allows nicer testing of dump_pages(), at
least on user-space pages.
For quite a while, I was doing a quick hack to gup_test.c whenever I
wanted to try out changes to dump_page(). Then Matthew Wilcox asked me
what I meant when I said "I used my dump_page() unit test", and I
realized that it might be nice to check in a polished up version of
that.
Details about how it works and how to use it are in the commit
description for patch #6.
2) Fixes a limitation of hmm-tests: these tests are incredibly useful,
but only if people actually build and run them. And it turns out that
libhugetlbfs is a little too effective at throwing a wrench in the
works, there. So I've added a little configuration check that removes
just two of the 21 hmm-tests, if libhugetlbfs is not available.
Further details in the commit description of patch #8.
Other smaller things that this series does:
a) Remove code duplication by creating gup_test.h.
b) Clear up the sub-test organization, and their invocation within
run_vmtests.sh.
c) Other minor assorted improvements.
John Hubbard (8):
mm/gup_benchmark: rename to mm/gup_test
selftests/vm: use a common gup_test.h
selftests/vm: rename run_vmtests --> run_vmtests.sh
selftests/vm: minor cleanup: Makefile and gup_test.c
selftests/vm: only some gup_test items are really benchmarks
selftests/vm: gup_test: introduce the dump_pages() sub-test
selftests/vm: run_vmtest.sh: update and clean up gup_test invocation
selftests/vm: hmm-tests: remove the libhugetlbfs dependency
Documentation/core-api/pin_user_pages.rst | 6 +-
arch/s390/configs/debug_defconfig | 2 +-
arch/s390/configs/defconfig | 2 +-
mm/Kconfig | 21 +-
mm/Makefile | 2 +-
mm/{gup_benchmark.c => gup_test.c} | 109 ++++++----
mm/gup_test.h | 32 +++
tools/testing/selftests/vm/.gitignore | 3 +-
tools/testing/selftests/vm/Makefile | 38 +++-
tools/testing/selftests/vm/check_config.sh | 30 +++
tools/testing/selftests/vm/config | 2 +-
tools/testing/selftests/vm/gup_benchmark.c | 137 -------------
tools/testing/selftests/vm/gup_test.c | 188 ++++++++++++++++++
tools/testing/selftests/vm/hmm-tests.c | 10 +-
.../vm/{run_vmtests => run_vmtest.sh} | 24 ++-
15 files changed, 403 insertions(+), 203 deletions(-)
rename mm/{gup_benchmark.c => gup_test.c} (59%)
create mode 100644 mm/gup_test.h
create mode 100755 tools/testing/selftests/vm/check_config.sh
delete mode 100644 tools/testing/selftests/vm/gup_benchmark.c
create mode 100644 tools/testing/selftests/vm/gup_test.c
rename tools/testing/selftests/vm/{run_vmtests => run_vmtest.sh} (91%)
--
2.28.0
On Mon, Sep 14 2020 at 15:24, Linus Torvalds wrote:
> On Mon, Sep 14, 2020 at 2:55 PM Thomas Gleixner <tglx(a)linutronix.de> wrote:
>>
>> Yes it does generate better code, but I tried hard to spot a difference
>> in various metrics exposed by perf. It's all in the noise and I only
>> can spot a difference when the actual preemption check after the
>> decrement
>
> I'm somewhat more worried about the small-device case.
I just checked on one of my old UP ARM toys which I run at home. The .text
increase is about 2% (75k) and none of the tests I ran showed any
significant difference. Couldn't verify with perf though as the PMU on
that piece of art is unusable.
> That said, the diffstat certainly has its very clear charm, and I do
> agree that it makes things simpler.
>
> I'm just not convinced people should ever EVER do things like that "if
> (preemptible())" garbage. It sounds like somebody is doing seriously
> bad things.
OTOH, having a working 'preemptible()' or maybe better named
'can_schedule()' check makes tons of sense to make decisions about
allocation modes or other things.
We're currently looking through all of in_atomic(), in_interrupt()
etc. usage sites and quite some of them are historic and have the clear
intent of checking whether the code is called from task context or
hard/softirq context. Lots of them are completely broken or just work by
chance.
But there is clearly historic precendence that context checks are
useful, but they only can be useful if we have a consistent mechanism
which works everywhere.
Of course we could mandate that every interface which might be called
from one or the other context has a context argument or provides two
variants of the same thing. But I'm not really convinced whether that's
a win over having a consistent and reliable set of checks.
Thanks,
tglx
This series attempts to provide a simple way for BPF programs (and in
future other consumers) to utilize BPF Type Format (BTF) information
to display kernel data structures in-kernel. The use case this
functionality is applied to here is to support a snprintf()-like
helper to copy a BTF representation of kernel data to a string,
and a BPF seq file helper to display BTF data for an iterator.
There is already support in kernel/bpf/btf.c for "show" functionality;
the changes here generalize that support from seq-file specific
verifier display to the more generic case and add another specific
use case; rather than seq_printf()ing the show data, it is copied
to a supplied string using a snprintf()-like function. Other future
consumers of the show functionality could include a bpf_printk_btf()
function which printk()ed the data instead. Oops messaging in
particular would be an interesting application for such functionality.
The above potential use case hints at a potential reply to
a reasonable objection that such typed display should be
solved by tracing programs, where the in-kernel tracing records
data and the userspace program prints it out. While this
is certainly the recommended approach for most cases, I
believe having an in-kernel mechanism would be valuable
also. Critically in BPF programs it greatly simplifies
debugging and tracing of such data to invoking a simple
helper.
One challenge raised in an earlier iteration of this work -
where the BTF printing was implemented as a printk() format
specifier - was that the amount of data printed per
printk() was large, and other format specifiers were far
simpler. Here we sidestep that concern by printing
components of the BTF representation as we go for the
seq file case, and in the string case the snprintf()-like
operation is intended to be a basis for perf event or
ringbuf output. The reasons for avoiding bpf_trace_printk
are that
1. bpf_trace_printk() strings are restricted in size and
cannot display anything beyond trivial data structures; and
2. bpf_trace_printk() is for debugging purposes only.
As Alexei suggested, a bpf_trace_puts() helper could solve
this in the future but it still would be limited by the
1000 byte limit for traced strings.
Default output for an sk_buff looks like this (zeroed fields
are omitted):
(struct sk_buff){
.transport_header = (__u16)65535,
.mac_header = (__u16)65535,
.end = (sk_buff_data_t)192,
.head = (unsigned char *)0x000000007524fd8b,
.data = (unsigned char *)0x000000007524fd8b,
.truesize = (unsigned int)768,
.users = (refcount_t){
.refs = (atomic_t){
.counter = (int)1,
},
},
}
Flags can modify aspects of output format; see patch 3
for more details.
Changes since v6:
- Updated safe data size to 32, object name size to 80.
This increases the number of safe copies done, but performance is
not a key goal here. WRT name size the largest type name length
in bpf-next according to "pahole -s" is 64 bytes, so that still gives
room for additional type qualifiers, parens etc within the name limit
(Alexei, patch 2)
- Remove inlines and converted as many #defines to functions as was
possible. In a few cases - btf_show_type_value[s]() specifically -
I left these as macros as btf_show_type_value[s]() prepends and
appends format strings to the format specifier (in order to include
indentation, delimiters etc so a macro makes that simpler (Alexei,
patch 2)
- Handle btf_resolve_size() error in btf_show_obj_safe() (Alexei, patch 2)
- Removed clang loop unroll in BTF snprintf test (Alexei)
- switched to using bpf_core_type_id_kernel(type) as suggested by Andrii,
and Alexei noted that __builtin_btf_type_id(,1) should be used (patch 4)
- Added skip logic if __builtin_btf_type_id is not available (patches 4,8)
- Bumped limits on bpf iters to support printing larger structures (Alexei,
patch 5)
- Updated overflow bpf_iter tests to reflect new iter max size (patch 6)
- Updated seq helper to use type id only (Alexei, patch 7)
- Updated BTF task iter test to use task struct instead of struct fs_struct
since new limits allow a task_struct to be displayed (patch 8)
- Fixed E2BIG handling in iter task (Alexei, patch 8)
Changes since v5:
- Moved btf print prepare into patch 3, type show seq
with flags into patch 2 (Alexei, patches 2,3)
- Fixed build bot warnings around static declarations
and printf attributes
- Renamed functions to snprintf_btf/seq_printf_btf
(Alexei, patches 3-6)
Changes since v4:
- Changed approach from a BPF trace event-centric design to one
utilizing a snprintf()-like helper and an iter helper (Alexei,
patches 3,5)
- Added tests to verify BTF output (patch 4)
- Added support to tests for verifying BTF type_id-based display
as well as type name via __builtin_btf_type_id (Andrii, patch 4).
- Augmented task iter tests to cover the BTF-based seq helper.
Because a task_struct's BTF-based representation would overflow
the PAGE_SIZE limit on iterator data, the "struct fs_struct"
(task->fs) is displayed for each task instead (Alexei, patch 6).
Changes since v3:
- Moved to RFC since the approach is different (and bpf-next is
closed)
- Rather than using a printk() format specifier as the means
of invoking BTF-enabled display, a dedicated BPF helper is
used. This solves the issue of printk() having to output
large amounts of data using a complex mechanism such as
BTF traversal, but still provides a way for the display of
such data to be achieved via BPF programs. Future work could
include a bpf_printk_btf() function to invoke display via
printk() where the elements of a data structure are printk()ed
one at a time. Thanks to Petr Mladek, Andy Shevchenko and
Rasmus Villemoes who took time to look at the earlier printk()
format-specifier-focused version of this and provided feedback
clarifying the problems with that approach.
- Added trace id to the bpf_trace_printk events as a means of
separating output from standard bpf_trace_printk() events,
ensuring it can be easily parsed by the reader.
- Added bpf_trace_btf() helper tests which do simple verification
of the various display options.
Changes since v2:
- Alexei and Yonghong suggested it would be good to use
probe_kernel_read() on to-be-shown data to ensure safety
during operation. Safe copy via probe_kernel_read() to a
buffer object in "struct btf_show" is used to support
this. A few different approaches were explored
including dynamic allocation and per-cpu buffers. The
downside of dynamic allocation is that it would be done
during BPF program execution for bpf_trace_printk()s using
%pT format specifiers. The problem with per-cpu buffers
is we'd have to manage preemption and since the display
of an object occurs over an extended period and in printk
context where we'd rather not change preemption status,
it seemed tricky to manage buffer safety while considering
preemption. The approach of utilizing stack buffer space
via the "struct btf_show" seemed like the simplest approach.
The stack size of the associated functions which have a
"struct btf_show" on their stack to support show operation
(btf_type_snprintf_show() and btf_type_seq_show()) stays
under 500 bytes. The compromise here is the safe buffer we
use is small - 256 bytes - and as a result multiple
probe_kernel_read()s are needed for larger objects. Most
objects of interest are smaller than this (e.g.
"struct sk_buff" is 224 bytes), and while task_struct is a
notable exception at ~8K, performance is not the priority for
BTF-based display. (Alexei and Yonghong, patch 2).
- safe buffer use is the default behaviour (and is mandatory
for BPF) but unsafe display - meaning no safe copy is done
and we operate on the object itself - is supported via a
'u' option.
- pointers are prefixed with 0x for clarity (Alexei, patch 2)
- added additional comments and explanations around BTF show
code, especially around determining whether objects such
zeroed. Also tried to comment safe object scheme used. (Yonghong,
patch 2)
- added late_initcall() to initialize vmlinux BTF so that it would
not have to be initialized during printk operation (Alexei,
patch 5)
- removed CONFIG_BTF_PRINTF config option as it is not needed;
CONFIG_DEBUG_INFO_BTF can be used to gate test behaviour and
determining behaviour of type-based printk can be done via
retrieval of BTF data; if it's not there BTF was unavailable
or broken (Alexei, patches 4,6)
- fix bpf_trace_printk test to use vmlinux.h and globals via
skeleton infrastructure, removing need for perf events
(Andrii, patch 8)
Changes since v1:
- changed format to be more drgn-like, rendering indented type info
along with type names by default (Alexei)
- zeroed values are omitted (Arnaldo) by default unless the '0'
modifier is specified (Alexei)
- added an option to print pointer values without obfuscation.
The reason to do this is the sysctls controlling pointer display
are likely to be irrelevant in many if not most tracing contexts.
Some questions on this in the outstanding questions section below...
- reworked printk format specifer so that we no longer rely on format
%pT<type> but instead use a struct * which contains type information
(Rasmus). This simplifies the printk parsing, makes use more dynamic
and also allows specification by BTF id as well as name.
- removed incorrect patch which tried to fix dereferencing of resolved
BTF info for vmlinux; instead we skip modifiers for the relevant
case (array element type determination) (Alexei).
- fixed issues with negative snprintf format length (Rasmus)
- added test cases for various data structure formats; base types,
typedefs, structs, etc.
- tests now iterate through all typedef, enum, struct and unions
defined for vmlinux BTF and render a version of the target dummy
value which is either all zeros or all 0xff values; the idea is this
exercises the "skip if zero" and "print everything" cases.
- added support in BPF for using the %pT format specifier in
bpf_trace_printk()
- added BPF tests which ensure %pT format specifier use works (Alexei).
Alan Maguire (8):
bpf: provide function to get vmlinux BTF information
bpf: move to generic BTF show support, apply it to seq files/strings
bpf: add bpf_snprintf_btf helper
selftests/bpf: add bpf_snprintf_btf helper tests
bpf: bump iter seq size to support BTF representation of large data
structures
selftests/bpf: fix overflow tests to reflect iter size increase
bpf: add bpf_seq_printf_btf helper
selftests/bpf: add test for bpf_seq_printf_btf helper
include/linux/bpf.h | 3 +
include/linux/btf.h | 39 +
include/uapi/linux/bpf.h | 76 ++
kernel/bpf/bpf_iter.c | 4 +-
kernel/bpf/btf.c | 1007 ++++++++++++++++++--
kernel/bpf/core.c | 2 +
kernel/bpf/helpers.c | 4 +
kernel/bpf/verifier.c | 18 +-
kernel/trace/bpf_trace.c | 98 ++
scripts/bpf_helpers_doc.py | 2 +
tools/include/uapi/linux/bpf.h | 76 ++
tools/testing/selftests/bpf/prog_tests/bpf_iter.c | 88 +-
.../selftests/bpf/prog_tests/snprintf_btf.c | 60 ++
.../selftests/bpf/progs/bpf_iter_task_btf.c | 50 +
.../selftests/bpf/progs/netif_receive_skb.c | 249 +++++
15 files changed, 1659 insertions(+), 117 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/snprintf_btf.c
create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_task_btf.c
create mode 100644 tools/testing/selftests/bpf/progs/netif_receive_skb.c
--
1.8.3.1
This series attempts to provide a simple way for BPF programs (and in
future other consumers) to utilize BPF Type Format (BTF) information
to display kernel data structures in-kernel. The use case this
functionality is applied to here is to support a snprintf()-like
helper to copy a BTF representation of kernel data to a string,
and a BPF seq file helper to display BTF data for an iterator.
There is already support in kernel/bpf/btf.c for "show" functionality;
the changes here generalize that support from seq-file specific
verifier display to the more generic case and add another specific
use case; rather than seq_printf()ing the show data, it is copied
to a supplied string using a snprintf()-like function. Other future
consumers of the show functionality could include a bpf_printk_btf()
function which printk()ed the data instead. Oops messaging in
particular would be an interesting application for such functionality.
The above potential use case hints at a potential reply to
a reasonable objection that such typed display should be
solved by tracing programs, where the in-kernel tracing records
data and the userspace program prints it out. While this
is certainly the recommended approach for most cases, I
believe having an in-kernel mechanism would be valuable
also. Critically in BPF programs it greatly simplifies
debugging and tracing of such data to invoking a simple
helper.
One challenge raised in an earlier iteration of this work -
where the BTF printing was implemented as a printk() format
specifier - was that the amount of data printed per
printk() was large, and other format specifiers were far
simpler. Here we sidestep that concern by printing
components of the BTF representation as we go for the
seq file case, and in the string case the snprintf()-like
operation is intended to be a basis for perf event or
ringbuf output. The reasons for avoiding bpf_trace_printk
are that
1. bpf_trace_printk() strings are restricted in size and
cannot display anything beyond trivial data structures; and
2. bpf_trace_printk() is for debugging purposes only.
As Alexei suggested, a bpf_trace_puts() helper could solve
this in the future but it still would be limited by the
1000 byte limit for traced strings.
Default output for an sk_buff looks like this (zeroed fields
are omitted):
(struct sk_buff){
.transport_header = (__u16)65535,
.mac_header = (__u16)65535,
.end = (sk_buff_data_t)192,
.head = (unsigned char *)0x000000007524fd8b,
.data = (unsigned char *)0x000000007524fd8b,
.truesize = (unsigned int)768,
.users = (refcount_t){
.refs = (atomic_t){
.counter = (int)1,
},
},
}
Flags can modify aspects of output format; see patch 3
for more details.
Changes since v5:
- Moved btf print prepare into patch 3, type show seq
with flags into patch 2 (Alexei, patches 2,3)
- Fixed build bot warnings around static declarations
and printf attributes
- Renamed functions to snprintf_btf/seq_printf_btf
(Alexei, patches 3-6)
Changes since v4:
- Changed approach from a BPF trace event-centric design to one
utilizing a snprintf()-like helper and an iter helper (Alexei,
patches 3,5)
- Added tests to verify BTF output (patch 4)
- Added support to tests for verifying BTF type_id-based display
as well as type name via __builtin_btf_type_id (Andrii, patch 4).
- Augmented task iter tests to cover the BTF-based seq helper.
Because a task_struct's BTF-based representation would overflow
the PAGE_SIZE limit on iterator data, the "struct fs_struct"
(task->fs) is displayed for each task instead (Alexei, patch 6).
Changes since v3:
- Moved to RFC since the approach is different (and bpf-next is
closed)
- Rather than using a printk() format specifier as the means
of invoking BTF-enabled display, a dedicated BPF helper is
used. This solves the issue of printk() having to output
large amounts of data using a complex mechanism such as
BTF traversal, but still provides a way for the display of
such data to be achieved via BPF programs. Future work could
include a bpf_printk_btf() function to invoke display via
printk() where the elements of a data structure are printk()ed
one at a time. Thanks to Petr Mladek, Andy Shevchenko and
Rasmus Villemoes who took time to look at the earlier printk()
format-specifier-focused version of this and provided feedback
clarifying the problems with that approach.
- Added trace id to the bpf_trace_printk events as a means of
separating output from standard bpf_trace_printk() events,
ensuring it can be easily parsed by the reader.
- Added bpf_trace_btf() helper tests which do simple verification
of the various display options.
Changes since v2:
- Alexei and Yonghong suggested it would be good to use
probe_kernel_read() on to-be-shown data to ensure safety
during operation. Safe copy via probe_kernel_read() to a
buffer object in "struct btf_show" is used to support
this. A few different approaches were explored
including dynamic allocation and per-cpu buffers. The
downside of dynamic allocation is that it would be done
during BPF program execution for bpf_trace_printk()s using
%pT format specifiers. The problem with per-cpu buffers
is we'd have to manage preemption and since the display
of an object occurs over an extended period and in printk
context where we'd rather not change preemption status,
it seemed tricky to manage buffer safety while considering
preemption. The approach of utilizing stack buffer space
via the "struct btf_show" seemed like the simplest approach.
The stack size of the associated functions which have a
"struct btf_show" on their stack to support show operation
(btf_type_snprintf_show() and btf_type_seq_show()) stays
under 500 bytes. The compromise here is the safe buffer we
use is small - 256 bytes - and as a result multiple
probe_kernel_read()s are needed for larger objects. Most
objects of interest are smaller than this (e.g.
"struct sk_buff" is 224 bytes), and while task_struct is a
notable exception at ~8K, performance is not the priority for
BTF-based display. (Alexei and Yonghong, patch 2).
- safe buffer use is the default behaviour (and is mandatory
for BPF) but unsafe display - meaning no safe copy is done
and we operate on the object itself - is supported via a
'u' option.
- pointers are prefixed with 0x for clarity (Alexei, patch 2)
- added additional comments and explanations around BTF show
code, especially around determining whether objects such
zeroed. Also tried to comment safe object scheme used. (Yonghong,
patch 2)
- added late_initcall() to initialize vmlinux BTF so that it would
not have to be initialized during printk operation (Alexei,
patch 5)
- removed CONFIG_BTF_PRINTF config option as it is not needed;
CONFIG_DEBUG_INFO_BTF can be used to gate test behaviour and
determining behaviour of type-based printk can be done via
retrieval of BTF data; if it's not there BTF was unavailable
or broken (Alexei, patches 4,6)
- fix bpf_trace_printk test to use vmlinux.h and globals via
skeleton infrastructure, removing need for perf events
(Andrii, patch 8)
Changes since v1:
- changed format to be more drgn-like, rendering indented type info
along with type names by default (Alexei)
- zeroed values are omitted (Arnaldo) by default unless the '0'
modifier is specified (Alexei)
- added an option to print pointer values without obfuscation.
The reason to do this is the sysctls controlling pointer display
are likely to be irrelevant in many if not most tracing contexts.
Some questions on this in the outstanding questions section below...
- reworked printk format specifer so that we no longer rely on format
%pT<type> but instead use a struct * which contains type information
(Rasmus). This simplifies the printk parsing, makes use more dynamic
and also allows specification by BTF id as well as name.
- removed incorrect patch which tried to fix dereferencing of resolved
BTF info for vmlinux; instead we skip modifiers for the relevant
case (array element type determination) (Alexei).
- fixed issues with negative snprintf format length (Rasmus)
- added test cases for various data structure formats; base types,
typedefs, structs, etc.
- tests now iterate through all typedef, enum, struct and unions
defined for vmlinux BTF and render a version of the target dummy
value which is either all zeros or all 0xff values; the idea is this
exercises the "skip if zero" and "print everything" cases.
- added support in BPF for using the %pT format specifier in
bpf_trace_printk()
- added BPF tests which ensure %pT format specifier use works (Alexei).
Alan Maguire (6):
bpf: provide function to get vmlinux BTF information
bpf: move to generic BTF show support, apply it to seq files/strings
bpf: add bpf_snprintf_btf helper
selftests/bpf: add bpf_snprintf_btf helper tests
bpf: add bpf_seq_printf_btf helper
selftests/bpf: add test for bpf_seq_printf_btf helper
include/linux/bpf.h | 3 +
include/linux/btf.h | 39 +
include/uapi/linux/bpf.h | 78 ++
kernel/bpf/btf.c | 980 ++++++++++++++++++---
kernel/bpf/core.c | 2 +
kernel/bpf/helpers.c | 4 +
kernel/bpf/verifier.c | 18 +-
kernel/trace/bpf_trace.c | 134 +++
scripts/bpf_helpers_doc.py | 2 +
tools/include/uapi/linux/bpf.h | 78 ++
tools/testing/selftests/bpf/prog_tests/bpf_iter.c | 66 ++
.../selftests/bpf/prog_tests/snprintf_btf.c | 54 ++
.../selftests/bpf/progs/bpf_iter_task_btf.c | 49 ++
.../selftests/bpf/progs/netif_receive_skb.c | 260 ++++++
14 files changed, 1659 insertions(+), 108 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/snprintf_btf.c
create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_task_btf.c
create mode 100644 tools/testing/selftests/bpf/progs/netif_receive_skb.c
--
1.8.3.1
From: Thomas Gleixner <tglx(a)linutronix.de>
CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
removed. Cleanup the leftovers before doing so.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Josh Triplett <josh(a)joshtriplett.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: rcu(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
---
tools/testing/selftests/rcutorture/configs/rcu/SRCU-t | 1 -
tools/testing/selftests/rcutorture/configs/rcu/SRCU-u | 1 -
tools/testing/selftests/rcutorture/configs/rcu/TINY01 | 1 -
tools/testing/selftests/rcutorture/doc/TINY_RCU.txt | 5 ++---
tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt | 1 -
tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h | 1 -
6 files changed, 2 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/SRCU-t b/tools/testing/selftests/rcutorture/configs/rcu/SRCU-t
index 6c78022..553cf65 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/SRCU-t
+++ b/tools/testing/selftests/rcutorture/configs/rcu/SRCU-t
@@ -7,4 +7,3 @@ CONFIG_RCU_TRACE=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
CONFIG_DEBUG_ATOMIC_SLEEP=y
-#CHECK#CONFIG_PREEMPT_COUNT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/SRCU-u b/tools/testing/selftests/rcutorture/configs/rcu/SRCU-u
index c15ada8..99563da 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/SRCU-u
+++ b/tools/testing/selftests/rcutorture/configs/rcu/SRCU-u
@@ -7,4 +7,3 @@ CONFIG_RCU_TRACE=n
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
-CONFIG_PREEMPT_COUNT=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TINY01 b/tools/testing/selftests/rcutorture/configs/rcu/TINY01
index 6db705e..9b22b8e 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TINY01
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TINY01
@@ -10,4 +10,3 @@ CONFIG_RCU_TRACE=n
#CHECK#CONFIG_RCU_STALL_COMMON=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
-CONFIG_PREEMPT_COUNT=n
diff --git a/tools/testing/selftests/rcutorture/doc/TINY_RCU.txt b/tools/testing/selftests/rcutorture/doc/TINY_RCU.txt
index a75b169..d30cedf 100644
--- a/tools/testing/selftests/rcutorture/doc/TINY_RCU.txt
+++ b/tools/testing/selftests/rcutorture/doc/TINY_RCU.txt
@@ -3,11 +3,10 @@ This document gives a brief rationale for the TINY_RCU test cases.
Kconfig Parameters:
-CONFIG_DEBUG_LOCK_ALLOC -- Do all three and none of the three.
-CONFIG_PREEMPT_COUNT
+CONFIG_DEBUG_LOCK_ALLOC -- Do both and none of the two.
CONFIG_RCU_TRACE
-The theory here is that randconfig testing will hit the other six possible
+The theory here is that randconfig testing will hit the other two possible
combinations of these parameters.
diff --git a/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
index 1b96d68..cfdd48f 100644
--- a/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
+++ b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
@@ -43,7 +43,6 @@ CONFIG_64BIT
Used only to check CONFIG_RCU_FANOUT value, inspection suffices.
-CONFIG_PREEMPT_COUNT
CONFIG_PREEMPT_RCU
Redundant with CONFIG_PREEMPT, ignore.
diff --git a/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h b/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h
index 283d710..d0d485d 100644
--- a/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h
+++ b/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h
@@ -8,7 +8,6 @@
#undef CONFIG_HOTPLUG_CPU
#undef CONFIG_MODULES
#undef CONFIG_NO_HZ_FULL_SYSIDLE
-#undef CONFIG_PREEMPT_COUNT
#undef CONFIG_PREEMPT_RCU
#undef CONFIG_PROVE_RCU
#undef CONFIG_RCU_NOCB_CPU
--
2.9.5
Hi!
I really like Hangbin Liu's intent[1] but I think we need to be a little
more clean about the implementation. This extracts run_kselftest.sh from
the Makefile so it can actually be changed without embeds, etc. Instead,
generate the test list into a text file. Everything gets much simpler.
:)
And in patch 2, I add back Hangin Liu's new options (with some extra
added) with knowledge of "collections" (i.e. Makefile TARGETS) and
subtests. This should work really well with LAVA too, which needs to
manipulate the lists of tests being run.
Thoughts?
-Kees
[1] https://lore.kernel.org/lkml/20200914022227.437143-1-liuhangbin@gmail.com/
Kees Cook (2):
selftests: Extract run_kselftest.sh and generate stand-alone test list
selftests/run_kselftest.sh: Make each test individually selectable
tools/testing/selftests/Makefile | 26 ++-----
tools/testing/selftests/lib.mk | 5 +-
tools/testing/selftests/run_kselftest.sh | 89 ++++++++++++++++++++++++
3 files changed, 98 insertions(+), 22 deletions(-)
create mode 100755 tools/testing/selftests/run_kselftest.sh
--
2.25.1
Right now .kunitconfig and the build dir are automatically created if
the build dir does not exists; however, if the build dir is present and
.kunitconfig is not, kunit_tool will crash.
Fix this by checking for both the build dir as well as the .kunitconfig.
NOTE: This depends on commit 5578d008d9e0 ("kunit: tool: fix running
kunit_tool from outside kernel tree")
Link: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git/c…
Signed-off-by: Brendan Higgins <brendanhiggins(a)google.com>
---
tools/testing/kunit/kunit.py | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index e2caf4e24ecb2..8ab17e21a3578 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -243,6 +243,8 @@ def main(argv, linux=None):
if cli_args.subcommand == 'run':
if not os.path.exists(cli_args.build_dir):
os.mkdir(cli_args.build_dir)
+
+ if not os.path.exists(kunit_kernel.kunitconfig_path):
create_default_kunitconfig()
if not linux:
@@ -258,10 +260,12 @@ def main(argv, linux=None):
if result.status != KunitStatus.SUCCESS:
sys.exit(1)
elif cli_args.subcommand == 'config':
- if cli_args.build_dir:
- if not os.path.exists(cli_args.build_dir):
- os.mkdir(cli_args.build_dir)
- create_default_kunitconfig()
+ if cli_args.build_dir and (
+ not os.path.exists(cli_args.build_dir)):
+ os.mkdir(cli_args.build_dir)
+
+ if not os.path.exists(kunit_kernel.kunitconfig_path):
+ create_default_kunitconfig()
if not linux:
linux = kunit_kernel.LinuxSourceTree()
base-commit: d96fe1a5485fa978a6e3690adc4dbe4d20b5baa4
--
2.28.0.681.g6f77f65b4e-goog
Changes since v2:
- added struct xfrm_translator as API to register xfrm_compat.ko with
xfrm_state.ko. This allows compilation of translator as a loadable
module
- fixed indention and collected reviewed-by (Johannes Berg)
- moved boilerplate from commit messages into cover-letter (Steffen
Klassert)
- found on KASAN build and fixed non-initialised stack variable usage
in the translator
The resulting v2/v3 diff can be found here:
https://gist.github.com/0x7f454c46/8f68311dfa1f240959fdbe7c77ed2259
Patches as a .git branch:
https://github.com/0x7f454c46/linux/tree/xfrm-compat-v3
Changes since v1:
- reworked patches set to use translator
- separated the compat layer into xfrm_compat.c,
compiled under XFRM_USER_COMPAT config
- 32-bit messages now being sent in frag_list (like wext-core does)
- instead of __packed add compat_u64 members in compat structures
- selftest reworked to kselftest lib API
- added netlink dump testing to the selftest
XFRM is disabled for compatible users because of the UABI difference.
The difference is in structures paddings and in the result the size
of netlink messages differ.
Possibility for compatible application to manage xfrm tunnels was
disabled by: the commmit 19d7df69fdb2 ("xfrm: Refuse to insert 32 bit
userspace socket policies on 64 bit systems") and the commit 74005991b78a
("xfrm: Do not parse 32bits compiled xfrm netlink msg on 64bits host").
This is my second attempt to resolve the xfrm/compat problem by adding
the 64=>32 and 32=>64 bit translators those non-visibly to a user
provide translation between compatible user and kernel.
Previous attempt was to interrupt the message ABI according to a syscall
by xfrm_user, which resulted in over-complicated code [1].
Florian Westphal provided the idea of translator and some draft patches
in the discussion. In these patches, his idea is reused and some of his
initial code is also present.
There were a couple of attempts to solve xfrm compat problem:
https://lkml.org/lkml/2017/1/20/733https://patchwork.ozlabs.org/patch/44600/http://netdev.vger.kernel.narkive.com/2Gesykj6/patch-net-next-xfrm-correctl…
All the discussions end in the conclusion that xfrm should have a full
compatible layer to correctly work with 32-bit applications on 64-bit
kernels:
https://lkml.org/lkml/2017/1/23/413https://patchwork.ozlabs.org/patch/433279/
In some recent lkml discussion, Linus said that it's worth to fix this
problem and not giving people an excuse to stay on 32-bit kernel:
https://lkml.org/lkml/2018/2/13/752
There is also an selftest for ipsec tunnels.
It doesn't depend on any library and compat version can be easy
build with: make CFLAGS=-m32 net/ipsec
[1]: https://lkml.kernel.org/r/20180726023144.31066-1-dima@arista.com
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Florian Westphal <fw(a)strlen.de>
Cc: Herbert Xu <herbert(a)gondor.apana.org.au>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Johannes Berg <johannes(a)sipsolutions.net>
Cc: Steffen Klassert <steffen.klassert(a)secunet.com>
Cc: Stephen Suryaputra <ssuryaextr(a)gmail.com>
Cc: Dmitry Safonov <0x7f454c46(a)gmail.com>
Cc: netdev(a)vger.kernel.org
Dmitry Safonov (7):
xfrm: Provide API to register translator module
xfrm/compat: Add 64=>32-bit messages translator
xfrm/compat: Attach xfrm dumps to 64=>32 bit translator
netlink/compat: Append NLMSG_DONE/extack to frag_list
xfrm/compat: Add 32=>64-bit messages translator
xfrm/compat: Translate 32-bit user_policy from sockptr
selftest/net/xfrm: Add test for ipsec tunnel
MAINTAINERS | 1 +
include/net/xfrm.h | 33 +
net/netlink/af_netlink.c | 47 +-
net/xfrm/Kconfig | 11 +
net/xfrm/Makefile | 1 +
net/xfrm/xfrm_compat.c | 625 +++++++
net/xfrm/xfrm_state.c | 77 +-
net/xfrm/xfrm_user.c | 110 +-
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/ipsec.c | 2195 ++++++++++++++++++++++++
11 files changed, 3066 insertions(+), 36 deletions(-)
create mode 100644 net/xfrm/xfrm_compat.c
create mode 100644 tools/testing/selftests/net/ipsec.c
base-commit: ba4f184e126b751d1bffad5897f263108befc780
--
2.28.0
Hi,
The v6 of this patch series include only the type change requested by
Andy on the vdso patch, but since v5 included some bigger changes, I'm
documenting them in this cover letter as well.
Please note this applies on top of Linus tree, and it succeeds seccomp
and syscall user dispatch selftests.
v5 cover letter
--------------
This is v5 of Syscall User Dispatch. It has some big changes in
comparison to v4.
First of all, it allows the vdso trampoline code for architectures that
support it. This is exposed through an arch hook. It also addresses
the concern about what happens when a bad selector is provided, instead
of SIGSEGV, we fail with SIGSYS, which is more debug-able.
Another major change is that it is now based on top of Gleixner's common
syscall entry work, and is supposed to only be used by that code.
Therefore, the entry symbol is not exported outside of kernel/entry/ code.
The biggest change in this version is the attempt to avoid using one of
the final TIF flags on x86 32 bit, without increasing the size of that
variable to 64 bit. My expectation is that, with this work, plus the
removal of TIF_IA32, TIF_X32 and TIF_FORCE_TF, we might be able to avoid
changing this field to 64 bits at all. Instead, this follows the
suggestion by Andy to have a generic TIF flag for SECCOMP and this
mechanism, and use another field to decide which one is enabled. The
code for this is not complex, so it seems like a viable approach.
Finally, this version adds some documentation to the feature.
Kees, I dropped your reviewed-by on patch 5, given the amount of
changes.
Thanks,
Previous submissions are archived at:
RFC/v1: https://lkml.org/lkml/2020/7/8/96
v2: https://lkml.org/lkml/2020/7/9/17
v3: https://lkml.org/lkml/2020/7/12/4
v4: https://www.spinics.net/lists/linux-kselftest/msg16377.html
v5: https://lkml.org/lkml/2020/8/10/1320
Gabriel Krisman Bertazi (9):
kernel: Support TIF_SYSCALL_INTERCEPT flag
kernel: entry: Support TIF_SYSCAL_INTERCEPT on common entry code
x86: vdso: Expose sigreturn address on vdso to the kernel
signal: Expose SYS_USER_DISPATCH si_code type
kernel: Implement selective syscall userspace redirection
kernel: entry: Support Syscall User Dispatch for common syscall entry
x86: Enable Syscall User Dispatch
selftests: Add kselftest for syscall user dispatch
doc: Document Syscall User Dispatch
.../admin-guide/syscall-user-dispatch.rst | 87 ++++++
arch/Kconfig | 21 ++
arch/x86/Kconfig | 1 +
arch/x86/entry/vdso/vdso2c.c | 2 +
arch/x86/entry/vdso/vdso32/sigreturn.S | 2 +
arch/x86/entry/vdso/vma.c | 15 +
arch/x86/include/asm/elf.h | 1 +
arch/x86/include/asm/thread_info.h | 4 +-
arch/x86/include/asm/vdso.h | 2 +
arch/x86/kernel/signal_compat.c | 2 +-
fs/exec.c | 8 +
include/linux/entry-common.h | 6 +-
include/linux/sched.h | 8 +-
include/linux/seccomp.h | 20 +-
include/linux/syscall_intercept.h | 71 +++++
include/linux/syscall_user_dispatch.h | 29 ++
include/uapi/asm-generic/siginfo.h | 3 +-
include/uapi/linux/prctl.h | 5 +
kernel/entry/Makefile | 1 +
kernel/entry/common.c | 32 +-
kernel/entry/common.h | 15 +
kernel/entry/syscall_user_dispatch.c | 101 ++++++
kernel/fork.c | 10 +-
kernel/seccomp.c | 7 +-
kernel/sys.c | 5 +
tools/testing/selftests/Makefile | 1 +
.../syscall_user_dispatch/.gitignore | 2 +
.../selftests/syscall_user_dispatch/Makefile | 9 +
.../selftests/syscall_user_dispatch/config | 1 +
.../syscall_user_dispatch.c | 292 ++++++++++++++++++
30 files changed, 744 insertions(+), 19 deletions(-)
create mode 100644 Documentation/admin-guide/syscall-user-dispatch.rst
create mode 100644 include/linux/syscall_intercept.h
create mode 100644 include/linux/syscall_user_dispatch.h
create mode 100644 kernel/entry/common.h
create mode 100644 kernel/entry/syscall_user_dispatch.c
create mode 100644 tools/testing/selftests/syscall_user_dispatch/.gitignore
create mode 100644 tools/testing/selftests/syscall_user_dispatch/Makefile
create mode 100644 tools/testing/selftests/syscall_user_dispatch/config
create mode 100644 tools/testing/selftests/syscall_user_dispatch/syscall_user_dispatch.c
--
2.28.0
This patchset is based on Google-internal RSEQ
work done by Paul Turner and Andrew Hunter.
When working with per-CPU RSEQ-based memory allocations,
it is sometimes important to make sure that a global
memory location is no longer accessed from RSEQ critical
sections. For example, there can be two per-CPU lists,
one is "active" and accessed per-CPU, while another one
is inactive and worked on asynchronously "off CPU" (e.g.
garbage collection is performed). Then at some point
the two lists are swapped, and a fast RCU-like mechanism
is required to make sure that the previously active
list is no longer accessed.
This patch introduces such a mechanism: in short,
membarrier() syscall issues an IPI to a CPU, restarting
a potentially active RSEQ critical section on the CPU.
v1->v2:
- removed the ability to IPI all CPUs in a single sycall;
- use task->mm rather than task->group_leader to identify
tasks belonging to the same process.
v2->v3:
- re-added the ability to IPI all CPUs in a single syscall;
- integrated with membarrier_private_expedited() to
make sure only CPUs running tasks with the same mm as
the current task are interrupted;
- also added MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ;
- flags in membarrier_private_expedited are never actually
bit flags but always distinct values (i.e. never two flags
are combined), so I modified bit testing to full equation
comparison for simplicity (otherwise the code needs to
work when several bits are set, for example).
v3->v4:
- added the third parameter to membarrier syscall: @cpu_id:
if @flags == MEMBARRIER_CMD_FLAG_CPU, then @cpu_id indicates
the cpu on which RSEQ CS should be restarted.
v4->v5:
- added @cpu_id parameter to sys_membarrier in syscalls.h.
v5->v6:
- made membarrier_private_expedited more efficient in a
single-cpu case;
- a couple of minor refactorings.
v6->v7:
- made @flags an unsigned int in sys_membarrier;
- a couple of minor refactorings.
v7->v8:
- replaced BUG_ON with WARN_ON_ONCE in membarrier.c.
The second patch in the patchset adds a selftest
of this feature.
Signed-off-by: Peter Oskolkov <posk(a)google.com>
---
include/linux/sched/mm.h | 3 +
include/linux/syscalls.h | 2 +-
include/uapi/linux/membarrier.h | 26 ++++++
kernel/sched/membarrier.c | 136 +++++++++++++++++++++++++-------
4 files changed, 136 insertions(+), 31 deletions(-)
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index f889e332912f..15bfb06f2884 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -348,10 +348,13 @@ enum {
MEMBARRIER_STATE_GLOBAL_EXPEDITED = (1U << 3),
MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY = (1U << 4),
MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE = (1U << 5),
+ MEMBARRIER_STATE_PRIVATE_EXPEDITED_RSEQ_READY = (1U << 6),
+ MEMBARRIER_STATE_PRIVATE_EXPEDITED_RSEQ = (1U << 7),
};
enum {
MEMBARRIER_FLAG_SYNC_CORE = (1U << 0),
+ MEMBARRIER_FLAG_RSEQ = (1U << 1),
};
#ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 75ac7f8ae93c..466c993e52bf 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -974,7 +974,7 @@ asmlinkage long sys_execveat(int dfd, const char __user *filename,
const char __user *const __user *argv,
const char __user *const __user *envp, int flags);
asmlinkage long sys_userfaultfd(int flags);
-asmlinkage long sys_membarrier(int cmd, int flags);
+asmlinkage long sys_membarrier(int cmd, int flags, int cpu_id);
asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
asmlinkage long sys_copy_file_range(int fd_in, loff_t __user *off_in,
int fd_out, loff_t __user *off_out,
diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
index 5891d7614c8c..737605897f36 100644
--- a/include/uapi/linux/membarrier.h
+++ b/include/uapi/linux/membarrier.h
@@ -114,6 +114,26 @@
* If this command is not implemented by an
* architecture, -EINVAL is returned.
* Returns 0 on success.
+ * @MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ:
+ * Ensure the caller thread, upon return from
+ * system call, that all its running thread
+ * siblings have any currently running rseq
+ * critical sections restarted if @flags
+ * parameter is 0; if @flags parameter is
+ * MEMBARRIER_CMD_FLAG_CPU,
+ * then this operation is performed only
+ * on CPU indicated by @cpu_id. If this command is
+ * not implemented by an architecture, -EINVAL
+ * is returned. A process needs to register its
+ * intent to use the private expedited rseq
+ * command prior to using it, otherwise
+ * this command returns -EPERM.
+ * @MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ:
+ * Register the process intent to use
+ * MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ.
+ * If this command is not implemented by an
+ * architecture, -EINVAL is returned.
+ * Returns 0 on success.
* @MEMBARRIER_CMD_SHARED:
* Alias to MEMBARRIER_CMD_GLOBAL. Provided for
* header backward compatibility.
@@ -131,9 +151,15 @@ enum membarrier_cmd {
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED = (1 << 4),
MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE = (1 << 5),
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE = (1 << 6),
+ MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ = (1 << 7),
+ MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ = (1 << 8),
/* Alias for header backward compatibility. */
MEMBARRIER_CMD_SHARED = MEMBARRIER_CMD_GLOBAL,
};
+enum membarrier_cmd_flag {
+ MEMBARRIER_CMD_FLAG_CPU = (1 << 0),
+};
+
#endif /* _UAPI_LINUX_MEMBARRIER_H */
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 168479a7d61b..e23e74d52db5 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -18,6 +18,14 @@
#define MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK 0
#endif
+#ifdef CONFIG_RSEQ
+#define MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ_BITMASK \
+ (MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ \
+ | MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ_BITMASK)
+#else
+#define MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ_BITMASK 0
+#endif
+
#define MEMBARRIER_CMD_BITMASK \
(MEMBARRIER_CMD_GLOBAL | MEMBARRIER_CMD_GLOBAL_EXPEDITED \
| MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED \
@@ -30,6 +38,11 @@ static void ipi_mb(void *info)
smp_mb(); /* IPIs should be serializing but paranoid. */
}
+static void ipi_rseq(void *info)
+{
+ rseq_preempt(current);
+}
+
static void ipi_sync_rq_state(void *info)
{
struct mm_struct *mm = (struct mm_struct *) info;
@@ -129,19 +142,27 @@ static int membarrier_global_expedited(void)
return 0;
}
-static int membarrier_private_expedited(int flags)
+static int membarrier_private_expedited(int flags, int cpu_id)
{
- int cpu;
cpumask_var_t tmpmask;
struct mm_struct *mm = current->mm;
+ smp_call_func_t ipi_func = ipi_mb;
- if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
+ if (flags == MEMBARRIER_FLAG_SYNC_CORE) {
if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
return -EINVAL;
if (!(atomic_read(&mm->membarrier_state) &
MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
return -EPERM;
+ } else if (flags == MEMBARRIER_FLAG_RSEQ) {
+ if (!IS_ENABLED(CONFIG_RSEQ))
+ return -EINVAL;
+ if (!(atomic_read(&mm->membarrier_state) &
+ MEMBARRIER_STATE_PRIVATE_EXPEDITED_RSEQ_READY))
+ return -EPERM;
+ ipi_func = ipi_rseq;
} else {
+ WARN_ON_ONCE(flags);
if (!(atomic_read(&mm->membarrier_state) &
MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY))
return -EPERM;
@@ -156,35 +177,59 @@ static int membarrier_private_expedited(int flags)
*/
smp_mb(); /* system call entry is not a mb. */
- if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
+ if (cpu_id < 0 && !zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
return -ENOMEM;
cpus_read_lock();
- rcu_read_lock();
- for_each_online_cpu(cpu) {
+
+ if (cpu_id >= 0) {
struct task_struct *p;
- /*
- * Skipping the current CPU is OK even through we can be
- * migrated at any point. The current CPU, at the point
- * where we read raw_smp_processor_id(), is ensured to
- * be in program order with respect to the caller
- * thread. Therefore, we can skip this CPU from the
- * iteration.
- */
- if (cpu == raw_smp_processor_id())
- continue;
- p = rcu_dereference(cpu_rq(cpu)->curr);
- if (p && p->mm == mm)
- __cpumask_set_cpu(cpu, tmpmask);
+ if (cpu_id >= nr_cpu_ids || !cpu_online(cpu_id))
+ goto out;
+ if (cpu_id == raw_smp_processor_id())
+ goto out;
+ rcu_read_lock();
+ p = rcu_dereference(cpu_rq(cpu_id)->curr);
+ if (!p || p->mm != mm) {
+ rcu_read_unlock();
+ goto out;
+ }
+ rcu_read_unlock();
+ } else {
+ int cpu;
+
+ rcu_read_lock();
+ for_each_online_cpu(cpu) {
+ struct task_struct *p;
+
+ /*
+ * Skipping the current CPU is OK even through we can be
+ * migrated at any point. The current CPU, at the point
+ * where we read raw_smp_processor_id(), is ensured to
+ * be in program order with respect to the caller
+ * thread. Therefore, we can skip this CPU from the
+ * iteration.
+ */
+ if (cpu == raw_smp_processor_id())
+ continue;
+ p = rcu_dereference(cpu_rq(cpu)->curr);
+ if (p && p->mm == mm)
+ __cpumask_set_cpu(cpu, tmpmask);
+ }
+ rcu_read_unlock();
}
- rcu_read_unlock();
preempt_disable();
- smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+ if (cpu_id >= 0)
+ smp_call_function_single(cpu_id, ipi_func, NULL, 1);
+ else
+ smp_call_function_many(tmpmask, ipi_func, NULL, 1);
preempt_enable();
- free_cpumask_var(tmpmask);
+out:
+ if (cpu_id < 0)
+ free_cpumask_var(tmpmask);
cpus_read_unlock();
/*
@@ -283,11 +328,18 @@ static int membarrier_register_private_expedited(int flags)
set_state = MEMBARRIER_STATE_PRIVATE_EXPEDITED,
ret;
- if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
+ if (flags == MEMBARRIER_FLAG_SYNC_CORE) {
if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
return -EINVAL;
ready_state =
MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY;
+ } else if (flags == MEMBARRIER_FLAG_RSEQ) {
+ if (!IS_ENABLED(CONFIG_RSEQ))
+ return -EINVAL;
+ ready_state =
+ MEMBARRIER_STATE_PRIVATE_EXPEDITED_RSEQ_READY;
+ } else {
+ WARN_ON_ONCE(flags);
}
/*
@@ -299,6 +351,8 @@ static int membarrier_register_private_expedited(int flags)
return 0;
if (flags & MEMBARRIER_FLAG_SYNC_CORE)
set_state |= MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE;
+ if (flags & MEMBARRIER_FLAG_RSEQ)
+ set_state |= MEMBARRIER_STATE_PRIVATE_EXPEDITED_RSEQ;
atomic_or(set_state, &mm->membarrier_state);
ret = sync_runqueues_membarrier_state(mm);
if (ret)
@@ -310,8 +364,15 @@ static int membarrier_register_private_expedited(int flags)
/**
* sys_membarrier - issue memory barriers on a set of threads
- * @cmd: Takes command values defined in enum membarrier_cmd.
- * @flags: Currently needs to be 0. For future extensions.
+ * @cmd: Takes command values defined in enum membarrier_cmd.
+ * @flags: Currently needs to be 0 for all commands other than
+ * MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ: in the latter
+ * case it can be MEMBARRIER_CMD_FLAG_CPU, indicating that @cpu_id
+ * contains the CPU on which to interrupt (= restart)
+ * the RSEQ critical section.
+ * @cpu_id: if @flags == MEMBARRIER_CMD_FLAG_CPU, indicates the cpu on which
+ * RSEQ CS should be interrupted (@cmd must be
+ * MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ).
*
* If this system call is not implemented, -ENOSYS is returned. If the
* command specified does not exist, not available on the running
@@ -337,10 +398,21 @@ static int membarrier_register_private_expedited(int flags)
* smp_mb() X O O
* sys_membarrier() O O O
*/
-SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
+SYSCALL_DEFINE3(membarrier, int, cmd, unsigned int, flags, int, cpu_id)
{
- if (unlikely(flags))
- return -EINVAL;
+ switch (cmd) {
+ case MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ:
+ if (unlikely(flags && flags != MEMBARRIER_CMD_FLAG_CPU))
+ return -EINVAL;
+ break;
+ default:
+ if (unlikely(flags))
+ return -EINVAL;
+ }
+
+ if (!(flags & MEMBARRIER_CMD_FLAG_CPU))
+ cpu_id = -1;
+
switch (cmd) {
case MEMBARRIER_CMD_QUERY:
{
@@ -362,13 +434,17 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
case MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED:
return membarrier_register_global_expedited();
case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
- return membarrier_private_expedited(0);
+ return membarrier_private_expedited(0, cpu_id);
case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
return membarrier_register_private_expedited(0);
case MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE:
- return membarrier_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
+ return membarrier_private_expedited(MEMBARRIER_FLAG_SYNC_CORE, cpu_id);
case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE:
return membarrier_register_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
+ case MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ:
+ return membarrier_private_expedited(MEMBARRIER_FLAG_RSEQ, cpu_id);
+ case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ:
+ return membarrier_register_private_expedited(MEMBARRIER_FLAG_RSEQ);
default:
return -EINVAL;
}
--
2.28.0.709.gb0816b6eb0-goog
[ Please CC me I am not subscribed to all MLs ]
[ CC Sami ]
Hi Bill,
I have tested your patch on top of Sami's latest clang-cfi Git branch.
Feel free to add...
Tested-by: Sedat Dilek <sedat.dilek(a)gmail.com> # LLVM toolchain
version 11.0.0-rc3 on x86-64
Thanks for the patch.
Regards,
- Sedat -