This patch reduces the running time for hmm-tests from about 10+
seconds, to just under 1.0 second, for an approximately 10x speedup.
That brings it in line with most of the other tests in selftests/vm,
which mostly run in < 1 sec.
This is done with a one-line change that simply reduces the number of
iterations of several tests, from 256, to 10. Thanks to Ralph Campbell
for suggesting changing NTIMES as a way to get the speedup.
Suggested-by: Ralph Campbell <rcampbell(a)nvidia.com>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
This is based on mmotm.
tools/testing/selftests/vm/hmm-tests.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vm/hmm-tests.c b/tools/testing/selftests/vm/hmm-tests.c
index 6b79723d7dc6..5d1ac691b9f4 100644
--- a/tools/testing/selftests/vm/hmm-tests.c
+++ b/tools/testing/selftests/vm/hmm-tests.c
@@ -49,7 +49,7 @@ struct hmm_buffer {
#define TWOMEG (1 << 21)
#define HMM_BUFFER_SIZE (1024 << 12)
#define HMM_PATH_MAX 64
-#define NTIMES 256
+#define NTIMES 10
#define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1)))
--
2.28.0
v3 -> v4:
- Rebasing
- Cast bpf_[per|this]_cpu_ptr's parameter to void __percpu * before
passing into per_cpu_ptr.
v2 -> v3:
- Rename functions and variables in verifier for better readability.
- Stick to logging message convention in libbpf.
- Move bpf_per_cpu_ptr and bpf_this_cpu_ptr from trace-specific
helper set to base helper set.
- More specific test in ksyms_btf.
- Fix return type cast in bpf_*_cpu_ptr.
- Fix btf leak in ksyms_btf selftest.
- Fix return error code for kallsyms_find().
v1 -> v2:
- Move check_pseudo_btf_id from check_ld_imm() to
replace_map_fd_with_map_ptr() and rename the latter.
- Add bpf_this_cpu_ptr().
- Use bpf_core_types_are_compat() in libbpf.c for checking type
compatibility.
- Rewrite typed ksym extern type in BTF with int to save space.
- Minor revision of bpf_per_cpu_ptr()'s comments.
- Avoid using long in tests that use skeleton.
- Refactored test_ksyms.c by moving kallsyms_find() to trace_helpers.c
- Fold the patches that sync include/linux/uapi and
tools/include/linux/uapi.
rfc -> v1:
- Encode VAR's btf_id for PSEUDO_BTF_ID.
- More checks in verifier. Checking the btf_id passed as
PSEUDO_BTF_ID is valid VAR, its name and type.
- Checks in libbpf on type compatibility of ksyms.
- Add bpf_per_cpu_ptr() to access kernel percpu vars. Introduced
new ARG and RET types for this helper.
This patch series extends the previously added __ksym externs with
btf support.
Right now the __ksym externs are treated as pure 64-bit scalar value.
Libbpf replaces ld_imm64 insn of __ksym by its kernel address at load
time. This patch series extend those externs with their btf info. Note
that btf support for __ksym must come with the kernel btf that has
VARs encoded to work properly. The corresponding chagnes in pahole
is available at [1] (with a fix at [2] for gcc 4.9+).
The first 3 patches in this series add support for general kernel
global variables, which include verifier checking (01/06), libpf
support (02/06) and selftests for getting typed ksym extern's kernel
address (03/06).
The next 3 patches extends that capability further by introducing
helpers bpf_per_cpu_ptr() and bpf_this_cpu_ptr(), which allows accessing
kernel percpu variables correctly (04/06 and 05/06).
The tests of this feature were performed against pahole that is extended
with [1] and [2]. For kernel BTF that does not have VARs encoded, the
selftests will be skipped.
[1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=f3d9054ba…
[2] https://www.spinics.net/lists/dwarves/msg00451.html
Hao Luo (6):
bpf: Introduce pseudo_btf_id
bpf/libbpf: BTF support for typed ksyms
selftests/bpf: ksyms_btf to test typed ksyms
bpf: Introduce bpf_per_cpu_ptr()
bpf: Introducte bpf_this_cpu_ptr()
bpf/selftests: Test for bpf_per_cpu_ptr() and bpf_this_cpu_ptr()
include/linux/bpf.h | 6 +
include/linux/bpf_verifier.h | 7 +
include/linux/btf.h | 26 +++
include/uapi/linux/bpf.h | 67 +++++-
kernel/bpf/btf.c | 25 ---
kernel/bpf/helpers.c | 32 +++
kernel/bpf/verifier.c | 190 ++++++++++++++++--
kernel/trace/bpf_trace.c | 4 +
tools/include/uapi/linux/bpf.h | 67 +++++-
tools/lib/bpf/libbpf.c | 112 +++++++++--
.../testing/selftests/bpf/prog_tests/ksyms.c | 38 ++--
.../selftests/bpf/prog_tests/ksyms_btf.c | 88 ++++++++
.../selftests/bpf/progs/test_ksyms_btf.c | 55 +++++
tools/testing/selftests/bpf/trace_helpers.c | 27 +++
tools/testing/selftests/bpf/trace_helpers.h | 4 +
15 files changed, 653 insertions(+), 95 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
create mode 100644 tools/testing/selftests/bpf/progs/test_ksyms_btf.c
--
2.28.0.709.gb0816b6eb0-goog
This patch series is a result of discussion at the refcount_t BOF
the Linux Plumbers Conference. In this discussion, we identified
a need for looking closely and investigating atomic_t usages in
the kernel when it is used strictly as a counter without it
controlling object lifetimes and state changes.
There are a number of atomic_t usages in the kernel where atomic_t api
is used strictly for counting and not for managing object lifetime. In
some cases, atomic_t might not even be needed.
The purpose of these counters is twofold: 1. clearly differentiate
atomic_t counters from atomic_t usages that guard object lifetimes,
hence prone to overflow and underflow errors. It allows tools that scan
for underflow and overflow on atomic_t usages to detect overflow and
underflows to scan just the cases that are prone to errors. 2. provides
non-atomic counters for cases where atomic isn't necessary.
Simple atomic and non-atomic counters api provides interfaces for simple
atomic and non-atomic counters that just count, and don't guard resource
lifetimes. Counters will wrap around to 0 when it overflows and should
not be used to guard resource lifetimes, device usage and open counts
that control state changes, and pm states.
Using counter_atomic to guard lifetimes could lead to use-after free
when it overflows and undefined behavior when used to manage state
changes and device usage/open states.
This patch series introduces Simple atomic and non-atomic counters.
Counter atomic ops leverage atomic_t and provide a sub-set of atomic_t
ops.
In addition this patch series converts a few drivers to use the new api.
The following criteria is used for select variables for conversion:
1. Variable doesn't guard object lifetimes, manage state changes e.g:
device usage counts, device open counts, and pm states.
2. Variable is used for stats and counters.
3. The conversion doesn't change the overflow behavior.
Changes since RFC:
-- Thanks for reviews and reviewed-by, and Acked-by tags. Updated
the patches with the tags.
-- Addressed Kees's comments:
1. Non-atomic counters renamed to counter_simple32 and counter_simple64
to clearly indicate size.
2. Added warning for counter_simple* usage and it should be used only
when there is no need for atomicity.
3. Renamed counter_atomic to counter_atomic32 to clearly indicate size.
4. Renamed counter_atomic_long to counter_atomic64 and it now uses
atomic64_t ops and indicates size.
5. Test updated for the API renames.
6. Added helper functions for test results printing
7. Verified that the test module compiles in kunit env. and test
module can be loaded to run the test.
8. Updated Documentation to reflect the intent to make the API
restricted so it can never be used to guard object lifetimes
and state management. I left _return ops for now, inc_return
is necessary for now as per the discussion we had on this topic.
-- Updated driver patches with API name changes.
-- We discussed if binder counters can be non-atomic. For now I left
them the same as the RFC patch - using counter_atomic32
-- Unrelated to this patch series:
The patch series review uncovered improvements could be made to
test_async_driver_probe and vmw_vmci/vmci_guest. I will track
these for fixing later.
Shuah Khan (11):
counters: Introduce counter_simple* and counter_atomic* counters
selftests:lib:test_counters: add new test for counters
drivers/base: convert deferred_trigger_count and probe_count to
counter_atomic32
drivers/base/devcoredump: convert devcd_count to counter_atomic32
drivers/acpi: convert seqno counter_atomic32
drivers/acpi/apei: convert seqno counter_atomic32
drivers/android/binder: convert stats, transaction_log to
counter_atomic32
drivers/base/test/test_async_driver_probe: convert to use
counter_atomic32
drivers/char/ipmi: convert stats to use counter_atomic32
drivers/misc/vmw_vmci: convert num guest devices counter to
counter_atomic32
drivers/edac: convert pci counters to counter_atomic32
Documentation/core-api/counters.rst | 174 +++++++++
MAINTAINERS | 8 +
drivers/acpi/acpi_extlog.c | 5 +-
drivers/acpi/apei/ghes.c | 5 +-
drivers/android/binder.c | 41 +--
drivers/android/binder_internal.h | 3 +-
drivers/base/dd.c | 19 +-
drivers/base/devcoredump.c | 5 +-
drivers/base/test/test_async_driver_probe.c | 23 +-
drivers/char/ipmi/ipmi_msghandler.c | 9 +-
drivers/char/ipmi/ipmi_si_intf.c | 9 +-
drivers/edac/edac_pci.h | 5 +-
drivers/edac/edac_pci_sysfs.c | 28 +-
drivers/misc/vmw_vmci/vmci_guest.c | 9 +-
include/linux/counters.h | 350 +++++++++++++++++++
lib/Kconfig | 10 +
lib/Makefile | 1 +
lib/test_counters.c | 276 +++++++++++++++
tools/testing/selftests/lib/Makefile | 1 +
tools/testing/selftests/lib/config | 1 +
tools/testing/selftests/lib/test_counters.sh | 5 +
21 files changed, 913 insertions(+), 74 deletions(-)
create mode 100644 Documentation/core-api/counters.rst
create mode 100644 include/linux/counters.h
create mode 100644 lib/test_counters.c
create mode 100755 tools/testing/selftests/lib/test_counters.sh
--
2.25.1
These patch series adds below kselftests to test the user-space support for the
ARMv8.5 Memory Tagging Extension present in arm64 tree [1]. This patch
series is based on Linux v5.9-rc3.
1) This test-case verifies that the memory allocated by kernel mmap interface
can support tagged memory access. It first checks the presence of tags at
address[56:59] and then proceeds with read and write. The pass criteria for
this test is that tag fault exception should not happen.
2) This test-case crosses the valid memory to the invalid memory. In this
memory area valid tags are not inserted so read and write should not pass. The
pass criteria for this test is that tag fault exception should happen for all
the illegal addresses. This test also verfies that PSTATE.TCO works properly.
3) This test-case verifies that the memory inherited by child process from
parent process should have same tags copied. The pass criteria for this test is
that tag fault exception should not happen.
4) This test checks different mmap flags with PROT_MTE memory protection.
5) This testcase checks that KSM should not merge pages containing different
MTE tag values. However, if the tags are same then the pages may merge. This
testcase uses the generic ksm sysfs interfaces to verify the MTE behaviour, so
this testcase is not fullproof and may be impacted due to other load in the system.
6) Fifth test verifies that syscalls read/write etc works by considering that
user pointer has valid/invalid allocation tags.
Changes since v1 [2]:
* Redefined MTE kernel header definitions to decouple kselftest compilations.
* Removed gmi masking instructions in mte_insert_random_tag assembly
function. This simplifies the tag inclusion mask test with only GCR
mask register used.
* Created a new mte_insert_random_tag function with gmi instruction.
This is useful for the 6th test which reuses the original tag.
* Now use /dev/shm/* to hold temporary files.
* Updated the 6th test to handle the error properly in case of failure
in accessing memory with invalid tag in kernel.
* Code and comment clean-ups.
Thanks,
Amit Daniel
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/mte
[2]: https://patchwork.kernel.org/patch/11747791/
Amit Daniel Kachhap (6):
kselftest/arm64: Add utilities and a test to validate mte memory
kselftest/arm64: Verify mte tag inclusion via prctl
kselftest/arm64: Check forked child mte memory accessibility
kselftest/arm64: Verify all different mmap MTE options
kselftest/arm64: Verify KSM page merge for MTE pages
kselftest/arm64: Check mte tagged user address in kernel
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/mte/.gitignore | 6 +
tools/testing/selftests/arm64/mte/Makefile | 29 ++
.../selftests/arm64/mte/check_buffer_fill.c | 475 ++++++++++++++++++
.../selftests/arm64/mte/check_child_memory.c | 195 +++++++
.../selftests/arm64/mte/check_ksm_options.c | 159 ++++++
.../selftests/arm64/mte/check_mmap_options.c | 262 ++++++++++
.../arm64/mte/check_tags_inclusion.c | 185 +++++++
.../selftests/arm64/mte/check_user_mem.c | 111 ++++
.../selftests/arm64/mte/mte_common_util.c | 341 +++++++++++++
.../selftests/arm64/mte/mte_common_util.h | 118 +++++
tools/testing/selftests/arm64/mte/mte_def.h | 60 +++
.../testing/selftests/arm64/mte/mte_helper.S | 128 +++++
13 files changed, 2070 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/arm64/mte/.gitignore
create mode 100644 tools/testing/selftests/arm64/mte/Makefile
create mode 100644 tools/testing/selftests/arm64/mte/check_buffer_fill.c
create mode 100644 tools/testing/selftests/arm64/mte/check_child_memory.c
create mode 100644 tools/testing/selftests/arm64/mte/check_ksm_options.c
create mode 100644 tools/testing/selftests/arm64/mte/check_mmap_options.c
create mode 100644 tools/testing/selftests/arm64/mte/check_tags_inclusion.c
create mode 100644 tools/testing/selftests/arm64/mte/check_user_mem.c
create mode 100644 tools/testing/selftests/arm64/mte/mte_common_util.c
create mode 100644 tools/testing/selftests/arm64/mte/mte_common_util.h
create mode 100644 tools/testing/selftests/arm64/mte/mte_def.h
create mode 100644 tools/testing/selftests/arm64/mte/mte_helper.S
--
2.17.1
This version 3 of the mremap speed up patches previously posted at:
v1 - https://lore.kernel.org/r/20200930222130.4175584-1-kaleshsingh@google.com
v2 - https://lore.kernel.org/r/20201002162101.665549-1-kaleshsingh@google.com
mremap time can be optimized by moving entries at the PMD/PUD level if
the source and destination addresses are PMD/PUD-aligned and
PMD/PUD-sized. Enable moving at the PMD and PUD levels on arm64 and
x86. Other architectures where this type of move is supported and known to
be safe can also opt-in to these optimizations by enabling HAVE_MOVE_PMD
and HAVE_MOVE_PUD.
Observed Performance Improvements for remapping a PUD-aligned 1GB-sized
region on x86 and arm64:
- HAVE_MOVE_PMD is already enabled on x86 : N/A
- Enabling HAVE_MOVE_PUD on x86 : ~13x speed up
- Enabling HAVE_MOVE_PMD on arm64 : ~ 8x speed up
- Enabling HAVE_MOVE_PUD on arm64 : ~19x speed up
Altogether, HAVE_MOVE_PMD and HAVE_MOVE_PUD
give a total of ~150x speed up on arm64.
Changes in v2:
- Reduce mremap_test time by only validating a configurable
threshold of the remapped region, as per John.
- Use a random pattern for mremap validation. Provide pattern
seed in test output, as per John.
- Moved set_pud_at() to separate patch, per Kirill.
- Use switch() instead of ifs in move_pgt_entry(), per Kirill.
- Update commit message with description of Android
garbage collector use case for HAVE_MOVE_PUD, as per Joel.
- Fix build test error reported by kernel test robot in [1].
Changes in v3:
- Make lines 80 cols or less where they don’t need to be longer,
per John.
- Removed unused PATTERN_SIZE in mremap_test
- Added Reviewed-by tag for patch 1/5 (mremap kselftest patch).
- Use switch() instead of ifs in get_extent(), per Kirill
- Add BUILD_BUG() is get_extent() default case.
- Move get_old_pud() and alloc_new_pud() out of
#ifdef CONFIG_HAVE_MOVE_PUD, per Kirill.
- Have get_old_pmd() and alloc_new_pmd() use get_old_pud() and
alloc_old_pud(), per Kirill.
- Replace #ifdef CONFIG_HAVE_MOVE_PMD / PUD in move_page_tables()
with IS_ENABLED(CONFIG_HAVE_MOVE_PMD / PUD), per Kirill.
- Fold Add set_pud_at() patch into patch 4/5, per Kirill.
[1] https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org/thread/CKPGL4F…
Kalesh Singh (5):
kselftests: vm: Add mremap tests
arm64: mremap speedup - Enable HAVE_MOVE_PMD
mm: Speedup mremap on 1GB or larger regions
arm64: mremap speedup - Enable HAVE_MOVE_PUD
x86: mremap speedup - Enable HAVE_MOVE_PUD
arch/Kconfig | 7 +
arch/arm64/Kconfig | 2 +
arch/arm64/include/asm/pgtable.h | 1 +
arch/x86/Kconfig | 1 +
mm/mremap.c | 230 ++++++++++++---
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 1 +
tools/testing/selftests/vm/mremap_test.c | 344 +++++++++++++++++++++++
tools/testing/selftests/vm/run_vmtests | 11 +
9 files changed, 558 insertions(+), 40 deletions(-)
create mode 100644 tools/testing/selftests/vm/mremap_test.c
base-commit: 549738f15da0e5a00275977623be199fbbf7df50
--
2.28.0.806.g8561365e88-goog
v4: https://lkml.org/lkml/2020/9/2/356
v4-->v5
Based on comments from Artem Bityutskiy, evaluation of timer based
wakeup latencies may not be a fruitful measurement especially on the x86
platform which has the capability to pre-arm a CPU when a timer is set.
Hence, including only the IPI based tests for latency measurement to
acheive expected behaviour across platforms.
kernel module + bash selftest approach which presents lower deviations
and higher accuracy: https://lkml.org/lkml/2020/7/21/567
---
The patch series introduces a mechanism to measure wakeup latency for
IPI based interrupts.
The motivation behind this series is to find significant deviations
behind advertised latency values
To achieve this in the userspace, IPI latencies are calculated by
sending information through pipes and inducing a wakeup.
To account for delays from kernel-userspace interactions baseline
observations are taken on a 100% busy CPU and subsequent obervations
must be considered relative to that.
One downside of the userspace approach in contrast to the kernel
implementation is that the run to run variance can turn out to be high
in the order of ms; which is the scope of the experiments at times.
Another downside of the userspace approach is that it takes much longer
to run and hence a command-line option quick and full are added to make
sure quick 1 CPU tests can be carried out when needed and otherwise it
can carry out a full system comprehensive test.
Usage
---
./cpuidle --mode <full / quick / num_cpus> --output <output location>
full: runs on all CPUS
quick: run on a random CPU
num_cpus: Limit the number of CPUS to run on
Sample output snippet
---------------------
--IPI Latency Test---
SRC_CPU DEST_CPU IPI_Latency(ns)
...
0 5 256178
0 6 478161
0 7 285445
0 8 273553
Expected IPI latency(ns): 100000
Observed Average IPI latency(ns): 248334
Pratik Rajesh Sampat (1):
selftests/cpuidle: Add support for cpuidle latency measurement
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/cpuidle/Makefile | 7 +
tools/testing/selftests/cpuidle/cpuidle.c | 479 ++++++++++++++++++++++
tools/testing/selftests/cpuidle/settings | 1 +
4 files changed, 488 insertions(+)
create mode 100644 tools/testing/selftests/cpuidle/Makefile
create mode 100644 tools/testing/selftests/cpuidle/cpuidle.c
create mode 100644 tools/testing/selftests/cpuidle/settings
--
2.26.2