May 2021 - Linux-kselftest-mirror

[PATCH v5 0/2] CPU-Idle latency selftest framework

by Pratik R. Sampat

Changelog RFC v4 --> PATCH v5: 1. Added a CPU online check prior to parsing the CPU topology to avoid parsing topologies for CPUs unavailable for the latency test 2. Added comment describing the selftest in cpuidle.sh As I have made changes to cpuidle.sh's working, hence dropping "Reviewed-by" from Doug Smythies for the second patch, while retaining it for the first patch. RFC v4: https://lkml.org/lkml/2021/4/12/99 --- A kernel module + userspace driver to estimate the wakeup latency caused by going into stop states. The motivation behind this program is to find significant deviations behind advertised latency and residency values. The patchset measures latencies for two kinds of events. IPIs and Timers As this is a software-only mechanism, there will additional latencies of the kernel-firmware-hardware interactions. To account for that, the program also measures a baseline latency on a 100 percent loaded CPU and the latencies achieved must be in view relative to that. To achieve this, we introduce a kernel module and expose its control knobs through the debugfs interface that the selftests can engage with. The kernel module provides the following interfaces within /sys/kernel/debug/latency_test/ for, IPI test: ipi_cpu_dest = Destination CPU for the IPI ipi_cpu_src = Origin of the IPI ipi_latency_ns = Measured latency time in ns Timeout test: timeout_cpu_src = CPU on which the timer to be queued timeout_expected_ns = Timer duration timeout_diff_ns = Difference of actual duration vs expected timer Sample output on a POWER9 system is as follows: # --IPI Latency Test--- # Baseline Average IPI latency(ns): 3114 # Observed Average IPI latency(ns) - State0: 3265 # Observed Average IPI latency(ns) - State1: 3507 # Observed Average IPI latency(ns) - State2: 3739 # Observed Average IPI latency(ns) - State3: 3807 # Observed Average IPI latency(ns) - State4: 17070 # Observed Average IPI latency(ns) - State5: 1038174 # Observed Average IPI latency(ns) - State6: 1068784 # # --Timeout Latency Test-- # Baseline Average timeout diff(ns): 1420 # Observed Average timeout diff(ns) - State0: 1640 # Observed Average timeout diff(ns) - State1: 1764 # Observed Average timeout diff(ns) - State2: 1715 # Observed Average timeout diff(ns) - State3: 1845 # Observed Average timeout diff(ns) - State4: 16581 # Observed Average timeout diff(ns) - State5: 939977 # Observed Average timeout diff(ns) - State6: 1073024 Things to keep in mind: 1. This kernel module + bash driver does not guarantee idleness on a core when the IPI and the Timer is armed. It only invokes sleep and hopes that the core is idle once the IPI/Timer is invoked onto it. Hence this program must be run on a completely idle system for best results 2. Even on a completely idle system, there maybe book-keeping tasks or jitter tasks that can run on the core we want idle. This can create outliers in the latency measurement. Thankfully, these outliers should be large enough to easily weed them out. 3. A userspace only selftest variant was also sent out as RFC based on suggestions over the previous patchset to simply the kernel complexeity. However, a userspace only approach had more noise in the latency measurement due to userspace-kernel interactions which led to run to run variance and a lesser accurate test. Another downside of the nature of a userspace program is that it takes orders of magnitude longer to complete a full system test compared to the kernel framework. RFC patch: https://lkml.org/lkml/2020/9/2/356 4. For Intel Systems, the Timer based latencies don't exactly give out the measure of idle latencies. This is because of a hardware optimization mechanism that pre-arms a CPU when a timer is set to wakeup. That doesn't make this metric useless for Intel systems, it just means that is measuring IPI/Timer responding latency rather than idle wakeup latencies. (Source: https://lkml.org/lkml/2020/9/2/610) For solution to this problem, a hardware based latency analyzer is devised by Artem Bityutskiy from Intel. https://youtu.be/Opk92aQyvt0?t=8266 https://intel.github.io/wult/ Pratik R. Sampat (2): cpuidle: Extract IPI based and timer based wakeup latency from idle states selftest/cpuidle: Add support for cpuidle latency measurement drivers/cpuidle/Makefile | 1 + drivers/cpuidle/test-cpuidle_latency.c | 157 ++++++++ lib/Kconfig.debug | 10 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/cpuidle/Makefile | 6 + tools/testing/selftests/cpuidle/cpuidle.sh | 414 +++++++++++++++++++++ tools/testing/selftests/cpuidle/settings | 2 + 7 files changed, 591 insertions(+) create mode 100644 drivers/cpuidle/test-cpuidle_latency.c create mode 100644 tools/testing/selftests/cpuidle/Makefile create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh create mode 100644 tools/testing/selftests/cpuidle/settings -- 2.17.1

4 years, 7 months

2
3
0 0

[PATCH v4 1/2] selftests/sgx: Rename 'eenter' and 'sgx_call_vdso'

by Jarkko Sakkinen

Rename symbols for better clarity: * 'eenter' -> 'vdso_sgx_enter_enclave' * 'sgx_call_vdso' -> 'sgx_enter_enclave' Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org> --- v2: Refined thh renames just a bit. tools/testing/selftests/sgx/call.S | 6 +++--- tools/testing/selftests/sgx/main.c | 25 +++++++++++++------------ tools/testing/selftests/sgx/main.h | 4 ++-- 3 files changed, 18 insertions(+), 17 deletions(-) diff --git a/tools/testing/selftests/sgx/call.S b/tools/testing/selftests/sgx/call.S index 4ecadc7490f4..b09a25890f3b 100644 --- a/tools/testing/selftests/sgx/call.S +++ b/tools/testing/selftests/sgx/call.S @@ -5,8 +5,8 @@ .text - .global sgx_call_vdso -sgx_call_vdso: + .global sgx_enter_enclave +sgx_enter_enclave: .cfi_startproc push %r15 .cfi_adjust_cfa_offset 8 @@ -27,7 +27,7 @@ sgx_call_vdso: .cfi_adjust_cfa_offset 8 push 0x38(%rsp) .cfi_adjust_cfa_offset 8 - call *eenter(%rip) + call *vdso_sgx_enter_enclave(%rip) add $0x10, %rsp .cfi_adjust_cfa_offset -0x10 pop %rbx diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c index d304a4044eb9..43da68388e25 100644 --- a/tools/testing/selftests/sgx/main.c +++ b/tools/testing/selftests/sgx/main.c @@ -21,7 +21,7 @@ #include "../kselftest.h" static const uint64_t MAGIC = 0x1122334455667788ULL; -vdso_sgx_enter_enclave_t eenter; +vdso_sgx_enter_enclave_t vdso_sgx_enter_enclave; struct vdso_symtab { Elf64_Sym *elf_symtab; @@ -149,7 +149,7 @@ int main(int argc, char *argv[]) { struct sgx_enclave_run run; struct vdso_symtab symtab; - Elf64_Sym *eenter_sym; + Elf64_Sym *sgx_enter_enclave_sym; uint64_t result = 0; struct encl encl; unsigned int i; @@ -194,29 +194,30 @@ int main(int argc, char *argv[]) if (!vdso_get_symtab(addr, &symtab)) goto err; - eenter_sym = vdso_symtab_get(&symtab, "__vdso_sgx_enter_enclave"); - if (!eenter_sym) + sgx_enter_enclave_sym = vdso_symtab_get(&symtab, "__vdso_sgx_enter_enclave"); + if (!sgx_enter_enclave_sym) goto err; - eenter = addr + eenter_sym->st_value; + vdso_sgx_enter_enclave = addr + sgx_enter_enclave_sym->st_value; - ret = sgx_call_vdso((void *)&MAGIC, &result, 0, EENTER, NULL, NULL, &run); - if (!report_results(&run, ret, result, "sgx_call_vdso")) + ret = sgx_enter_enclave((void *)&MAGIC, &result, 0, EENTER, + NULL, NULL, &run); + if (!report_results(&run, ret, result, "sgx_enter_enclave_unclobbered")) goto err; /* Invoke the vDSO directly. */ result = 0; - ret = eenter((unsigned long)&MAGIC, (unsigned long)&result, 0, EENTER, - 0, 0, &run); - if (!report_results(&run, ret, result, "eenter")) + ret = vdso_sgx_enter_enclave((unsigned long)&MAGIC, (unsigned long)&result, + 0, EENTER, 0, 0, &run); + if (!report_results(&run, ret, result, "sgx_enter_enclave")) goto err; /* And with an exit handler. */ run.user_handler = (__u64)user_handler; run.user_data = 0xdeadbeef; - ret = eenter((unsigned long)&MAGIC, (unsigned long)&result, 0, EENTER, - 0, 0, &run); + ret = vdso_sgx_enter_enclave((unsigned long)&MAGIC, (unsigned long)&result, + 0, EENTER, 0, 0, &run); if (!report_results(&run, ret, result, "user_handler")) goto err; diff --git a/tools/testing/selftests/sgx/main.h b/tools/testing/selftests/sgx/main.h index 67211a708f04..68672fd86cf9 100644 --- a/tools/testing/selftests/sgx/main.h +++ b/tools/testing/selftests/sgx/main.h @@ -35,7 +35,7 @@ bool encl_load(const char *path, struct encl *encl); bool encl_measure(struct encl *encl); bool encl_build(struct encl *encl); -int sgx_call_vdso(void *rdi, void *rsi, long rdx, u32 function, void *r8, void *r9, - struct sgx_enclave_run *run); +int sgx_enter_enclave(void *rdi, void *rsi, long rdx, u32 function, void *r8, void *r9, + struct sgx_enclave_run *run); #endif /* MAIN_H */ -- 2.31.1

4 years, 7 months

3
7
0 0

[PATCHv2] selftests: xfrm: put cleanup code into an exit trap

by Po-Hsu Lin

If the xfrm_policy.sh script gets terminated by any reason, the netns namespace files created by the test will be left alone. In this case a second attempt to run this test will fail with: # Cannot create namespace file "/run/netns/ns1": File exists Move the netns cleanup code into an exit trap so that we can ensure these files will be removed in the end. Changes in V2: - Update commit message and patch title. Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com> --- tools/testing/selftests/net/xfrm_policy.sh | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/xfrm_policy.sh b/tools/testing/selftests/net/xfrm_policy.sh index bdf450e..bb4632b 100755 --- a/tools/testing/selftests/net/xfrm_policy.sh +++ b/tools/testing/selftests/net/xfrm_policy.sh @@ -28,6 +28,11 @@ KEY_AES=0x0123456789abcdef0123456789012345 SPI1=0x1 SPI2=0x2 +cleanup() { + for i in 1 2 3 4;do ip netns del ns$i 2>/dev/null ;done +} +trap cleanup EXIT + do_esp_policy() { local ns=$1 local me=$2 @@ -481,6 +486,4 @@ check_hthresh_repeat "policies with repeated htresh change" check_random_order ns3 "policies inserted in random order" -for i in 1 2 3 4;do ip netns del ns$i;done - exit $ret -- 2.7.4

4 years, 7 months

1
0
0 0

[PATCH resend v2 3/5] MAINTAINERS: add tools/testing/selftests/vm/ to MEMORY MANAGEMENT

by David Hildenbrand

MEMORY MANAGEMENT seems to be a good fit. Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: Jason Gunthorpe <jgg(a)ziepe.ca> Cc: Peter Xu <peterx(a)redhat.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: linux-kselftest(a)vger.kernel.org Signed-off-by: David Hildenbrand <david(a)redhat.com> --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 9450e052f1b1..cd267d218e08 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11566,6 +11566,7 @@ F: include/linux/mm.h F: include/linux/mmzone.h F: include/linux/vmalloc.h F: mm/ +F: tools/testing/selftests/vm/ MEMORY TECHNOLOGY DEVICES (MTD) M: Miquel Raynal <miquel.raynal(a)bootlin.com> -- 2.30.2

4 years, 7 months

2
1
0 0

[PATCH v4 0/4] KVM statistics data fd-based binary interface

by Jing Zhang

This patchset provides a file descriptor for every VM and VCPU to read KVM statistics data in binary format. It is meant to provide a lightweight, flexible, scalable and efficient lock-free solution for user space telemetry applications to pull the statistics data periodically for large scale systems. The pulling frequency could be as high as a few times per second. In this patchset, every statistics data are treated to have some attributes as below: * architecture dependent or common * VM statistics data or VCPU statistics data * type: cumulative, instantaneous, * unit: none for simple counter, nanosecond, microsecond, millisecond, second, Byte, KiByte, MiByte, GiByte. Clock Cycles Since no lock/synchronization is used, the consistency between all the statistics data is not guaranteed. That means not all statistics data are read out at the exact same time, since the statistics date are still being updated by KVM subsystems while they are read out. --- * v3 -> v4 - Rebase to kvm/queue, commit 9f242010c3b4 ("KVM: avoid "deadlock" between install_new_memslots and MMU notifier") - Use C-stype comments in the whole patch - Fix wrong count for x86 VCPU stats descriptors - Fix KVM stats data size counting and validity check in selftest * v2 -> v3 - Rebase to kvm/queue, commit edf408f5257b ("KVM: avoid "deadlock" between install_new_memslots and MMU notifier") - Resolve some nitpicks about format * v1 -> v2 - Use ARRAY_SIZE to count the number of stats descriptors - Fix missing `size` field initialization in macro STATS_DESC [1] https://lore.kernel.org/kvm/20210402224359.2297157-1-jingzhangos@google.com [2] https://lore.kernel.org/kvm/20210415151741.1607806-1-jingzhangos@google.com [3] https://lore.kernel.org/kvm/20210423181727.596466-1-jingzhangos@google.com --- Jing Zhang (4): KVM: stats: Separate common stats from architecture specific ones KVM: stats: Add fd-based API to read binary stats data KVM: stats: Add documentation for statistics data binary interface KVM: selftests: Add selftest for KVM statistics data binary interface Documentation/virt/kvm/api.rst | 171 ++++++++ arch/arm64/include/asm/kvm_host.h | 9 +- arch/arm64/kvm/guest.c | 42 +- arch/mips/include/asm/kvm_host.h | 9 +- arch/mips/kvm/mips.c | 67 ++- arch/powerpc/include/asm/kvm_host.h | 9 +- arch/powerpc/kvm/book3s.c | 68 +++- arch/powerpc/kvm/book3s_hv.c | 12 +- arch/powerpc/kvm/book3s_pr.c | 2 +- arch/powerpc/kvm/book3s_pr_papr.c | 2 +- arch/powerpc/kvm/booke.c | 63 ++- arch/s390/include/asm/kvm_host.h | 9 +- arch/s390/kvm/kvm-s390.c | 133 +++++- arch/x86/include/asm/kvm_host.h | 9 +- arch/x86/kvm/x86.c | 71 +++- include/linux/kvm_host.h | 132 +++++- include/linux/kvm_types.h | 12 + include/uapi/linux/kvm.h | 50 +++ tools/testing/selftests/kvm/.gitignore | 1 + tools/testing/selftests/kvm/Makefile | 3 + .../testing/selftests/kvm/include/kvm_util.h | 3 + .../selftests/kvm/kvm_bin_form_stats.c | 380 ++++++++++++++++++ tools/testing/selftests/kvm/lib/kvm_util.c | 11 + virt/kvm/kvm_main.c | 237 ++++++++++- 24 files changed, 1415 insertions(+), 90 deletions(-) create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c base-commit: 9f242010c3b46e63bc62f08fff42cef992d3801b -- 2.31.1.527.g47e6f16901-goog

4 years, 7 months

4
10
0 0

[PATCH resend v2 5/5] selftests/vm: add test for MADV_POPULATE_(READ|WRITE)

by David Hildenbrand

Let's add a simple test for MADV_POPULATE_READ and MADV_POPULATE_WRITE, verifying some error handling, that population works, and that softdirty tracking works as expected. For now, limit the test to private anonymous memory. Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Jann Horn <jannh(a)google.com> Cc: Jason Gunthorpe <jgg(a)ziepe.ca> Cc: Dave Hansen <dave.hansen(a)intel.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Rik van Riel <riel(a)surriel.com> Cc: Michael S. Tsirkin <mst(a)redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: Richard Henderson <rth(a)twiddle.net> Cc: Ivan Kokshaysky <ink(a)jurassic.park.msu.ru> Cc: Matt Turner <mattst88(a)gmail.com> Cc: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de> Cc: "James E.J. Bottomley" <James.Bottomley(a)HansenPartnership.com> Cc: Helge Deller <deller(a)gmx.de> Cc: Chris Zankel <chris(a)zankel.net> Cc: Max Filippov <jcmvbkbc(a)gmail.com> Cc: Mike Kravetz <mike.kravetz(a)oracle.com> Cc: Peter Xu <peterx(a)redhat.com> Cc: Rolf Eike Beer <eike-kernel(a)sf-tec.de> Cc: Shuah Khan <shuah(a)kernel.org> Cc: linux-alpha(a)vger.kernel.org Cc: linux-mips(a)vger.kernel.org Cc: linux-parisc(a)vger.kernel.org Cc: linux-xtensa(a)linux-xtensa.org Cc: linux-arch(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: Linux API <linux-api(a)vger.kernel.org> Signed-off-by: David Hildenbrand <david(a)redhat.com> --- tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/Makefile | 1 + tools/testing/selftests/vm/madv_populate.c | 342 +++++++++++++++++++++ tools/testing/selftests/vm/run_vmtests.sh | 16 + 4 files changed, 360 insertions(+) create mode 100644 tools/testing/selftests/vm/madv_populate.c diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore index b4fc0148360e..c9a5dd1adf7d 100644 --- a/tools/testing/selftests/vm/.gitignore +++ b/tools/testing/selftests/vm/.gitignore @@ -24,3 +24,4 @@ hmm-tests local_config.* protection_keys_32 protection_keys_64 +madv_populate diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 8b0cd421ebd3..04b6650c1924 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -42,6 +42,7 @@ TEST_GEN_FILES += on-fault-limit TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd +TEST_GEN_FILES += madv_populate ifeq ($(MACHINE),x86_64) CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh $(CC) ../x86/trivial_32bit_program.c -m32) diff --git a/tools/testing/selftests/vm/madv_populate.c b/tools/testing/selftests/vm/madv_populate.c new file mode 100644 index 000000000000..b959e4ebdad4 --- /dev/null +++ b/tools/testing/selftests/vm/madv_populate.c @@ -0,0 +1,342 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * MADV_POPULATE_READ and MADV_POPULATE_WRITE tests + * + * Copyright 2021, Red Hat, Inc. + * + * Author(s): David Hildenbrand <david(a)redhat.com> + */ +#define _GNU_SOURCE +#include <stdlib.h> +#include <string.h> +#include <stdbool.h> +#include <stdint.h> +#include <unistd.h> +#include <errno.h> +#include <fcntl.h> +#include <sys/mman.h> + +#include "../kselftest.h" + +#if defined(MADV_POPULATE_READ) && defined(MADV_POPULATE_WRITE) + +/* + * For now, we're using 2 MiB of private anonymous memory for all tests. + */ +#define SIZE (2 * 1024 * 1024) + +static size_t pagesize; + +static uint64_t pagemap_get_entry(int fd, char *start) +{ + const unsigned long pfn = (unsigned long)start / pagesize; + uint64_t entry; + int ret; + + ret = pread(fd, &entry, sizeof(entry), pfn * sizeof(entry)); + if (ret != sizeof(entry)) + ksft_exit_fail_msg("reading pagemap failed\n"); + return entry; +} + +static bool pagemap_is_populated(int fd, char *start) +{ + uint64_t entry = pagemap_get_entry(fd, start); + + /* Present or swapped. */ + return entry & 0xc000000000000000ull; +} + +static bool pagemap_is_softdirty(int fd, char *start) +{ + uint64_t entry = pagemap_get_entry(fd, start); + + return entry & 0x0080000000000000ull; +} + +static void sense_support(void) +{ + char *addr; + int ret; + + addr = mmap(0, pagesize, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, 0, 0); + if (!addr) + ksft_exit_fail_msg("mmap failed\n"); + + ret = madvise(addr, pagesize, MADV_POPULATE_READ); + if (ret) + ksft_exit_skip("MADV_POPULATE_READ is not available\n"); + + ret = madvise(addr, pagesize, MADV_POPULATE_WRITE); + if (ret) + ksft_exit_skip("MADV_POPULATE_WRITE is not available\n"); + + munmap(addr, pagesize); +} + +static void test_prot_read(void) +{ + char *addr; + int ret; + + ksft_print_msg("[RUN] %s\n", __func__); + + addr = mmap(0, SIZE, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0); + if (addr == MAP_FAILED) + ksft_exit_fail_msg("mmap failed\n"); + + ret = madvise(addr, SIZE, MADV_POPULATE_READ); + ksft_test_result(!ret, "MADV_POPULATE_READ with PROT_READ\n"); + + ret = madvise(addr, SIZE, MADV_POPULATE_WRITE); + ksft_test_result(ret == -1 && errno == EINVAL, + "MADV_POPULATE_WRITE with PROT_READ\n"); + + munmap(addr, SIZE); +} + +static void test_prot_write(void) +{ + char *addr; + int ret; + + ksft_print_msg("[RUN] %s\n", __func__); + + addr = mmap(0, SIZE, PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0); + if (addr == MAP_FAILED) + ksft_exit_fail_msg("mmap failed\n"); + + ret = madvise(addr, SIZE, MADV_POPULATE_READ); + ksft_test_result(ret == -1 && errno == EINVAL, + "MADV_POPULATE_READ with PROT_WRITE\n"); + + ret = madvise(addr, SIZE, MADV_POPULATE_WRITE); + ksft_test_result(!ret, "MADV_POPULATE_WRITE with PROT_WRITE\n"); + + munmap(addr, SIZE); +} + +static void test_holes(void) +{ + char *addr; + int ret; + + ksft_print_msg("[RUN] %s\n", __func__); + + addr = mmap(0, SIZE, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, 0, 0); + if (addr == MAP_FAILED) + ksft_exit_fail_msg("mmap failed\n"); + ret = munmap(addr + pagesize, pagesize); + if (ret) + ksft_exit_fail_msg("munmap failed\n"); + + /* Hole in the middle */ + ret = madvise(addr, SIZE, MADV_POPULATE_READ); + ksft_test_result(ret == -1 && errno == ENOMEM, + "MADV_POPULATE_READ with holes in the middle\n"); + ret = madvise(addr, SIZE, MADV_POPULATE_WRITE); + ksft_test_result(ret == -1 && errno == ENOMEM, + "MADV_POPULATE_WRITE with holes in the middle\n"); + + /* Hole at end */ + ret = madvise(addr, 2 * pagesize, MADV_POPULATE_READ); + ksft_test_result(ret == -1 && errno == ENOMEM, + "MADV_POPULATE_READ with holes at the end\n"); + ret = madvise(addr, 2 * pagesize, MADV_POPULATE_WRITE); + ksft_test_result(ret == -1 && errno == ENOMEM, + "MADV_POPULATE_WRITE with holes at the end\n"); + + /* Hole at beginning */ + ret = madvise(addr + pagesize, pagesize, MADV_POPULATE_READ); + ksft_test_result(ret == -1 && errno == ENOMEM, + "MADV_POPULATE_READ with holes at the beginning\n"); + ret = madvise(addr + pagesize, pagesize, MADV_POPULATE_WRITE); + ksft_test_result(ret == -1 && errno == ENOMEM, + "MADV_POPULATE_WRITE with holes at the beginning\n"); + + munmap(addr, SIZE); +} + +static bool range_is_populated(char *start, ssize_t size) +{ + int fd = open("/proc/self/pagemap", O_RDONLY); + bool ret = true; + + if (fd < 0) + ksft_exit_fail_msg("opening pagemap failed\n"); + for (; size > 0 && ret; size -= pagesize, start += pagesize) + if (!pagemap_is_populated(fd, start)) + ret = false; + close(fd); + return ret; +} + +static bool range_is_not_populated(char *start, ssize_t size) +{ + int fd = open("/proc/self/pagemap", O_RDONLY); + bool ret = true; + + if (fd < 0) + ksft_exit_fail_msg("opening pagemap failed\n"); + for (; size > 0 && ret; size -= pagesize, start += pagesize) + if (pagemap_is_populated(fd, start)) + ret = false; + close(fd); + return ret; +} + +static void test_populate_read(void) +{ + char *addr; + int ret; + + ksft_print_msg("[RUN] %s\n", __func__); + + addr = mmap(0, SIZE, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, 0, 0); + if (addr == MAP_FAILED) + ksft_exit_fail_msg("mmap failed\n"); + ksft_test_result(range_is_not_populated(addr, SIZE), + "range initially not populated\n"); + + ret = madvise(addr, SIZE, MADV_POPULATE_READ); + ksft_test_result(!ret, "MADV_POPULATE_READ\n"); + ksft_test_result(range_is_populated(addr, SIZE), + "range is populated\n"); + + munmap(addr, SIZE); +} + +static void test_populate_write(void) +{ + char *addr; + int ret; + + ksft_print_msg("[RUN] %s\n", __func__); + + addr = mmap(0, SIZE, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, 0, 0); + if (addr == MAP_FAILED) + ksft_exit_fail_msg("mmap failed\n"); + ksft_test_result(range_is_not_populated(addr, SIZE), + "range initially not populated\n"); + + ret = madvise(addr, SIZE, MADV_POPULATE_WRITE); + ksft_test_result(!ret, "MADV_POPULATE_WRITE\n"); + ksft_test_result(range_is_populated(addr, SIZE), + "range is populated\n"); + + munmap(addr, SIZE); +} + +static bool range_is_softdirty(char *start, ssize_t size) +{ + int fd = open("/proc/self/pagemap", O_RDONLY); + bool ret = true; + + if (fd < 0) + ksft_exit_fail_msg("opening pagemap failed\n"); + for (; size > 0 && ret; size -= pagesize, start += pagesize) + if (!pagemap_is_softdirty(fd, start)) + ret = false; + close(fd); + return ret; +} + +static bool range_is_not_softdirty(char *start, ssize_t size) +{ + int fd = open("/proc/self/pagemap", O_RDONLY); + bool ret = true; + + if (fd < 0) + ksft_exit_fail_msg("opening pagemap failed\n"); + for (; size > 0 && ret; size -= pagesize, start += pagesize) + if (pagemap_is_softdirty(fd, start)) + ret = false; + close(fd); + return ret; +} + +static void clear_softdirty(void) +{ + int fd = open("/proc/self/clear_refs", O_WRONLY); + const char *ctrl = "4"; + int ret; + + if (fd < 0) + ksft_exit_fail_msg("opening clear_refs failed\n"); + ret = write(fd, ctrl, strlen(ctrl)); + if (ret != strlen(ctrl)) + ksft_exit_fail_msg("writing clear_refs failed\n"); + close(fd); +} + +static void test_softdirty(void) +{ + char *addr; + int ret; + + ksft_print_msg("[RUN] %s\n", __func__); + + addr = mmap(0, SIZE, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, 0, 0); + if (addr == MAP_FAILED) + ksft_exit_fail_msg("mmap failed\n"); + + /* Clear any softdirty bits. */ + clear_softdirty(); + ksft_test_result(range_is_not_softdirty(addr, SIZE), + "range is not softdirty\n"); + + /* Populating READ should set softdirty. */ + ret = madvise(addr, SIZE, MADV_POPULATE_READ); + ksft_test_result(!ret, "MADV_POPULATE_READ\n"); + ksft_test_result(range_is_not_softdirty(addr, SIZE), + "range is not softdirty\n"); + + /* Populating WRITE should set softdirty. */ + ret = madvise(addr, SIZE, MADV_POPULATE_WRITE); + ksft_test_result(!ret, "MADV_POPULATE_WRITE\n"); + ksft_test_result(range_is_softdirty(addr, SIZE), + "range is softdirty\n"); + + munmap(addr, SIZE); +} + +int main(int argc, char **argv) +{ + int err; + + pagesize = getpagesize(); + + ksft_print_header(); + ksft_set_plan(21); + + sense_support(); + test_prot_read(); + test_prot_write(); + test_holes(); + test_populate_read(); + test_populate_write(); + test_softdirty(); + + err = ksft_get_fail_cnt(); + if (err) + ksft_exit_fail_msg("%d out of %d tests failed\n", + err, ksft_test_num()); + return ksft_exit_pass(); +} + +#else /* defined(MADV_POPULATE_READ) && defined(MADV_POPULATE_WRITE) */ + +#warning "missing MADV_POPULATE_READ or MADV_POPULATE_WRITE definition" + +int main(int argc, char **argv) +{ + ksft_print_header(); + ksft_exit_skip("MADV_POPULATE_READ or MADV_POPULATE_WRITE not defined\n"); +} + +#endif /* defined(MADV_POPULATE_READ) && defined(MADV_POPULATE_WRITE) */ diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh index e953f3cd9664..955782d138ab 100755 --- a/tools/testing/selftests/vm/run_vmtests.sh +++ b/tools/testing/selftests/vm/run_vmtests.sh @@ -346,4 +346,20 @@ else exitcode=1 fi +echo "--------------------------------------------------------" +echo "running MADV_POPULATE_READ and MADV_POPULATE_WRITE tests" +echo "--------------------------------------------------------" +./madv_populate +ret_val=$? + +if [ $ret_val -eq 0 ]; then + echo "[PASS]" +elif [ $ret_val -eq $ksft_skip ]; then + echo "[SKIP]" + exitcode=$ksft_skip +else + echo "[FAIL]" + exitcode=1 +fi + exit $exitcode -- 2.30.2

4 years, 7 months

1
0
0 0

[PATCH resend v2 4/5] selftests/vm: add protection_keys_32 / protection_keys_64 to gitignore

by David Hildenbrand

We missed to add two binaries to gitignore. Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: Jason Gunthorpe <jgg(a)ziepe.ca> Cc: Peter Xu <peterx(a)redhat.com> Cc: Ram Pai <linuxram(a)us.ibm.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: linux-kselftest(a)vger.kernel.org Signed-off-by: David Hildenbrand <david(a)redhat.com> --- tools/testing/selftests/vm/.gitignore | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore index 9a35c3f6a557..b4fc0148360e 100644 --- a/tools/testing/selftests/vm/.gitignore +++ b/tools/testing/selftests/vm/.gitignore @@ -22,3 +22,5 @@ map_fixed_noreplace write_to_hugetlbfs hmm-tests local_config.* +protection_keys_32 +protection_keys_64 -- 2.30.2

4 years, 7 months

1
0
0 0

[PATCH v18 0/9] mm: introduce memfd_secret system call to create "secret" memory areas

by Mike Rapoport

From: Mike Rapoport <rppt(a)linux.ibm.com> Hi, @Andrew, this is based on v5.12-rc1, I can rebase whatever way you prefer. This is an implementation of "secret" mappings backed by a file descriptor. The file descriptor backing secret memory mappings is created using a dedicated memfd_secret system call The desired protection mode for the memory is configured using flags parameter of the system call. The mmap() of the file descriptor created with memfd_secret() will create a "secret" memory mapping. The pages in that mapping will be marked as not present in the direct map and will be present only in the page table of the owning mm. Although normally Linux userspace mappings are protected from other users, such secret mappings are useful for environments where a hostile tenant is trying to trick the kernel into giving them access to other tenants mappings. Additionally, in the future the secret mappings may be used as a mean to protect guest memory in a virtual machine host. For demonstration of secret memory usage we've created a userspace library https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloade… that does two things: the first is act as a preloader for openssl to redirect all the OPENSSL_malloc calls to secret memory meaning any secret keys get automatically protected this way and the other thing it does is expose the API to the user who needs it. We anticipate that a lot of the use cases would be like the openssl one: many toolkits that deal with secret keys already have special handling for the memory to try to give them greater protection, so this would simply be pluggable into the toolkits without any need for user application modification. Hiding secret memory mappings behind an anonymous file allows usage of the page cache for tracking pages allocated for the "secret" mappings as well as using address_space_operations for e.g. page migration callbacks. The anonymous file may be also used implicitly, like hugetlb files, to implement mmap(MAP_SECRET) and use the secret memory areas with "native" mm ABIs in the future. Removing of the pages from the direct map may cause its fragmentation on architectures that use large pages to map the physical memory which affects the system performance. However, the original Kconfig text for CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736 ("x86: add gbpages switches")) and the recent report [1] showed that "... although 1G mappings are a good default choice, there is no compelling evidence that it must be the only choice". Hence, it is sufficient to have secretmem disabled by default with the ability of a system administrator to enable it at boot time. In addition, there is also a long term goal to improve management of the direct map. [1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux… v18: * rebase on v5.12-rc1 * merge kfence fix into the original patch * massage commit message of the patch introducing the memfd_secret syscall v17: https://lore.kernel.org/lkml/20210208084920.2884-1-rppt@kernel.org * Remove pool of large pages backing secretmem allocations, per Michal Hocko * Add secretmem pages to unevictable LRU, per Michal Hocko * Use GFP_HIGHUSER as secretmem mapping mask, per Michal Hocko * Make secretmem an opt-in feature that is disabled by default v16: https://lore.kernel.org/lkml/20210121122723.3446-1-rppt@kernel.org * Fix memory leak intorduced in v15 * Clean the data left from previous page user before handing the page to the userspace v15: https://lore.kernel.org/lkml/20210120180612.1058-1-rppt@kernel.org * Add riscv/Kconfig update to disable set_memory operations for nommu builds (patch 3) * Update the code around add_to_page_cache() per Matthew's comments (patches 6,7) * Add fixups for build/checkpatch errors discovered by CI systems v14: https://lore.kernel.org/lkml/20201203062949.5484-1-rppt@kernel.org * Finally s/mod_node_page_state/mod_lruvec_page_state/ v13: https://lore.kernel.org/lkml/20201201074559.27742-1-rppt@kernel.org * Added Reviewed-by, thanks Catalin and David * s/mod_node_page_state/mod_lruvec_page_state/ as Shakeel suggested Older history: v12: https://lore.kernel.org/lkml/20201125092208.12544-1-rppt@kernel.org v11: https://lore.kernel.org/lkml/20201124092556.12009-1-rppt@kernel.org v10: https://lore.kernel.org/lkml/20201123095432.5860-1-rppt@kernel.org v9: https://lore.kernel.org/lkml/20201117162932.13649-1-rppt@kernel.org v8: https://lore.kernel.org/lkml/20201110151444.20662-1-rppt@kernel.org v7: https://lore.kernel.org/lkml/20201026083752.13267-1-rppt@kernel.org v6: https://lore.kernel.org/lkml/20200924132904.1391-1-rppt@kernel.org v5: https://lore.kernel.org/lkml/20200916073539.3552-1-rppt@kernel.org v4: https://lore.kernel.org/lkml/20200818141554.13945-1-rppt@kernel.org v3: https://lore.kernel.org/lkml/20200804095035.18778-1-rppt@kernel.org v2: https://lore.kernel.org/lkml/20200727162935.31714-1-rppt@kernel.org v1: https://lore.kernel.org/lkml/20200720092435.17469-1-rppt@kernel.org rfc-v2: https://lore.kernel.org/lkml/20200706172051.19465-1-rppt@kernel.org/ rfc-v1: https://lore.kernel.org/lkml/20200130162340.GA14232@rapoport-lnx/ rfc-v0: https://lore.kernel.org/lkml/1572171452-7958-1-git-send-email-rppt@kernel.o… Mike Rapoport (9): mm: add definition of PMD_PAGE_ORDER mmap: make mlock_future_check() global riscv/Kconfig: make direct map manipulation options depend on MMU set_memory: allow set_direct_map_*_noflush() for multiple pages set_memory: allow querying whether set_direct_map_*() is actually enabled mm: introduce memfd_secret system call to create "secret" memory areas PM: hibernate: disable when there are active secretmem users arch, mm: wire up memfd_secret system call where relevant secretmem: test: add basic selftest for memfd_secret(2) arch/arm64/include/asm/Kbuild | 1 - arch/arm64/include/asm/cacheflush.h | 6 - arch/arm64/include/asm/kfence.h | 2 +- arch/arm64/include/asm/set_memory.h | 17 ++ arch/arm64/include/uapi/asm/unistd.h | 1 + arch/arm64/kernel/machine_kexec.c | 1 + arch/arm64/mm/mmu.c | 6 +- arch/arm64/mm/pageattr.c | 23 +- arch/riscv/Kconfig | 4 +- arch/riscv/include/asm/set_memory.h | 4 +- arch/riscv/include/asm/unistd.h | 1 + arch/riscv/mm/pageattr.c | 8 +- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/x86/include/asm/set_memory.h | 4 +- arch/x86/mm/pat/set_memory.c | 8 +- fs/dax.c | 11 +- include/linux/pgtable.h | 3 + include/linux/secretmem.h | 30 +++ include/linux/set_memory.h | 16 +- include/linux/syscalls.h | 1 + include/uapi/asm-generic/unistd.h | 6 +- include/uapi/linux/magic.h | 1 + kernel/power/hibernate.c | 5 +- kernel/power/snapshot.c | 4 +- kernel/sys_ni.c | 2 + mm/Kconfig | 3 + mm/Makefile | 1 + mm/gup.c | 10 + mm/internal.h | 3 + mm/mlock.c | 3 +- mm/mmap.c | 5 +- mm/secretmem.c | 261 +++++++++++++++++++ mm/vmalloc.c | 5 +- scripts/checksyscalls.sh | 4 + tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/Makefile | 3 +- tools/testing/selftests/vm/memfd_secret.c | 296 ++++++++++++++++++++++ tools/testing/selftests/vm/run_vmtests.sh | 17 ++ 39 files changed, 726 insertions(+), 53 deletions(-) create mode 100644 arch/arm64/include/asm/set_memory.h create mode 100644 include/linux/secretmem.h create mode 100644 mm/secretmem.c create mode 100644 tools/testing/selftests/vm/memfd_secret.c -- 2.28.0

4 years, 7 months

6
20
0 0

[PATCH] kselftest/arm64: Add missing stddef.h include to BTI tests

by Mark Brown

Explicitly include stddef.h when building the BTI tests so that we have a definition of NULL, with at least some toolchains this is not done implicitly by anything else: test.c: In function ‘start’: test.c:214:25: error: ‘NULL’ undeclared (first use in this function) 214 | sigaction(SIGILL, &sa, NULL); | ^~~~ test.c:20:1: note: ‘NULL’ is defined in header ‘<stddef.h>’; did you forget to ‘#include <stddef.h>’? Signed-off-by: Mark Brown <broonie(a)kernel.org> --- tools/testing/selftests/arm64/bti/test.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/arm64/bti/test.c b/tools/testing/selftests/arm64/bti/test.c index 656b04976ccc..67b77ab83c20 100644 --- a/tools/testing/selftests/arm64/bti/test.c +++ b/tools/testing/selftests/arm64/bti/test.c @@ -6,6 +6,7 @@ #include "system.h" +#include <stddef.h> #include <linux/errno.h> #include <linux/auxvec.h> #include <linux/signal.h> -- 2.20.1

4 years, 7 months

2
1
0 0

[PATCH 1/1] mnt: Delete two unneeded bool conversions

by Zhen Lei

The result of an expression consisting of a single relational operator is already of the bool type and does not need to be evaluated explicitly. No functional change. Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com> --- tools/testing/selftests/mount/unprivileged-remount-test.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c b/tools/testing/selftests/mount/unprivileged-remount-test.c index 584dc6bc3b06679..d2917054fe3ae56 100644 --- a/tools/testing/selftests/mount/unprivileged-remount-test.c +++ b/tools/testing/selftests/mount/unprivileged-remount-test.c @@ -204,7 +204,7 @@ bool test_unpriv_remount(const char *fstype, const char *mount_options, if (!WIFEXITED(status)) { die("child did not terminate cleanly\n"); } - return WEXITSTATUS(status) == EXIT_SUCCESS ? true : false; + return WEXITSTATUS(status) == EXIT_SUCCESS; } create_and_enter_userns(); @@ -282,7 +282,7 @@ static bool test_priv_mount_unpriv_remount(void) if (!WIFEXITED(status)) { die("child did not terminate cleanly\n"); } - return WEXITSTATUS(status) == EXIT_SUCCESS ? true : false; + return WEXITSTATUS(status) == EXIT_SUCCESS; } orig_mnt_flags = read_mnt_flags(orig_path); -- 2.26.0.106.g9fadedd

4 years, 7 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror May 2021