- Linux-kselftest-mirror - lists.linaro.org

[PATCH] selftest/net/ipsec.c: Remove unneeded semicolon

by Xu Wang

fix semicolon.cocci warning: tools/testing/selftests/net/ipsec.c:1788:2-3: Unneeded semicolon Signed-off-by: Xu Wang <vulab(a)iscas.ac.cn> --- tools/testing/selftests/net/ipsec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/ipsec.c b/tools/testing/selftests/net/ipsec.c index 17ced7d6ce25..f23438d512c5 100644 --- a/tools/testing/selftests/net/ipsec.c +++ b/tools/testing/selftests/net/ipsec.c @@ -1785,7 +1785,7 @@ static void grand_child_serv(unsigned int nr, int cmd_fd, void *buf, break; default: printk("got unknown msg type %d", msg->type); - }; + } } static int grand_child_f(unsigned int nr, int cmd_fd, void *buf) -- 2.17.1

4 years, 10 months

2
1
0 0

[PATCH] selftests: timers: set-timer-lat: remove unneeded semicolon

by Jiapeng Chong

Fix the following coccicheck warnings: ./tools/testing/selftests/timers/set-timer-lat.c:83:2-3: Unneeded semicolon. Reported-by: Abaci Robot <abaci(a)linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com> --- tools/testing/selftests/timers/set-timer-lat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/timers/set-timer-lat.c b/tools/testing/selftests/timers/set-timer-lat.c index 50da454..d60bbca 100644 --- a/tools/testing/selftests/timers/set-timer-lat.c +++ b/tools/testing/selftests/timers/set-timer-lat.c @@ -80,7 +80,7 @@ char *clockstring(int clockid) return "CLOCK_BOOTTIME_ALARM"; case CLOCK_TAI: return "CLOCK_TAI"; - }; + } return "UNKNOWN_CLOCKID"; } -- 1.8.3.1

4 years, 10 months

1
0
0 0

[PATCH] selftests/bpf: Simplify the calculation of variables

by Jiapeng Chong

Fix the following coccicheck warnings: ./tools/testing/selftests/bpf/test_sockmap.c:735:35-37: WARNING !A || A && B is equivalent to !A || B. Reported-by: Abaci Robot <abaci(a)linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com> --- tools/testing/selftests/bpf/test_sockmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index 427ca00..eefd445 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -732,7 +732,7 @@ static int sendmsg_test(struct sockmap_options *opt) * socket is not a valid test. So in this case lets not * enable kTLS but still run the test. */ - if (!txmsg_redir || (txmsg_redir && txmsg_ingress)) { + if (!txmsg_redir || txmsg_ingress) { err = sockmap_init_ktls(opt->verbose, rx_fd); if (err) return err; -- 1.8.3.1

4 years, 10 months

2
1
0 0

Re: [security/brute] cfe92ab6a3: WARNING:inconsistent_lock_state

by John Wood

On Tue, Mar 02, 2021 at 01:49:41PM +0800, kernel test robot wrote: > > > Greeting, > > FYI, we noticed the following commit (built with gcc-9): > > commit: cfe92ab6a3ea700c08ba673b46822d51f38d6b40 ("[PATCH v5 2/8] security/brute: Define a LSM and manage statistical data") > url: https://github.com/0day-ci/linux/commits/John-Wood/Fork-brute-force-attack-… > base: https://git.kernel.org/cgit/linux/kernel/git/shuah/linux-kselftest.git next > > in testcase: trinity > version: trinity-static-i386-x86_64-f93256fb_2019-08-28 > with following parameters: > > group: ["group-00", "group-01", "group-02", "group-03", "group-04"] > > test-description: Trinity is a linux system call fuzz tester. > test-url: http://codemonkey.org.uk/projects/trinity/ > > > on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 8G > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): > > > +-------------------------------------------------------------------------+------------+------------+ > | | 1d53b7aac6 | cfe92ab6a3 | > +-------------------------------------------------------------------------+------------+------------+ > | WARNING:inconsistent_lock_state | 0 | 6 | > | inconsistent{IN-SOFTIRQ-W}->{SOFTIRQ-ON-W}usage | 0 | 6 | > +-------------------------------------------------------------------------+------------+------------+ > > > If you fix the issue, kindly add following tag > Reported-by: kernel test robot <oliver.sang(a)intel.com> > > > [ 116.852721] ================================ > [ 116.853120] WARNING: inconsistent lock state > [ 116.853120] 5.11.0-rc7-00013-gcfe92ab6a3ea #1 Tainted: G S > [ 116.853120] -------------------------------- > > [...] Thanks for the report. I will work on this for the next version. > Thanks, > Oliver Sang Thanks, John Wood

4 years, 10 months

1
0
0 0

[RFC PATCH v3 0/7] KVM: selftests: some improvement and a new test for kvm page table

by Yanan Wang

Hi, This v3 series can mainly include two parts. Based on kvm queue branch: https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=queue Links of v1: https://lore.kernel.org/lkml/20210208090841.333724-1-wangyanan55@huawei.com/ Links of v2: https://lore.kernel.org/lkml/20210225055940.18748-1-wangyanan55@huawei.com/ In the first part, all the known hugetlb backing src types specified with different hugepage sizes are listed, so that we can specify use of hugetlb source of the exact granularity that we want, instead of the system default ones. And as all the known hugetlb page sizes are listed, it's appropriate for all architectures. Besides, a helper that can get granularity of different backing src types(anonumous/thp/hugetlb) is added, so that we can use the accurate backing src granularity for kinds of alignment or guest memory accessing of vcpus. In the second part, a new test is added: This test is added to serve as a performance tester and a bug reproducer for kvm page table code (GPA->HPA mappings), it gives guidance for the people trying to make some improvement for kvm. And the following explains what we can exactly do through this test. The function guest_code() can cover the conditions where a single vcpu or multiple vcpus access guest pages within the same memory region, in three VM stages(before dirty logging, during dirty logging, after dirty logging). Besides, the backing src memory type(ANONYMOUS/THP/HUGETLB) of the tested memory region can be specified by users, which means normal page mappings or block mappings can be chosen by users to be created in the test. If ANONYMOUS memory is specified, kvm will create normal page mappings for the tested memory region before dirty logging, and update attributes of the page mappings from RO to RW during dirty logging. If THP/HUGETLB memory is specified, kvm will create block mappings for the tested memory region before dirty logging, and split the blcok mappings into normal page mappings during dirty logging, and coalesce the page mappings back into block mappings after dirty logging is stopped. So in summary, as a performance tester, this test can present the performance of kvm creating/updating normal page mappings, or the performance of kvm creating/splitting/recovering block mappings, through execution time. When we need to coalesce the page mappings back to block mappings after dirty logging is stopped, we have to firstly invalidate *all* the TLB entries for the page mappings right before installation of the block entry, because a TLB conflict abort error could occur if we can't invalidate the TLB entries fully. We have hit this TLB conflict twice on aarch64 software implementation and fixed it. As this test can imulate process from dirty logging enabled to dirty logging stopped of a VM with block mappings, so it can also reproduce this TLB conflict abort due to inadequate TLB invalidation when coalescing tables. Links about the TLB conflict abort: https://lore.kernel.org/lkml/20201201201034.116760-3-wangyanan55@huawei.com/ --- Change logs: v2->v3: - Add tags of Suggested-by, Reviewed-by in the patches - Add a generic micro to get hugetlb page sizes - Some changes for suggestions about v2 series v1->v2: - Add a patch to sync header files - Add helpers to get granularity of different backing src types - Some changes for suggestions about v1 series --- Yanan Wang (7): tools headers: sync headers of asm-generic/hugetlb_encode.h tools headers: Add a macro to get HUGETLB page sizes for mmap KVM: selftests: Use flag CLOCK_MONOTONIC_RAW for timing KVM: selftests: Make a generic helper to get vm guest mode strings KVM: selftests: Add a helper to get system configured THP page size KVM: selftests: List all hugetlb src types specified with page sizes KVM: selftests: Adapt vm_userspace_mem_region_add to new helpers KVM: selftests: Add a test for kvm page table code include/uapi/linux/mman.h | 2 + tools/include/asm-generic/hugetlb_encode.h | 3 + tools/include/uapi/linux/mman.h | 2 + tools/testing/selftests/kvm/Makefile | 3 + .../selftests/kvm/demand_paging_test.c | 8 +- .../selftests/kvm/dirty_log_perf_test.c | 14 +- .../testing/selftests/kvm/include/kvm_util.h | 4 +- .../testing/selftests/kvm/include/test_util.h | 21 +- .../selftests/kvm/kvm_page_table_test.c | 476 ++++++++++++++++++ tools/testing/selftests/kvm/lib/kvm_util.c | 59 ++- tools/testing/selftests/kvm/lib/test_util.c | 92 +++- tools/testing/selftests/kvm/steal_time.c | 4 +- 12 files changed, 628 insertions(+), 60 deletions(-) create mode 100644 tools/testing/selftests/kvm/kvm_page_table_test.c -- 2.19.1

4 years, 10 months

3
10
0 0

[PATCH] kunit: fix checkpatch warning

by Lucas Pires Stankus

Tidy up code by fixing the following checkpatch warnings: CHECK: Alignment should match open parenthesis CHECK: Lines should not end with a '(' Signed-off-by: Lucas Stankus <lucas.p.stankus(a)gmail.com> --- lib/kunit/assert.c | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/lib/kunit/assert.c b/lib/kunit/assert.c index 33acdaa28a7d..309f49d70b2f 100644 --- a/lib/kunit/assert.c +++ b/lib/kunit/assert.c @@ -25,7 +25,7 @@ void kunit_base_assert_format(const struct kunit_assert *assert, } string_stream_add(stream, "%s FAILED at %s:%d\n", - expect_or_assert, assert->file, assert->line); + expect_or_assert, assert->file, assert->line); } EXPORT_SYMBOL_GPL(kunit_base_assert_format); @@ -48,8 +48,9 @@ EXPORT_SYMBOL_GPL(kunit_fail_assert_format); void kunit_unary_assert_format(const struct kunit_assert *assert, struct string_stream *stream) { - struct kunit_unary_assert *unary_assert = container_of( - assert, struct kunit_unary_assert, assert); + struct kunit_unary_assert *unary_assert; + + unary_assert = container_of(assert, struct kunit_unary_assert, assert); kunit_base_assert_format(assert, stream); if (unary_assert->expected_true) @@ -67,8 +68,10 @@ EXPORT_SYMBOL_GPL(kunit_unary_assert_format); void kunit_ptr_not_err_assert_format(const struct kunit_assert *assert, struct string_stream *stream) { - struct kunit_ptr_not_err_assert *ptr_assert = container_of( - assert, struct kunit_ptr_not_err_assert, assert); + struct kunit_ptr_not_err_assert *ptr_assert; + + ptr_assert = container_of(assert, struct kunit_ptr_not_err_assert, + assert); kunit_base_assert_format(assert, stream); if (!ptr_assert->value) { @@ -88,8 +91,10 @@ EXPORT_SYMBOL_GPL(kunit_ptr_not_err_assert_format); void kunit_binary_assert_format(const struct kunit_assert *assert, struct string_stream *stream) { - struct kunit_binary_assert *binary_assert = container_of( - assert, struct kunit_binary_assert, assert); + struct kunit_binary_assert *binary_assert; + + binary_assert = container_of(assert, struct kunit_binary_assert, + assert); kunit_base_assert_format(assert, stream); string_stream_add(stream, @@ -110,8 +115,10 @@ EXPORT_SYMBOL_GPL(kunit_binary_assert_format); void kunit_binary_ptr_assert_format(const struct kunit_assert *assert, struct string_stream *stream) { - struct kunit_binary_ptr_assert *binary_assert = container_of( - assert, struct kunit_binary_ptr_assert, assert); + struct kunit_binary_ptr_assert *binary_assert; + + binary_assert = container_of(assert, struct kunit_binary_ptr_assert, + assert); kunit_base_assert_format(assert, stream); string_stream_add(stream, @@ -132,8 +139,10 @@ EXPORT_SYMBOL_GPL(kunit_binary_ptr_assert_format); void kunit_binary_str_assert_format(const struct kunit_assert *assert, struct string_stream *stream) { - struct kunit_binary_str_assert *binary_assert = container_of( - assert, struct kunit_binary_str_assert, assert); + struct kunit_binary_str_assert *binary_assert; + + binary_assert = container_of(assert, struct kunit_binary_str_assert, + assert); kunit_base_assert_format(assert, stream); string_stream_add(stream, -- 2.30.1

4 years, 10 months

3
2
0 0

[PATCH 5.10 355/663] kselftests: dmabuf-heaps: Fix Makefiles inclusion of the kernels usr/include dir

by Greg Kroah-Hartman

From: John Stultz <john.stultz(a)linaro.org> [ Upstream commit 64ba3d591c9d2be2a9c09e99b00732afe002ad0d ] Copied in from somewhere else, the makefile was including the kerne's usr/include dir, which caused the asm/ioctl.h file to be used. Unfortunately, that file has different values for _IOC_SIZEBITS and _IOC_WRITE than include/uapi/asm-generic/ioctl.h which then causes the _IOCW macros to give the wrong ioctl numbers, specifically for DMA_BUF_IOCTL_SYNC. This patch simply removes the extra include from the Makefile Cc: Shuah Khan <shuah(a)kernel.org> Cc: Brian Starkey <brian.starkey(a)arm.com> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: Laura Abbott <labbott(a)kernel.org> Cc: Hridya Valsaraju <hridya(a)google.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: Sandeep Patil <sspatil(a)google.com> Cc: Daniel Mentz <danielmentz(a)google.com> Cc: linux-media(a)vger.kernel.org Cc: dri-devel(a)lists.freedesktop.org Cc: linux-kselftest(a)vger.kernel.org Fixes: a8779927fd86c ("kselftests: Add dma-heap test") Signed-off-by: John Stultz <john.stultz(a)linaro.org> Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- tools/testing/selftests/dmabuf-heaps/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/dmabuf-heaps/Makefile b/tools/testing/selftests/dmabuf-heaps/Makefile index 607c2acd20829..604b43ece15f5 100644 --- a/tools/testing/selftests/dmabuf-heaps/Makefile +++ b/tools/testing/selftests/dmabuf-heaps/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 -CFLAGS += -static -O3 -Wl,-no-as-needed -Wall -I../../../../usr/include +CFLAGS += -static -O3 -Wl,-no-as-needed -Wall TEST_GEN_PROGS = dmabuf-heap -- 2.27.0

4 years, 10 months

1
0
0 0

[PATCH 5.11 430/775] kselftests: dmabuf-heaps: Fix Makefiles inclusion of the kernels usr/include dir

by Greg Kroah-Hartman

From: John Stultz <john.stultz(a)linaro.org> [ Upstream commit 64ba3d591c9d2be2a9c09e99b00732afe002ad0d ] Copied in from somewhere else, the makefile was including the kerne's usr/include dir, which caused the asm/ioctl.h file to be used. Unfortunately, that file has different values for _IOC_SIZEBITS and _IOC_WRITE than include/uapi/asm-generic/ioctl.h which then causes the _IOCW macros to give the wrong ioctl numbers, specifically for DMA_BUF_IOCTL_SYNC. This patch simply removes the extra include from the Makefile Cc: Shuah Khan <shuah(a)kernel.org> Cc: Brian Starkey <brian.starkey(a)arm.com> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: Laura Abbott <labbott(a)kernel.org> Cc: Hridya Valsaraju <hridya(a)google.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: Sandeep Patil <sspatil(a)google.com> Cc: Daniel Mentz <danielmentz(a)google.com> Cc: linux-media(a)vger.kernel.org Cc: dri-devel(a)lists.freedesktop.org Cc: linux-kselftest(a)vger.kernel.org Fixes: a8779927fd86c ("kselftests: Add dma-heap test") Signed-off-by: John Stultz <john.stultz(a)linaro.org> Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- tools/testing/selftests/dmabuf-heaps/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/dmabuf-heaps/Makefile b/tools/testing/selftests/dmabuf-heaps/Makefile index 607c2acd20829..604b43ece15f5 100644 --- a/tools/testing/selftests/dmabuf-heaps/Makefile +++ b/tools/testing/selftests/dmabuf-heaps/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 -CFLAGS += -static -O3 -Wl,-no-as-needed -Wall -I../../../../usr/include +CFLAGS += -static -O3 -Wl,-no-as-needed -Wall TEST_GEN_PROGS = dmabuf-heap -- 2.27.0

4 years, 10 months

1
0
0 0

[PATCH v4 0/8] Fork brute force attack mitigation

by John Wood

Attacks against vulnerable userspace applications with the purpose to break ASLR or bypass canaries traditionally use some level of brute force with the help of the fork system call. This is possible since when creating a new process using fork its memory contents are the same as those of the parent process (the process that called the fork system call). So, the attacker can test the memory infinite times to find the correct memory values or the correct memory addresses without worrying about crashing the application. Based on the above scenario it would be nice to have this detected and mitigated, and this is the goal of this patch serie. Specifically the following attacks are expected to be detected: 1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a desirable memory layout is got (e.g. Stack Clash). 2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly until a desirable memory layout is got (e.g. what CTFs do for simple network service). 3.- Launching processes without exec() (e.g. Android Zygote) and exposing state to attack a sibling. 4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until the previously shared memory layout of all the other children is exposed (e.g. kind of related to HeartBleed). In each case, a privilege boundary has been crossed: Case 1: setuid/setgid process Case 2: network to local Case 3: privilege changes Case 4: network to local So, what will really be detected are fork/exec brute force attacks that cross any of the commented bounds. The implementation details and comparison against other existing implementations can be found in the "Documentation" patch. This v4 version has changed a lot from the v2. Basically the application crash period is now compute on an on-going basis using an exponential moving average (EMA), a detection of a brute force attack through the "execve" system call has been added and the crossing of the commented privilege bounds are taken into account. Also, the fine tune has also been removed and now, all this kind of attacks are detected without administrator intervention. In the v2 version Kees Cook suggested to study if the statistical data shared by all the fork hierarchy processes can be tracked in some other way. Specifically the question was if this info can be hold by the family hierarchy of the mm struct. After studying this hierarchy I think it is not suitable for the Brute LSM since they are totally copied on fork() and in this case we want that they are shared. So I leave this road. So, knowing all this information I will explain now the different patches: The 1/8 patch defines a new LSM hook to get the fatal signal of a task. This will be useful during the attack detection phase. The 2/8 patch defines a new LSM and manages the statistical data shared by all the fork hierarchy processes. The 3/8 patch detects a fork/exec brute force attack. The 4/8 patch narrows the detection taken into account the privilege boundary crossing. The 5/8 patch mitigates a brute force attack. The 6/8 patch adds self-tests to validate the Brute LSM expectations. The 7/8 patch adds the documentation to explain this implementation. The 8/8 patch updates the maintainers file. This patch serie is a task of the KSPP [1] and can also be accessed from my github tree [2] in the "brute_v4" branch. [1] https://github.com/KSPP/linux/issues/39 [2] https://github.com/johwood/linux/ The previous versions can be found in: RFC https://lore.kernel.org/kernel-hardening/20200910202107.3799376-1-keescook@… Version 2 https://lore.kernel.org/kernel-hardening/20201025134540.3770-1-john.wood@gm… Version 3 https://lore.kernel.org/lkml/20210221154919.68050-1-john.wood@gmx.com/ Changelog RFC -> v2 ------------------- - Rename this feature with a more suitable name (Jann Horn, Kees Cook). - Convert the code to an LSM (Kees Cook). - Add locking to avoid data races (Jann Horn). - Add a new LSM hook to get the fatal signal of a task (Jann Horn, Kees Cook). - Add the last crashes timestamps list to avoid false positives in the attack detection (Jann Horn). - Use "period" instead of "rate" (Jann Horn). - Other minor changes suggested (Jann Horn, Kees Cook). Changelog v2 -> v3 ------------------ - Compute the application crash period on an on-going basis (Kees Cook). - Detect a brute force attack through the execve system call (Kees Cook). - Detect an slow brute force attack (Randy Dunlap). - Fine tuning the detection taken into account privilege boundary crossing (Kees Cook). - Taken into account only fatal signals delivered by the kernel (Kees Cook). - Remove the sysctl attributes to fine tuning the detection (Kees Cook). - Remove the prctls to allow per process enabling/disabling (Kees Cook). - Improve the documentation (Kees Cook). - Fix some typos in the documentation (Randy Dunlap). - Add self-test to validate the expectations (Kees Cook). Changelog v3 -> v4 ------------------ - Fix all the warnings shown by the tool "scripts/kernel-doc" (Randy Dunlap). Any constructive comments are welcome. Thanks. John Wood (8): security: Add LSM hook at the point where a task gets a fatal signal security/brute: Define a LSM and manage statistical data securtiy/brute: Detect a brute force attack security/brute: Fine tuning the attack detection security/brute: Mitigate a brute force attack selftests/brute: Add tests for the Brute LSM Documentation: Add documentation for the Brute LSM MAINTAINERS: Add a new entry for the Brute LSM Documentation/admin-guide/LSM/Brute.rst | 224 +++++ Documentation/admin-guide/LSM/index.rst | 1 + MAINTAINERS | 7 + include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 4 + include/linux/security.h | 4 + kernel/signal.c | 1 + security/Kconfig | 11 +- security/Makefile | 4 + security/brute/Kconfig | 13 + security/brute/Makefile | 2 + security/brute/brute.c | 1102 ++++++++++++++++++++++ security/security.c | 5 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/brute/.gitignore | 2 + tools/testing/selftests/brute/Makefile | 5 + tools/testing/selftests/brute/config | 1 + tools/testing/selftests/brute/exec.c | 44 + tools/testing/selftests/brute/test.c | 507 ++++++++++ tools/testing/selftests/brute/test.sh | 226 +++++ 20 files changed, 2160 insertions(+), 5 deletions(-) create mode 100644 Documentation/admin-guide/LSM/Brute.rst create mode 100644 security/brute/Kconfig create mode 100644 security/brute/Makefile create mode 100644 security/brute/brute.c create mode 100644 tools/testing/selftests/brute/.gitignore create mode 100644 tools/testing/selftests/brute/Makefile create mode 100644 tools/testing/selftests/brute/config create mode 100644 tools/testing/selftests/brute/exec.c create mode 100644 tools/testing/selftests/brute/test.c create mode 100755 tools/testing/selftests/brute/test.sh -- 2.25.1

4 years, 10 months

1
1
0 0

[PATCH v6 6/6] selftest/x86/signal: Include test cases for validating sigaltstack

by Chang S. Bae

The test measures the kernel's signal delivery with different (enough vs. insufficient) stack sizes. Signed-off-by: Chang S. Bae <chang.seok.bae(a)intel.com> Reviewed-by: Len Brown <len.brown(a)intel.com> Cc: x86(a)kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org --- Changes from v3: * Revised test messages again (Borislav Petkov) Changes from v2: * Revised test messages (Borislav Petkov) --- tools/testing/selftests/x86/Makefile | 2 +- tools/testing/selftests/x86/sigaltstack.c | 128 ++++++++++++++++++++++ 2 files changed, 129 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/x86/sigaltstack.c diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index 333980375bc7..65bba2ae86ee 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -13,7 +13,7 @@ CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) trivial_program.c -no-pie) TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \ check_initial_reg_state sigreturn iopl ioperm \ test_vsyscall mov_ss_trap \ - syscall_arg_fault fsgsbase_restore + syscall_arg_fault fsgsbase_restore sigaltstack TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer diff --git a/tools/testing/selftests/x86/sigaltstack.c b/tools/testing/selftests/x86/sigaltstack.c new file mode 100644 index 000000000000..f689af75e979 --- /dev/null +++ b/tools/testing/selftests/x86/sigaltstack.c @@ -0,0 +1,128 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#define _GNU_SOURCE +#include <signal.h> +#include <stdio.h> +#include <stdbool.h> +#include <string.h> +#include <err.h> +#include <errno.h> +#include <limits.h> +#include <sys/mman.h> +#include <sys/auxv.h> +#include <sys/prctl.h> +#include <sys/resource.h> +#include <setjmp.h> + +/* sigaltstack()-enforced minimum stack */ +#define ENFORCED_MINSIGSTKSZ 2048 + +#ifndef AT_MINSIGSTKSZ +# define AT_MINSIGSTKSZ 51 +#endif + +static int nerrs; + +static bool sigalrm_expected; + +static unsigned long at_minstack_size; + +static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *), + int flags) +{ + struct sigaction sa; + + memset(&sa, 0, sizeof(sa)); + sa.sa_sigaction = handler; + sa.sa_flags = SA_SIGINFO | flags; + sigemptyset(&sa.sa_mask); + if (sigaction(sig, &sa, 0)) + err(1, "sigaction"); +} + +static void clearhandler(int sig) +{ + struct sigaction sa; + + memset(&sa, 0, sizeof(sa)); + sa.sa_handler = SIG_DFL; + sigemptyset(&sa.sa_mask); + if (sigaction(sig, &sa, 0)) + err(1, "sigaction"); +} + +static int setup_altstack(void *start, unsigned long size) +{ + stack_t ss; + + memset(&ss, 0, sizeof(ss)); + ss.ss_size = size; + ss.ss_sp = start; + + return sigaltstack(&ss, NULL); +} + +static jmp_buf jmpbuf; + +static void sigsegv(int sig, siginfo_t *info, void *ctx_void) +{ + if (sigalrm_expected) { + printf("[FAIL]\tWrong signal delivered: SIGSEGV (expected SIGALRM)."); + nerrs++; + } else { + printf("[OK]\tSIGSEGV signal delivered.\n"); + } + + siglongjmp(jmpbuf, 1); +} + +static void sigalrm(int sig, siginfo_t *info, void *ctx_void) +{ + if (!sigalrm_expected) { + printf("[FAIL]\tWrong signal delivered: SIGALRM (expected SIGSEGV)."); + nerrs++; + } else { + printf("[OK]\tSIGALRM signal delivered.\n"); + } +} + +static void test_sigaltstack(void *altstack, unsigned long size) +{ + if (setup_altstack(altstack, size)) + err(1, "sigaltstack()"); + + sigalrm_expected = (size > at_minstack_size) ? true : false; + + sethandler(SIGSEGV, sigsegv, 0); + sethandler(SIGALRM, sigalrm, SA_ONSTACK); + + if (!sigsetjmp(jmpbuf, 1)) { + printf("[RUN]\tTest an alternate signal stack of %ssufficient size.\n", + sigalrm_expected ? "" : "in"); + printf("\tRaise SIGALRM. %s is expected to be delivered.\n", + sigalrm_expected ? "It" : "SIGSEGV"); + raise(SIGALRM); + } + + clearhandler(SIGALRM); + clearhandler(SIGSEGV); +} + +int main(void) +{ + void *altstack; + + at_minstack_size = getauxval(AT_MINSIGSTKSZ); + + altstack = mmap(NULL, at_minstack_size + SIGSTKSZ, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); + if (altstack == MAP_FAILED) + err(1, "mmap()"); + + if ((ENFORCED_MINSIGSTKSZ + 1) < at_minstack_size) + test_sigaltstack(altstack, ENFORCED_MINSIGSTKSZ + 1); + + test_sigaltstack(altstack, at_minstack_size + SIGSTKSZ); + + return nerrs == 0 ? 0 : 1; +} -- 2.17.1

4 years, 10 months

1
0
0 0

[PATCH v6 4/6] selftest/sigaltstack: Use the AT_MINSIGSTKSZ aux vector if available

by Chang S. Bae

The SIGSTKSZ constant may not represent enough stack size in some architectures as the hardware state size grows. Use getauxval(AT_MINSIGSTKSZ) to increase the stack size. Signed-off-by: Chang S. Bae <chang.seok.bae(a)intel.com> Reviewed-by: Len Brown <len.brown(a)intel.com> Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org --- Changes from v5: * Added as a new patch. --- tools/testing/selftests/sigaltstack/sas.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/sigaltstack/sas.c b/tools/testing/selftests/sigaltstack/sas.c index 8934a3766d20..c53b070755b6 100644 --- a/tools/testing/selftests/sigaltstack/sas.c +++ b/tools/testing/selftests/sigaltstack/sas.c @@ -17,6 +17,7 @@ #include <string.h> #include <assert.h> #include <errno.h> +#include <sys/auxv.h> #include "../kselftest.h" @@ -24,6 +25,11 @@ #define SS_AUTODISARM (1U << 31) #endif +#ifndef AT_MINSIGSTKSZ +#define AT_MINSIGSTKSZ 51 +#endif + +static unsigned int stack_size; static void *sstack, *ustack; static ucontext_t uc, sc; static const char *msg = "[OK]\tStack preserved"; @@ -47,7 +53,7 @@ void my_usr1(int sig, siginfo_t *si, void *u) #endif if (sp < (unsigned long)sstack || - sp >= (unsigned long)sstack + SIGSTKSZ) { + sp >= (unsigned long)sstack + stack_size) { ksft_exit_fail_msg("SP is not on sigaltstack\n"); } /* put some data on stack. other sighandler will try to overwrite it */ @@ -108,6 +114,10 @@ int main(void) stack_t stk; int err; + /* Make sure more than the required minimum. */ + stack_size = getauxval(AT_MINSIGSTKSZ) + SIGSTKSZ; + ksft_print_msg("[NOTE]\tthe stack size is %lu\n", stack_size); + ksft_print_header(); ksft_set_plan(3); @@ -117,7 +127,7 @@ int main(void) sigaction(SIGUSR1, &act, NULL); act.sa_sigaction = my_usr2; sigaction(SIGUSR2, &act, NULL); - sstack = mmap(NULL, SIGSTKSZ, PROT_READ | PROT_WRITE, + sstack = mmap(NULL, stack_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); if (sstack == MAP_FAILED) { ksft_exit_fail_msg("mmap() - %s\n", strerror(errno)); @@ -139,7 +149,7 @@ int main(void) } stk.ss_sp = sstack; - stk.ss_size = SIGSTKSZ; + stk.ss_size = stack_size; stk.ss_flags = SS_ONSTACK | SS_AUTODISARM; err = sigaltstack(&stk, NULL); if (err) { @@ -161,7 +171,7 @@ int main(void) } } - ustack = mmap(NULL, SIGSTKSZ, PROT_READ | PROT_WRITE, + ustack = mmap(NULL, stack_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); if (ustack == MAP_FAILED) { ksft_exit_fail_msg("mmap() - %s\n", strerror(errno)); @@ -170,7 +180,7 @@ int main(void) getcontext(&uc); uc.uc_link = NULL; uc.uc_stack.ss_sp = ustack; - uc.uc_stack.ss_size = SIGSTKSZ; + uc.uc_stack.ss_size = stack_size; makecontext(&uc, switch_fn, 0); raise(SIGUSR1); -- 2.17.1

4 years, 10 months

1
0
0 0

[PATCH] kunit: tool: Fix a python tuple typing error

by David Gow

The first argument to namedtuple() should match the name of the type, which wasn't the case for KconfigEntryBase. Fixing this is enough to make mypy show no python typing errors again. Fixes 97752c39bd ("kunit: kunit_tool: Allow .kunitconfig to disable config items") Signed-off-by: David Gow <davidgow(a)google.com> --- tools/testing/kunit/kunit_config.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/kunit/kunit_config.py b/tools/testing/kunit/kunit_config.py index 0b550cbd667d..1e2683dcc0e7 100644 --- a/tools/testing/kunit/kunit_config.py +++ b/tools/testing/kunit/kunit_config.py @@ -13,7 +13,7 @@ from typing import List, Set CONFIG_IS_NOT_SET_PATTERN = r'^# CONFIG_(\w+) is not set$' CONFIG_PATTERN = r'^CONFIG_(\w+)=(\S+|".*")$' -KconfigEntryBase = collections.namedtuple('KconfigEntry', ['name', 'value']) +KconfigEntryBase = collections.namedtuple('KconfigEntryBase', ['name', 'value']) class KconfigEntry(KconfigEntryBase): -- 2.30.0.617.g56c4b15f3c-goog

4 years, 10 months

3
2
0 0

[PATCH] kunit: tool: Disable PAGE_POISONING under --alltests

by David Gow

kunit_tool maintains a list of config options which are broken under UML, which we exclude from an otherwise 'make ARCH=um allyesconfig' build used to run all tests with the --alltests option. Something in UML allyesconfig is causing segfaults when page poisining is enabled (and is poisoning with a non-zero value). Previously, this didn't occur, as allyesconfig enabled the CONFIG_PAGE_POISONING_ZERO option, which worked around the problem by zeroing memory. This option has since been removed, and memory is now poisoned with 0xAA, which triggers segfaults in many different codepaths, preventing UML from booting. Note that we have to disable both CONFIG_PAGE_POISONING and CONFIG_DEBUG_PAGEALLOC, as the latter will 'select' the former on architectures (such as UML) which don't implement __kernel_map_pages(). Ideally, we'd fix this properly by tracking down the real root cause, but since this is breaking KUnit's --alltests feature, it's worth disabling there in the meantime so the kernel can boot to the point where tests can actually run. Fixes: f289041ed4 ("mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO") Signed-off-by: David Gow <davidgow(a)google.com> --- As described above, 'make ARCH=um allyesconfig' is broken. KUnit has been maintaining a list of configs to force-disable for this in tools/testing/kunit/configs/broken_on_uml.config. The kernels we've built with this have broken since CONFIG_PAGE_POISONING_ZERO was removed, panic-ing on startup with: <0>[ 0.100000][ T11] Kernel panic - not syncing: Segfault with no mm <4>[ 0.100000][ T11] CPU: 0 PID: 11 Comm: kdevtmpfs Not tainted 5.11.0-rc7-00003-g63381dc6f5f1-dirty #4 <4>[ 0.100000][ T11] Stack: <4>[ 0.100000][ T11] 677d3d40 677d3f10 0000000e 600c0bc0 <4>[ 0.100000][ T11] 677d3d90 603c99be 677d3d90 62529b93 <4>[ 0.100000][ T11] 603c9ac0 677d3f10 62529b00 603c98a0 <4>[ 0.100000][ T11] Call Trace: <4>[ 0.100000][ T11] [<600c0bc0>] ? set_signals+0x0/0x60 <4>[ 0.100000][ T11] [<603c99be>] lookup_mnt+0x11e/0x220 <4>[ 0.100000][ T11] [<62529b93>] ? down_write+0x93/0x180 <4>[ 0.100000][ T11] [<603c9ac0>] ? lock_mount+0x0/0x160 <4>[ 0.100000][ T11] [<62529b00>] ? down_write+0x0/0x180 <4>[ 0.100000][ T11] [<603c98a0>] ? lookup_mnt+0x0/0x220 <4>[ 0.100000][ T11] [<603c8160>] ? namespace_unlock+0x0/0x1a0 <4>[ 0.100000][ T11] [<603c9b25>] lock_mount+0x65/0x160 <4>[ 0.100000][ T11] [<6012f360>] ? up_write+0x0/0x40 <4>[ 0.100000][ T11] [<603cbbd2>] do_new_mount_fc+0xd2/0x220 <4>[ 0.100000][ T11] [<603eb560>] ? vfs_parse_fs_string+0x0/0xa0 <4>[ 0.100000][ T11] [<603cbf04>] do_new_mount+0x1e4/0x260 <4>[ 0.100000][ T11] [<603ccae9>] path_mount+0x1c9/0x6e0 <4>[ 0.100000][ T11] [<603a9f4f>] ? getname_kernel+0xaf/0x1a0 <4>[ 0.100000][ T11] [<603ab280>] ? kern_path+0x0/0x60 <4>[ 0.100000][ T11] [<600238ee>] 0x600238ee <4>[ 0.100000][ T11] [<62523baa>] devtmpfsd+0x52/0xb8 <4>[ 0.100000][ T11] [<62523b58>] ? devtmpfsd+0x0/0xb8 <4>[ 0.100000][ T11] [<600fffd8>] kthread+0x1d8/0x200 <4>[ 0.100000][ T11] [<600a4ea6>] new_thread_handler+0x86/0xc0 Disabling PAGE_POISONING fixes this. The issue can't be repoduced with just PAGE_POISONING, there's clearly something (or several things) also enabled by allyesconfig which contribute. Ideally, we'd track these down and fix this at its root cause, but in the meantime it'd be nice to disable PAGE_POISONING so we can at least get the kernel to boot. One way would be to add a 'depends on !UML' or similar, but since PAGE_POISONING does seem to work in the non-allyesconfig case, adding it to our list of broken configs seemed the better choice. Thoughts? (Note that to reproduce this, you'll want to run ./tools/testing/kunit/kunit.py run --alltests --raw_output It also depends on a couple of other fixes which are not upstream yet: https://www.spinics.net/lists/linux-rtc/msg08294.html https://lore.kernel.org/linux-i3c/20210127040636.1535722-1-davidgow@google.… Cheers, -- David tools/testing/kunit/configs/broken_on_uml.config | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/kunit/configs/broken_on_uml.config b/tools/testing/kunit/configs/broken_on_uml.config index a7f0603d33f6..690870043ac0 100644 --- a/tools/testing/kunit/configs/broken_on_uml.config +++ b/tools/testing/kunit/configs/broken_on_uml.config @@ -40,3 +40,5 @@ # CONFIG_RESET_BRCMSTB_RESCAL is not set # CONFIG_RESET_INTEL_GW is not set # CONFIG_ADI_AXI_ADC is not set +# CONFIG_DEBUG_PAGEALLOC is not set +# CONFIG_PAGE_POISONING is not set -- 2.30.0.478.g8a0d178c01-goog

4 years, 10 months

3
2
0 0

general protection fault in kvm_hv_irq_routing_update

by syzbot

Hello, syzbot found the following issue on: HEAD commit: a99163e9 Merge tag 'devicetree-for-5.12' of git://git.kern.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=12d72682d00000 kernel config: https://syzkaller.appspot.com/x/.config?x=7a875029a795d230 dashboard link: https://syzkaller.appspot.com/bug?extid=6987f3b2dbd9eda95f12 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12faef12d00000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=163342ccd00000 The issue was bisected to: commit 8f014550dfb114cc7f42a517d20d2cf887a0b771 Author: Vitaly Kuznetsov <vkuznets(a)redhat.com> Date: Tue Jan 26 13:48:14 2021 +0000 KVM: x86: hyper-v: Make Hyper-V emulation enablement conditional bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10df16a8d00000 final oops: https://syzkaller.appspot.com/x/report.txt?x=12df16a8d00000 console output: https://syzkaller.appspot.com/x/log.txt?x=14df16a8d00000 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+6987f3b2dbd9eda95f12(a)syzkaller.appspotmail.com Fixes: 8f014550dfb1 ("KVM: x86: hyper-v: Make Hyper-V emulation enablement conditional") L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details. general protection fault, probably for non-canonical address 0xdffffc0000000028: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000140-0x0000000000000147] CPU: 1 PID: 8370 Comm: syz-executor859 Not tainted 5.11.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:synic_get arch/x86/kvm/hyperv.c:165 [inline] RIP: 0010:kvm_hv_set_sint_gsi arch/x86/kvm/hyperv.c:475 [inline] RIP: 0010:kvm_hv_irq_routing_update+0x230/0x460 arch/x86/kvm/hyperv.c:498 Code: 80 19 00 00 48 89 f8 48 c1 e8 03 80 3c 28 00 0f 85 ff 01 00 00 4d 8b ad 80 19 00 00 49 8d bd 40 01 00 00 48 89 f8 48 c1 e8 03 <0f> b6 04 28 84 c0 74 06 0f 8e d2 01 00 00 45 0f b6 bd 40 01 00 00 RSP: 0018:ffffc90001b3fac0 EFLAGS: 00010206 RAX: 0000000000000028 RBX: ffff888012df5900 RCX: 0000000000000000 RDX: ffff888022193780 RSI: ffffffff81174d43 RDI: 0000000000000140 RBP: dffffc0000000000 R08: 0000000000000000 R09: ffffc900018819eb R10: ffffffff81170f3e R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001 FS: 0000000000a73300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000557e8c876888 CR3: 0000000013c0b000 CR4: 00000000001526e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: kvm_set_irq_routing+0x69b/0x940 arch/x86/kvm/../../../virt/kvm/irqchip.c:223 kvm_vm_ioctl+0x12d0/0x2800 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3959 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x43ef29 Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffe391eb808 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000400488 RCX: 000000000043ef29 RDX: 0000000020000140 RSI: 000000004008ae6a RDI: 0000000000000004 RBP: 0000000000402f10 R08: 0000000000400488 R09: 0000000000400488 R10: 0000000000400488 R11: 0000000000000246 R12: 0000000000402fa0 R13: 0000000000000000 R14: 00000000004ac018 R15: 0000000000400488 Modules linked in: ---[ end trace 2aa75ec1dd148710 ]--- RIP: 0010:synic_get arch/x86/kvm/hyperv.c:165 [inline] RIP: 0010:kvm_hv_set_sint_gsi arch/x86/kvm/hyperv.c:475 [inline] RIP: 0010:kvm_hv_irq_routing_update+0x230/0x460 arch/x86/kvm/hyperv.c:498 Code: 80 19 00 00 48 89 f8 48 c1 e8 03 80 3c 28 00 0f 85 ff 01 00 00 4d 8b ad 80 19 00 00 49 8d bd 40 01 00 00 48 89 f8 48 c1 e8 03 <0f> b6 04 28 84 c0 74 06 0f 8e d2 01 00 00 45 0f b6 bd 40 01 00 00 RSP: 0018:ffffc90001b3fac0 EFLAGS: 00010206 RAX: 0000000000000028 RBX: ffff888012df5900 RCX: 0000000000000000 RDX: ffff888022193780 RSI: ffffffff81174d43 RDI: 0000000000000140 RBP: dffffc0000000000 R08: 0000000000000000 R09: ffffc900018819eb R10: ffffffff81170f3e R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001 FS: 0000000000a73300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000557e8c876888 CR3: 0000000013c0b000 CR4: 00000000001526e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller(a)googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. For information about bisection process see: https://goo.gl/tpsmEJ#bisection syzbot can test patches for this issue, for details see: https://goo.gl/tpsmEJ#testing-patches

4 years, 10 months

2
1
0 0

[RFC PATCH v2 0/7] Some improvement and a new test for kvm page table

by Yanan Wang

Hi, This v2 series can mainly include two parts. Based on kvm queue branch: https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=queue Links of v1: https://lore.kernel.org/lkml/20210208090841.333724-1-wangyanan55@huawei.com/ In the first part, all the known hugetlb backing src types specified with different hugepage sizes are listed, so that we can specify use of hugetlb source of the exact granularity that we want, instead of the system default ones. And as all the known hugetlb page sizes are listed, it's appropriate for all architectures. Besides, a helper that can get granularity of different backing src types(anonumous/thp/hugetlb) is added, so that we can use the accurate backing src granularity for kinds of alignment or guest memory accessing of vcpus. In the second part, a new test is added: This test is added to serve as a performance tester and a bug reproducer for kvm page table code (GPA->HPA mappings), it gives guidance for the people trying to make some improvement for kvm. And the following explains what we can exactly do through this test. The function guest_code() can cover the conditions where a single vcpu or multiple vcpus access guest pages within the same memory region, in three VM stages(before dirty logging, during dirty logging, after dirty logging). Besides, the backing src memory type(ANONYMOUS/THP/HUGETLB) of the tested memory region can be specified by users, which means normal page mappings or block mappings can be chosen by users to be created in the test. If ANONYMOUS memory is specified, kvm will create normal page mappings for the tested memory region before dirty logging, and update attributes of the page mappings from RO to RW during dirty logging. If THP/HUGETLB memory is specified, kvm will create block mappings for the tested memory region before dirty logging, and split the blcok mappings into normal page mappings during dirty logging, and coalesce the page mappings back into block mappings after dirty logging is stopped. So in summary, as a performance tester, this test can present the performance of kvm creating/updating normal page mappings, or the performance of kvm creating/splitting/recovering block mappings, through execution time. When we need to coalesce the page mappings back to block mappings after dirty logging is stopped, we have to firstly invalidate *all* the TLB entries for the page mappings right before installation of the block entry, because a TLB conflict abort error could occur if we can't invalidate the TLB entries fully. We have hit this TLB conflict twice on aarch64 software implementation and fixed it. As this test can imulate process from dirty logging enabled to dirty logging stopped of a VM with block mappings, so it can also reproduce this TLB conflict abort due to inadequate TLB invalidation when coalescing tables. Links about the TLB conflict abort: https://lore.kernel.org/lkml/20201201201034.116760-3-wangyanan55@huawei.com/ Yanan Wang (7): tools include: sync head files of mmap flag encodings about hugetlb KVM: selftests: Use flag CLOCK_MONOTONIC_RAW for timing KVM: selftests: Make a generic helper to get vm guest mode strings KVM: selftests: Add a helper to get system configured THP page size KVM: selftests: List all hugetlb src types specified with page sizes KVM: selftests: Adapt vm_userspace_mem_region_add to new helpers KVM: selftests: Add a test for kvm page table code tools/include/asm-generic/hugetlb_encode.h | 3 + tools/testing/selftests/kvm/Makefile | 3 + .../selftests/kvm/demand_paging_test.c | 8 +- .../selftests/kvm/dirty_log_perf_test.c | 14 +- .../testing/selftests/kvm/include/kvm_util.h | 4 +- .../testing/selftests/kvm/include/test_util.h | 21 +- .../selftests/kvm/kvm_page_table_test.c | 476 ++++++++++++++++++ tools/testing/selftests/kvm/lib/kvm_util.c | 58 +-- tools/testing/selftests/kvm/lib/test_util.c | 92 +++- tools/testing/selftests/kvm/steal_time.c | 4 +- 10 files changed, 623 insertions(+), 60 deletions(-) create mode 100644 tools/testing/selftests/kvm/kvm_page_table_test.c -- 2.19.1

4 years, 10 months

4
17
0 0

[PATCH 0/5] userfaultfd: support minor fault handling for shmem

by Axel Rasmussen

Base ==== This series is based on top of my series which adds minor fault handling for hugetlbfs [1]. (And, therefore, it is based on linux-next/akpm and Peter Xu's series for disabling huge pmd sharing as well.) [1] https://lore.kernel.org/patchwork/cover/1384095/ Overview ======== See my original series linked above for a detailed overview of minor fault handling in general. The feature in this series works exactly like the hugetblfs version (from userspace's perspective). I'm sending this as a separate series because: - The original minor fault handling series has been through several rounds of review and seems close to being merged, so it seems reasonable to start looking at this next step. - shmem is different enough that this series may require some additional work before it's ready, and I don't want to delay the original series unnecessarily by bundling them together. Use Case ======== In some cases it is useful to have VM memory backed by tmpfs instead of hugetlbfs. So, this feature will be used to support the same VM live migration use case described in my original series. Additionally, Android folks (Lokesh Gidra <lokeshgidra(a)google.com>) hope to optimize the Android JVM garbage collector using this feature (a paper describing a somewhat similar approach: https://arxiv.org/pdf/1902.04738.pdf). Axel Rasmussen (5): userfaultfd: support minor fault handling for shmem userfaultfd/selftests: use memfd_create for shmem test type userfaultfd/selftests: create alias mappings in the shmem test userfaultfd/selftests: reinitialize test context in each test userfaultfd/selftests: exercise minor fault handling shmem support fs/userfaultfd.c | 6 +- include/linux/shmem_fs.h | 26 +- include/uapi/linux/userfaultfd.h | 4 +- mm/memory.c | 8 +- mm/shmem.c | 88 +++---- mm/userfaultfd.c | 27 +- tools/testing/selftests/vm/userfaultfd.c | 322 +++++++++++++++-------- 7 files changed, 293 insertions(+), 188 deletions(-) -- 2.30.0.617.g56c4b15f3c-goog

4 years, 10 months

1
6
0 0

[PATCH v29 00/12] Landlock LSM

by Mickaël Salaün

Hi, This patch series mainly fixes race condition issues, explains specific lock rules and improves code related to concurrent calls of hook_sb_delete() and release_inode(). I exploited these races to validate the fixes. Userspace tests are also improved along with some commit messages and comments. Serge Hallyn's review is taken into account and his Acked-by are added to the corresponding patches. The SLOC count is 1328 for security/landlock/ and 2539 for tools/testing/selftest/landlock/ . Test coverage for security/landlock/ is 93.6% of lines. The code not covered only deals with internal kernel errors (e.g. memory allocation) and race conditions. This series is being fuzzed by syzkaller (which may cover internal kernel errors), and patches are on their way: https://github.com/google/syzkaller/pull/2380 The compiled documentation is available here: https://landlock.io/linux-doc/landlock-v29/userspace-api/landlock.html This series can be applied on top of v5.11-7592-g1a3a9ffb27bb (Linus's master branch from Sunday). This can be tested with CONFIG_SECURITY_LANDLOCK, CONFIG_SAMPLE_LANDLOCK and by prepending "landlock," to CONFIG_LSM. This patch series can be found in a Git repository here: https://github.com/landlock-lsm/linux/commits/landlock-v29 This patch series seems ready for upstream and I would really appreciate final reviews. # Landlock LSM The goal of Landlock is to enable to restrict ambient rights (e.g. global filesystem access) for a set of processes. Because Landlock is a stackable LSM [1], it makes possible to create safe security sandboxes as new security layers in addition to the existing system-wide access-controls. This kind of sandbox is expected to help mitigate the security impact of bugs or unexpected/malicious behaviors in user-space applications. Landlock empowers any process, including unprivileged ones, to securely restrict themselves. Landlock is inspired by seccomp-bpf but instead of filtering syscalls and their raw arguments, a Landlock rule can restrict the use of kernel objects like file hierarchies, according to the kernel semantic. Landlock also takes inspiration from other OS sandbox mechanisms: XNU Sandbox, FreeBSD Capsicum or OpenBSD Pledge/Unveil. In this current form, Landlock misses some access-control features. This enables to minimize this patch series and ease review. This series still addresses multiple use cases, especially with the combined use of seccomp-bpf: applications with built-in sandboxing, init systems, security sandbox tools and security-oriented APIs [2]. Previous version: https://lore.kernel.org/lkml/20210202162710.657398-1-mic@digikod.net/ [1] https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler… [2] https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.n… Casey Schaufler (1): LSM: Infrastructure management of the superblock Mickaël Salaün (11): landlock: Add object management landlock: Add ruleset and domain management landlock: Set up the security framework and manage credentials landlock: Add ptrace restrictions fs,security: Add sb_delete hook landlock: Support filesystem access-control landlock: Add syscall implementations arch: Wire up Landlock syscalls selftests/landlock: Add user space tests samples/landlock: Add a sandbox manager example landlock: Add user and kernel documentation Documentation/security/index.rst | 1 + Documentation/security/landlock.rst | 79 + Documentation/userspace-api/index.rst | 1 + Documentation/userspace-api/landlock.rst | 307 ++ MAINTAINERS | 15 + arch/Kconfig | 7 + arch/alpha/kernel/syscalls/syscall.tbl | 3 + arch/arm/tools/syscall.tbl | 3 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 6 + arch/ia64/kernel/syscalls/syscall.tbl | 3 + arch/m68k/kernel/syscalls/syscall.tbl | 3 + arch/microblaze/kernel/syscalls/syscall.tbl | 3 + arch/mips/kernel/syscalls/syscall_n32.tbl | 3 + arch/mips/kernel/syscalls/syscall_n64.tbl | 3 + arch/mips/kernel/syscalls/syscall_o32.tbl | 3 + arch/parisc/kernel/syscalls/syscall.tbl | 3 + arch/powerpc/kernel/syscalls/syscall.tbl | 3 + arch/s390/kernel/syscalls/syscall.tbl | 3 + arch/sh/kernel/syscalls/syscall.tbl | 3 + arch/sparc/kernel/syscalls/syscall.tbl | 3 + arch/um/Kconfig | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 3 + arch/x86/entry/syscalls/syscall_64.tbl | 3 + arch/xtensa/kernel/syscalls/syscall.tbl | 3 + fs/super.c | 1 + include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 4 + include/linux/security.h | 4 + include/linux/syscalls.h | 7 + include/uapi/asm-generic/unistd.h | 8 +- include/uapi/linux/landlock.h | 128 + kernel/sys_ni.c | 5 + samples/Kconfig | 7 + samples/Makefile | 1 + samples/landlock/.gitignore | 1 + samples/landlock/Makefile | 13 + samples/landlock/sandboxer.c | 238 ++ security/Kconfig | 11 +- security/Makefile | 2 + security/landlock/Kconfig | 21 + security/landlock/Makefile | 4 + security/landlock/common.h | 20 + security/landlock/cred.c | 46 + security/landlock/cred.h | 58 + security/landlock/fs.c | 686 +++++ security/landlock/fs.h | 56 + security/landlock/limits.h | 21 + security/landlock/object.c | 67 + security/landlock/object.h | 91 + security/landlock/ptrace.c | 120 + security/landlock/ptrace.h | 14 + security/landlock/ruleset.c | 473 +++ security/landlock/ruleset.h | 165 + security/landlock/setup.c | 40 + security/landlock/setup.h | 18 + security/landlock/syscalls.c | 445 +++ security/security.c | 51 +- security/selinux/hooks.c | 58 +- security/selinux/include/objsec.h | 6 + security/selinux/ss/services.c | 3 +- security/smack/smack.h | 6 + security/smack/smack_lsm.c | 35 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/landlock/.gitignore | 2 + tools/testing/selftests/landlock/Makefile | 24 + tools/testing/selftests/landlock/base_test.c | 219 ++ tools/testing/selftests/landlock/common.h | 183 ++ tools/testing/selftests/landlock/config | 7 + tools/testing/selftests/landlock/fs_test.c | 2724 +++++++++++++++++ .../testing/selftests/landlock/ptrace_test.c | 337 ++ tools/testing/selftests/landlock/true.c | 5 + 72 files changed, 6827 insertions(+), 77 deletions(-) create mode 100644 Documentation/security/landlock.rst create mode 100644 Documentation/userspace-api/landlock.rst create mode 100644 include/uapi/linux/landlock.h create mode 100644 samples/landlock/.gitignore create mode 100644 samples/landlock/Makefile create mode 100644 samples/landlock/sandboxer.c create mode 100644 security/landlock/Kconfig create mode 100644 security/landlock/Makefile create mode 100644 security/landlock/common.h create mode 100644 security/landlock/cred.c create mode 100644 security/landlock/cred.h create mode 100644 security/landlock/fs.c create mode 100644 security/landlock/fs.h create mode 100644 security/landlock/limits.h create mode 100644 security/landlock/object.c create mode 100644 security/landlock/object.h create mode 100644 security/landlock/ptrace.c create mode 100644 security/landlock/ptrace.h create mode 100644 security/landlock/ruleset.c create mode 100644 security/landlock/ruleset.h create mode 100644 security/landlock/setup.c create mode 100644 security/landlock/setup.h create mode 100644 security/landlock/syscalls.c create mode 100644 tools/testing/selftests/landlock/.gitignore create mode 100644 tools/testing/selftests/landlock/Makefile create mode 100644 tools/testing/selftests/landlock/base_test.c create mode 100644 tools/testing/selftests/landlock/common.h create mode 100644 tools/testing/selftests/landlock/config create mode 100644 tools/testing/selftests/landlock/fs_test.c create mode 100644 tools/testing/selftests/landlock/ptrace_test.c create mode 100644 tools/testing/selftests/landlock/true.c base-commit: 31caf8b2a847214be856f843e251fc2ed2cd1075 -- 2.30.0

4 years, 10 months

1
11
0 0

[PATCH] selftests/timers: remove unneeded semicolon

by Jiapeng Chong

Fix the following coccicheck warnings: ./tools/testing/selftests/timers/nsleep-lat.c:75:2-3: Unneeded semicolon. Reported-by: Abaci Robot <abaci(a)linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com> --- tools/testing/selftests/timers/nsleep-lat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/timers/nsleep-lat.c b/tools/testing/selftests/timers/nsleep-lat.c index eb3e79e..a7ca982 100644 --- a/tools/testing/selftests/timers/nsleep-lat.c +++ b/tools/testing/selftests/timers/nsleep-lat.c @@ -72,7 +72,7 @@ char *clockstring(int clockid) return "CLOCK_BOOTTIME_ALARM"; case CLOCK_TAI: return "CLOCK_TAI"; - }; + } return "UNKNOWN_CLOCKID"; } -- 1.8.3.1

4 years, 10 months

1
0
0 0

[PATCH] Documentation: kselftest: fix path to test module files

by Antonio Terceiro

The top-level kselftest directory is not called kselftest, but selftests. Signed-off-by: Antonio Terceiro <antonio.terceiro(a)linaro.org> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Jonathan Corbet <corbet(a)lwn.net> --- Documentation/dev-tools/kselftest.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst index a901def730d9..dcefee707ccd 100644 --- a/Documentation/dev-tools/kselftest.rst +++ b/Documentation/dev-tools/kselftest.rst @@ -239,8 +239,8 @@ using a shell script test runner. ``kselftest/module.sh`` is designed to facilitate this process. There is also a header file provided to assist writing kernel modules that are for use with kselftest: -- ``tools/testing/kselftest/kselftest_module.h`` -- ``tools/testing/kselftest/kselftest/module.sh`` +- ``tools/testing/selftests/kselftest_module.h`` +- ``tools/testing/selftests/kselftest/module.sh`` How to use ---------- -- 2.30.1

4 years, 10 months

1
0
0 0

[PATCH AUTOSEL 5.11 10/67] selftests/bpf: Remove memory leak

by Sasha Levin

From: Björn Töpel <bjorn.topel(a)intel.com> [ Upstream commit 4896d7e37ea5217d42e210bfcf4d56964044704f ] The allocated entry is immediately overwritten by an assignment. Fix that. Signed-off-by: Björn Töpel <bjorn.topel(a)intel.com> Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-5-bjorn.topel@gmail.com Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- tools/testing/selftests/bpf/xdpxceiver.c | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/testing/selftests/bpf/xdpxceiver.c b/tools/testing/selftests/bpf/xdpxceiver.c index 1e722ee76b1fc..e7945b6246c82 100644 --- a/tools/testing/selftests/bpf/xdpxceiver.c +++ b/tools/testing/selftests/bpf/xdpxceiver.c @@ -729,7 +729,6 @@ static void worker_pkt_validate(void) u32 payloadseqnum = -2; while (1) { - pkt_node_rx_q = malloc(sizeof(struct pkt)); pkt_node_rx_q = TAILQ_LAST(&head, head_s); if (!pkt_node_rx_q) break; -- 2.27.0

4 years, 10 months

1
0
0 0

[PATCH v5 0/3] Some optimizations related to sgx

by Tianjia Zhang

This is an optimization of a set of sgx-related codes, each of which is independent of the patch. Because the second and third patches have conflicting dependencies, these patches are put together. --- v5 changes: * Remove the two patches with no actual value * Typo fix in commit message v4 changes: * Improvements suggested by review v3 changes: * split free_cnt count and spin lock optimization into two patches v2 changes: * review suggested changes Tianjia Zhang (3): selftests/x86: Use getauxval() to simplify the code in sgx x86/sgx: Allows ioctl PROVISION to execute before CREATE x86/sgx: Remove redundant if conditions in sgx_encl_create arch/x86/kernel/cpu/sgx/driver.c | 1 + arch/x86/kernel/cpu/sgx/ioctl.c | 8 ++++---- tools/testing/selftests/sgx/main.c | 24 ++++-------------------- 3 files changed, 9 insertions(+), 24 deletions(-) -- 2.19.1.3.ge56e4f7

4 years, 10 months

2
5
0 0

[PATCH v3 0/8] Fork brute force attack mitigation

by John Wood

Attacks against vulnerable userspace applications with the purpose to break ASLR or bypass canaries traditionally use some level of brute force with the help of the fork system call. This is possible since when creating a new process using fork its memory contents are the same as those of the parent process (the process that called the fork system call). So, the attacker can test the memory infinite times to find the correct memory values or the correct memory addresses without worrying about crashing the application. Based on the above scenario it would be nice to have this detected and mitigated, and this is the goal of this patch serie. Specifically the following attacks are expected to be detected: 1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a desirable memory layout is got (e.g. Stack Clash). 2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly until a desirable memory layout is got (e.g. what CTFs do for simple network service). 3.- Launching processes without exec() (e.g. Android Zygote) and exposing state to attack a sibling. 4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until the previously shared memory layout of all the other children is exposed (e.g. kind of related to HeartBleed). In each case, a privilege boundary has been crossed: Case 1: setuid/setgid process Case 2: network to local Case 3: privilege changes Case 4: network to local So, what will really be detected are fork/exec brute force attacks that cross any of the commented bounds. The implementation details and comparison against other existing implementations can be found in the "Documentation" patch. This v3 version has changed a lot from the v2. Basically the application crash period is now compute on an on-going basis using an exponential moving average (EMA), a detection of a brute force attack through the "execve" system call has been added and the crossing of the commented privilege bounds are taken into account. Also, the fine tune has also been removed and now, all this kind of attacks are detected without administrator intervention. In the v2 version Kees Cook suggested to study if the statistical data shared by all the fork hierarchy processes can be tracked in some other way. Specifically the question was if this info can be hold by the family hierarchy of the mm struct. After studying this hierarchy I think it is not suitable for the Brute LSM since they are totally copied on fork() and in this case we want that they are shared. So I leave this road. So, knowing all this information I will explain now the different patches: The 1/8 patch defines a new LSM hook to get the fatal signal of a task. This will be useful during the attack detection phase. The 2/8 patch defines a new LSM and manages the statistical data shared by all the fork hierarchy processes. The 3/8 patch detects a fork/exec brute force attack. The 4/8 patch narrows the detection taken into account the privilege boundary crossing. The 5/8 patch mitigates a brute force attack. The 6/8 patch adds self-tests to validate the Brute LSM expectations. The 7/8 patch adds the documentation to explain this implementation. The 8/8 patch updates the maintainers file. This patch serie is a task of the KSPP [1] and can also be accessed from my github tree [2] in the "brute_v3" branch. [1] https://github.com/KSPP/linux/issues/39 [2] https://github.com/johwood/linux/ The previous versions can be found in: https://lore.kernel.org/kernel-hardening/20200910202107.3799376-1-keescook@… https://lore.kernel.org/kernel-hardening/20201025134540.3770-1-john.wood@gm… Changelog RFC -> v2 ------------------- - Rename this feature with a more suitable name (Jann Horn, Kees Cook). - Convert the code to an LSM (Kees Cook). - Add locking to avoid data races (Jann Horn). - Add a new LSM hook to get the fatal signal of a task (Jann Horn, Kees Cook). - Add the last crashes timestamps list to avoid false positives in the attack detection (Jann Horn). - Use "period" instead of "rate" (Jann Horn). - Other minor changes suggested (Jann Horn, Kees Cook). Changelog v2 -> v3 ------------------ - Compute the application crash period on an on-going basis (Kees Cook). - Detect a brute force attack through the execve system call (Kees Cook). - Detect an slow brute force attack (Randy Dunlap). - Fine tuning the detection taken into account privilege boundary crossing (Kees Cook). - Taken into account only fatal signals delivered by the kernel (Kees Cook). - Remove the sysctl attributes to fine tuning the detection (Kees Cook). - Remove the prctls to allow per process enabling/disabling (Kees Cook). - Improve the documentation (Kees Cook). - Fix some typos in the documentation (Randy Dunlap). - Add self-test to validate the expectations (Kees Cook). John Wood (8): security: Add LSM hook at the point where a task gets a fatal signal security/brute: Define a LSM and manage statistical data securtiy/brute: Detect a brute force attack security/brute: Fine tuning the attack detection security/brute: Mitigate a brute force attack selftests/brute: Add tests for the Brute LSM Documentation: Add documentation for the Brute LSM MAINTAINERS: Add a new entry for the Brute LSM Documentation/admin-guide/LSM/Brute.rst | 224 +++++ Documentation/admin-guide/LSM/index.rst | 1 + MAINTAINERS | 7 + include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 4 + include/linux/security.h | 4 + kernel/signal.c | 1 + security/Kconfig | 11 +- security/Makefile | 4 + security/brute/Kconfig | 13 + security/brute/Makefile | 2 + security/brute/brute.c | 1102 ++++++++++++++++++++++ security/security.c | 5 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/brute/.gitignore | 2 + tools/testing/selftests/brute/Makefile | 5 + tools/testing/selftests/brute/config | 1 + tools/testing/selftests/brute/exec.c | 44 + tools/testing/selftests/brute/test.c | 507 ++++++++++ tools/testing/selftests/brute/test.sh | 226 +++++ 20 files changed, 2160 insertions(+), 5 deletions(-) create mode 100644 Documentation/admin-guide/LSM/Brute.rst create mode 100644 security/brute/Kconfig create mode 100644 security/brute/Makefile create mode 100644 security/brute/brute.c create mode 100644 tools/testing/selftests/brute/.gitignore create mode 100644 tools/testing/selftests/brute/Makefile create mode 100644 tools/testing/selftests/brute/config create mode 100644 tools/testing/selftests/brute/exec.c create mode 100644 tools/testing/selftests/brute/test.c create mode 100755 tools/testing/selftests/brute/test.sh -- 2.25.1

4 years, 10 months

2
15
0 0

[PATCH] selftests: timers: remove unneeded semicolon

by Jiapeng Chong

Fix the following coccicheck warnings: ./tools/testing/selftests/timers/alarmtimer-suspend.c:82:2-3: Unneeded semicolon. Reported-by: Abaci Robot <abaci(a)linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com> --- tools/testing/selftests/timers/alarmtimer-suspend.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/timers/alarmtimer-suspend.c b/tools/testing/selftests/timers/alarmtimer-suspend.c index 4da09db..54da4b08 100644 --- a/tools/testing/selftests/timers/alarmtimer-suspend.c +++ b/tools/testing/selftests/timers/alarmtimer-suspend.c @@ -79,7 +79,7 @@ char *clockstring(int clockid) return "CLOCK_BOOTTIME_ALARM"; case CLOCK_TAI: return "CLOCK_TAI"; - }; + } return "UNKNOWN_CLOCKID"; } -- 1.8.3.1

4 years, 10 months

1
0
0 0

[PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas

by Mike Rapoport

From: Mike Rapoport <rppt(a)linux.ibm.com> Hi, @Andrew, this is based on v5.11-rc5-mmotm-2021-01-27-23-30, with secretmem and related patches dropped from there, I can rebase whatever way you prefer. This is an implementation of "secret" mappings backed by a file descriptor. The file descriptor backing secret memory mappings is created using a dedicated memfd_secret system call The desired protection mode for the memory is configured using flags parameter of the system call. The mmap() of the file descriptor created with memfd_secret() will create a "secret" memory mapping. The pages in that mapping will be marked as not present in the direct map and will be present only in the page table of the owning mm. Although normally Linux userspace mappings are protected from other users, such secret mappings are useful for environments where a hostile tenant is trying to trick the kernel into giving them access to other tenants mappings. Additionally, in the future the secret mappings may be used as a mean to protect guest memory in a virtual machine host. For demonstration of secret memory usage we've created a userspace library https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloade… that does two things: the first is act as a preloader for openssl to redirect all the OPENSSL_malloc calls to secret memory meaning any secret keys get automatically protected this way and the other thing it does is expose the API to the user who needs it. We anticipate that a lot of the use cases would be like the openssl one: many toolkits that deal with secret keys already have special handling for the memory to try to give them greater protection, so this would simply be pluggable into the toolkits without any need for user application modification. Hiding secret memory mappings behind an anonymous file allows usage of the page cache for tracking pages allocated for the "secret" mappings as well as using address_space_operations for e.g. page migration callbacks. The anonymous file may be also used implicitly, like hugetlb files, to implement mmap(MAP_SECRET) and use the secret memory areas with "native" mm ABIs in the future. Removing of the pages from the direct map may cause its fragmentation on architectures that use large pages to map the physical memory which affects the system performance. However, the original Kconfig text for CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736 ("x86: add gbpages switches")) and the recent report [1] showed that "... although 1G mappings are a good default choice, there is no compelling evidence that it must be the only choice". Hence, it is sufficient to have secretmem disabled by default with the ability of a system administrator to enable it at boot time. In addition, there is also a long term goal to improve management of the direct map. [1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux… v17: * Remove pool of large pages backing secretmem allocations, per Michal Hocko * Add secretmem pages to unevictable LRU, per Michal Hocko * Use GFP_HIGHUSER as secretmem mapping mask, per Michal Hocko * Make secretmem an opt-in feature that is disabled by default v16: * Fix memory leak intorduced in v15 * Clean the data left from previous page user before handing the page to the userspace v15: https://lore.kernel.org/lkml/20210120180612.1058-1-rppt@kernel.org * Add riscv/Kconfig update to disable set_memory operations for nommu builds (patch 3) * Update the code around add_to_page_cache() per Matthew's comments (patches 6,7) * Add fixups for build/checkpatch errors discovered by CI systems v14: https://lore.kernel.org/lkml/20201203062949.5484-1-rppt@kernel.org * Finally s/mod_node_page_state/mod_lruvec_page_state/ v13: https://lore.kernel.org/lkml/20201201074559.27742-1-rppt@kernel.org * Added Reviewed-by, thanks Catalin and David * s/mod_node_page_state/mod_lruvec_page_state/ as Shakeel suggested Older history: v12: https://lore.kernel.org/lkml/20201125092208.12544-1-rppt@kernel.org v11: https://lore.kernel.org/lkml/20201124092556.12009-1-rppt@kernel.org v10: https://lore.kernel.org/lkml/20201123095432.5860-1-rppt@kernel.org v9: https://lore.kernel.org/lkml/20201117162932.13649-1-rppt@kernel.org v8: https://lore.kernel.org/lkml/20201110151444.20662-1-rppt@kernel.org v7: https://lore.kernel.org/lkml/20201026083752.13267-1-rppt@kernel.org v6: https://lore.kernel.org/lkml/20200924132904.1391-1-rppt@kernel.org v5: https://lore.kernel.org/lkml/20200916073539.3552-1-rppt@kernel.org v4: https://lore.kernel.org/lkml/20200818141554.13945-1-rppt@kernel.org v3: https://lore.kernel.org/lkml/20200804095035.18778-1-rppt@kernel.org v2: https://lore.kernel.org/lkml/20200727162935.31714-1-rppt@kernel.org v1: https://lore.kernel.org/lkml/20200720092435.17469-1-rppt@kernel.org rfc-v2: https://lore.kernel.org/lkml/20200706172051.19465-1-rppt@kernel.org/ rfc-v1: https://lore.kernel.org/lkml/20200130162340.GA14232@rapoport-lnx/ rfc-v0: https://lore.kernel.org/lkml/1572171452-7958-1-git-send-email-rppt@kernel.o… Arnd Bergmann (1): arm64: kfence: fix header inclusion Mike Rapoport (9): mm: add definition of PMD_PAGE_ORDER mmap: make mlock_future_check() global riscv/Kconfig: make direct map manipulation options depend on MMU set_memory: allow set_direct_map_*_noflush() for multiple pages set_memory: allow querying whether set_direct_map_*() is actually enabled mm: introduce memfd_secret system call to create "secret" memory areas PM: hibernate: disable when there are active secretmem users arch, mm: wire up memfd_secret system call where relevant secretmem: test: add basic selftest for memfd_secret(2) arch/arm64/include/asm/Kbuild | 1 - arch/arm64/include/asm/cacheflush.h | 6 - arch/arm64/include/asm/kfence.h | 2 +- arch/arm64/include/asm/set_memory.h | 17 ++ arch/arm64/include/uapi/asm/unistd.h | 1 + arch/arm64/kernel/machine_kexec.c | 1 + arch/arm64/mm/mmu.c | 6 +- arch/arm64/mm/pageattr.c | 23 +- arch/riscv/Kconfig | 4 +- arch/riscv/include/asm/set_memory.h | 4 +- arch/riscv/include/asm/unistd.h | 1 + arch/riscv/mm/pageattr.c | 8 +- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/x86/include/asm/set_memory.h | 4 +- arch/x86/mm/pat/set_memory.c | 8 +- fs/dax.c | 11 +- include/linux/pgtable.h | 3 + include/linux/secretmem.h | 30 +++ include/linux/set_memory.h | 16 +- include/linux/syscalls.h | 1 + include/uapi/asm-generic/unistd.h | 6 +- include/uapi/linux/magic.h | 1 + kernel/power/hibernate.c | 5 +- kernel/power/snapshot.c | 4 +- kernel/sys_ni.c | 2 + mm/Kconfig | 3 + mm/Makefile | 1 + mm/gup.c | 10 + mm/internal.h | 3 + mm/mlock.c | 3 +- mm/mmap.c | 5 +- mm/secretmem.c | 261 +++++++++++++++++++ mm/vmalloc.c | 5 +- scripts/checksyscalls.sh | 4 + tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/Makefile | 3 +- tools/testing/selftests/vm/memfd_secret.c | 296 ++++++++++++++++++++++ tools/testing/selftests/vm/run_vmtests | 17 ++ 39 files changed, 726 insertions(+), 53 deletions(-) create mode 100644 arch/arm64/include/asm/set_memory.h create mode 100644 include/linux/secretmem.h create mode 100644 mm/secretmem.c create mode 100644 tools/testing/selftests/vm/memfd_secret.c -- 2.28.0

4 years, 10 months

7
72
0 0

[GIT PULL] Kselftest update for Linux 5.12-rc1

by Shuah Khan

Hi Linus, Please pull the following Kselftest update for Linux 5.12-rc1. This Kselftest update for Linux 5.12-rc1 consists of: - dmabuf-heaps test fixes and cleanups from John Stultz. - seccomp test fix to accept any valid fd in user_notification_addfd. - Minor fixes to breakpoints and vDSO tests. - Minor code cleanups to ipc and x86 tests. diff is attached. thanks, -- Shuah ---------------------------------------------------------------- The following changes since commit 92bf22614b21a2706f4993b278017e437f7785b3: Linux 5.11-rc7 (2021-02-07 13:57:38 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-next-5.12-rc1 for you to fetch changes up to e0c0840a46db9d50ba7391082d665d74f320c39f: selftests/seccomp: Accept any valid fd in user_notification_addfd (2021-02-09 17:39:01 -0700) ---------------------------------------------------------------- linux-kselftest-next-5.12-rc1 This Kselftest update for Linux 5.12-rc1 consists of: - dmabuf-heaps test fixes and cleanups from John Stultz. - seccomp test fix to accept any valid fd in user_notification_addfd. - Minor fixes to breakpoints and vDSO tests. - Minor code cleanups to ipc and x86 tests. ---------------------------------------------------------------- John Stultz (5): kselftests: dmabuf-heaps: Fix Makefile's inclusion of the kernel's usr/include dir kselftests: dmabuf-heaps: Add clearer checks on DMABUF_BEGIN/END_SYNC kselftests: dmabuf-heaps: Softly fail if don't find a vgem device kselftests: dmabuf-heaps: Cleanup test output kselftests: dmabuf-heaps: Add extra checking that allocated buffers are zeroed Seth Forshee (1): selftests/seccomp: Accept any valid fd in user_notification_addfd Tiezhu Yang (1): selftests: breakpoints: Use correct error messages in breakpoint_test_arm64.c Tobias Klauser (2): selftests/vDSO: fix ABI selftest on riscv selftests/timens: add futex binary to .gitignore Yang Li (2): selftests/ipc: remove unneeded semicolon selftests/x86/ldt_gdt: remove unneeded semicolon .../selftests/breakpoints/breakpoint_test_arm64.c | 4 +- tools/testing/selftests/dmabuf-heaps/Makefile | 2 +- tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c | 149 ++++++++++++++++----- tools/testing/selftests/ipc/msgque.c | 6 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 8 +- tools/testing/selftests/timens/.gitignore | 1 + tools/testing/selftests/vDSO/vdso_config.h | 4 +- tools/testing/selftests/x86/ldt_gdt.c | 2 +- 8 files changed, 132 insertions(+), 44 deletions(-) ----------------------------------------------------------------

4 years, 10 months

2
1
0 0

[GIT PULL] KUnit update for Linux 5.12-rc1

by Shuah Khan

Hi Linus, Please pull the following KUnit update for Linux 5.12-rc1. This KUnit update for Linux 5.12-rc1 consists of consists of: -- support for filtering test suites using glob from Daniel Latypov. "kunit_filter.glob" command line option is passed to the UML kernel, which currently only supports filtering by suite name. This support allows running different subsets of tests, e.g. $ ./tools/testing/kunit/kunit.py build $ ./tools/testing/kunit/kunit.py exec 'list*' $ ./tools/testing/kunit/kunit.py exec 'kunit*' -- several fixes and cleanups also from Daniel Latypov. diff is attached. thanks, -- Shuah ---------------------------------------------------------------- The following changes since commit 92bf22614b21a2706f4993b278017e437f7785b3: Linux 5.11-rc7 (2021-02-07 13:57:38 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-kunit-5.12-rc1 for you to fetch changes up to 7af29141a31a2a2350589471c8979ff5f22fb9b7: kunit: tool: fix unintentional statefulness in run_kernel() (2021-02-08 16:10:22 -0700) ---------------------------------------------------------------- linux-kselftest-kunit-5.12-rc1 This KUnit update for Linux 5.12-rc1 consists of consists of: -- support for filtering test suites using glob from Daniel Latypov. "kunit_filter.glob" command line option is passed to the UML kernel, which currently only supports filtering by suite name. This support allows running different subsets of tests, e.g. $ ./tools/testing/kunit/kunit.py build $ ./tools/testing/kunit/kunit.py exec 'list*' $ ./tools/testing/kunit/kunit.py exec 'kunit*' -- several fixes and cleanups also from Daniel Latypov. ---------------------------------------------------------------- Daniel Latypov (12): kunit: tool: fix unit test cleanup handling kunit: tool: stop using bare asserts in unit test kunit: tool: use `with open()` in unit test minor: kunit: tool: fix unit test so it can run from non-root dir kunit: tool: simplify kconfig is_subset_of() logic KUnit: Docs: make start.rst example Kconfig follow style.rst Documentation: kunit: add tips.rst for small examples kunit: make kunit_tool accept optional path to .kunitconfig fragment kunit: don't show `1 == 1` in failed assertion messages kunit: add kunit.filter_glob cmdline option to filter suites kunit: tool: add support for filtering suites by glob kunit: tool: fix unintentional statefulness in run_kernel() Documentation/dev-tools/kunit/index.rst | 2 + Documentation/dev-tools/kunit/start.rst | 7 +- Documentation/dev-tools/kunit/tips.rst | 115 ++++++++++++++++++ lib/kunit/Kconfig | 1 + lib/kunit/assert.c | 39 +++++- lib/kunit/executor.c | 93 +++++++++++++-- tools/testing/kunit/kunit.py | 30 +++-- tools/testing/kunit/kunit_config.py | 13 +- tools/testing/kunit/kunit_kernel.py | 18 ++- tools/testing/kunit/kunit_tool_test.py | 204 +++++++++++++++++--------------- 10 files changed, 390 insertions(+), 132 deletions(-) create mode 100644 Documentation/dev-tools/kunit/tips.rst ----------------------------------------------------------------

4 years, 10 months

2
1
0 0

[PATCH v4 21/22] x86/fpu/xstate: Support dynamic user state in the signal handling path

by Chang S. Bae

Entering a signal handler, the kernel saves xstate in signal frame. The dynamic user state is better to be saved only when used. fpu->state_mask can help to exclude unused states. Returning from a signal handler, XRSTOR re-initializes the excluded state components. Add a test case to verify in the signal handler that the signal frame excludes AMX data when the signaled thread has initialized AMX state. Signed-off-by: Chang S. Bae <chang.seok.bae(a)intel.com> Reviewed-by: Len Brown <len.brown(a)intel.com> Cc: x86(a)kernel.org Cc: linux-kernel(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org --- Changes from v3: * Removed 'no functional changes' in the changelog. (Borislav Petkov) Changes from v1: * Made it revertable (moved close to the end of the series). * Included the test case. --- arch/x86/include/asm/fpu/internal.h | 2 +- tools/testing/selftests/x86/amx.c | 66 +++++++++++++++++++++++++++++ 2 files changed, 67 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index c467312d38d8..090eb5bb277b 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -354,7 +354,7 @@ static inline void copy_kernel_to_xregs(struct xregs_state *xstate, u64 mask) */ static inline int copy_xregs_to_user(struct xregs_state __user *buf) { - u64 mask = xfeatures_mask_user(); + u64 mask = current->thread.fpu.state_mask; u32 lmask = mask; u32 hmask = mask >> 32; int err; diff --git a/tools/testing/selftests/x86/amx.c b/tools/testing/selftests/x86/amx.c index f4ecdfd27ae9..a7386b886532 100644 --- a/tools/testing/selftests/x86/amx.c +++ b/tools/testing/selftests/x86/amx.c @@ -650,6 +650,71 @@ static void test_ptrace(void) test_tile_state_write(ptracee_loads_tiles); } +/* Signal handling test */ + +static int sigtrapped; +struct tile_data sig_tiles, sighdl_tiles; + +static void handle_sigtrap(int sig, siginfo_t *info, void *ctx_void) +{ + ucontext_t *uctxt = (ucontext_t *)ctx_void; + struct xsave_data xdata; + struct tile_config cfg; + struct tile_data tiles; + u64 header; + + header = __get_xsave_xstate_bv((void *)uctxt->uc_mcontext.fpregs); + + if (header & (1 << XFEATURE_XTILE_DATA)) + printf("[FAIL]\ttile data was written in sigframe\n"); + else + printf("[OK]\ttile data was skipped in sigframe\n"); + + set_tilecfg(&cfg); + load_tilecfg(&cfg); + init_xdata(&xdata); + + make_tiles(&tiles); + copy_tiles_to_xdata(&xdata, &tiles); + restore_xdata(&xdata); + + save_xdata(&xdata); + if (compare_xdata_tiles(&xdata, &tiles)) + err(1, "tile load file"); + + printf("\tsignal handler: load tile data\n"); + + sigtrapped = sig; +} + +static void test_signal_handling(void) +{ + struct xsave_data xdata = { 0 }; + struct tile_data tiles = { 0 }; + + sethandler(SIGTRAP, handle_sigtrap, 0); + sigtrapped = 0; + + printf("[RUN]\tCheck tile state management in handling signal\n"); + + printf("\tbefore signal: initial tile data state\n"); + + raise(SIGTRAP); + + if (sigtrapped == 0) + err(1, "sigtrap"); + + save_xdata(&xdata); + if (compare_xdata_tiles(&xdata, &tiles)) { + printf("[FAIL]\ttile data was not loaded at sigreturn\n"); + nerrs++; + } else { + printf("[OK]\ttile data was re-initialized at sigreturn\n"); + } + + clearhandler(SIGTRAP); +} + int main(void) { /* Check hardware availability at first */ @@ -672,6 +737,7 @@ int main(void) test_fork(); test_context_switch(); test_ptrace(); + test_signal_handling(); return nerrs ? 1 : 0; } -- 2.17.1

4 years, 10 months

1
0
0 0

[PATCH v4 20/22] selftest/x86/amx: Include test cases for the AMX state management

by Chang S. Bae

This selftest exercises the kernel's behavior not to inherit AMX state and the ability to switch the context by verifying that they retain unique data between multiple threads. Also, ptrace() is used to insert AMX state into existing threads -- both before and after the existing thread has initialized its AMX state. Collect the test cases of validating those operations together, as they share some common setup for the AMX state. These test cases do not depend on AMX compiler support, as they employ userspace-XSAVE directly to access AMX state. Signed-off-by: Chang S. Bae <chang.seok.bae(a)intel.com> Reviewed-by: Len Brown <len.brown(a)intel.com> Cc: linux-kernel(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org --- Changes from v2: * Updated the test messages and the changelog as tile data is not inherited to a child anymore. * Removed bytecode for the instructions already supported by binutils. * Changed to check the XSAVE availability in a reliable way. Changes from v1: * Removed signal testing code --- tools/testing/selftests/x86/Makefile | 2 +- tools/testing/selftests/x86/amx.c | 677 +++++++++++++++++++++++++++ 2 files changed, 678 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/x86/amx.c diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index 333980375bc7..2f7feb03867b 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -17,7 +17,7 @@ TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer -TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering +TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering amx # Some selftests require 32bit support enabled also on 64bit systems TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall diff --git a/tools/testing/selftests/x86/amx.c b/tools/testing/selftests/x86/amx.c new file mode 100644 index 000000000000..f4ecdfd27ae9 --- /dev/null +++ b/tools/testing/selftests/x86/amx.c @@ -0,0 +1,677 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define _GNU_SOURCE +#include <err.h> +#include <elf.h> +#include <pthread.h> +#include <sched.h> +#include <setjmp.h> +#include <signal.h> +#include <stdio.h> +#include <string.h> +#include <stdbool.h> +#include <stdint.h> +#include <stdlib.h> +#include <time.h> +#include <malloc.h> +#include <unistd.h> +#include <ucontext.h> + +#include <linux/futex.h> + +#include <sys/ipc.h> +#include <sys/mman.h> +#include <sys/ptrace.h> +#include <sys/shm.h> +#include <sys/signal.h> +#include <sys/syscall.h> +#include <sys/time.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <sys/uio.h> +#include <sys/ucontext.h> + +#include <x86intrin.h> + +#ifndef __x86_64__ +# error This test is 64-bit only +#endif + +typedef uint8_t u8; +typedef uint16_t u16; +typedef uint32_t u32; +typedef uint64_t u64; + +#define PAGE_SIZE (1 << 12) + +#define NUM_TILES 8 +#define TILE_SIZE 1024 +#define XSAVE_SIZE ((NUM_TILES * TILE_SIZE) + PAGE_SIZE) + +struct xsave_data { + u8 area[XSAVE_SIZE]; +} __attribute__((aligned(64))); + +/* Tile configuration associated: */ +#define MAX_TILES 16 +#define RESERVED_BYTES 14 + +struct tile_config { + u8 palette_id; + u8 start_row; + u8 reserved[RESERVED_BYTES]; + u16 colsb[MAX_TILES]; + u8 rows[MAX_TILES]; +}; + +struct tile_data { + u8 data[NUM_TILES * TILE_SIZE]; +}; + +static inline u64 __xgetbv(u32 index) +{ + u32 eax, edx; + + asm volatile("xgetbv;" + : "=a" (eax), "=d" (edx) + : "c" (index)); + return eax + ((u64)edx << 32); +} + +static inline void __cpuid(u32 *eax, u32 *ebx, u32 *ecx, u32 *edx) +{ + asm volatile("cpuid;" + : "=a" (*eax), "=b" (*ebx), "=c" (*ecx), "=d" (*edx) + : "0" (*eax), "2" (*ecx)); +} + +/* Load tile configuration */ +static inline void __ldtilecfg(void *cfg) +{ + asm volatile(".byte 0xc4,0xe2,0x78,0x49,0x00" + : : "a"(cfg)); +} + +/* Load tile data to %tmm0 register only */ +static inline void __tileloadd(void *tile) +{ + asm volatile(".byte 0xc4,0xe2,0x7b,0x4b,0x04,0x10" + : : "a"(tile), "d"(0)); +} + +/* Save extended states */ +static inline void __xsave(void *buffer, u32 lo, u32 hi) +{ + asm volatile("xsave (%%rdi)" + : : "D" (buffer), "a" (lo), "d" (hi) + : "memory"); +} + +/* Restore extended states */ +static inline void __xrstor(void *buffer, u32 lo, u32 hi) +{ + asm volatile("xrstor (%%rdi)" + : : "D" (buffer), "a" (lo), "d" (hi)); +} + +/* Release tile states to init values */ +static inline void __tilerelease(void) +{ + asm volatile(".byte 0xc4, 0xe2, 0x78, 0x49, 0xc0" ::); +} + +static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *), + int flags) +{ + struct sigaction sa; + + memset(&sa, 0, sizeof(sa)); + sa.sa_sigaction = handler; + sa.sa_flags = SA_SIGINFO | flags; + sigemptyset(&sa.sa_mask); + if (sigaction(sig, &sa, 0)) + err(1, "sigaction"); +} + +static void clearhandler(int sig) +{ + struct sigaction sa; + + memset(&sa, 0, sizeof(sa)); + sa.sa_handler = SIG_DFL; + sigemptyset(&sa.sa_mask); + if (sigaction(sig, &sa, 0)) + err(1, "sigaction"); +} + +/* Hardware info check: */ + +static jmp_buf jmpbuf; +static bool xsave_disabled; + +static void handle_sigill(int sig, siginfo_t *si, void *ctx_void) +{ + xsave_disabled = true; + siglongjmp(jmpbuf, 1); +} + +#define XFEATURE_XTILE_CFG 17 +#define XFEATURE_XTILE_DATA 18 +#define XFEATURE_MASK_XTILE ((1 << XFEATURE_XTILE_DATA) | \ + (1 << XFEATURE_XTILE_CFG)) + +static inline bool check_xsave_supports_xtile(void) +{ + bool supported = false; + + sethandler(SIGILL, handle_sigill, 0); + + if (!sigsetjmp(jmpbuf, 1)) + supported = __xgetbv(0) & XFEATURE_MASK_XTILE; + + clearhandler(SIGILL); + return supported; +} + +struct xtile_hwinfo { + struct { + u16 bytes_per_tile; + u16 bytes_per_row; + u16 max_names; + u16 max_rows; + } spec; + + struct { + u32 offset; + u32 size; + } xsave; +}; + +static struct xtile_hwinfo xtile; + +static bool __enum_xtile_config(void) +{ + u32 eax, ebx, ecx, edx; + u16 bytes_per_tile; + bool valid = false; + +#define TILE_CPUID 0x1d +#define TILE_PALETTE_CPUID_SUBLEAVE 0x1 + + eax = TILE_CPUID; + ecx = TILE_PALETTE_CPUID_SUBLEAVE; + + __cpuid(&eax, &ebx, &ecx, &edx); + if (!eax || !ebx || !ecx) + return valid; + + xtile.spec.max_names = ebx >> 16; + if (xtile.spec.max_names < NUM_TILES) + return valid; + + bytes_per_tile = eax >> 16; + if (bytes_per_tile < TILE_SIZE) + return valid; + + xtile.spec.bytes_per_row = ebx; + xtile.spec.max_rows = ecx; + valid = true; + + return valid; +} + +static bool __enum_xsave_tile(void) +{ + u32 eax, ebx, ecx, edx; + bool valid = false; + +#define XSTATE_CPUID 0xd +#define XSTATE_USER_STATE_SUBLEAVE 0x0 + + eax = XSTATE_CPUID; + ecx = XFEATURE_XTILE_DATA; + + __cpuid(&eax, &ebx, &ecx, &edx); + if (!eax || !ebx) + return valid; + + xtile.xsave.offset = ebx; + xtile.xsave.size = eax; + valid = true; + + return valid; +} + +static bool __check_xsave_size(void) +{ + u32 eax, ebx, ecx, edx; + bool valid = false; + + eax = XSTATE_CPUID; + ecx = XSTATE_USER_STATE_SUBLEAVE; + + __cpuid(&eax, &ebx, &ecx, &edx); + if (ebx && ebx <= XSAVE_SIZE) + valid = true; + + return valid; +} + +/* + * Check the hardware-provided tile state info and cross-check it with the + * hard-coded values: XSAVE_SIZE, NUM_TILES, and TILE_SIZE. + */ +static int check_xtile_hwinfo(void) +{ + bool success = false; + + if (!__check_xsave_size()) + return success; + + if (!__enum_xsave_tile()) + return success; + + if (!__enum_xtile_config()) + return success; + + if (sizeof(struct tile_data) >= xtile.xsave.size) + success = true; + + return success; +} + +/* The helpers for managing XSAVE buffer and tile states: */ + +/* Use the uncompacted format without 'init optimization' */ +static void save_xdata(void *data) +{ + __xsave(data, -1, -1); +} + +static void restore_xdata(void *data) +{ + __xrstor(data, -1, -1); +} + +static inline u64 __get_xsave_xstate_bv(void *data) +{ +#define XSAVE_HDR_OFFSET 512 + return *(u64 *)(data + XSAVE_HDR_OFFSET); +} + +static void set_tilecfg(struct tile_config *cfg) +{ + int i; + + memset(cfg, 0, sizeof(*cfg)); + /* The first implementation has one significant palette with id 1 */ + cfg->palette_id = 1; + for (i = 0; i < xtile.spec.max_names; i++) { + cfg->colsb[i] = xtile.spec.bytes_per_row; + cfg->rows[i] = xtile.spec.max_rows; + } +} + +static void load_tilecfg(struct tile_config *cfg) +{ + __ldtilecfg(cfg); +} + +static void make_tiles(void *tiles) +{ + u32 iterations = xtile.xsave.size / sizeof(u32); + static u32 value = 1; + u32 *ptr = tiles; + int i; + + for (i = 0, ptr = tiles; i < iterations; i++, ptr++) + *ptr = value; + value++; +} + +/* + * Initialize the XSAVE buffer: + * + * Make sure tile configuration loaded already. Load limited tile data (%tmm0 only) + * and save all the states. XSAVE buffer is ready to complete tile data. + */ +static void init_xdata(void *data) +{ + struct tile_data tiles; + + make_tiles(&tiles); + __tileloadd(&tiles); + __xsave(data, -1, -1); +} + +static inline void *__get_xsave_tile_data_addr(void *data) +{ + return data + xtile.xsave.offset; +} + +static void copy_tiles_to_xdata(void *xdata, void *tiles) +{ + void *dst = __get_xsave_tile_data_addr(xdata); + + memcpy(dst, tiles, xtile.xsave.size); +} + +static int compare_xdata_tiles(void *xdata, void *tiles) +{ + void *tile_data = __get_xsave_tile_data_addr(xdata); + + if (memcmp(tile_data, tiles, xtile.xsave.size)) + return 1; + + return 0; +} + +static int nerrs, errs; + +/* Testing tile data inheritance */ + +static void test_tile_data_inheritance(void) +{ + struct xsave_data xdata; + struct tile_data tiles; + struct tile_config cfg; + pid_t child; + int status; + + set_tilecfg(&cfg); + load_tilecfg(&cfg); + init_xdata(&xdata); + + make_tiles(&tiles); + copy_tiles_to_xdata(&xdata, &tiles); + restore_xdata(&xdata); + + errs = 0; + + child = fork(); + if (child < 0) + err(1, "fork"); + + if (child == 0) { + memset(&xdata, 0, sizeof(xdata)); + save_xdata(&xdata); + if (compare_xdata_tiles(&xdata, &tiles)) { + printf("[OK]\tchild didn't inherit tile data at fork()\n"); + } else { + printf("[FAIL]\tchild inherited tile data at fork()\n"); + nerrs++; + } + _exit(0); + } + wait(&status); +} + +static void test_fork(void) +{ + pid_t child; + int status; + + child = fork(); + if (child < 0) + err(1, "fork"); + + if (child == 0) { + test_tile_data_inheritance(); + _exit(0); + } + + wait(&status); +} + +/* Context switching test */ + +#define ITERATIONS 10 +#define NUM_THREADS 5 + +struct futex_info { + int current; + int next; + int *futex; +}; + +static inline void command_wait(struct futex_info *info, int value) +{ + do { + sched_yield(); + } while (syscall(SYS_futex, info->futex, FUTEX_WAIT, value, 0, 0, 0)); +} + +static inline void command_wake(struct futex_info *info, int value) +{ + do { + *info->futex = value; + while (!syscall(SYS_futex, info->futex, FUTEX_WAKE, 1, 0, 0, 0)) + sched_yield(); + } while (0); +} + +static inline int get_iterative_value(int id) +{ + return ((id << 1) & ~0x1); +} + +static inline int get_endpoint_value(int id) +{ + return ((id << 1) | 0x1); +} + +static void *check_tiles(void *info) +{ + struct futex_info *finfo = (struct futex_info *)info; + struct xsave_data xdata; + struct tile_data tiles; + struct tile_config cfg; + int i; + + set_tilecfg(&cfg); + load_tilecfg(&cfg); + init_xdata(&xdata); + + make_tiles(&tiles); + copy_tiles_to_xdata(&xdata, &tiles); + restore_xdata(&xdata); + + for (i = 0; i < ITERATIONS; i++) { + command_wait(finfo, get_iterative_value(finfo->current)); + + memset(&xdata, 0, sizeof(xdata)); + save_xdata(&xdata); + errs += compare_xdata_tiles(&xdata, &tiles); + + make_tiles(&tiles); + copy_tiles_to_xdata(&xdata, &tiles); + restore_xdata(&xdata); + + command_wake(finfo, get_iterative_value(finfo->next)); + } + + command_wait(finfo, get_endpoint_value(finfo->current)); + __tilerelease(); + return NULL; +} + +static int create_children(int num, struct futex_info *finfo) +{ + const int shm_id = shmget(IPC_PRIVATE, sizeof(int), IPC_CREAT | 0666); + int *futex = shmat(shm_id, NULL, 0); + pthread_t thread; + int i; + + for (i = 0; i < num; i++) { + finfo[i].futex = futex; + finfo[i].current = i + 1; + finfo[i].next = (i + 2) % (num + 1); + + if (pthread_create(&thread, NULL, check_tiles, &finfo[i])) { + err(1, "pthread_create"); + return 1; + } + } + return 0; +} + +static void test_context_switch(void) +{ + struct futex_info *finfo; + cpu_set_t cpuset; + int i; + + printf("[RUN]\t%u context switches of tile states in %d threads\n", + ITERATIONS * NUM_THREADS, NUM_THREADS); + + errs = 0; + + CPU_ZERO(&cpuset); + CPU_SET(0, &cpuset); + if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0) + err(1, "sched_setaffinity to CPU 0"); + + finfo = malloc(sizeof(*finfo) * NUM_THREADS); + + if (create_children(NUM_THREADS, finfo)) + return; + + for (i = 0; i < ITERATIONS; i++) { + command_wake(finfo, get_iterative_value(1)); + command_wait(finfo, get_iterative_value(0)); + } + + for (i = 1; i <= NUM_THREADS; i++) + command_wake(finfo, get_endpoint_value(i)); + + if (errs) { + printf("[FAIL]\t%u incorrect tile states\n", errs); + nerrs += errs; + return; + } + + printf("[OK]\tall tile states are correct\n"); +} + +/* Ptrace test */ + +static inline long get_tile_state(pid_t child, struct iovec *iov) +{ + return ptrace(PTRACE_GETREGSET, child, (u32)NT_X86_XSTATE, iov); +} + +static inline long set_tile_state(pid_t child, struct iovec *iov) +{ + return ptrace(PTRACE_SETREGSET, child, (u32)NT_X86_XSTATE, iov); +} + +static int write_tile_state(bool load_tile, pid_t child) +{ + struct xsave_data xdata; + struct tile_data tiles; + struct iovec iov; + + iov.iov_base = &xdata; + iov.iov_len = sizeof(xdata); + + if (get_tile_state(child, &iov)) + err(1, "PTRACE_GETREGSET"); + + make_tiles(&tiles); + copy_tiles_to_xdata(&xdata, &tiles); + if (set_tile_state(child, &iov)) + err(1, "PTRACE_SETREGSET"); + + memset(&xdata, 0, sizeof(xdata)); + if (get_tile_state(child, &iov)) + err(1, "PTRACE_GETREGSET"); + + if (!load_tile) + memset(&tiles, 0, sizeof(tiles)); + + return compare_xdata_tiles(&xdata, &tiles); +} + +static void test_tile_state_write(bool load_tile) +{ + pid_t child; + int status; + + child = fork(); + if (child < 0) + err(1, "fork"); + + if (child == 0) { + printf("[RUN]\tPtrace-induced tile state write, "); + printf("%s tile data loaded\n", load_tile ? "with" : "without"); + + if (ptrace(PTRACE_TRACEME, 0, NULL, NULL)) + err(1, "PTRACE_TRACEME"); + + if (load_tile) { + struct tile_config cfg; + struct tile_data tiles; + + set_tilecfg(&cfg); + load_tilecfg(&cfg); + make_tiles(&tiles); + /* Load only %tmm0 but inducing the #NM */ + __tileloadd(&tiles); + } + + raise(SIGTRAP); + _exit(0); + } + + do { + wait(&status); + } while (WSTOPSIG(status) != SIGTRAP); + + errs = write_tile_state(load_tile, child); + if (errs) { + nerrs++; + printf("[FAIL]\t%s write\n", load_tile ? "incorrect" : "unexpected"); + } else { + printf("[OK]\t%s write\n", load_tile ? "correct" : "no"); + } + + ptrace(PTRACE_DETACH, child, NULL, NULL); + wait(&status); +} + +static void test_ptrace(void) +{ + bool ptracee_loads_tiles; + + ptracee_loads_tiles = true; + test_tile_state_write(ptracee_loads_tiles); + + ptracee_loads_tiles = false; + test_tile_state_write(ptracee_loads_tiles); +} + +int main(void) +{ + /* Check hardware availability at first */ + + if (!check_xsave_supports_xtile()) { + if (xsave_disabled) + printf("XSAVE disabled.\n"); + else + printf("Tile data not available.\n"); + return 0; + } + + if (!check_xtile_hwinfo()) { + printf("Available tile state size is insufficient to test.\n"); + return 0; + } + + nerrs = 0; + + test_fork(); + test_context_switch(); + test_ptrace(); + + return nerrs ? 1 : 0; +} -- 2.17.1

4 years, 10 months

1
0
0 0

[PATCH v28 00/12] Landlock LSM

by Mickaël Salaün

Hi, This patch series fixes a corner-case with non-overlapping access rights coming from different layers. This is now handled in a generic way and verified with new tests. A stricter check is enforced for landlock_add_rule(2) to forbid useless rules. Finally, the previous landlock_enforce_ruleset_self(2) is renamed to landlock_restrict_self(2), which is more consistent. The SLOC count is 1314 for security/landlock/ and 2484 for tools/testing/selftest/landlock/ . Test coverage for security/landlock/ is 94.7% of lines. The code not covered only deals with internal kernel errors (e.g. memory allocation) and race conditions. This series is being fuzzed by syzkaller, and patches are on their way: https://github.com/google/syzkaller/pull/2380 The compiled documentation is available here: https://landlock.io/linux-doc/landlock-v28/userspace-api/landlock.html This series can be applied on top of v5.11-rc6 . This can be tested with CONFIG_SECURITY_LANDLOCK, CONFIG_SAMPLE_LANDLOCK and by prepending "landlock," to CONFIG_LSM. This patch series can be found in a Git repository here: https://github.com/landlock-lsm/linux/commits/landlock-v28 This patch series seems ready for upstream and I would really appreciate final reviews. # Landlock LSM The goal of Landlock is to enable to restrict ambient rights (e.g. global filesystem access) for a set of processes. Because Landlock is a stackable LSM [1], it makes possible to create safe security sandboxes as new security layers in addition to the existing system-wide access-controls. This kind of sandbox is expected to help mitigate the security impact of bugs or unexpected/malicious behaviors in user-space applications. Landlock empowers any process, including unprivileged ones, to securely restrict themselves. Landlock is inspired by seccomp-bpf but instead of filtering syscalls and their raw arguments, a Landlock rule can restrict the use of kernel objects like file hierarchies, according to the kernel semantic. Landlock also takes inspiration from other OS sandbox mechanisms: XNU Sandbox, FreeBSD Capsicum or OpenBSD Pledge/Unveil. In this current form, Landlock misses some access-control features. This enables to minimize this patch series and ease review. This series still addresses multiple use cases, especially with the combined use of seccomp-bpf: applications with built-in sandboxing, init systems, security sandbox tools and security-oriented APIs [2]. Previous version: https://lore.kernel.org/lkml/20210121205119.793296-1-mic@digikod.net/ [1] https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler… [2] https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.n… Casey Schaufler (1): LSM: Infrastructure management of the superblock Mickaël Salaün (11): landlock: Add object management landlock: Add ruleset and domain management landlock: Set up the security framework and manage credentials landlock: Add ptrace restrictions fs,security: Add sb_delete hook landlock: Support filesystem access-control landlock: Add syscall implementations arch: Wire up Landlock syscalls selftests/landlock: Add user space tests samples/landlock: Add a sandbox manager example landlock: Add user and kernel documentation Documentation/security/index.rst | 1 + Documentation/security/landlock.rst | 79 + Documentation/userspace-api/index.rst | 1 + Documentation/userspace-api/landlock.rst | 307 ++ MAINTAINERS | 15 + arch/Kconfig | 7 + arch/alpha/kernel/syscalls/syscall.tbl | 3 + arch/arm/tools/syscall.tbl | 3 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 6 + arch/ia64/kernel/syscalls/syscall.tbl | 3 + arch/m68k/kernel/syscalls/syscall.tbl | 3 + arch/microblaze/kernel/syscalls/syscall.tbl | 3 + arch/mips/kernel/syscalls/syscall_n32.tbl | 3 + arch/mips/kernel/syscalls/syscall_n64.tbl | 3 + arch/mips/kernel/syscalls/syscall_o32.tbl | 3 + arch/parisc/kernel/syscalls/syscall.tbl | 3 + arch/powerpc/kernel/syscalls/syscall.tbl | 3 + arch/s390/kernel/syscalls/syscall.tbl | 3 + arch/sh/kernel/syscalls/syscall.tbl | 3 + arch/sparc/kernel/syscalls/syscall.tbl | 3 + arch/um/Kconfig | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 3 + arch/x86/entry/syscalls/syscall_64.tbl | 3 + arch/xtensa/kernel/syscalls/syscall.tbl | 3 + fs/super.c | 1 + include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 3 + include/linux/security.h | 4 + include/linux/syscalls.h | 7 + include/uapi/asm-generic/unistd.h | 8 +- include/uapi/linux/landlock.h | 128 + kernel/sys_ni.c | 5 + samples/Kconfig | 7 + samples/Makefile | 1 + samples/landlock/.gitignore | 1 + samples/landlock/Makefile | 13 + samples/landlock/sandboxer.c | 238 ++ security/Kconfig | 11 +- security/Makefile | 2 + security/landlock/Kconfig | 21 + security/landlock/Makefile | 4 + security/landlock/common.h | 20 + security/landlock/cred.c | 46 + security/landlock/cred.h | 58 + security/landlock/fs.c | 627 ++++ security/landlock/fs.h | 56 + security/landlock/limits.h | 21 + security/landlock/object.c | 67 + security/landlock/object.h | 91 + security/landlock/ptrace.c | 120 + security/landlock/ptrace.h | 14 + security/landlock/ruleset.c | 473 +++ security/landlock/ruleset.h | 165 + security/landlock/setup.c | 40 + security/landlock/setup.h | 18 + security/landlock/syscalls.c | 444 +++ security/security.c | 51 +- security/selinux/hooks.c | 58 +- security/selinux/include/objsec.h | 6 + security/selinux/ss/services.c | 3 +- security/smack/smack.h | 6 + security/smack/smack_lsm.c | 35 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/landlock/.gitignore | 2 + tools/testing/selftests/landlock/Makefile | 24 + tools/testing/selftests/landlock/base_test.c | 219 ++ tools/testing/selftests/landlock/common.h | 169 ++ tools/testing/selftests/landlock/config | 6 + tools/testing/selftests/landlock/fs_test.c | 2664 +++++++++++++++++ .../testing/selftests/landlock/ptrace_test.c | 314 ++ tools/testing/selftests/landlock/true.c | 5 + 72 files changed, 6668 insertions(+), 77 deletions(-) create mode 100644 Documentation/security/landlock.rst create mode 100644 Documentation/userspace-api/landlock.rst create mode 100644 include/uapi/linux/landlock.h create mode 100644 samples/landlock/.gitignore create mode 100644 samples/landlock/Makefile create mode 100644 samples/landlock/sandboxer.c create mode 100644 security/landlock/Kconfig create mode 100644 security/landlock/Makefile create mode 100644 security/landlock/common.h create mode 100644 security/landlock/cred.c create mode 100644 security/landlock/cred.h create mode 100644 security/landlock/fs.c create mode 100644 security/landlock/fs.h create mode 100644 security/landlock/limits.h create mode 100644 security/landlock/object.c create mode 100644 security/landlock/object.h create mode 100644 security/landlock/ptrace.c create mode 100644 security/landlock/ptrace.h create mode 100644 security/landlock/ruleset.c create mode 100644 security/landlock/ruleset.h create mode 100644 security/landlock/setup.c create mode 100644 security/landlock/setup.h create mode 100644 security/landlock/syscalls.c create mode 100644 tools/testing/selftests/landlock/.gitignore create mode 100644 tools/testing/selftests/landlock/Makefile create mode 100644 tools/testing/selftests/landlock/base_test.c create mode 100644 tools/testing/selftests/landlock/common.h create mode 100644 tools/testing/selftests/landlock/config create mode 100644 tools/testing/selftests/landlock/fs_test.c create mode 100644 tools/testing/selftests/landlock/ptrace_test.c create mode 100644 tools/testing/selftests/landlock/true.c base-commit: 1048ba83fb1c00cd24172e23e8263972f6b5d9ac -- 2.30.0

4 years, 10 months

3
26
0 0

[RFC PATCH 00/13] Add futex2 syscalls

by André Almeida

Hi, This patch series introduces the futex2 syscalls. * What happened to the current futex()? For some years now, developers have been trying to add new features to futex, but maintainers have been reluctant to accept then, given the multiplexed interface full of legacy features and tricky to do big changes. Some problems that people tried to address with patchsets are: NUMA-awareness[0], smaller sized futexes[1], wait on multiple futexes[2]. NUMA, for instance, just doesn't fit the current API in a reasonable way. Considering that, it's not possible to merge new features into the current futex. ** The NUMA problem At the current implementation, all futex kernel side infrastructure is stored on a single node. Given that, all futex() calls issued by processors that aren't located on that node will have a memory access penalty when doing it. ** The 32bit sized futex problem Embedded systems or anything with memory constrains would benefit of using smaller sizes for the futex userspace integer. Also, a mutex implementation can be done using just three values, so 8 bits is enough for various scenarios. ** The wait on multiple problem The use case lies in the Wine implementation of the Windows NT interface WaitMultipleObjects. This Windows API function allows a thread to sleep waiting on the first of a set of event sources (mutexes, timers, signal, console input, etc) to signal. Considering this is a primitive synchronization operation for Windows applications, being able to quickly signal events on the producer side, and quickly go to sleep on the consumer side is essential for good performance of those running over Wine. [0] https://lore.kernel.org/lkml/20160505204230.932454245@linutronix.de/ [1] https://lore.kernel.org/lkml/20191221155659.3159-2-malteskarupke@web.de/ [2] https://lore.kernel.org/lkml/20200213214525.183689-1-andrealmeid@collabora.… * The solution As proposed by Peter Zijlstra and Florian Weimer[3], a new interface is required to solve this, which must be designed with those features in mind. futex2() is that interface. As opposed to the current multiplexed interface, the new one should have one syscall per operation. This will allow the maintainability of the API if it gets extended, and will help users with type checking of arguments. In particular, the new interface is extended to support the ability to wait on any of a list of futexes at a time, which could be seen as a vectored extension of the FUTEX_WAIT semantics. [3] https://lore.kernel.org/lkml/20200303120050.GC2596@hirez.programming.kicks-… * The interface The new interface can be seen in details in the following patches, but this is a high level summary of what the interface can do: - Supports wake/wait semantics, as in futex() - Supports requeue operations, similarly as FUTEX_CMP_REQUEUE, but with individual flags for each address - Supports waiting for a vector of futexes, using a new syscall named futex_waitv() - Supports variable sized futexes (8bits, 16bits and 32bits) - Supports NUMA-awareness operations, where the user can specify on which memory node would like to operate * Implementation The internal implementation follows a similar design to the original futex. Given that we want to replicate the same external behavior of current futex, this should be somewhat expected. For some functions, like the init and the code to get a shared key, I literally copied code and comments from kernel/futex.c. I decided to do so instead of exposing the original function as a public function since in that way we can freely modify our implementation if required, without any impact on old futex. Also, the comments precisely describes the details and corner cases of the implementation. Each patch contains a brief description of implementation, but patch 6 "docs: locking: futex2: Add documentation" adds a more complete document about it. * The patchset This patchset can be also found at my git tree: https://gitlab.collabora.com/tonyk/linux/-/tree/futex2 - Patch 1: Implements wait/wake, and the basics foundations of futex2 - Patches 2-4: Implement the remaining features (shared, waitv, requeue). - Patch 5: Adds the x86_x32 ABI handling. I kept it in a separated patch since I'm not sure if x86_x32 is still a thing, or if it should return -ENOSYS. - Patch 6: Add a documentation file which details the interface and the internal implementation. - Patches 7-13: Selftests for all operations along with perf support for futex2. - Patch 14: While working on porting glibc for futex2, I found out that there's a futex_wake() call at the user thread exit path, if that thread was created with clone(..., CLONE_CHILD_SETTID, ...). In order to make pthreads work with futex2, it was required to add this patch. Note that this is more a proof-of-concept of what we will need to do in future, rather than part of the interface and shouldn't be merged as it is. * Testing: This patchset provides selftests for each operation and their flags. Along with that, the following work was done: ** Stability To stress the interface in "real world scenarios": - glibc[4]: nptl's low level locking was modified to use futex2 API (except for robust and PI things). All relevant nptl/ tests passed. - Wine[5]: Proton/Wine was modified in order to use futex2() for the emulation of Windows NT sync mechanisms based on futex, called "fsync". Triple-A games with huge CPU's loads and tons of parallel jobs worked as expected when compared with the previous FUTEX_WAIT_MULTIPLE implementation at futex(). Some games issue 42k futex2() calls per second. - Full GNU/Linux distro: I installed the modified glibc in my host machine, so all pthread's programs would use futex2(). After tweaking systemd[6] to allow futex2() calls at seccomp, everything worked as expected (web browsers do some syscall sandboxing and need some configuration as well). - perf: The perf benchmarks tests can also be used to stress the interface, and they can be found in this patchset. ** Performance - For comparing futex() and futex2() performance, I used the artificial benchmarks implemented at perf (wake, wake-parallel, hash and requeue). The setup was 200 runs for each test and using 8, 80, 800, 8000 for the number of threads, Note that for this test, I'm not using patch 14 ("kernel: Enable waitpid() for futex2") , for reasons explained at "The patchset" section. - For the first three ones, I measured an average of 4% gain in performance. This is not a big step, but it shows that the new interface is at least comparable in performance with the current one. - For requeue, I measured an average of 21% decrease in performance compared to the original futex implementation. This is expected given the new design with individual flags. The performance trade-offs are explained at patch 4 ("futex2: Implement requeue operation"). [4] https://gitlab.collabora.com/tonyk/glibc/-/tree/futex2 [5] https://gitlab.collabora.com/tonyk/wine/-/tree/proton_5.13 [6] https://gitlab.collabora.com/tonyk/systemd * FAQ ** "Where's the code for NUMA and FUTEX_8/16?" The current code is already complex enough to take some time for review, so I believe it's better to split that work out to a future iteration of this patchset. Besides that, this RFC is the core part of the infrastructure, and the following features will not pose big design changes to it, the work will be more about wiring up the flags and modifying some functions. ** "And what's about FUTEX_64?" By supporting 64 bit futexes, the kernel structure for futex would need to have a 64 bit field for the value, and that could defeat one of the purposes of having different sized futexes in the first place: supporting smaller ones to decrease memory usage. This might be something that could be disabled for 32bit archs (and even for CONFIG_BASE_SMALL). Which use case would benefit for FUTEX_64? Does it worth the trade-offs? ** "Where's the PI/robust stuff?" As said by Peter Zijlstra at [3], all those new features are related to the "simple" futex interface, that doesn't use PI or robust. Do we want to have this complexity at futex2() and if so, should it be part of this patchset or can it be future work? Thanks, André André Almeida (13): futex2: Implement wait and wake functions futex2: Add support for shared futexes futex2: Implement vectorized wait futex2: Implement requeue operation futex2: Add compatibility entry point for x86_x32 ABI docs: locking: futex2: Add documentation selftests: futex2: Add wake/wait test selftests: futex2: Add timeout test selftests: futex2: Add wouldblock test selftests: futex2: Add waitv test selftests: futex2: Add requeue test perf bench: Add futex2 benchmark tests kernel: Enable waitpid() for futex2 Documentation/locking/futex2.rst | 198 +++ Documentation/locking/index.rst | 1 + MAINTAINERS | 2 +- arch/arm/tools/syscall.tbl | 4 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 4 + arch/x86/entry/syscalls/syscall_32.tbl | 4 + arch/x86/entry/syscalls/syscall_64.tbl | 4 + fs/inode.c | 1 + include/linux/compat.h | 23 + include/linux/fs.h | 1 + include/linux/syscalls.h | 18 + include/uapi/asm-generic/unistd.h | 14 +- include/uapi/linux/futex.h | 56 + init/Kconfig | 7 + kernel/Makefile | 1 + kernel/fork.c | 2 + kernel/futex2.c | 1255 +++++++++++++++++ kernel/sys_ni.c | 6 + tools/arch/x86/include/asm/unistd_64.h | 12 + tools/include/uapi/asm-generic/unistd.h | 11 +- .../arch/x86/entry/syscalls/syscall_64.tbl | 3 + tools/perf/bench/bench.h | 4 + tools/perf/bench/futex-hash.c | 24 +- tools/perf/bench/futex-requeue.c | 57 +- tools/perf/bench/futex-wake-parallel.c | 41 +- tools/perf/bench/futex-wake.c | 37 +- tools/perf/bench/futex.h | 47 + tools/perf/builtin-bench.c | 18 +- .../selftests/futex/functional/.gitignore | 3 + .../selftests/futex/functional/Makefile | 8 +- .../futex/functional/futex2_requeue.c | 164 +++ .../selftests/futex/functional/futex2_wait.c | 209 +++ .../selftests/futex/functional/futex2_waitv.c | 157 +++ .../futex/functional/futex_wait_timeout.c | 58 +- .../futex/functional/futex_wait_wouldblock.c | 33 +- .../testing/selftests/futex/functional/run.sh | 6 + .../selftests/futex/include/futex2test.h | 121 ++ 38 files changed, 2563 insertions(+), 53 deletions(-) create mode 100644 Documentation/locking/futex2.rst create mode 100644 kernel/futex2.c create mode 100644 tools/testing/selftests/futex/functional/futex2_requeue.c create mode 100644 tools/testing/selftests/futex/functional/futex2_wait.c create mode 100644 tools/testing/selftests/futex/functional/futex2_waitv.c create mode 100644 tools/testing/selftests/futex/include/futex2test.h -- 2.30.1

4 years, 10 months

6
28
0 0

[PATCH] selftests: timers: set-timer-lat: remove unneeded semicolon

by Yang Li

Eliminate the following coccicheck warning: ./tools/testing/selftests/timers/set-timer-lat.c:83:2-3: Unneeded semicolon ./tools/testing/selftests/timers/nsleep-lat.c:75:2-3: Unneeded semicolon ./tools/testing/selftests/timers/nanosleep.c:75:2-3: Unneeded semicolon ./tools/testing/selftests/timers/inconsistency-check.c:75:2-3: Unneeded semicolon ./tools/testing/selftests/timers/alarmtimer-suspend.c:82:2-3: Unneeded semicolon Reported-by: Abaci Robot <abaci(a)linux.alibaba.com> Signed-off-by: Yang Li <yang.lee(a)linux.alibaba.com> --- tools/testing/selftests/timers/alarmtimer-suspend.c | 2 +- tools/testing/selftests/timers/inconsistency-check.c | 2 +- tools/testing/selftests/timers/nanosleep.c | 2 +- tools/testing/selftests/timers/nsleep-lat.c | 2 +- tools/testing/selftests/timers/set-timer-lat.c | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/timers/alarmtimer-suspend.c b/tools/testing/selftests/timers/alarmtimer-suspend.c index 4da09db..54da4b08 100644 --- a/tools/testing/selftests/timers/alarmtimer-suspend.c +++ b/tools/testing/selftests/timers/alarmtimer-suspend.c @@ -79,7 +79,7 @@ char *clockstring(int clockid) return "CLOCK_BOOTTIME_ALARM"; case CLOCK_TAI: return "CLOCK_TAI"; - }; + } return "UNKNOWN_CLOCKID"; } diff --git a/tools/testing/selftests/timers/inconsistency-check.c b/tools/testing/selftests/timers/inconsistency-check.c index 022d3ff..e6756d9 100644 --- a/tools/testing/selftests/timers/inconsistency-check.c +++ b/tools/testing/selftests/timers/inconsistency-check.c @@ -72,7 +72,7 @@ char *clockstring(int clockid) return "CLOCK_BOOTTIME_ALARM"; case CLOCK_TAI: return "CLOCK_TAI"; - }; + } return "UNKNOWN_CLOCKID"; } diff --git a/tools/testing/selftests/timers/nanosleep.c b/tools/testing/selftests/timers/nanosleep.c index 71b5441..433a096 100644 --- a/tools/testing/selftests/timers/nanosleep.c +++ b/tools/testing/selftests/timers/nanosleep.c @@ -72,7 +72,7 @@ char *clockstring(int clockid) return "CLOCK_BOOTTIME_ALARM"; case CLOCK_TAI: return "CLOCK_TAI"; - }; + } return "UNKNOWN_CLOCKID"; } diff --git a/tools/testing/selftests/timers/nsleep-lat.c b/tools/testing/selftests/timers/nsleep-lat.c index eb3e79e..a7ca982 100644 --- a/tools/testing/selftests/timers/nsleep-lat.c +++ b/tools/testing/selftests/timers/nsleep-lat.c @@ -72,7 +72,7 @@ char *clockstring(int clockid) return "CLOCK_BOOTTIME_ALARM"; case CLOCK_TAI: return "CLOCK_TAI"; - }; + } return "UNKNOWN_CLOCKID"; } diff --git a/tools/testing/selftests/timers/set-timer-lat.c b/tools/testing/selftests/timers/set-timer-lat.c index 50da454..d60bbca 100644 --- a/tools/testing/selftests/timers/set-timer-lat.c +++ b/tools/testing/selftests/timers/set-timer-lat.c @@ -80,7 +80,7 @@ char *clockstring(int clockid) return "CLOCK_BOOTTIME_ALARM"; case CLOCK_TAI: return "CLOCK_TAI"; - }; + } return "UNKNOWN_CLOCKID"; } -- 1.8.3.1

4 years, 10 months

1
0
0 0

[PATCH v3 0/2] kunit: fail tests on UBSAN errors

by Daniel Latypov

v1 by Uriel is here: [1]. Since it's been a while, I've dropped the Reviewed-By's. It depended on commit 83c4e7a0363b ("KUnit: KASAN Integration") which hadn't been merged yet, so that caused some kerfuffle with applying them previously and the series was reverted. This revives the series but makes the kunit_fail_current_test() function take a format string and logs the file and line number of the failing code, addressing Alan Maguire's comments on the previous version. As a result, the patch that makes UBSAN errors was tweaked slightly to include an error message. v2 -> v3: Fix kunit_fail_current_test() so it works w/ CONFIG_KUNIT=m s/_/__ on the helper func to match others in test.c [1] https://lore.kernel.org/linux-kselftest/20200806174326.3577537-1-urielguaja… Uriel Guajardo (2): kunit: support failure from dynamic analysis tools kunit: ubsan integration include/kunit/test-bug.h | 30 ++++++++++++++++++++++++++++++ lib/kunit/test.c | 37 +++++++++++++++++++++++++++++++++---- lib/ubsan.c | 3 +++ 3 files changed, 66 insertions(+), 4 deletions(-) create mode 100644 include/kunit/test-bug.h base-commit: 1e0d27fce010b0a4a9e595506b6ede75934c31be -- 2.30.0.478.g8a0d178c01-goog

4 years, 10 months

5
10
0 0

epoll: different edge-triggered behavior bewteen pipe and socketpair

by fruggeri＠arista.com

pipe() and socketpair() have different behavior wrt edge-triggered read epoll, in that no event is generated when data is written into a non-empty pipe, but an event is generated if socketpair() is used instead. This simple modification of the epoll2 testlet from tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c (it just adds a second write) shows the different behavior. The testlet passes with pipe() but fails with socketpair() with 5.10. They both fail with 4.19. Is it fair to assume that 5.10 pipe's behavior is the correct one? Thanks, Francesco Ruggeri /* * t0 * | (ew) * e0 * | (et) * s0 */ TEST(epoll2) { int efd; int sfd[2]; struct epoll_event e; ASSERT_EQ(socketpair(AF_UNIX, SOCK_STREAM, 0, sfd), 0); //ASSERT_EQ(pipe(sfd), 0); efd = epoll_create(1); ASSERT_GE(efd, 0); e.events = EPOLLIN | EPOLLET; ASSERT_EQ(epoll_ctl(efd, EPOLL_CTL_ADD, sfd[0], &e), 0); ASSERT_EQ(write(sfd[1], "w", 1), 1); EXPECT_EQ(epoll_wait(efd, &e, 1, 0), 1); ASSERT_EQ(write(sfd[1], "w", 1), 1); EXPECT_EQ(epoll_wait(efd, &e, 1, 0), 0); close(efd); close(sfd[0]); close(sfd[1]); }

4 years, 10 months

1
0
0 0

[PATCH v3 0/5] Some optimizations related to sgx

by Tianjia Zhang

This is an optimization of a set of sgx-related codes, each of which is independent of the patch. Because the second and third patches have conflicting dependencies, these patches are put together. --- v3 changes: * split free_cnt count and spin lock optimization into two patches v2 changes: * review suggested changes Tianjia Zhang (5): selftests/x86: Simplify the code to get vdso base address in sgx x86/sgx: Optimize the locking range in sgx_sanitize_section() x86/sgx: Optimize the free_cnt count in sgx_epc_section x86/sgx: Allows ioctl PROVISION to execute before CREATE x86/sgx: Remove redundant if conditions in sgx_encl_create arch/x86/kernel/cpu/sgx/driver.c | 1 + arch/x86/kernel/cpu/sgx/ioctl.c | 9 +++++---- arch/x86/kernel/cpu/sgx/main.c | 13 +++++-------- tools/testing/selftests/sgx/main.c | 24 ++++-------------------- 4 files changed, 15 insertions(+), 32 deletions(-) -- 2.19.1.3.ge56e4f7

4 years, 10 months

4
17
0 0

[PATCH] selftests: kvm: add hardware_disable test

by Marc Orr

From: Ignacio Alvarado <ikalvarado(a)google.com> This test launches 512 VMs in serial and kills them after a random amount of time. The test was original written to exercise KVM user notifiers in the context of1650b4ebc99d: - KVM: Disable irq while unregistering user notifier - https://lore.kernel.org/kvm/CACXrx53vkO=HKfwWwk+fVpvxcNjPrYmtDZ10qWxFvVX_PT… Recently, this test piqued my interest because it proved useful to for AMD SNP in exercising the "in-use" pages, described in APM section 15.36.12, "Running SNP-Active Virtual Machines". To run the test, first compile: $ make "CPPFLAGS=-static -Wl,--whole-archive -lpthread -Wl,--no-whole-archive" \ -C tools/testing/selftests/kvm/ Then, copy the test over to a machine with the kernel and run: $ ./hardware_disable_test Signed-off-by: Ignacio Alvarado <ikalvarado(a)google.com> Signed-off-by: Marc Orr <marcorr(a)google.com> --- tools/testing/selftests/kvm/.gitignore | 1 + tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/hardware_disable_test.c | 165 ++++++++++++++++++ 3 files changed, 167 insertions(+) create mode 100644 tools/testing/selftests/kvm/hardware_disable_test.c diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore index ce8f4ad39684..d631e111441a 100644 --- a/tools/testing/selftests/kvm/.gitignore +++ b/tools/testing/selftests/kvm/.gitignore @@ -28,6 +28,7 @@ /demand_paging_test /dirty_log_test /dirty_log_perf_test +/hardware_disable_test /kvm_create_max_vcpus /set_memory_region_test /steal_time diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index fe41c6a0fa67..c1c403d878f6 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -62,6 +62,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/tsc_msrs_test TEST_GEN_PROGS_x86_64 += demand_paging_test TEST_GEN_PROGS_x86_64 += dirty_log_test TEST_GEN_PROGS_x86_64 += dirty_log_perf_test +TEST_GEN_PROGS_x86_64 += hardware_disable_test TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus TEST_GEN_PROGS_x86_64 += set_memory_region_test TEST_GEN_PROGS_x86_64 += steal_time diff --git a/tools/testing/selftests/kvm/hardware_disable_test.c b/tools/testing/selftests/kvm/hardware_disable_test.c new file mode 100644 index 000000000000..2f2eeb8a1d86 --- /dev/null +++ b/tools/testing/selftests/kvm/hardware_disable_test.c @@ -0,0 +1,165 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * This test is intended to reproduce a crash that happens when + * kvm_arch_hardware_disable is called and it attempts to unregister the user + * return notifiers. + */ + +#define _GNU_SOURCE + +#include <fcntl.h> +#include <pthread.h> +#include <semaphore.h> +#include <stdint.h> +#include <stdlib.h> +#include <unistd.h> +#include <sys/wait.h> + +#include <test_util.h> + +#include "kvm_util.h" + +#define VCPU_NUM 4 +#define SLEEPING_THREAD_NUM (1 << 4) +#define FORK_NUM (1ULL << 9) +#define DELAY_US_MAX 2000 +#define GUEST_CODE_PIO_PORT 4 + +sem_t *sem; + +/* Arguments for the pthreads */ +struct payload { + struct kvm_vm *vm; + uint32_t index; +}; + +static void guest_code(void) +{ + for (;;) + ; /* Some busy work */ + printf("Should not be reached.\n"); +} + +static void *run_vcpu(void *arg) +{ + struct payload *payload = (struct payload *)arg; + struct kvm_run *state = vcpu_state(payload->vm, payload->index); + + vcpu_run(payload->vm, payload->index); + + TEST_ASSERT(false, "%s: exited with reason %d: %s\n", + __func__, state->exit_reason, + exit_reason_str(state->exit_reason)); + pthread_exit(NULL); +} + +static void *sleeping_thread(void *arg) +{ + int fd; + + while (true) { + fd = open("/dev/null", O_RDWR); + close(fd); + } + TEST_ASSERT(false, "%s: exited\n", __func__); + pthread_exit(NULL); +} + +static inline void check_create_thread(pthread_t *thread, pthread_attr_t *attr, + void *(*f)(void *), void *arg) +{ + int r; + + r = pthread_create(thread, attr, f, arg); + TEST_ASSERT(r == 0, "%s: failed to create thread", __func__); +} + +static inline void check_set_affinity(pthread_t thread, cpu_set_t *cpu_set) +{ + int r; + + r = pthread_setaffinity_np(thread, sizeof(cpu_set_t), cpu_set); + TEST_ASSERT(r == 0, "%s: failed set affinity", __func__); +} + +static inline void check_join(pthread_t thread, void **retval) +{ + int r; + + r = pthread_join(thread, retval); + TEST_ASSERT(r == 0, "%s: failed to join thread", __func__); +} + +static void run_test(uint32_t run) +{ + struct kvm_vm *vm; + cpu_set_t cpu_set; + pthread_t threads[VCPU_NUM]; + pthread_t throw_away; + struct payload payloads[VCPU_NUM]; + void *b; + uint32_t i, j; + + CPU_ZERO(&cpu_set); + for (i = 0; i < VCPU_NUM; i++) + CPU_SET(i, &cpu_set); + + vm = vm_create(VM_MODE_DEFAULT, DEFAULT_GUEST_PHY_PAGES, O_RDWR); + kvm_vm_elf_load(vm, program_invocation_name, 0, 0); + vm_create_irqchip(vm); + + fprintf(stderr, "%s: [%d] start vcpus\n", __func__, run); + for (i = 0; i < VCPU_NUM; ++i) { + vm_vcpu_add_default(vm, i, guest_code); + payloads[i].vm = vm; + payloads[i].index = i; + + check_create_thread(&threads[i], NULL, run_vcpu, + (void *)&payloads[i]); + check_set_affinity(threads[i], &cpu_set); + + for (j = 0; j < SLEEPING_THREAD_NUM; ++j) { + check_create_thread(&throw_away, NULL, sleeping_thread, + (void *)NULL); + check_set_affinity(throw_away, &cpu_set); + } + } + fprintf(stderr, "%s: [%d] all threads launched\n", __func__, run); + sem_post(sem); + for (i = 0; i < VCPU_NUM; ++i) + check_join(threads[i], &b); + /* Should not be reached */ + TEST_ASSERT(false, "%s: [%d] child escaped the ninja\n", __func__, run); +} + +int main(int argc, char **argv) +{ + uint32_t i; + int s, r; + pid_t pid; + + sem = sem_open("vm_sem", O_CREAT | O_EXCL, 0644, 0); + sem_unlink("vm_sem"); + + for (i = 0; i < FORK_NUM; ++i) { + pid = fork(); + TEST_ASSERT(pid >= 0, "%s: unable to fork", __func__); + if (pid == 0) + run_test(i); /* This function always exits */ + + fprintf(stderr, "%s: [%d] waiting semaphore\n", __func__, i); + sem_wait(sem); + r = (rand() % DELAY_US_MAX) + 1; + fprintf(stderr, "%s: [%d] waiting %dus\n", __func__, i, r); + usleep(r); + r = waitpid(pid, &s, WNOHANG); + TEST_ASSERT(r != pid, + "%s: [%d] child exited unexpectedly status: [%d]", + __func__, i, s); + fprintf(stderr, "%s: [%d] killing child\n", __func__, i); + kill(pid, SIGKILL); + } + + sem_destroy(sem); + exit(0); +} -- 2.30.0.478.g8a0d178c01-goog

4 years, 10 months

2
1
0 0

[PATCH v11 00/14] prohibit pinning pages in ZONE_MOVABLE

by Pavel Tatashin

Changelog --------- v11 - Another build fix reported by robot on i386: moved is_pinnable_page() below set_page_section() in linux/mm.h v10 - Fixed !CONFIG_MMU compiler issues by adding is_zero_pfn() stub. v9 - Renamed gpf_to_alloc_flags() to gfp_to_alloc_flags_cma(); thanks Lecopzer Chen for noticing. - Fixed warning reported scripts/checkpatch.pl: "Logical continuations should be on the previous line" v8 - Added reviewed by's from John Hubbard - Fixed subjects for selftests patches - Moved zero page check inside is_pinnable_page() as requested by Jason Gunthorpe. v7 - Added reviewed-by's - Fixed a compile bug on non-mmu builds reported by robot v6 Small update, but I wanted to send it out quicker, as it removes a controversial patch and replaces it with something sane. - Removed forcing FOLL_WRITE for longterm gup, instead added a patch to skip zero pages during migration. - Added reviewed-by's and minor log changes. v5 - Added the following patches to the beginning of series, which are fixes to the other existing problems with CMA migration code: mm/gup: check every subpage of a compound page during isolation mm/gup: return an error on migration failure mm/gup: check for isolation errors also at the beginning of series mm/gup: do not allow zero page for pinned pages - remove .gfp_mask/.reclaim_idx changes from mm/vmscan.c - update movable zone header comment in patch 8 instead of patch 3, fix the comment - Added acked, sign-offs - Updated commit logs based on feedback - Addressed issues reported by Michal and Jason. - Remove: #define PINNABLE_MIGRATE_MAX 10 #define PINNABLE_ISOLATE_MAX 100 Instead: fail on the first migration failure, and retry isolation forever as their failures are transient. - In self-set addressed some of the comments from John Hubbard, updated commit logs, and added comments. Renamed gup->flags with gup->test_flags. v4 - Address page migration comments. New patch: mm/gup: limit number of gup migration failures, honor failures Implements the limiting number of retries for migration failures, and also check for isolation failures. Added a test case into gup_test to verify that pages never long-term pinned in a movable zone, and also added tests to fault both in kernel and in userland. v3 - Merged with linux-next, which contains clean-up patch from Jason, therefore this series is reduced by two patches which did the same thing. v2 - Addressed all review comments - Added Reviewed-by's. - Renamed PF_MEMALLOC_NOMOVABLE to PF_MEMALLOC_PIN - Added is_pinnable_page() to check if page can be longterm pinned - Fixed gup fast path by checking is_in_pinnable_zone() - rename cma_page_list to movable_page_list - add a admin-guide note about handling pinned pages in ZONE_MOVABLE, updated caveat about pinned pages from linux/mmzone.h - Move current_gfp_context() to fast-path --------- When page is pinned it cannot be moved and its physical address stays the same until pages is unpinned. This is useful functionality to allows userland to implementation DMA access. For example, it is used by vfio in vfio_pin_pages(). However, this functionality breaks memory hotplug/hotremove assumptions that pages in ZONE_MOVABLE can always be migrated. This patch series fixes this issue by forcing new allocations during page pinning to omit ZONE_MOVABLE, and also to migrate any existing pages from ZONE_MOVABLE during pinning. It uses the same scheme logic that is currently used by CMA, and extends the functionality for all allocations. For more information read the discussion [1] about this problem. [1] https://lore.kernel.org/lkml/CA+CK2bBffHBxjmb9jmSKacm0fJMinyt3Nhk8Nx6iudcQS… Previous versions: v1 https://lore.kernel.org/lkml/20201202052330.474592-1-pasha.tatashin@soleen.… v2 https://lore.kernel.org/lkml/20201210004335.64634-1-pasha.tatashin@soleen.c… v3 https://lore.kernel.org/lkml/20201211202140.396852-1-pasha.tatashin@soleen.… v4 https://lore.kernel.org/lkml/20201217185243.3288048-1-pasha.tatashin@soleen… v5 https://lore.kernel.org/lkml/20210119043920.155044-1-pasha.tatashin@soleen.… v6 https://lore.kernel.org/lkml/20210120014333.222547-1-pasha.tatashin@soleen.… v7 https://lore.kernel.org/lkml/20210122033748.924330-1-pasha.tatashin@soleen.… v8 https://lore.kernel.org/lkml/20210125194751.1275316-1-pasha.tatashin@soleen… v9 https://lore.kernel.org/lkml/20210201153827.444374-1-pasha.tatashin@soleen.… v10 https://lore.kernel.org/lkml/20210211162427.618913-1-pasha.tatashin@soleen.… Pavel Tatashin (14): mm/gup: don't pin migrated cma pages in movable zone mm/gup: check every subpage of a compound page during isolation mm/gup: return an error on migration failure mm/gup: check for isolation errors mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_PIN mm: apply per-task gfp constraints in fast path mm: honor PF_MEMALLOC_PIN for all movable pages mm/gup: do not migrate zero page mm/gup: migrate pinned pages out of movable zone memory-hotplug.rst: add a note about ZONE_MOVABLE and page pinning mm/gup: change index type to long as it counts pages mm/gup: longterm pin migration cleanup selftests/vm: gup_test: fix test flag selftests/vm: gup_test: test faulting in kernel, and verify pinnable pages .../admin-guide/mm/memory-hotplug.rst | 9 + include/linux/migrate.h | 1 + include/linux/mm.h | 19 ++ include/linux/mmzone.h | 13 +- include/linux/pgtable.h | 12 ++ include/linux/sched.h | 2 +- include/linux/sched/mm.h | 27 +-- include/trace/events/migrate.h | 3 +- mm/gup.c | 174 ++++++++---------- mm/gup_test.c | 29 +-- mm/gup_test.h | 3 +- mm/hugetlb.c | 4 +- mm/page_alloc.c | 33 ++-- tools/testing/selftests/vm/gup_test.c | 36 +++- 14 files changed, 208 insertions(+), 157 deletions(-) -- 2.25.1

4 years, 10 months

1
14
0 0

[PATCH] testptp: Fix compile with musl libc

by Hauke Mehrtens

Musl libc does not define the glibc specific macro __GLIBC_PREREQ(), but it has the clock_adjtime() function. Assume that a libc implementation which does not define __GLIBC_PREREQ at all still implements clock_adjtime(). This fixes a build problem with musl libc because the __GLIBC_PREREQ() macro is missing. Fixes: 42e1358e103d ("ptp: In the testptp utility, use clock_adjtime from glibc when available") Signed-off-by: Hauke Mehrtens <hauke(a)hauke-m.de> --- tools/testing/selftests/ptp/testptp.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/ptp/testptp.c b/tools/testing/selftests/ptp/testptp.c index f7911aaeb007..ecffe2c78543 100644 --- a/tools/testing/selftests/ptp/testptp.c +++ b/tools/testing/selftests/ptp/testptp.c @@ -38,6 +38,7 @@ #define NSEC_PER_SEC 1000000000LL /* clock_adjtime is not available in GLIBC < 2.14 */ +#ifdef __GLIBC_PREREQ #if !__GLIBC_PREREQ(2, 14) #include <sys/syscall.h> static int clock_adjtime(clockid_t id, struct timex *tx) @@ -45,6 +46,7 @@ static int clock_adjtime(clockid_t id, struct timex *tx) return syscall(__NR_clock_adjtime, id, tx); } #endif +#endif /* __GLIBC_PREREQ */ static void show_flag_test(int rq_index, unsigned int flags, int err) { -- 2.20.1

4 years, 10 months

2
3
0 0

[PATCH v4 0/5] Some optimizations related to sgx

by Tianjia Zhang

This is an optimization of a set of sgx-related codes, each of which is independent of the patch. Because the second and third patches have conflicting dependencies, these patches are put together. --- v4 changes: * Improvements suggested by review v3 changes: * split free_cnt count and spin lock optimization into two patches v2 changes: * review suggested changes Tianjia Zhang (5): selftests/x86: Use getauxval() to simplify the code in sgx x86/sgx: Reduce the locking range in sgx_sanitize_section() x86/sgx: Optimize the free_cnt count in sgx_epc_section x86/sgx: Allows ioctl PROVISION to execute before CREATE x86/sgx: Remove redundant if conditions in sgx_encl_create arch/x86/kernel/cpu/sgx/driver.c | 1 + arch/x86/kernel/cpu/sgx/ioctl.c | 8 ++++---- arch/x86/kernel/cpu/sgx/main.c | 13 +++++-------- tools/testing/selftests/sgx/main.c | 24 ++++-------------------- 4 files changed, 14 insertions(+), 32 deletions(-) -- 2.19.1.3.ge56e4f7

4 years, 11 months

3
16
0 0

[PATCH v10 00/14] prohibit pinning pages in ZONE_MOVABLE

by Pavel Tatashin

Changelog --------- v10 - Fixed !CONFIG_MMU compiler issues by adding is_zero_pfn() stub. v9 - Renamed gpf_to_alloc_flags() to gfp_to_alloc_flags_cma(); thanks Lecopzer Chen for noticing. - Fixed warning reported scripts/checkpatch.pl: "Logical continuations should be on the previous line" v8 - Added reviewed by's from John Hubbard - Fixed subjects for selftests patches - Moved zero page check inside is_pinnable_page() as requested by Jason Gunthorpe. v7 - Added reviewed-by's - Fixed a compile bug on non-mmu builds reported by robot v6 Small update, but I wanted to send it out quicker, as it removes a controversial patch and replaces it with something sane. - Removed forcing FOLL_WRITE for longterm gup, instead added a patch to skip zero pages during migration. - Added reviewed-by's and minor log changes. v5 - Added the following patches to the beginning of series, which are fixes to the other existing problems with CMA migration code: mm/gup: check every subpage of a compound page during isolation mm/gup: return an error on migration failure mm/gup: check for isolation errors also at the beginning of series mm/gup: do not allow zero page for pinned pages - remove .gfp_mask/.reclaim_idx changes from mm/vmscan.c - update movable zone header comment in patch 8 instead of patch 3, fix the comment - Added acked, sign-offs - Updated commit logs based on feedback - Addressed issues reported by Michal and Jason. - Remove: #define PINNABLE_MIGRATE_MAX 10 #define PINNABLE_ISOLATE_MAX 100 Instead: fail on the first migration failure, and retry isolation forever as their failures are transient. - In self-set addressed some of the comments from John Hubbard, updated commit logs, and added comments. Renamed gup->flags with gup->test_flags. v4 - Address page migration comments. New patch: mm/gup: limit number of gup migration failures, honor failures Implements the limiting number of retries for migration failures, and also check for isolation failures. Added a test case into gup_test to verify that pages never long-term pinned in a movable zone, and also added tests to fault both in kernel and in userland. v3 - Merged with linux-next, which contains clean-up patch from Jason, therefore this series is reduced by two patches which did the same thing. v2 - Addressed all review comments - Added Reviewed-by's. - Renamed PF_MEMALLOC_NOMOVABLE to PF_MEMALLOC_PIN - Added is_pinnable_page() to check if page can be longterm pinned - Fixed gup fast path by checking is_in_pinnable_zone() - rename cma_page_list to movable_page_list - add a admin-guide note about handling pinned pages in ZONE_MOVABLE, updated caveat about pinned pages from linux/mmzone.h - Move current_gfp_context() to fast-path --------- When page is pinned it cannot be moved and its physical address stays the same until pages is unpinned. This is useful functionality to allows userland to implementation DMA access. For example, it is used by vfio in vfio_pin_pages(). However, this functionality breaks memory hotplug/hotremove assumptions that pages in ZONE_MOVABLE can always be migrated. This patch series fixes this issue by forcing new allocations during page pinning to omit ZONE_MOVABLE, and also to migrate any existing pages from ZONE_MOVABLE during pinning. It uses the same scheme logic that is currently used by CMA, and extends the functionality for all allocations. For more information read the discussion [1] about this problem. [1] https://lore.kernel.org/lkml/CA+CK2bBffHBxjmb9jmSKacm0fJMinyt3Nhk8Nx6iudcQS… Previous versions: v1 https://lore.kernel.org/lkml/20201202052330.474592-1-pasha.tatashin@soleen.… v2 https://lore.kernel.org/lkml/20201210004335.64634-1-pasha.tatashin@soleen.c… v3 https://lore.kernel.org/lkml/20201211202140.396852-1-pasha.tatashin@soleen.… v4 https://lore.kernel.org/lkml/20201217185243.3288048-1-pasha.tatashin@soleen… v5 https://lore.kernel.org/lkml/20210119043920.155044-1-pasha.tatashin@soleen.… v6 https://lore.kernel.org/lkml/20210120014333.222547-1-pasha.tatashin@soleen.… v7 https://lore.kernel.org/lkml/20210122033748.924330-1-pasha.tatashin@soleen.… v8 https://lore.kernel.org/lkml/20210125194751.1275316-1-pasha.tatashin@soleen… v9 https://lore.kernel.org/lkml/20210201153827.444374-1-pasha.tatashin@soleen.… Pavel Tatashin (14): mm/gup: don't pin migrated cma pages in movable zone mm/gup: check every subpage of a compound page during isolation mm/gup: return an error on migration failure mm/gup: check for isolation errors mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_PIN mm: apply per-task gfp constraints in fast path mm: honor PF_MEMALLOC_PIN for all movable pages mm/gup: do not migrate zero page mm/gup: migrate pinned pages out of movable zone memory-hotplug.rst: add a note about ZONE_MOVABLE and page pinning mm/gup: change index type to long as it counts pages mm/gup: longterm pin migration cleanup selftests/vm: gup_test: fix test flag selftests/vm: gup_test: test faulting in kernel, and verify pinnable pages .../admin-guide/mm/memory-hotplug.rst | 9 + include/linux/migrate.h | 1 + include/linux/mm.h | 19 ++ include/linux/mmzone.h | 13 +- include/linux/pgtable.h | 12 ++ include/linux/sched.h | 2 +- include/linux/sched/mm.h | 27 +-- include/trace/events/migrate.h | 3 +- mm/gup.c | 174 ++++++++---------- mm/gup_test.c | 29 +-- mm/gup_test.h | 3 +- mm/hugetlb.c | 4 +- mm/page_alloc.c | 33 ++-- tools/testing/selftests/vm/gup_test.c | 36 +++- 14 files changed, 208 insertions(+), 157 deletions(-) -- 2.25.1

4 years, 11 months

1
14
0 0

[PATCH v6 1/4] lib: vsprintf: scanf: Negative number must have field width > 1

by Richard Fitzgerald

If a signed number field starts with a '-' the field width must be > 1, or unlimited, to allow at least one digit after the '-'. This patch adds a check for this. If a signed field starts with '-' and field_width == 1 the scanf will quit. It is ok for a signed number field to have a field width of 1 if it starts with a digit. In that case the single digit can be converted. Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com> Reviewed-by: Petr Mladek <pmladek(a)suse.com> Acked-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com> --- lib/vsprintf.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 3b53c73580c5..28bb26cd1f67 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -3434,8 +3434,12 @@ int vsscanf(const char *buf, const char *fmt, va_list args) str = skip_spaces(str); digit = *str; - if (is_sign && digit == '-') + if (is_sign && digit == '-') { + if (field_width == 1) + break; + digit = *(str + 1); + } if (!digit || (base == 16 && !isxdigit(digit)) -- 2.20.1

4 years, 11 months

2
5
0 0

[PATCH v5 1/4] lib: vsprintf: scanf: Negative number must have field width > 1

by Richard Fitzgerald

If a signed number field starts with a '-' the field width must be > 1, or unlimited, to allow at least one digit after the '-'. This patch adds a check for this. If a signed field starts with '-' and field_width == 1 the scanf will quit. It is ok for a signed number field to have a field width of 1 if it starts with a digit. In that case the single digit can be converted. Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com> Reviewed-by: Petr Mladek <pmladek(a)suse.com> Acked-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com> --- lib/vsprintf.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 3b53c73580c5..28bb26cd1f67 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -3434,8 +3434,12 @@ int vsscanf(const char *buf, const char *fmt, va_list args) str = skip_spaces(str); digit = *str; - if (is_sign && digit == '-') + if (is_sign && digit == '-') { + if (field_width == 1) + break; + digit = *(str + 1); + } if (!digit || (base == 16 && !isxdigit(digit)) -- 2.20.1

4 years, 11 months

3
7
0 0

[PATCH] selftests/harness: pass variant to teardown

by Willem de Bruijn

From: Willem de Bruijn <willemb(a)google.com> FIXTURE_VARIANT data is passed to FIXTURE_SETUP and TEST_F as variant. In some cases, the variant will change the setup, such that expections also change on teardown. Also pass variant to FIXTURE_TEARDOWN. The new FIXTURE_TEARDOWN logic is identical to that in FIXTURE_SETUP, right above. Signed-off-by: Willem de Bruijn <willemb(a)google.com> --- For one use of this see tentative tools/testing/selftests/filesystems/selectpoll.c kselftest at https://github.com/wdebruij/linux-next-mirror/commit/12b4d183ac9140c1360637… --- tools/testing/selftests/kselftest_harness.h | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h index f19804df244c..6a27e79278e8 100644 --- a/tools/testing/selftests/kselftest_harness.h +++ b/tools/testing/selftests/kselftest_harness.h @@ -283,7 +283,9 @@ #define FIXTURE_TEARDOWN(fixture_name) \ void fixture_name##_teardown( \ struct __test_metadata __attribute__((unused)) *_metadata, \ - FIXTURE_DATA(fixture_name) __attribute__((unused)) *self) + FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \ + const FIXTURE_VARIANT(fixture_name) \ + __attribute__((unused)) *variant) /** * FIXTURE_VARIANT(fixture_name) - Optionally called once per fixture @@ -298,9 +300,9 @@ * ... * }; * - * Defines type of constant parameters provided to FIXTURE_SETUP() and TEST_F() - * as *variant*. Variants allow the same tests to be run with different - * arguments. + * Defines type of constant parameters provided to FIXTURE_SETUP(), TEST_F() and + * FIXTURE_TEARDOWN as *variant*. Variants allow the same tests to be run with + * different arguments. */ #define FIXTURE_VARIANT(fixture_name) struct _fixture_variant_##fixture_name @@ -382,7 +384,7 @@ if (!_metadata->passed) \ return; \ fixture_name##_##test_name(_metadata, &self, variant->data); \ - fixture_name##_teardown(_metadata, &self); \ + fixture_name##_teardown(_metadata, &self, variant->data); \ } \ static struct __test_metadata \ _##fixture_name##_##test_name##_object = { \ -- 2.29.2.576.ga3fc446d84-goog

4 years, 11 months

3
4
0 0

[PATCH bpf-next] selftests/bpf: Simplify the calculation of variables

by Jiapeng Chong

Fix the following coccicheck warnings: ./tools/testing/selftests/bpf/xdpxceiver.c:954:28-30: WARNING !A || A && B is equivalent to !A || B. ./tools/testing/selftests/bpf/xdpxceiver.c:932:28-30: WARNING !A || A && B is equivalent to !A || B. ./tools/testing/selftests/bpf/xdpxceiver.c:909:28-30: WARNING !A || A && B is equivalent to !A || B. Reported-by: Abaci Robot <abaci(a)linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com> --- tools/testing/selftests/bpf/xdpxceiver.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/bpf/xdpxceiver.c b/tools/testing/selftests/bpf/xdpxceiver.c index 99ea6cf..f4a96d5 100644 --- a/tools/testing/selftests/bpf/xdpxceiver.c +++ b/tools/testing/selftests/bpf/xdpxceiver.c @@ -897,7 +897,7 @@ static void *worker_testapp_validate(void *arg) ksft_print_msg("Destroying socket\n"); } - if (!opt_bidi || (opt_bidi && bidi_pass)) { + if (!opt_bidi || bidi_pass) { xsk_socket__delete(ifobject->xsk->xsk); (void)xsk_umem__delete(ifobject->umem->umem); } @@ -922,7 +922,7 @@ static void testapp_validate(void) pthread_mutex_lock(&sync_mutex); /*Spawn RX thread */ - if (!opt_bidi || (opt_bidi && !bidi_pass)) { + if (!opt_bidi || !bidi_pass) { if (pthread_create(&t0, &attr, worker_testapp_validate, ifdict[1])) exit_with_error(errno); } else if (opt_bidi && bidi_pass) { @@ -942,7 +942,7 @@ static void testapp_validate(void) pthread_mutex_unlock(&sync_mutex); /*Spawn TX thread */ - if (!opt_bidi || (opt_bidi && !bidi_pass)) { + if (!opt_bidi || !bidi_pass) { if (pthread_create(&t1, &attr, worker_testapp_validate, ifdict[0])) exit_with_error(errno); } else if (opt_bidi && bidi_pass) { -- 1.8.3.1

4 years, 11 months

2
1
0 0

[PATCH bpf 0/4] Expose network namespace cookies to user space

by Lorenz Bauer

We're working on a user space control plane for the BPF sk_lookup hook [1]. The hook attaches to a network namespace and allows control over which socket receives a new connection / packet. Roughly, applications can give a socket to our user space component to participate in custom bind semantics. This creates an edge case where an application can provide us with a socket that lives in a different network namespace than our BPF sk_lookup program. We'd like to return an error in this case. Additionally, we have some user space state that is tied to the network namespace. We currently use the inode of the nsfs entry in a directory name, but this is suffers from inode reuse. I'm proposing to fix both of these issues by adding a new SO_NETNS_COOKIE socket option as well as a NS_GET_COOKIE ioctl. Using these we get a stable, unique identifier for a network namespace and check whether a socket belongs to the "correct" namespace. NS_GET_COOKIE could be renamed to NS_GET_NET_COOKIE. I kept the name generic because it seems like other namespace types could benefit from a cookie as well. I'm trying to land this via the bpf tree since this is where the netns cookie originated, please let me know if this isn't appropriate. 1: https://www.kernel.org/doc/html/latest/bpf/prog_sk_lookup.html Cc: bpf(a)vger.kernel.org Cc: linux-alpha(a)vger.kernel.org Cc: linux-api(a)vger.kernel.org Cc: linux-arch(a)vger.kernel.org Cc: linux-fsdevel(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: linux-mips(a)vger.kernel.org Cc: linux-parisc(a)vger.kernel.org Cc: netdev(a)vger.kernel.org Cc: sparclinux(a)vger.kernel.org Lorenz Bauer (4): net: add SO_NETNS_COOKIE socket option nsfs: add an ioctl to discover the network namespace cookie tools/testing: add test for NS_GET_COOKIE tools/testing: add a selftest for SO_NETNS_COOKIE arch/alpha/include/uapi/asm/socket.h | 2 + arch/mips/include/uapi/asm/socket.h | 2 + arch/parisc/include/uapi/asm/socket.h | 2 + arch/sparc/include/uapi/asm/socket.h | 2 + fs/nsfs.c | 9 +++ include/linux/sock_diag.h | 20 ++++++ include/net/net_namespace.h | 11 ++++ include/uapi/asm-generic/socket.h | 2 + include/uapi/linux/nsfs.h | 2 + net/core/filter.c | 9 ++- net/core/sock.c | 7 +++ tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/Makefile | 2 +- tools/testing/selftests/net/so_netns_cookie.c | 61 +++++++++++++++++++ tools/testing/selftests/nsfs/.gitignore | 1 + tools/testing/selftests/nsfs/Makefile | 2 +- tools/testing/selftests/nsfs/netns.c | 57 +++++++++++++++++ 17 files changed, 185 insertions(+), 7 deletions(-) create mode 100644 tools/testing/selftests/net/so_netns_cookie.c create mode 100644 tools/testing/selftests/nsfs/netns.c -- 2.27.0

4 years, 11 months

1
2
0 0

[RFC PATCH 0/2] Add a test for kvm page table code

by Yanan Wang

Hi, This test is added to serve as a performance tester and a bug reproducer for kvm page table code (GPA->HPA mappings), it gives guidance for the people trying to make some improvement for kvm. The following explains what we can exactly do through this test. And a RFC is sent for comments, thanks. The function guest_code() is designed to cover conditions where a single vcpu or multiple vcpus access guest pages within the same memory range, in three VM stages(before dirty-logging, during dirty-logging, after dirty-logging). Besides, the backing source memory type(ANONYMOUS/THP/HUGETLB) of the tested memory region can be specified by users, which means normal page mappings or block mappings can be chosen by users to be created in the test. If use of ANONYMOUS memory is specified, kvm will create page mappings for the tested memory region before dirty-logging, and update attributes of the page mappings from RO to RW during dirty-logging. If use of THP/HUGETLB memory is specified, kvm will create block mappings for the tested memory region before dirty-logging, and split the blcok mappings into page mappings during dirty-logging, and coalesce the page mappings back into block mappings after dirty-logging is stopped. So in summary, as a performance tester, this test can present the performance of kvm creating/updating normal page mappings, or the performance of kvm creating/splitting/recovering block mappings, through execution time. When we need to coalesce the page mappings back to block mappings after dirty logging is stopped, we have to firstly invalidate *all* the TLB entries for the page mappings right before installation of the block entry, because a TLB conflict abort error could occur if we can't invalidate the TLB entries fully. We have hit this TLB conflict twice on aarch64 software implementation and fixed it. As this test can imulate process from dirty-logging enabled to dirty-logging stopped of a VM with block mappings, so it can also reproduce this TLB conflict abort due to inadequate TLB invalidation when coalescing tables. Links about the TLB conflict abort: https://lore.kernel.org/lkml/20201201201034.116760-3-wangyanan55@huawei.com/ --- Here are some test examples of this test: platform: HiSilicon Kunpeng920 (aarch64, FWB not supported) host kernel: Linux mainline (1) Based on v5.11-rc6 cmdline: ./kvm_page_table_test -m 4 -t 0 -g 4K -s 1G -v 1 (1 vcpu, 1G memory, page mappings(granule 4K)) KVM_CREATE_MAPPINGS: 0.8196s 0.8260s 0.8258s 0.8169s 0.8190s KVM_UPDATE_MAPPINGS: 1.1930s 1.1949s 1.1940s 1.1934s 1.1946s cmdline: ./kvm_page_table_test -m 4 -t 0 -g 4K -s 1G -v 20 (20 vcpus, 1G memory, page mappings(granule 4K)) KVM_CREATE_MAPPINGS: 23.4028s 23.8015s 23.6702s 23.9437s 22.1646s KVM_UPDATE_MAPPINGS: 16.9550s 16.4734s 16.8300s 16.9621s 16.9402s cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 1 (1 vcpu, 20G memory, block mappings(granule 1G)) KVM_CREATE_MAPPINGS: 3.7040s 3.7053s 3.7047s 3.7061s 3.7068s KVM_ADJUST_MAPPINGS: 2.8264s 2.8266s 2.8272s 2.8259s 2.8283s cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 20 (20 vcpus, 20G memory, block mappings(granule 1G)) KVM_CREATE_MAPPINGS: 52.8338s 52.8327s 52.8336s 52.8255s 52.8303s KVM_ADJUST_MAPPINGS: 52.0466s 52.0473s 52.0550s 52.0518s 52.0467s (2) I have post a patch series to improve efficiency of stage2 page table code, so test the performance changes. cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 20 (20 vcpus, 20G memory, block mappings(granule 1G)) Before patch: KVM_CREATE_MAPPINGS: 52.8338s 52.8327s 52.8336s 52.8255s 52.8303s After patch: KVM_CREATE_MAPPINGS: 3.7022s 3.7031s 3.7028s 3.7012s 3.7024s Before patch: KVM_ADJUST_MAPPINGS: 52.0466s 52.0473s 52.0550s 52.0518s 52.0467s After patch: KVM_ADJUST_MAPPINGS: 0.3008s 0.3004s 0.2974s 0.2917s 0.2900s cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 40 (40 vcpus, 20G memory, block mappings(granule 1G)) Before patch: KVM_CREATE_MAPPINGS: 104.560s 104.556s 104.554s 104.556s 104.550s After patch: KVM_CREATE_MAPPINGS: 3.7011s 3.7103s 3.7005s 3.7024s 3.7106s Before patch: KVM_ADJUST_MAPPINGS: 103.931s 103.936s 103.927s 103.942s 103.927s After patch: KVM_ADJUST_MAPPINGS: 0.3541s 0.3694s 0.3656s 0.3693s 0.3687s --- Yanan Wang (2): KVM: selftests: Add a macro to get string of vm_mem_backing_src_type KVM: selftests: Add a test for kvm page table code tools/testing/selftests/kvm/Makefile | 3 + .../testing/selftests/kvm/include/kvm_util.h | 3 + .../selftests/kvm/kvm_page_table_test.c | 518 ++++++++++++++++++ tools/testing/selftests/kvm/lib/kvm_util.c | 8 + 4 files changed, 532 insertions(+) create mode 100644 tools/testing/selftests/kvm/kvm_page_table_test.c -- 2.23.0

4 years, 11 months

5
18
0 0

[PATCH] selftests/vDSO: fix ABI selftest on riscv

by Tobias Klauser

Only older versions of the RISC-V GCC toolchain define __riscv__. Check for __riscv as well, which is used by newer GCC toolchains. Also set VDSO_32BIT based on __riscv_xlen. Before (on riscv64): $ ./vdso_test_abi [vDSO kselftest] VDSO_VERSION: LINUX_4 Could not find __vdso_gettimeofday Could not find __vdso_clock_gettime Could not find __vdso_clock_getres clock_id: CLOCK_REALTIME [PASS] Could not find __vdso_clock_gettime Could not find __vdso_clock_getres clock_id: CLOCK_BOOTTIME [PASS] Could not find __vdso_clock_gettime Could not find __vdso_clock_getres clock_id: CLOCK_TAI [PASS] Could not find __vdso_clock_gettime Could not find __vdso_clock_getres clock_id: CLOCK_REALTIME_COARSE [PASS] Could not find __vdso_clock_gettime Could not find __vdso_clock_getres clock_id: CLOCK_MONOTONIC [PASS] Could not find __vdso_clock_gettime Could not find __vdso_clock_getres clock_id: CLOCK_MONOTONIC_RAW [PASS] Could not find __vdso_clock_gettime Could not find __vdso_clock_getres clock_id: CLOCK_MONOTONIC_COARSE [PASS] Could not find __vdso_time After (on riscv32): $ ./vdso_test_abi [vDSO kselftest] VDSO_VERSION: LINUX_4.15 The time is 1612449376.015086 The time is 1612449376.18340784 The resolution is 0 1 clock_id: CLOCK_REALTIME [PASS] The time is 774.842586182 The resolution is 0 1 clock_id: CLOCK_BOOTTIME [PASS] The time is 1612449376.22536565 The resolution is 0 1 clock_id: CLOCK_TAI [PASS] The time is 1612449376.20885172 The resolution is 0 4000000 clock_id: CLOCK_REALTIME_COARSE [PASS] The time is 774.845491269 The resolution is 0 1 clock_id: CLOCK_MONOTONIC [PASS] The time is 774.849534200 The resolution is 0 1 clock_id: CLOCK_MONOTONIC_RAW [PASS] The time is 774.842139684 The resolution is 0 4000000 clock_id: CLOCK_MONOTONIC_COARSE [PASS] Could not find __vdso_time Signed-off-by: Tobias Klauser <tklauser(a)distanz.ch> --- tools/testing/selftests/vDSO/vdso_config.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/vDSO/vdso_config.h b/tools/testing/selftests/vDSO/vdso_config.h index 6a6fe8d4ff55..6188b16827d1 100644 --- a/tools/testing/selftests/vDSO/vdso_config.h +++ b/tools/testing/selftests/vDSO/vdso_config.h @@ -47,10 +47,12 @@ #elif defined(__x86_64__) #define VDSO_VERSION 0 #define VDSO_NAMES 1 -#elif defined(__riscv__) +#elif defined(__riscv__) || defined(__riscv) #define VDSO_VERSION 5 #define VDSO_NAMES 1 +#if __riscv_xlen == 32 #define VDSO_32BIT 1 +#endif #else /* nds32 */ #define VDSO_VERSION 4 #define VDSO_NAMES 1 -- 2.30.0

4 years, 11 months

4
6
0 0

[PATCH] selftests/seccomp: Accept any valid fd in user_notification_addfd

by Seth Forshee

This test expects fds to have specific values, which works fine when the test is run standalone. However, the kselftest runner consumes a couple of extra fds for redirection when running tests, so the test fails when run via kselftest. Change the test to pass on any valid fd number. Signed-off-by: Seth Forshee <seth.forshee(a)canonical.com> --- tools/testing/selftests/seccomp/seccomp_bpf.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 26c72f2b61b1..9338df6f4ca8 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -4019,18 +4019,14 @@ TEST(user_notification_addfd) /* Verify we can set an arbitrary remote fd */ fd = ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd); - /* - * The child has fds 0(stdin), 1(stdout), 2(stderr), 3(memfd), - * 4(listener), so the newly allocated fd should be 5. - */ - EXPECT_EQ(fd, 5); + EXPECT_GE(fd, 0); EXPECT_EQ(filecmp(getpid(), pid, memfd, fd), 0); /* Verify we can set an arbitrary remote fd with large size */ memset(&big, 0x0, sizeof(big)); big.addfd = addfd; fd = ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD_BIG, &big); - EXPECT_EQ(fd, 6); + EXPECT_GE(fd, 0); /* Verify we can set a specific remote fd */ addfd.newfd = 42; -- 2.29.2

4 years, 11 months

3
3
0 0

[PATCH v2 0/2] kunit: fail tests on UBSAN errors

by Daniel Latypov

v1 by Uriel is here: [1]. Since it's been a while, I've dropped the Reviewed-By's. It depended on commit 83c4e7a0363b ("KUnit: KASAN Integration") which hadn't been merged yet, so that caused some kerfuffle with applying them previously and the series was reverted. This revives the series but makes the kunit_fail_current_test() function take a format string and logs the file and line number of the failing code, addressing Alan Maguire's comments on the previous version. As a result, the patch that makes UBSAN errors was tweaked slightly to include an error message. [1] https://lore.kernel.org/linux-kselftest/20200806174326.3577537-1-urielguaja… Uriel Guajardo (2): kunit: support failure from dynamic analysis tools kunit: ubsan integration include/kunit/test-bug.h | 30 ++++++++++++++++++++++++++++++ lib/kunit/test.c | 36 ++++++++++++++++++++++++++++++++---- lib/ubsan.c | 3 +++ 3 files changed, 65 insertions(+), 4 deletions(-) create mode 100644 include/kunit/test-bug.h base-commit: 1e0d27fce010b0a4a9e595506b6ede75934c31be -- 2.30.0.478.g8a0d178c01-goog

4 years, 11 months

2
6
0 0

Correct .gitignore in several places for kselftest outputs

by Erik Hollensbe

Sincerely hope I did this right; first time contributor, learning the patch workflow. I took the opportunity to fix two .gitignore files that were leaving stale worktree/index outputs after running `make kselftest` against recent mainline (76c057c84d286140c6c416c3b4ba832cd1d8984e). Thanks for your time!

4 years, 11 months

3
5
0 0

[PATCH] selftests/bpf: Simplify the calculation of variables

by Jiapeng Chong

Fix the following coccicheck warnings: ./tools/testing/selftests/bpf/xdpxceiver.c:954:28-30: WARNING !A || A && B is equivalent to !A || B. ./tools/testing/selftests/bpf/xdpxceiver.c:932:28-30: WARNING !A || A && B is equivalent to !A || B. ./tools/testing/selftests/bpf/xdpxceiver.c:909:28-30: WARNING !A || A && B is equivalent to !A || B. Reported-by: Abaci Robot <abaci(a)linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com> --- tools/testing/selftests/bpf/xdpxceiver.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/bpf/xdpxceiver.c b/tools/testing/selftests/bpf/xdpxceiver.c index 99ea6cf..f4a96d5 100644 --- a/tools/testing/selftests/bpf/xdpxceiver.c +++ b/tools/testing/selftests/bpf/xdpxceiver.c @@ -897,7 +897,7 @@ static void *worker_testapp_validate(void *arg) ksft_print_msg("Destroying socket\n"); } - if (!opt_bidi || (opt_bidi && bidi_pass)) { + if (!opt_bidi || bidi_pass) { xsk_socket__delete(ifobject->xsk->xsk); (void)xsk_umem__delete(ifobject->umem->umem); } @@ -922,7 +922,7 @@ static void testapp_validate(void) pthread_mutex_lock(&sync_mutex); /*Spawn RX thread */ - if (!opt_bidi || (opt_bidi && !bidi_pass)) { + if (!opt_bidi || !bidi_pass) { if (pthread_create(&t0, &attr, worker_testapp_validate, ifdict[1])) exit_with_error(errno); } else if (opt_bidi && bidi_pass) { @@ -942,7 +942,7 @@ static void testapp_validate(void) pthread_mutex_unlock(&sync_mutex); /*Spawn TX thread */ - if (!opt_bidi || (opt_bidi && !bidi_pass)) { + if (!opt_bidi || !bidi_pass) { if (pthread_create(&t1, &attr, worker_testapp_validate, ifdict[0])) exit_with_error(errno); } else if (opt_bidi && bidi_pass) { -- 1.8.3.1

4 years, 11 months

1
0
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror