January 2024 - Linux-kselftest-mirror

by Reinette Chatre

Hi Shuah, Could you please consider Ilpo's resctrl selftest enhancements [1] for inclusion into kselftest's "next" branch in preparation for the next merge window? Thank you very much. Reinette [1] https://lore.kernel.org/lkml/20231215150515.36983-1-ilpo.jarvinen@linux.int…

1 year, 4 months

2
4
0 0

[PATCH v3 0/7] Split a folio to any lower order folios

by Zi Yan

From: Zi Yan <ziy(a)nvidia.com> Hi all, File folio supports any order and people would like to support flexible orders for anonymous folio[1] too. Currently, split_huge_page() only splits a huge page to order-0 pages, but splitting to orders higher than 0 is also useful. This patchset adds support for splitting a huge page to any lower order pages and uses it during file folio truncate operations. The patchset is on top of mm-everything-2023-03-27-21-20. Changelog === Since v2 --- 1. Fixed an issue in __split_page_owner() introduced during my rebase Since v1 --- 1. Changed split_page_memcg() and split_page_owner() parameter to use order 2. Used folio_test_pmd_mappable() in place of the equivalent code Details === * Patch 1 changes split_page_memcg() to use order instead of nr_pages * Patch 2 changes split_page_owner() to use order instead of nr_pages * Patch 3 and 4 add new_order parameter split_page_memcg() and split_page_owner() and prepare for upcoming changes. * Patch 5 adds split_huge_page_to_list_to_order() to split a huge page to any lower order. The original split_huge_page_to_list() calls split_huge_page_to_list_to_order() with new_order = 0. * Patch 6 uses split_huge_page_to_list_to_order() in large pagecache folio truncation instead of split the large folio all the way down to order-0. * Patch 7 adds a test API to debugfs and test cases in split_huge_page_test selftests. Comments and/or suggestions are welcome. [1] https://lore.kernel.org/linux-mm/Y%2FblF0GIunm+pRIC@casper.infradead.org/ Zi Yan (7): mm/memcg: use order instead of nr in split_page_memcg() mm/page_owner: use order instead of nr in split_page_owner() mm: memcg: make memcg huge page split support any order split. mm: page_owner: add support for splitting to any order in split page_owner. mm: thp: split huge page to any lower order pages. mm: truncate: split huge page cache page to a non-zero order if possible. mm: huge_memory: enable debugfs to split huge pages to any order. include/linux/huge_mm.h | 10 +- include/linux/memcontrol.h | 4 +- include/linux/page_owner.h | 10 +- mm/huge_memory.c | 137 ++++++++--- mm/memcontrol.c | 10 +- mm/page_alloc.c | 8 +- mm/page_owner.c | 8 +- mm/truncate.c | 21 +- .../selftests/mm/split_huge_page_test.c | 225 +++++++++++++++++- 9 files changed, 365 insertions(+), 68 deletions(-) -- 2.39.2

1 year, 4 months

6
22
0 0

[RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions

by Waiman Long

This patch series is based on the RFC patch from Frederic [1]. Instead of offering RCU_NOCB as a separate option, it is now lumped into a root-only cpuset.cpus.isolation_full flag that will enable all the additional CPU isolation capabilities available for isolated partitions if set. RCU_NOCB is just the first one to this party. Additional dynamic CPU isolation capabilities will be added in the future. The first 2 patches are adopted from Federic with minor twists to fix merge conflicts and compilation issue. The rests are for implementing the new cpuset.cpus.isolation_full interface which is essentially a flag to globally enable or disable full CPU isolation on isolated partitions. On read, it also shows the CPU isolation capabilities that are currently enabled. RCU_NOCB requires that the rcu_nocbs option be present in the kernel boot command line. Without that, the rcu_nocb functionality cannot be enabled even if the isolation_full flag is set. So we allow users to check the isolation_full file to verify that if the desired CPU isolation capability is enabled or not. Only sanity checking has been done so far. More testing, especially on the RCU side, will be needed. [1] https://lore.kernel.org/lkml/20220525221055.1152307-1-frederic@kernel.org/ Frederic Weisbecker (2): rcu/nocb: Pass a cpumask instead of a single CPU to offload/deoffload rcu/nocb: Prepare to change nocb cpumask from CPU-hotplug protected cpuset caller Waiman Long (6): rcu/no_cb: Add rcu_nocb_enabled() to expose the rcu_nocb state cgroup/cpuset: Better tracking of addition/deletion of isolated CPUs cgroup/cpuset: Add cpuset.cpus.isolation_full cgroup/cpuset: Enable dynamic rcu_nocb mode on isolated CPUs cgroup/cpuset: Document the new cpuset.cpus.isolation_full control file cgroup/cpuset: Update test_cpuset_prs.sh to handle cpuset.cpus.isolation_full Documentation/admin-guide/cgroup-v2.rst | 24 ++ include/linux/rcupdate.h | 15 +- kernel/cgroup/cpuset.c | 237 ++++++++++++++---- kernel/rcu/rcutorture.c | 6 +- kernel/rcu/tree_nocb.h | 118 ++++++--- .../selftests/cgroup/test_cpuset_prs.sh | 23 +- 6 files changed, 337 insertions(+), 86 deletions(-) -- 2.39.3

1 year, 4 months

6
19
0 0

[PATCH v2 0/7] Use TAP in some more x86 KVM selftests

by Thomas Huth

Here's a follow-up from my RFC series last year: https://lore.kernel.org/lkml/20221004093131.40392-1-thuth@redhat.com/T/ and from v1 earlier this year: https://lore.kernel.org/kvm/20230712075910.22480-1-thuth@redhat.com/ Basic idea of this series is now to use the kselftest_harness.h framework to get TAP output in the tests, so that it is easier for the user to see what is going on, and e.g. to be able to detect whether a certain test is part of the test binary or not (which is useful when tests get extended in the course of time). v2: - Dropped the "Rename the ASSERT_EQ macro" patch (already merged) - Split the fixes in the sync_regs_test into separate patches (see the first two patches) - Introduce the KVM_ONE_VCPU_TEST_SUITE() macro as suggested by Sean (see third patch) and use it in the following patches - Add a new patch to convert vmx_pmu_caps_test.c, too Thomas Huth (7): KVM: selftests: x86: sync_regs_test: Use vcpu_run() where appropriate KVM: selftests: x86: sync_regs_test: Get regs structure before modifying it KVM: selftests: Add a macro to define a test with one vcpu KVM: selftests: x86: Use TAP interface in the sync_regs test KVM: selftests: x86: Use TAP interface in the fix_hypercall test KVM: selftests: x86: Use TAP interface in the vmx_pmu_caps test KVM: selftests: x86: Use TAP interface in the userspace_msr_exit test .../selftests/kvm/include/kvm_test_harness.h | 35 +++++ .../selftests/kvm/x86_64/fix_hypercall_test.c | 27 ++-- .../selftests/kvm/x86_64/sync_regs_test.c | 121 +++++++++++++----- .../kvm/x86_64/userspace_msr_exit_test.c | 19 +-- .../selftests/kvm/x86_64/vmx_pmu_caps_test.c | 50 ++------ 5 files changed, 160 insertions(+), 92 deletions(-) create mode 100644 tools/testing/selftests/kvm/include/kvm_test_harness.h -- 2.41.0

1 year, 4 months

2
9
0 0

[PATCH] selftests: Add missing gitignore entries

by Bernd Edlinger

Prevent them from polluting git status after building selftests. Signed-off-by: Bernd Edlinger <bernd.edlinger(a)hotmail.de> --- tools/testing/selftests/damon/.gitignore | 1 + tools/testing/selftests/thermal/intel/power_floor/.gitignore | 2 ++ tools/testing/selftests/thermal/intel/workload_hint/.gitignore | 2 ++ tools/testing/selftests/uevent/.gitignore | 2 ++ 4 files changed, 7 insertions(+) create mode 100644 tools/testing/selftests/thermal/intel/power_floor/.gitignore create mode 100644 tools/testing/selftests/thermal/intel/workload_hint/.gitignore create mode 100644 tools/testing/selftests/uevent/.gitignore diff --git a/tools/testing/selftests/damon/.gitignore b/tools/testing/selftests/damon/.gitignore index c6c2965a6607..79b32e30fce3 100644 --- a/tools/testing/selftests/damon/.gitignore +++ b/tools/testing/selftests/damon/.gitignore @@ -1,2 +1,3 @@ # SPDX-License-Identifier: GPL-2.0-only huge_count_read_write +access_memory diff --git a/tools/testing/selftests/thermal/intel/power_floor/.gitignore b/tools/testing/selftests/thermal/intel/power_floor/.gitignore new file mode 100644 index 000000000000..754810406b33 --- /dev/null +++ b/tools/testing/selftests/thermal/intel/power_floor/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +power_floor_test diff --git a/tools/testing/selftests/thermal/intel/workload_hint/.gitignore b/tools/testing/selftests/thermal/intel/workload_hint/.gitignore new file mode 100644 index 000000000000..b5448c0576c9 --- /dev/null +++ b/tools/testing/selftests/thermal/intel/workload_hint/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +workload_hint_test diff --git a/tools/testing/selftests/uevent/.gitignore b/tools/testing/selftests/uevent/.gitignore new file mode 100644 index 000000000000..15127939d872 --- /dev/null +++ b/tools/testing/selftests/uevent/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +uevent_filtering -- 2.39.2

1 year, 4 months

3
2
0 0

[PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers

by David Finkel

Other mechanisms for querying the peak memory usage of either a process or v1 memory cgroup allow for resetting the high watermark. Restore parity with those mechanisms. For example: - Any write to memory.max_usage_in_bytes in a cgroup v1 mount resets the high watermark. - writing "5" to the clear_refs pseudo-file in a processes's proc directory resets the peak RSS. This change copies the cgroup v1 behavior so any write to the memory.peak and memory.swap.peak pseudo-files reset the high watermark to the current usage. This behavior is particularly useful for work scheduling systems that need to track memory usage of worker processes/cgroups per-work-item. Since memory can't be squeezed like CPU can (the OOM-killer has opinions), these systems need to track the peak memory usage to compute system/container fullness when binpacking workitems. Signed-off-by: David Finkel <davidf(a)vimeo.com> --- Documentation/admin-guide/cgroup-v2.rst | 20 +++--- mm/memcontrol.c | 23 ++++++ .../selftests/cgroup/test_memcontrol.c | 72 ++++++++++++++++--- 3 files changed, 99 insertions(+), 16 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 3f85254f3cef..95af0628dc44 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1305,11 +1305,13 @@ PAGE_SIZE multiple when read back. reclaim induced by memory.reclaim. memory.peak - A read-only single value file which exists on non-root - cgroups. + A read-write single value file which exists on non-root cgroups. + + The max memory usage recorded for the cgroup and its descendants since + either the creation of the cgroup or the most recent reset. - The max memory usage recorded for the cgroup and its - descendants since the creation of the cgroup. + Any non-empty write to this file resets it to the current memory usage. + All content written is completely ignored. memory.oom.group A read-write single value file which exists on non-root @@ -1626,11 +1628,13 @@ PAGE_SIZE multiple when read back. Healthy workloads are not expected to reach this limit. memory.swap.peak - A read-only single value file which exists on non-root - cgroups. + A read-write single value file which exists on non-root cgroups. + + The max swap usage recorded for the cgroup and its descendants since + the creation of the cgroup or the most recent reset. - The max swap usage recorded for the cgroup and its - descendants since the creation of the cgroup. + Any non-empty write to this file resets it to the current swap usage. + All content written is completely ignored. memory.swap.max A read-write single value file which exists on non-root diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1c1061df9cd1..b04af158922d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -25,6 +25,7 @@ * Copyright (C) 2020 Alibaba, Inc, Alex Shi */ +#include <linux/cgroup-defs.h> #include <linux/page_counter.h> #include <linux/memcontrol.h> #include <linux/cgroup.h> @@ -6635,6 +6636,16 @@ static u64 memory_peak_read(struct cgroup_subsys_state *css, return (u64)memcg->memory.watermark * PAGE_SIZE; } +static ssize_t memory_peak_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + + page_counter_reset_watermark(&memcg->memory); + + return nbytes; +} + static int memory_min_show(struct seq_file *m, void *v) { return seq_puts_memcg_tunable(m, @@ -6947,6 +6958,7 @@ static struct cftype memory_files[] = { .name = "peak", .flags = CFTYPE_NOT_ON_ROOT, .read_u64 = memory_peak_read, + .write = memory_peak_write, }, { .name = "min", @@ -7917,6 +7929,16 @@ static u64 swap_peak_read(struct cgroup_subsys_state *css, return (u64)memcg->swap.watermark * PAGE_SIZE; } +static ssize_t swap_peak_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + + page_counter_reset_watermark(&memcg->swap); + + return nbytes; +} + static int swap_high_show(struct seq_file *m, void *v) { return seq_puts_memcg_tunable(m, @@ -7999,6 +8021,7 @@ static struct cftype swap_files[] = { .name = "swap.peak", .flags = CFTYPE_NOT_ON_ROOT, .read_u64 = swap_peak_read, + .write = swap_peak_write, }, { .name = "swap.events", diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index c7c9572003a8..0326c317f1f2 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -161,12 +161,12 @@ static int alloc_pagecache_50M_check(const char *cgroup, void *arg) /* * This test create a memory cgroup, allocates * some anonymous memory and some pagecache - * and check memory.current and some memory.stat values. + * and checks memory.current, memory.peak, and some memory.stat values. */ -static int test_memcg_current(const char *root) +static int test_memcg_current_peak(const char *root) { int ret = KSFT_FAIL; - long current; + long current, peak, peak_reset; char *memcg; memcg = cg_name(root, "memcg_test"); @@ -180,12 +180,32 @@ static int test_memcg_current(const char *root) if (current != 0) goto cleanup; + peak = cg_read_long(memcg, "memory.peak"); + if (peak != 0) + goto cleanup; + if (cg_run(memcg, alloc_anon_50M_check, NULL)) goto cleanup; + peak = cg_read_long(memcg, "memory.peak"); + if (peak < MB(50)) + goto cleanup; + + peak_reset = cg_write(memcg, "memory.peak", "\n"); + if (peak_reset != 0) + goto cleanup; + + peak = cg_read_long(memcg, "memory.peak"); + if (peak > MB(30)) + goto cleanup; + if (cg_run(memcg, alloc_pagecache_50M_check, NULL)) goto cleanup; + peak = cg_read_long(memcg, "memory.peak"); + if (peak < MB(50)) + goto cleanup; + ret = KSFT_PASS; cleanup: @@ -815,13 +835,14 @@ static int alloc_anon_50M_check_swap(const char *cgroup, void *arg) /* * This test checks that memory.swap.max limits the amount of - * anonymous memory which can be swapped out. + * anonymous memory which can be swapped out. Additionally, it verifies that + * memory.swap.peak reflects the high watermark and can be reset. */ -static int test_memcg_swap_max(const char *root) +static int test_memcg_swap_max_peak(const char *root) { int ret = KSFT_FAIL; char *memcg; - long max; + long max, peak; if (!is_swap_enabled()) return KSFT_SKIP; @@ -838,6 +859,12 @@ static int test_memcg_swap_max(const char *root) goto cleanup; } + if (cg_read_long(memcg, "memory.swap.peak")) + goto cleanup; + + if (cg_read_long(memcg, "memory.peak")) + goto cleanup; + if (cg_read_strcmp(memcg, "memory.max", "max\n")) goto cleanup; @@ -860,6 +887,27 @@ static int test_memcg_swap_max(const char *root) if (cg_read_key_long(memcg, "memory.events", "oom_kill ") != 1) goto cleanup; + peak = cg_read_long(memcg, "memory.peak"); + if (peak < MB(29)) + goto cleanup; + + peak = cg_read_long(memcg, "memory.swap.peak"); + if (peak < MB(29)) + goto cleanup; + + if (cg_write(memcg, "memory.swap.peak", "\n")) + goto cleanup; + + if (cg_read_long(memcg, "memory.swap.peak") > MB(10)) + goto cleanup; + + + if (cg_write(memcg, "memory.peak", "\n")) + goto cleanup; + + if (cg_read_long(memcg, "memory.peak")) + goto cleanup; + if (cg_run(memcg, alloc_anon_50M_check_swap, (void *)MB(30))) goto cleanup; @@ -867,6 +915,14 @@ static int test_memcg_swap_max(const char *root) if (max <= 0) goto cleanup; + peak = cg_read_long(memcg, "memory.peak"); + if (peak < MB(29)) + goto cleanup; + + peak = cg_read_long(memcg, "memory.swap.peak"); + if (peak < MB(19)) + goto cleanup; + ret = KSFT_PASS; cleanup: @@ -1293,7 +1349,7 @@ struct memcg_test { const char *name; } tests[] = { T(test_memcg_subtree_control), - T(test_memcg_current), + T(test_memcg_current_peak), T(test_memcg_min), T(test_memcg_low), T(test_memcg_high), @@ -1301,7 +1357,7 @@ struct memcg_test { T(test_memcg_max), T(test_memcg_reclaim), T(test_memcg_oom_events), - T(test_memcg_swap_max), + T(test_memcg_swap_max_peak), T(test_memcg_sock), T(test_memcg_oom_group_leaf_events), T(test_memcg_oom_group_parent_events), -- 2.39.2

1 year, 4 months

3
5
0 0

[PATCH] bpf: Separate bpf_local_storage_lookup() fast and slow paths

by Marco Elver

To allow the compiler to inline the bpf_local_storage_lookup() fast- path, factor it out by making bpf_local_storage_lookup() a static inline function and move the slow-path to bpf_local_storage_lookup_slowpath(). Base on results from './benchs/run_bench_local_storage.sh' this produces improvements in throughput and latency in the majority of cases: | Hashmap Control | =============== | num keys: 10 | hashmap (control) sequential get: | <before> | <after> | hits throughput: 13.895 ± 0.024 M ops/s | 14.022 ± 0.095 M ops/s (+0.9%) | hits latency: 71.968 ns/op | 71.318 ns/op (-0.9%) | important_hits throughput: 13.895 ± 0.024 M ops/s | 14.022 ± 0.095 M ops/s (+0.9%) | | num keys: 1000 | hashmap (control) sequential get: | <before> | <after> | hits throughput: 11.793 ± 0.018 M ops/s | 11.645 ± 0.370 M ops/s (-1.3%) | hits latency: 84.794 ns/op | 85.874 ns/op (+1.3%) | important_hits throughput: 11.793 ± 0.018 M ops/s | 11.645 ± 0.370 M ops/s (-1.3%) | | num keys: 10000 | hashmap (control) sequential get: | <before> | <after> | hits throughput: 7.113 ± 0.012 M ops/s | 7.037 ± 0.051 M ops/s (-1.1%) | hits latency: 140.581 ns/op | 142.113 ns/op (+1.1%) | important_hits throughput: 7.113 ± 0.012 M ops/s | 7.037 ± 0.051 M ops/s (-1.1%) | | num keys: 100000 | hashmap (control) sequential get: | <before> | <after> | hits throughput: 4.793 ± 0.034 M ops/s | 4.990 ± 0.025 M ops/s (+4.1%) | hits latency: 208.623 ns/op | 200.401 ns/op (-3.9%) | important_hits throughput: 4.793 ± 0.034 M ops/s | 4.990 ± 0.025 M ops/s (+4.1%) | | num keys: 4194304 | hashmap (control) sequential get: | <before> | <after> | hits throughput: 2.088 ± 0.008 M ops/s | 2.962 ± 0.004 M ops/s (+41.9%) | hits latency: 478.851 ns/op | 337.648 ns/op (-29.5%) | important_hits throughput: 2.088 ± 0.008 M ops/s | 2.962 ± 0.004 M ops/s (+41.9%) | | Local Storage | ============= | num_maps: 1 | local_storage cache sequential get: | <before> | <after> | hits throughput: 32.598 ± 0.008 M ops/s | 38.480 ± 0.054 M ops/s (+18.0%) | hits latency: 30.676 ns/op | 25.988 ns/op (-15.3%) | important_hits throughput: 32.598 ± 0.008 M ops/s | 38.480 ± 0.054 M ops/s (+18.0%) | local_storage cache interleaved get: | <before> | <after> | hits throughput: 36.963 ± 0.045 M ops/s | 43.847 ± 0.037 M ops/s (+18.6%) | hits latency: 27.054 ns/op | 22.807 ns/op (-15.7%) | important_hits throughput: 36.963 ± 0.045 M ops/s | 43.847 ± 0.037 M ops/s (+18.6%) | | num_maps: 10 | local_storage cache sequential get: | <before> | <after> | hits throughput: 32.078 ± 0.004 M ops/s | 37.813 ± 0.020 M ops/s (+17.9%) | hits latency: 31.174 ns/op | 26.446 ns/op (-15.2%) | important_hits throughput: 3.208 ± 0.000 M ops/s | 3.781 ± 0.002 M ops/s (+17.9%) | local_storage cache interleaved get: | <before> | <after> | hits throughput: 34.564 ± 0.011 M ops/s | 40.082 ± 0.037 M ops/s (+16.0%) | hits latency: 28.932 ns/op | 24.949 ns/op (-13.8%) | important_hits throughput: 12.344 ± 0.004 M ops/s | 14.315 ± 0.013 M ops/s (+16.0%) | | num_maps: 16 | local_storage cache sequential get: | <before> | <after> | hits throughput: 32.493 ± 0.023 M ops/s | 38.147 ± 0.029 M ops/s (+17.4%) | hits latency: 30.776 ns/op | 26.215 ns/op (-14.8%) | important_hits throughput: 2.031 ± 0.001 M ops/s | 2.384 ± 0.002 M ops/s (+17.4%) | local_storage cache interleaved get: | <before> | <after> | hits throughput: 34.380 ± 0.521 M ops/s | 41.605 ± 0.095 M ops/s (+21.0%) | hits latency: 29.087 ns/op | 24.035 ns/op (-17.4%) | important_hits throughput: 10.939 ± 0.166 M ops/s | 13.238 ± 0.030 M ops/s (+21.0%) | | num_maps: 17 | local_storage cache sequential get: | <before> | <after> | hits throughput: 28.748 ± 0.028 M ops/s | 32.248 ± 0.080 M ops/s (+12.2%) | hits latency: 34.785 ns/op | 31.009 ns/op (-10.9%) | important_hits throughput: 1.693 ± 0.002 M ops/s | 1.899 ± 0.005 M ops/s (+12.2%) | local_storage cache interleaved get: | <before> | <after> | hits throughput: 31.313 ± 0.030 M ops/s | 35.911 ± 0.020 M ops/s (+14.7%) | hits latency: 31.936 ns/op | 27.847 ns/op (-12.8%) | important_hits throughput: 9.533 ± 0.009 M ops/s | 10.933 ± 0.006 M ops/s (+14.7%) | | num_maps: 24 | local_storage cache sequential get: | <before> | <after> | hits throughput: 18.475 ± 0.027 M ops/s | 19.000 ± 0.006 M ops/s (+2.8%) | hits latency: 54.127 ns/op | 52.632 ns/op (-2.8%) | important_hits throughput: 0.770 ± 0.001 M ops/s | 0.792 ± 0.000 M ops/s (+2.9%) | local_storage cache interleaved get: | <before> | <after> | hits throughput: 21.361 ± 0.028 M ops/s | 22.388 ± 0.099 M ops/s (+4.8%) | hits latency: 46.814 ns/op | 44.667 ns/op (-4.6%) | important_hits throughput: 6.009 ± 0.008 M ops/s | 6.298 ± 0.028 M ops/s (+4.8%) | | num_maps: 32 | local_storage cache sequential get: | <before> | <after> | hits throughput: 14.220 ± 0.006 M ops/s | 14.168 ± 0.020 M ops/s (-0.4%) | hits latency: 70.323 ns/op | 70.580 ns/op (+0.4%) | important_hits throughput: 0.445 ± 0.000 M ops/s | 0.443 ± 0.001 M ops/s (-0.4%) | local_storage cache interleaved get: | <before> | <after> | hits throughput: 17.250 ± 0.011 M ops/s | 16.650 ± 0.021 M ops/s (-3.5%) | hits latency: 57.971 ns/op | 60.061 ns/op (+3.6%) | important_hits throughput: 4.815 ± 0.003 M ops/s | 4.647 ± 0.006 M ops/s (-3.5%) | | num_maps: 100 | local_storage cache sequential get: | <before> | <after> | hits throughput: 5.212 ± 0.012 M ops/s | 5.878 ± 0.004 M ops/s (+12.8%) | hits latency: 191.877 ns/op | 170.116 ns/op (-11.3%) | important_hits throughput: 0.052 ± 0.000 M ops/s | 0.059 ± 0.000 M ops/s (+13.5%) | local_storage cache interleaved get: | <before> | <after> | hits throughput: 6.521 ± 0.053 M ops/s | 7.086 ± 0.010 M ops/s (+8.7%) | hits latency: 153.343 ns/op | 141.116 ns/op (-8.0%) | important_hits throughput: 1.703 ± 0.014 M ops/s | 1.851 ± 0.003 M ops/s (+8.7%) | | num_maps: 1000 | local_storage cache sequential get: | <before> | <after> | hits throughput: 0.357 ± 0.005 M ops/s | 0.325 ± 0.005 M ops/s (-9.0%) | hits latency: 2803.738 ns/op | 3076.923 ns/op (+9.7%) | important_hits throughput: 0.000 ± 0.000 M ops/s | 0.000 ± 0.000 M ops/s | local_storage cache interleaved get: | <before> | <after> | hits throughput: 0.434 ± 0.007 M ops/s | 0.447 ± 0.007 M ops/s (+3.0%) | hits latency: 2306.539 ns/op | 2237.687 ns/op (-3.0%) | important_hits throughput: 0.109 ± 0.002 M ops/s | 0.112 ± 0.002 M ops/s (+2.8%) Signed-off-by: Marco Elver <elver(a)google.com> --- include/linux/bpf_local_storage.h | 17 ++++++++++++++++- kernel/bpf/bpf_local_storage.c | 14 ++++---------- .../selftests/bpf/progs/cgrp_ls_recursion.c | 2 +- .../selftests/bpf/progs/task_ls_recursion.c | 2 +- 4 files changed, 22 insertions(+), 13 deletions(-) diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h index 173ec7f43ed1..c8cecf7fff87 100644 --- a/include/linux/bpf_local_storage.h +++ b/include/linux/bpf_local_storage.h @@ -130,9 +130,24 @@ bpf_local_storage_map_alloc(union bpf_attr *attr, bool bpf_ma); struct bpf_local_storage_data * +bpf_local_storage_lookup_slowpath(struct bpf_local_storage *local_storage, + struct bpf_local_storage_map *smap, + bool cacheit_lockit); +static inline struct bpf_local_storage_data * bpf_local_storage_lookup(struct bpf_local_storage *local_storage, struct bpf_local_storage_map *smap, - bool cacheit_lockit); + bool cacheit_lockit) +{ + struct bpf_local_storage_data *sdata; + + /* Fast path (cache hit) */ + sdata = rcu_dereference_check(local_storage->cache[smap->cache_idx], + bpf_rcu_lock_held()); + if (likely(sdata && rcu_access_pointer(sdata->smap) == smap)) + return sdata; + + return bpf_local_storage_lookup_slowpath(local_storage, smap, cacheit_lockit); +} void bpf_local_storage_destroy(struct bpf_local_storage *local_storage); diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c index 146824cc9689..2ef782a1bd6f 100644 --- a/kernel/bpf/bpf_local_storage.c +++ b/kernel/bpf/bpf_local_storage.c @@ -415,20 +415,14 @@ void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now) } /* If cacheit_lockit is false, this lookup function is lockless */ -struct bpf_local_storage_data * -bpf_local_storage_lookup(struct bpf_local_storage *local_storage, - struct bpf_local_storage_map *smap, - bool cacheit_lockit) +noinline struct bpf_local_storage_data * +bpf_local_storage_lookup_slowpath(struct bpf_local_storage *local_storage, + struct bpf_local_storage_map *smap, + bool cacheit_lockit) { struct bpf_local_storage_data *sdata; struct bpf_local_storage_elem *selem; - /* Fast path (cache hit) */ - sdata = rcu_dereference_check(local_storage->cache[smap->cache_idx], - bpf_rcu_lock_held()); - if (sdata && rcu_access_pointer(sdata->smap) == smap) - return sdata; - /* Slow path (cache miss) */ hlist_for_each_entry_rcu(selem, &local_storage->list, snode, rcu_read_lock_trace_held()) diff --git a/tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c b/tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c index a043d8fefdac..9895087a9235 100644 --- a/tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c +++ b/tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c @@ -21,7 +21,7 @@ struct { __type(value, long); } map_b SEC(".maps"); -SEC("fentry/bpf_local_storage_lookup") +SEC("fentry/bpf_local_storage_lookup_slowpath") int BPF_PROG(on_lookup) { struct task_struct *task = bpf_get_current_task_btf(); diff --git a/tools/testing/selftests/bpf/progs/task_ls_recursion.c b/tools/testing/selftests/bpf/progs/task_ls_recursion.c index 4542dc683b44..d73b33a4c153 100644 --- a/tools/testing/selftests/bpf/progs/task_ls_recursion.c +++ b/tools/testing/selftests/bpf/progs/task_ls_recursion.c @@ -27,7 +27,7 @@ struct { __type(value, long); } map_b SEC(".maps"); -SEC("fentry/bpf_local_storage_lookup") +SEC("fentry/bpf_local_storage_lookup_slowpath") int BPF_PROG(on_lookup) { struct task_struct *task = bpf_get_current_task_btf(); -- 2.43.0.429.g432eaa2c6b-goog

1 year, 4 months

4
9
0 0

[KTAP V2 PATCH v2] ktap_v2: add test metadata

by Rae Moar

Add specification for test metadata to the KTAP v2 spec. KTAP v1 only specifies the output format of very basic test information: test result and test name. Any additional test information either gets added to general diagnostic data or is not included in the output at all. The purpose of KTAP metadata is to create a framework to include and easily identify additional important test information in KTAP. KTAP metadata could include any test information that is pertinent for user interaction before or after the running of the test. For example, the test file path or the test speed. Since this includes a large variety of information, this specification will recognize notable types of KTAP metadata to ensure consistent format across test frameworks. See the full list of types in the specification. Example of KTAP Metadata: KTAP version 2 # ktap_test: main # ktap_arch: uml 1..1 KTAP version 2 # ktap_test: suite_1 # ktap_subsystem: example # ktap_test_file: lib/test.c 1..2 ok 1 test_1 # ktap_test: test_2 # ktap_speed: very_slow # custom_is_flaky: true ok 2 test_2 ok 1 test_suite The changes to the KTAP specification outline the format, location, and different types of metadata. Here is a link to a version of the KUnit parser that is able to parse test metadata lines for KTAP version 2. Note this includes test metadata lines for the main level of KTAP. Link: https://kunit-review.googlesource.com/c/linux/+/5889 Signed-off-by: Rae Moar <rmoar(a)google.com> --- Documentation/dev-tools/ktap.rst | 163 ++++++++++++++++++++++++++++++- 1 file changed, 159 insertions(+), 4 deletions(-) diff --git a/Documentation/dev-tools/ktap.rst b/Documentation/dev-tools/ktap.rst index ff77f4aaa6ef..4480eaf5bbc3 100644 --- a/Documentation/dev-tools/ktap.rst +++ b/Documentation/dev-tools/ktap.rst @@ -17,19 +17,20 @@ KTAP test results describe a series of tests (which may be nested: i.e., test can have subtests), each of which can contain both diagnostic data -- e.g., log lines -- and a final result. The test structure and results are machine-readable, whereas the diagnostic data is unstructured and is there to -aid human debugging. +aid human debugging. One exception to this is test metadata lines - a type +of diagnostic lines. Test metadata is used to identify important supplemental +test information and can be machine-readable. KTAP output is built from four different types of lines: - Version lines - Plan lines - Test case result lines -- Diagnostic lines +- Diagnostic lines (including test metadata) In general, valid KTAP output should also form valid TAP output, but some information, in particular nested test results, may be lost. Also note that there is a stagnant draft specification for TAP14, KTAP diverges from this in -a couple of places (notably the "Subtest" header), which are described where -relevant later in this document. +a couple of places, which are described where relevant later in this document. Version lines ------------- @@ -166,6 +167,154 @@ even if they do not start with a "#": this is to capture any other useful kernel output which may help debug the test. It is nevertheless recommended that tests always prefix any diagnostic output they have with a "#" character. +KTAP metadata lines +------------------- + +KTAP metadata lines are a subset of diagnostic lines that are used to include +and easily identify important supplemental test information in KTAP. + +.. code-block:: none + + # <prefix>_<metadata type>: <metadata value> + +The <prefix> indicates where to find the specification for the type of +metadata. The metadata types listed below use the prefix "ktap" (See Types of +KTAP Metadata). + +Types that are instead specified by an individual test framework use the +framework name as the prefix. For example, a metadata type documented by the +kselftest specification would use the prefix "kselftest". Any metadata type +that is not listed in a specification must use the prefix "custom". Note the +prefix must not include spaces or the characters ":" or "_". + +The format of <metadata type> and <value> varies based on the type. See the +individual specification. For "custom" types the <metadata type> can be any +string excluding ":", spaces, or newline characters and the <value> can be any +string. + +**Location:** + +The first KTAP metadata entry for a test must be "# ktap_test: <test name>", +which acts as a header to associate metadata with the correct test. + +For test cases, the location of the metadata is between the prior test result +line and the current test result line. For test suites, the location of the +metadata is between the suite's version line and test plan line. See the +example below. + +KTAP metadata for a test does not need to be contiguous. For example, a kernel +warning or other diagnostic output could interrupt metadata lines. However, it +is recommended to keep a test's metadata lines together when possible, as this +improves readability. + +**Here is an example of using KTAP metadata:** + +:: + + KTAP version 2 + # ktap_test: main + # ktap_arch: uml + 1..1 + KTAP version 2 + # ktap_test: suite_1 + # ktap_subsystem: example + # ktap_test_file: lib/test.c + 1..2 + ok 1 test_1 + # ktap_test: test_2 + # ktap_speed: very_slow + # custom_is_flaky: true + ok 2 test_2 + # suite_1 passed + ok 1 suite_1 + +In this example, the tests are running on UML. The test suite "suite_1" is part +of the subsystem "example" and belongs to the file "lib/example_test.c". It has +two subtests, "test_1" and "test_2". The subtest "test_2" has a speed of +"very_slow" and has been marked with a custom KTAP metadata type called +"custom_is_flaky" with the value of "true". + +**Types of KTAP Metadata:** + +This is the current list of KTAP metadata types recognized in this +specification. Note that all of these metadata types are optional (except for +ktap_test as the KTAP metadata header). + +- ``ktap_test``: Name of test (used as header of KTAP metadata). This should + match the test name printed in the test result line: "ok 1 [test_name]". + +- ``ktap_module``: Name of the module containing the test + +- ``ktap_subsystem``: Name of the subsystem being tested + +- ``ktap_start_time``: Time tests started in ISO8601 format + + - Example: "# ktap_start_time: 2024-01-09T13:09:01.990000+00:00" + +- ``ktap_duration``: Time taken (in seconds) to execute the test + + - Example: "ktap_duration: 10.154s" + +- ``ktap_speed``: Category of how fast test runs: "normal", "slow", or + "very_slow" + +- ``ktap_test_file``: Path to source file containing the test. This metadata + line can be repeated if the test is spread across multiple files. + + - Example: "# ktap_test_file: lib/test.c" + +- ``ktap_generated_file``: Description of and path to file generated during + test execution. This could be a core dump, generated filesystem image, some + form of visual output (for graphics drivers), etc. This metadata line can be + repeated to attach multiple files to the test. + + - Example: "# ktap_generated_file: Core dump: /var/lib/systemd/coredump/hello.core" + +- ``ktap_log_file``: Path to file containing kernel log test output + + - Example: "# ktap_log_file: /sys/kernel/debugfs/kunit/example/results" + +- ``ktap_error_file``: Path to file containing context for test failure or + error. This could include the difference between optimal test output and + actual test output. + + - Example: "# ktap_error_file: fs/results/example.out.bad" + +- ``ktap_results_url``: Link to webpage describing this test run and its + results + + - Example: "# ktap_results_url: https://kcidb.kernelci.org/hello" + +- ``ktap_arch``: Architecture used during test run + + - Example: "# ktap_arch: x86_64" + +- ``ktap_compiler``: Compiler used during test run + + - Example: "# ktap_compiler: gcc (GCC) 10.1.1 20200507 (Red Hat 10.1.1-1)" + +- ``ktap_respository_url``: Link to git repository of the checked out code. + + - Example: "# ktap_respository_url: https://github.com/torvalds/linux.git" + +- ``ktap_git_branch``: Name of git branch of checked out code + + - Example: "# ktap_git_branch: kselftest/kunit" + +- ``ktap_kernel_version``: Version of Linux Kernel being used during test run + + - Example: "# ktap_kernel_version: 6.7-rc1" + +- ``ktap_commit_hash``: The full git commit hash of the checked out base code. + + - Example: "# ktap_commit_hash: 064725faf8ec2e6e36d51e22d3b86d2707f0f47f" + +**Other Metadata Types:** + +There can also be KTAP metadata that is not included in the recognized list +above. This metadata must be prefixed with the test framework, ie. "kselftest", +or with the prefix "custom". For example, "# custom_batch: 20". + Unknown lines ------------- @@ -206,6 +355,7 @@ An example of a test with two nested subtests: KTAP version 2 1..1 KTAP version 2 + # ktap_test: example 1..2 ok 1 test_1 not ok 2 test_2 @@ -219,6 +369,7 @@ An example format with multiple levels of nested testing: KTAP version 2 1..2 KTAP version 2 + # ktap_test: example_test_1 1..2 KTAP version 2 1..2 @@ -254,6 +405,7 @@ Example KTAP output KTAP version 2 1..1 KTAP version 2 + # ktap_test: main_test 1..3 KTAP version 2 1..1 @@ -261,11 +413,14 @@ Example KTAP output ok 1 test_1 ok 1 example_test_1 KTAP version 2 + # ktap_test: example_test_2 + # ktap_speed: slow 1..2 ok 1 test_1 # SKIP test_1 skipped ok 2 test_2 ok 2 example_test_2 KTAP version 2 + # ktap_test: example_test_3 1..3 ok 1 test_1 # test_2: FAIL base-commit: 906f02e42adfbd5ae70d328ee71656ecb602aaf5 -- 2.43.0.429.g432eaa2c6b-goog

1 year, 4 months

4
7
0 0

[PATCH v8 0/4] Introduce mseal

by jeffxu＠chromium.org

From: Jeff Xu <jeffxu(a)chromium.org> This patchset proposes a new mseal() syscall for the Linux kernel. In a nutshell, mseal() protects the VMAs of a given virtual memory range against modifications, such as changes to their permission bits. Modern CPUs support memory permissions, such as the read/write (RW) and no-execute (NX) bits. Linux has supported NX since the release of kernel version 2.6.8 in August 2004 [1]. The memory permission feature improves the security stance on memory corruption bugs, as an attacker cannot simply write to arbitrary memory and point the code to it. The memory must be marked with the X bit, or else an exception will occur. Internally, the kernel maintains the memory permissions in a data structure called VMA (vm_area_struct). mseal() additionally protects the VMA itself against modifications of the selected seal type. Memory sealing is useful to mitigate memory corruption issues where a corrupted pointer is passed to a memory management system. For example, such an attacker primitive can break control-flow integrity guarantees since read-only memory that is supposed to be trusted can become writable or .text pages can get remapped. Memory sealing can automatically be applied by the runtime loader to seal .text and .rodata pages and applications can additionally seal security critical data at runtime. A similar feature already exists in the XNU kernel with the VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall [4]. Also, Chrome wants to adopt this feature for their CFI work [2] and this patchset has been designed to be compatible with the Chrome use case. Two system calls are involved in sealing the map: mmap() and mseal(). The new mseal() is an syscall on 64 bit CPU, and with following signature: int mseal(void addr, size_t len, unsigned long flags) addr/len: memory range. flags: reserved. mseal() blocks following operations for the given memory range. 1> Unmapping, moving to another location, and shrinking the size, via munmap() and mremap(), can leave an empty space, therefore can be replaced with a VMA with a new set of attributes. 2> Moving or expanding a different VMA into the current location, via mremap(). 3> Modifying a VMA via mmap(MAP_FIXED). 4> Size expansion, via mremap(), does not appear to pose any specific risks to sealed VMAs. It is included anyway because the use case is unclear. In any case, users can rely on merging to expand a sealed VMA. 5> mprotect() and pkey_mprotect(). 6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous memory, when users don't have write permission to the memory. Those behaviors can alter region contents by discarding pages, effectively a memset(0) for anonymous memory. In addition: mmap() has two related changes. The PROT_SEAL bit in prot field of mmap(). When present, it marks the map sealed since creation. The MAP_SEALABLE bit in the flags field of mmap(). When present, it marks the map as sealable. A map created without MAP_SEALABLE will not support sealing, i.e. mseal() will fail. Applications that don't care about sealing will expect their behavior unchanged. For those that need sealing support, opt-in by adding MAP_SEALABLE in mmap(). The idea that inspired this patch comes from Stephen Röttger’s work in V8 CFI [5]. Chrome browser in ChromeOS will be the first user of this API. Indeed, the Chrome browser has very specific requirements for sealing, which are distinct from those of most applications. For example, in the case of libc, sealing is only applied to read-only (RO) or read-execute (RX) memory segments (such as .text and .RELRO) to prevent them from becoming writable, the lifetime of those mappings are tied to the lifetime of the process. Chrome wants to seal two large address space reservations that are managed by different allocators. The memory is mapped RW- and RWX respectively but write access to it is restricted using pkeys (or in the future ARM permission overlay extensions). The lifetime of those mappings are not tied to the lifetime of the process, therefore, while the memory is sealed, the allocators still need to free or discard the unused memory. For example, with madvise(DONTNEED). However, always allowing madvise(DONTNEED) on this range poses a security risk. For example if a jump instruction crosses a page boundary and the second page gets discarded, it will overwrite the target bytes with zeros and change the control flow. Checking write-permission before the discard operation allows us to control when the operation is valid. In this case, the madvise will only succeed if the executing thread has PKEY write permissions and PKRU changes are protected in software by control-flow integrity. Although the initial version of this patch series is targeting the Chrome browser as its first user, it became evident during upstream discussions that we would also want to ensure that the patch set eventually is a complete solution for memory sealing and compatible with other use cases. The specific scenario currently in mind is glibc's use case of loading and sealing ELF executables. To this end, Stephen is working on a change to glibc to add sealing support to the dynamic linker, which will seal all non-writable segments at startup. Once this work is completed, all applications will be able to automatically benefit from these new protections. In closing, I would like to formally acknowledge the valuable contributions received during the RFC process, which were instrumental in shaping this patch: Jann Horn: raising awareness and providing valuable insights on the destructive madvise operations. Liam R. Howlett: perf optimization. Linus Torvalds: assisting in defining system call signature and scope. Pedro Falcato: suggesting sealing in the mmap(). Theo de Raadt: sharing the experiences and insight gained from implementing mimmutable() in OpenBSD. Change history: =============== V8: - perf optimization in mmap. (Liam R. Howlett) - add one testcase (test_seal_zero_address) - Update mseal.rst to add note for MAP_SEALABLE. V7: - fix index.rst (Randy Dunlap) - fix arm build (Randy Dunlap) - return EPERM for blocked operations (Theo de Raadt) https://lore.kernel.org/linux-mm/20240122152905.2220849-2-jeffxu@chromium.o… V6: - Drop RFC from subject, Given Linus's general approval. - Adjust syscall number for mseal (main Jan.11/2024) - Code style fix (Matthew Wilcox) - selftest: use ksft macros (Muhammad Usama Anjum) - Document fix. (Randy Dunlap) https://lore.kernel.org/all/20240111234142.2944934-1-jeffxu@chromium.org/ V5: - fix build issue in mseal-Wire-up-mseal-syscall (Suggested by Linus Torvalds, and Greg KH) - updates on selftest. https://lore.kernel.org/lkml/20240109154547.1839886-1-jeffxu@chromium.org/#r V4: (Suggested by Linus Torvalds) - new signature: mseal(start,len,flags) - 32 bit is not supported. vm_seal is removed, use vm_flags instead. - single bit in vm_flags for sealed state. - CONFIG_MSEAL kernel config is removed. - single bit of PROT_SEAL in the "Prot" field of mmap(). Other changes: - update selftest (Suggested by Muhammad Usama Anjum) - update documentation. https://lore.kernel.org/all/20240104185138.169307-1-jeffxu@chromium.org/ V3: - Abandon per-syscall approach, (Suggested by Linus Torvalds). - Organize sealing types around their functionality, such as MM_SEAL_BASE, MM_SEAL_PROT_PKEY. - Extend the scope of sealing from calls originated in userspace to both kernel and userspace. (Suggested by Linus Torvalds) - Add seal type support in mmap(). (Suggested by Pedro Falcato) - Add a new sealing type: MM_SEAL_DISCARD_RO_ANON to prevent destructive operations of madvise. (Suggested by Jann Horn and Stephen Röttger) - Make sealed VMAs mergeable. (Suggested by Jann Horn) - Add MAP_SEALABLE to mmap() - Add documentation - mseal.rst https://lore.kernel.org/linux-mm/20231212231706.2680890-2-jeffxu@chromium.o… v2: Use _BITUL to define MM_SEAL_XX type. Use unsigned long for seal type in sys_mseal() and other functions. Remove internal VM_SEAL_XX type and convert_user_seal_type(). Remove MM_ACTION_XX type. Remove caller_origin(ON_BEHALF_OF_XX) and replace with sealing bitmask. Add more comments in code. Add a detailed commit message. https://lore.kernel.org/lkml/20231017090815.1067790-1-jeffxu@chromium.org/ v1: https://lore.kernel.org/lkml/20231016143828.647848-1-jeffxu@chromium.org/ ---------------------------------------------------------------- [1] https://kernelnewbies.org/Linux_2_6_8 [2] https://v8.dev/blog/control-flow-integrity [3] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b… [4] https://man.openbsd.org/mimmutable.2 [5] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXge… [6] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426Fkcgnf… [7] https://lore.kernel.org/lkml/20230515130553.2311248-1-jeffxu@chromium.org/ Jeff Xu (4): mseal: Wire up mseal syscall mseal: add mseal syscall selftest mm/mseal memory sealing mseal:add documentation Documentation/userspace-api/index.rst | 1 + Documentation/userspace-api/mseal.rst | 215 ++ arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 2 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + include/linux/syscalls.h | 1 + include/uapi/asm-generic/mman-common.h | 8 + include/uapi/asm-generic/unistd.h | 5 +- kernel/sys_ni.c | 1 + mm/Makefile | 4 + mm/internal.h | 48 + mm/madvise.c | 12 + mm/mmap.c | 35 +- mm/mprotect.c | 10 + mm/mremap.c | 31 + mm/mseal.c | 343 ++++ tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + tools/testing/selftests/mm/mseal_test.c | 2024 +++++++++++++++++++ 33 files changed, 2756 insertions(+), 3 deletions(-) create mode 100644 Documentation/userspace-api/mseal.rst create mode 100644 mm/mseal.c create mode 100644 tools/testing/selftests/mm/mseal_test.c -- 2.43.0.429.g432eaa2c6b-goog

1 year, 4 months

12
49
0 0

[PATCH bpf-next v4 0/3] Annotate kfuncs in .BTF_ids section

by Daniel Xu

=== Description === This is a bpf-treewide change that annotates all kfuncs as such inside .BTF_ids. This annotation eventually allows us to automatically generate kfunc prototypes from bpftool. We store this metadata inside a yet-unused flags field inside struct btf_id_set8 (thanks Kumar!). pahole will be taught where to look. More details about the full chain of events are available in commit 3's description. The accompanying pahole and bpftool changes can be viewed here on these "frozen" branches [0][1]. [0]: https://github.com/danobi/pahole/tree/kfunc_btf-v3-mailed [1]: https://github.com/danobi/linux/tree/kfunc_bpftool-mailed === Changelog === Changes from v3: * Rebase to bpf-next and add missing annotation on new kfunc Changes from v2: * Only WARN() for vmlinux kfuncs Changes from v1: * Move WARN_ON() up a call level * Also return error when kfunc set is not properly tagged * Use BTF_KFUNCS_START/END instead of flags * Rename BTF_SET8_KFUNC to BTF_SET8_KFUNCS Daniel Xu (3): bpf: btf: Support flags for BTF_SET8 sets bpf: btf: Add BTF_KFUNCS_START/END macro pair bpf: treewide: Annotate BPF kfuncs in BTF Documentation/bpf/kfuncs.rst | 8 +++---- drivers/hid/bpf/hid_bpf_dispatch.c | 8 +++---- fs/verity/measure.c | 4 ++-- include/linux/btf_ids.h | 21 +++++++++++++++---- kernel/bpf/btf.c | 8 +++++++ kernel/bpf/cpumask.c | 4 ++-- kernel/bpf/helpers.c | 8 +++---- kernel/bpf/map_iter.c | 4 ++-- kernel/cgroup/rstat.c | 4 ++-- kernel/trace/bpf_trace.c | 8 +++---- net/bpf/test_run.c | 8 +++---- net/core/filter.c | 20 +++++++++--------- net/core/xdp.c | 4 ++-- net/ipv4/bpf_tcp_ca.c | 4 ++-- net/ipv4/fou_bpf.c | 4 ++-- net/ipv4/tcp_bbr.c | 4 ++-- net/ipv4/tcp_cubic.c | 4 ++-- net/ipv4/tcp_dctcp.c | 4 ++-- net/netfilter/nf_conntrack_bpf.c | 4 ++-- net/netfilter/nf_nat_bpf.c | 4 ++-- net/xfrm/xfrm_interface_bpf.c | 4 ++-- net/xfrm/xfrm_state_bpf.c | 4 ++-- .../selftests/bpf/bpf_testmod/bpf_testmod.c | 8 +++---- 23 files changed, 87 insertions(+), 66 deletions(-) -- 2.42.1

1 year, 4 months

7
10
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror January 2024