October 2023 - Linux-kselftest-mirror

[RFC bpf-next 0/6] Add bpf_xdp_get_xfrm_state() kfunc

by Daniel Xu

This patchset adds a kfunc helper, bpf_xdp_get_xfrm_state(), that wraps xfrm_state_lookup(). The intent is to support software RSS (via XDP) for the ongoing/upcoming ipsec pcpu work [0]. Recent experiments performed on (hopefully) reproducible AWS testbeds indicate that single tunnel pcpu ipsec can reach line rate on 100G ENA nics. More details about that will be presented at netdev next week [1]. Antony did the initial stable bpf helper - I later ported it to unstable kfuncs. So for the series, please apply a Co-developed-by for Antony, provided he acks and signs off on this. [0]: https://datatracker.ietf.org/doc/html/draft-ietf-ipsecme-multi-sa-performan… [1]: https://netdevconf.info/0x17/sessions/workshop/security-workshop.html Daniel Xu (6): bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc bpf: selftests: test_tunnel: Use ping -6 over ping6 bpf: selftests: test_tunnel: Mount bpffs if necessary bpf: selftests: test_tunnel: Use vmlinux.h declarations bpf: selftests: test_tunnel: Disable CO-RE relocations bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state() include/net/xfrm.h | 9 ++ net/xfrm/Makefile | 1 + net/xfrm/xfrm_policy.c | 2 + net/xfrm/xfrm_state_bpf.c | 105 ++++++++++++++++++ .../selftests/bpf/progs/bpf_tracing_net.h | 1 + .../selftests/bpf/progs/test_tunnel_kern.c | 95 +++++++++------- tools/testing/selftests/bpf/test_tunnel.sh | 43 ++++--- 7 files changed, 202 insertions(+), 54 deletions(-) create mode 100644 net/xfrm/xfrm_state_bpf.c -- 2.42.0

1 year, 8 months

3
9
0 0

[PATCH bpf-next v3 2/2] selftests/bpf: Add malloc failure checks in bpf_iter

by Yuran Pereira

Since some malloc calls in bpf_iter may at times fail, this patch adds the appropriate fail checks, and ensures that any previously allocated resource is appropriately destroyed before returning the function. Signed-off-by: Yuran Pereira <yuran.pereira(a)hotmail.com> --- tools/testing/selftests/bpf/prog_tests/bpf_iter.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c index 123a3502b8f0..1e02d1ba1c18 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c @@ -698,7 +698,7 @@ static void test_overflow(bool test_e2big_overflow, bool ret1) goto free_link; buf = malloc(expected_read_len); - if (!buf) + if (!ASSERT_OK_PTR(buf, "malloc")) goto close_iter; /* do read */ @@ -868,6 +868,8 @@ static void test_bpf_percpu_hash_map(void) skel->rodata->num_cpus = bpf_num_possible_cpus(); val = malloc(8 * bpf_num_possible_cpus()); + if (!ASSERT_OK_PTR(val, "malloc")) + goto out; err = bpf_iter_bpf_percpu_hash_map__load(skel); if (!ASSERT_OK_PTR(skel, "bpf_iter_bpf_percpu_hash_map__load")) @@ -1044,6 +1046,8 @@ static void test_bpf_percpu_array_map(void) skel->rodata->num_cpus = bpf_num_possible_cpus(); val = malloc(8 * bpf_num_possible_cpus()); + if (!ASSERT_OK_PTR(val, "malloc")) + goto out; err = bpf_iter_bpf_percpu_array_map__load(skel); if (!ASSERT_OK_PTR(skel, "bpf_iter_bpf_percpu_array_map__load")) -- 2.25.1

1 year, 8 months

3
2
0 0

[PATCH v2 00/20] Permission Overlay Extension

by Joey Gouly

Hello everyone, This series implements the Permission Overlay Extension introduced in 2022 VMSA enhancements [1]. It is based on v6.6-rc3. Changes since v1[2]: # Added Kconfig option # Added KVM support # Move VM_PKEY* defines into arch/ # Add isb() for POR_EL0 context switch # Added hwcap test, get-reg-list-test, signal frame handling test ptrace support is missing, I will add that for v3. The Permission Overlay Extension allows to constrain permissions on memory regions. This can be used from userspace (EL0) without a system call or TLB invalidation. POE is used to implement the Memory Protection Keys [3] Linux syscall. The first few patches add the basic framework, then the PKEYS interface is implemented, and then the selftests are made to work on arm64. There was discussion about what the 'default' protection key value should be, I used disallow-all (apart from pkey 0), which matches what x86 does. I have tested the modified protection_keys test on x86_64 [5], but not PPC. I haven't build tested the x86/ppc changes, will work on getting at least an x86 build environment working. Thanks, Joey [1] https://community.arm.com/arm-community-blogs/b/architectures-and-processor… [2] https://lore.kernel.org/linux-arm-kernel/20230927140123.5283-1-joey.gouly@a… [3] Documentation/core-api/protection-keys.rst [4] https://lore.kernel.org/linux-arm-kernel/20230919092850.1940729-7-mark.rutl… [5] test_ptrace_modifies_pkru asserts for me on a Ubuntu 5.4 kernel, but does so before my changes as well Joey Gouly (24): arm64/sysreg: add system register POR_EL{0,1} arm64/sysreg: update CPACR_EL1 register arm64: cpufeature: add Permission Overlay Extension cpucap arm64: disable trapping of POR_EL0 to EL2 arm64: context switch POR_EL0 register KVM: arm64: Save/restore POE registers arm64: enable the Permission Overlay Extension for EL0 arm64: add POIndex defines arm64: define VM_PKEY_BIT* for arm64 arm64: mask out POIndex when modifying a PTE arm64: enable ARCH_HAS_PKEYS on arm64 arm64: handle PKEY/POE faults arm64: stop using generic mm_hooks.h arm64: implement PKEYS support arm64: add POE signal support arm64: enable PKEY support for CPUs with S1POE arm64: enable POE and PIE to coexist kselftest/arm64: move get_header() selftests: mm: move fpregs printing selftests: mm: make protection_keys test work on arm64 kselftest/arm64: add HWCAP test for FEAT_S1POE kselftest/arm64: parse POE_MAGIC in a signal frame kselftest/arm64: Add test case for POR_EL0 signal frame records KVM: selftests: get-reg-list: add Permission Overlay registers Documentation/arch/arm64/elf_hwcaps.rst | 3 + arch/arm64/Kconfig | 18 +++ arch/arm64/include/asm/cpufeature.h | 6 + arch/arm64/include/asm/el2_setup.h | 10 +- arch/arm64/include/asm/hwcap.h | 1 + arch/arm64/include/asm/kvm_arm.h | 4 +- arch/arm64/include/asm/kvm_host.h | 4 + arch/arm64/include/asm/mman.h | 8 +- arch/arm64/include/asm/mmu.h | 2 + arch/arm64/include/asm/mmu_context.h | 51 ++++++- arch/arm64/include/asm/page.h | 10 ++ arch/arm64/include/asm/pgtable-hwdef.h | 10 ++ arch/arm64/include/asm/pgtable-prot.h | 8 +- arch/arm64/include/asm/pgtable.h | 26 +++- arch/arm64/include/asm/pkeys.h | 110 ++++++++++++++ arch/arm64/include/asm/por.h | 33 +++++ arch/arm64/include/asm/processor.h | 1 + arch/arm64/include/asm/sysreg.h | 16 ++ arch/arm64/include/asm/traps.h | 1 + arch/arm64/include/uapi/asm/hwcap.h | 1 + arch/arm64/include/uapi/asm/sigcontext.h | 7 + arch/arm64/kernel/cpufeature.c | 23 +++ arch/arm64/kernel/cpuinfo.c | 1 + arch/arm64/kernel/process.c | 19 +++ arch/arm64/kernel/signal.c | 51 +++++++ arch/arm64/kernel/traps.c | 12 +- arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 10 ++ arch/arm64/kvm/sys_regs.c | 2 + arch/arm64/mm/fault.c | 44 +++++- arch/arm64/mm/mmap.c | 9 ++ arch/arm64/mm/mmu.c | 40 +++++ arch/arm64/tools/cpucaps | 1 + arch/arm64/tools/sysreg | 15 +- arch/powerpc/include/asm/page.h | 11 ++ arch/x86/include/asm/page.h | 10 ++ fs/proc/task_mmu.c | 2 + include/linux/mm.h | 13 -- tools/testing/selftests/arm64/abi/hwcap.c | 13 ++ .../testing/selftests/arm64/signal/.gitignore | 1 + .../arm64/signal/testcases/poe_siginfo.c | 86 +++++++++++ .../arm64/signal/testcases/testcases.c | 27 +--- .../arm64/signal/testcases/testcases.h | 28 +++- .../selftests/kvm/aarch64/get-reg-list.c | 14 ++ tools/testing/selftests/mm/Makefile | 2 +- tools/testing/selftests/mm/pkey-arm64.h | 138 ++++++++++++++++++ tools/testing/selftests/mm/pkey-helpers.h | 8 + tools/testing/selftests/mm/pkey-powerpc.h | 3 + tools/testing/selftests/mm/pkey-x86.h | 4 + tools/testing/selftests/mm/protection_keys.c | 29 ++-- 49 files changed, 880 insertions(+), 66 deletions(-) create mode 100644 arch/arm64/include/asm/pkeys.h create mode 100644 arch/arm64/include/asm/por.h create mode 100644 tools/testing/selftests/arm64/signal/testcases/poe_siginfo.c create mode 100644 tools/testing/selftests/mm/pkey-arm64.h -- 2.25.1

1 year, 8 months

2
27
0 0

[PATCH RFC RFT 0/5] fork: Support shadow stacks in clone3()

by Mark Brown

The kernel has recently added support for shadow stacks, currently x86 only using their CET feature but both arm64 and RISC-V have equivalent features (GCS and Zisslpcfi respectively), I am actively working on GCS[1]. With shadow stacks the hardware maintains an additional stack containing only the return addresses for branch instructions which is not generally writeable by userspace and ensures that any returns are to the recorded addresses. This provides some protection against ROP attacks and making it easier to collect call stacks. These shadow stacks are allocated in the address space of the userspace process. Our API for shadow stacks does not currently offer userspace any flexiblity for managing the allocation of shadow stacks for newly created threads, instead the kernel allocates a new shadow stack with the same size as the normal stack whenever a thread is created with the feature enabled. The stacks allocated in this way are freed by the kernel when the thread exits or shadow stacks are disabled for the thread. This lack of flexibility and control isn't ideal, in the vast majority of cases the shadow stack will be over allocated and the implicit allocation and deallocation is not consistent with other interfaces. As far as I can tell the interface is done in this manner mainly because the shadow stack patches were in development since before clone3() was implemented. Since clone3() is readily extensible let's add support for specifying a shadow stack when creating a new thread or process in a similar manner to how the normal stack is specified, keeping the current implicit allocation behaviour if one is not specified either with clone3() or through the use of clone(). When the shadow stack is specified explicitly the kernel will not free it, the inconsistency with implicitly allocated shadow stacks is a bit awkward but that's existing ABI so we can't change it. The memory provided must have been allocated for use as a shadow stack, the expectation is that this will be done using the map_shadow_stack() syscall. I opted not to add validation for this in clone3() since it will be enforced by hardware anyway. Please note that the x86 portions of this code are build tested only, I don't appear to have a system that can run CET avaible to me, I have done testing with an integration into my pending work for GCS. There is some possibility that the arm64 implementation may require the use of clone3() and explicit userspace allocation of shadow stacks, this is still under discussion. A new architecture feature Kconfig option for shadow stacks is added as here, this was suggested as part of the review comments for the arm64 GCS series and since we need to detect if shadow stacks are supported it seemed sensible to roll it in here. The selftest portions of this depend on 34dce23f7e40 ("selftests/clone3: Report descriptive test names") in -next[2]. [1] https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org/ [2] https://lore.kernel.org/r/20231018-kselftest-clone3-output-v1-1-12b7c50ea2c… Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Mark Brown (5): mm: Introduce ARCH_HAS_USER_SHADOW_STACK fork: Add shadow stack support to clone3() selftests/clone3: Factor more of main loop into test_clone3() selftests/clone3: Allow tests to flag if -E2BIG is a valid error code kselftest/clone3: Test shadow stack support arch/x86/Kconfig | 1 + arch/x86/include/asm/shstk.h | 11 +- arch/x86/kernel/process.c | 2 +- arch/x86/kernel/shstk.c | 36 ++++- fs/proc/task_mmu.c | 2 +- include/linux/mm.h | 2 +- include/linux/sched/task.h | 2 + include/uapi/linux/sched.h | 17 +- kernel/fork.c | 40 ++++- mm/Kconfig | 6 + tools/testing/selftests/clone3/clone3.c | 180 +++++++++++++++++----- tools/testing/selftests/clone3/clone3_selftests.h | 5 + 12 files changed, 247 insertions(+), 57 deletions(-) --- base-commit: 80ab9b52e8d4add7735abdfb935877354b69edb6 change-id: 20231019-clone3-shadow-stack-15d40d2bf536 Best regards, -- Mark Brown <broonie(a)kernel.org>

1 year, 8 months

4
19
0 0

[PATCH v4 0/5] workload-specific and memory pressure-driven zswap writeback

by Nhat Pham

Changelog: v4: * Rename list_lru_add to list_lru_add_obj and __list_lru_add to list_lru_add (patch 1) (suggested by Johannes Weiner and Yosry Ahmed) * Some cleanups on the memcg aware LRU patch (patch 2) (suggested by Yosry Ahmed) * Use event interface for the new per-cgroup writeback counters. (patch 3) (suggested by Yosry Ahmed) * Abstract zswap's lruvec states and handling into zswap_lruvec_state (patch 5) (suggested by Yosry Ahmed) v3: * Add a patch to export per-cgroup zswap writeback counters * Add a patch to update zswap's kselftest * Separate the new list_lru functions into its own prep patch * Do not start from the top of the hierarchy when encounter a memcg that is not online for the global limit zswap writeback (patch 2) (suggested by Yosry Ahmed) * Do not remove the swap entry from list_lru in __read_swapcache_async() (patch 2) (suggested by Yosry Ahmed) * Removed a redundant zswap pool getting (patch 2) (reported by Ryan Roberts) * Use atomic for the nr_zswap_protected (instead of lruvec's lock) (patch 5) (suggested by Yosry Ahmed) * Remove the per-cgroup zswap shrinker knob (patch 5) (suggested by Yosry Ahmed) v2: * Fix loongarch compiler errors * Use pool stats instead of memcg stats when !CONFIG_MEMCG_KEM There are currently several issues with zswap writeback: 1. There is only a single global LRU for zswap, making it impossible to perform worload-specific shrinking - an memcg under memory pressure cannot determine which pages in the pool it owns, and often ends up writing pages from other memcgs. This issue has been previously observed in practice and mitigated by simply disabling memcg-initiated shrinking: https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u But this solution leaves a lot to be desired, as we still do not have an avenue for an memcg to free up its own memory locked up in the zswap pool. 2. We only shrink the zswap pool when the user-defined limit is hit. This means that if we set the limit too high, cold data that are unlikely to be used again will reside in the pool, wasting precious memory. It is hard to predict how much zswap space will be needed ahead of time, as this depends on the workload (specifically, on factors such as memory access patterns and compressibility of the memory pages). This patch series solves these issues by separating the global zswap LRU into per-memcg and per-NUMA LRUs, and performs workload-specific (i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The new shrinker does not have any parameter that must be tuned by the user, and can be opted in or out on a per-memcg basis. As a proof of concept, we ran the following synthetic benchmark: build the linux kernel in a memory-limited cgroup, and allocate some cold data in tmpfs to see if the shrinker could write them out and improved the overall performance. Depending on the amount of cold data generated, we observe from 14% to 35% reduction in kernel CPU time used in the kernel builds. Domenico Cerasuolo (3): zswap: make shrinking memcg-aware mm: memcg: add per-memcg zswap writeback stat selftests: cgroup: update per-memcg zswap writeback selftest Nhat Pham (2): list_lru: allows explicit memcg and NUMA node selection zswap: shrinks zswap pool based on memory pressure Documentation/admin-guide/mm/zswap.rst | 7 + drivers/android/binder_alloc.c | 5 +- fs/dcache.c | 8 +- fs/gfs2/quota.c | 6 +- fs/inode.c | 4 +- fs/nfs/nfs42xattr.c | 8 +- fs/nfsd/filecache.c | 4 +- fs/xfs/xfs_buf.c | 6 +- fs/xfs/xfs_dquot.c | 2 +- fs/xfs/xfs_qm.c | 2 +- include/linux/list_lru.h | 46 ++- include/linux/memcontrol.h | 5 + include/linux/mmzone.h | 2 + include/linux/vm_event_item.h | 1 + include/linux/zswap.h | 25 +- mm/list_lru.c | 48 ++- mm/memcontrol.c | 1 + mm/mmzone.c | 1 + mm/swap.h | 3 +- mm/swap_state.c | 25 +- mm/vmstat.c | 1 + mm/workingset.c | 4 +- mm/zswap.c | 365 ++++++++++++++++---- tools/testing/selftests/cgroup/test_zswap.c | 74 ++-- 24 files changed, 526 insertions(+), 127 deletions(-) -- 2.34.1

1 year, 8 months

2
9
0 0

[PATCH net] selftests: pmtu.sh: fix result checking

by Hangbin Liu

In the PMTU test, when all previous tests are skipped and the new test passes, the exit code is set to 0. However, the current check mistakenly treats this as an assignment, causing the check to pass every time. Consequently, regardless of how many tests have failed, if the latest test passes, the PMTU test will report a pass. Fixes: 2a9d3716b810 ("selftests: pmtu.sh: improve the test result processing") Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com> --- tools/testing/selftests/net/pmtu.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh index f838dd370f6a..b9648da4c371 100755 --- a/tools/testing/selftests/net/pmtu.sh +++ b/tools/testing/selftests/net/pmtu.sh @@ -2048,7 +2048,7 @@ run_test() { case $ret in 0) all_skipped=false - [ $exitcode=$ksft_skip ] && exitcode=0 + [ $exitcode = $ksft_skip ] && exitcode=0 ;; $ksft_skip) [ $all_skipped = true ] && exitcode=$ksft_skip -- 2.41.0

1 year, 8 months

2
1
0 0

[PATCH] Lower the ptrace permissions so that the memfd_secrect test program runs without an issue.

by Itaru Kitayama

--- On Ubuntu and probably other distros, ptrace permissions are tightend a bit by default; i.e., /proc/sys/kernel/yama/ptrace_score is set to 1. This cases memfd_secret's ptrace attach test fails with a permission error. Set it to 0 piror to running the program. Signed-off-by: Itaru Kitayama <itaru.kitayama(a)linux.dev> --- tools/testing/selftests/mm/run_vmtests.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index 3e2bc818d566..7d31718ce834 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -303,6 +303,7 @@ CATEGORY="hmm" run_test bash ./test_hmm.sh smoke # MADV_POPULATE_READ and MADV_POPULATE_WRITE tests CATEGORY="madv_populate" run_test ./madv_populate +echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope CATEGORY="memfd_secret" run_test ./memfd_secret # KSM KSM_MERGE_TIME_HUGE_PAGES test with size of 100 --- base-commit: ffc253263a1375a65fa6c9f62a893e9767fbebfa change-id: 20231030-selftest-c75b1b460817 Best regards, -- Itaru Kitayama <itaru.kitayama(a)linux.dev>

1 year, 8 months

1
0
0 0

[PATCH AUTOSEL 6.1 35/39] netfilter: nf_tables: audit log object reset once per table

by Sasha Levin

From: Phil Sutter <phil(a)nwl.cc> [ Upstream commit 1baf0152f7707c6c7e4ea815dcc1f431c0e603f9 ] When resetting multiple objects at once (via dump request), emit a log message per table (or filled skb) and resurrect the 'entries' parameter to contain the number of objects being logged for. To test the skb exhaustion path, perform some bulk counter and quota adds in the kselftest. Signed-off-by: Phil Sutter <phil(a)nwl.cc> Reviewed-by: Richard Guy Briggs <rgb(a)redhat.com> Acked-by: Paul Moore <paul(a)paul-moore.com> (Audit) Signed-off-by: Florian Westphal <fw(a)strlen.de> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- net/netfilter/nf_tables_api.c | 50 +++++++++++-------- .../testing/selftests/netfilter/nft_audit.sh | 46 +++++++++++++++++ 2 files changed, 74 insertions(+), 22 deletions(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 5e3dbe2652dbd..5c783199b4999 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7324,6 +7324,16 @@ static int nf_tables_fill_obj_info(struct sk_buff *skb, struct net *net, return -1; } +static void audit_log_obj_reset(const struct nft_table *table, + unsigned int base_seq, unsigned int nentries) +{ + char *buf = kasprintf(GFP_ATOMIC, "%s:%u", table->name, base_seq); + + audit_log_nfcfg(buf, table->family, nentries, + AUDIT_NFT_OP_OBJ_RESET, GFP_ATOMIC); + kfree(buf); +} + struct nft_obj_filter { char *table; u32 type; @@ -7338,8 +7348,10 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) struct net *net = sock_net(skb->sk); int family = nfmsg->nfgen_family; struct nftables_pernet *nft_net; + unsigned int entries = 0; struct nft_object *obj; bool reset = false; + int rc = 0; if (NFNL_MSG_TYPE(cb->nlh->nlmsg_type) == NFT_MSG_GETOBJ_RESET) reset = true; @@ -7352,6 +7364,7 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) if (family != NFPROTO_UNSPEC && family != table->family) continue; + entries = 0; list_for_each_entry_rcu(obj, &table->objects, list) { if (!nft_is_active(net, obj)) goto cont; @@ -7367,34 +7380,27 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) filter->type != NFT_OBJECT_UNSPEC && obj->ops->type->type != filter->type) goto cont; - if (reset) { - char *buf = kasprintf(GFP_ATOMIC, - "%s:%u", - table->name, - nft_net->base_seq); - - audit_log_nfcfg(buf, - family, - obj->handle, - AUDIT_NFT_OP_OBJ_RESET, - GFP_ATOMIC); - kfree(buf); - } - if (nf_tables_fill_obj_info(skb, net, NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, - NFT_MSG_NEWOBJ, - NLM_F_MULTI | NLM_F_APPEND, - table->family, table, - obj, reset) < 0) - goto done; + rc = nf_tables_fill_obj_info(skb, net, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NFT_MSG_NEWOBJ, + NLM_F_MULTI | NLM_F_APPEND, + table->family, table, + obj, reset); + if (rc < 0) + break; + entries++; nl_dump_check_consistent(cb, nlmsg_hdr(skb)); cont: idx++; } + if (reset && entries) + audit_log_obj_reset(table, nft_net->base_seq, entries); + if (rc < 0) + break; } -done: rcu_read_unlock(); cb->args[0] = idx; @@ -7499,7 +7505,7 @@ static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, audit_log_nfcfg(buf, family, - obj->handle, + 1, AUDIT_NFT_OP_OBJ_RESET, GFP_ATOMIC); kfree(buf); diff --git a/tools/testing/selftests/netfilter/nft_audit.sh b/tools/testing/selftests/netfilter/nft_audit.sh index bb34329e02a7f..e94a80859bbdb 100755 --- a/tools/testing/selftests/netfilter/nft_audit.sh +++ b/tools/testing/selftests/netfilter/nft_audit.sh @@ -93,6 +93,12 @@ do_test 'nft add counter t1 c1' \ do_test 'nft add counter t2 c1; add counter t2 c2' \ 'table=t2 family=2 entries=2 op=nft_register_obj' +for ((i = 3; i <= 500; i++)); do + echo "add counter t2 c$i" +done >$rulefile +do_test "nft -f $rulefile" \ +'table=t2 family=2 entries=498 op=nft_register_obj' + # adding/updating quotas do_test 'nft add quota t1 q1 { 10 bytes }' \ @@ -101,6 +107,12 @@ do_test 'nft add quota t1 q1 { 10 bytes }' \ do_test 'nft add quota t2 q1 { 10 bytes }; add quota t2 q2 { 10 bytes }' \ 'table=t2 family=2 entries=2 op=nft_register_obj' +for ((i = 3; i <= 500; i++)); do + echo "add quota t2 q$i { 10 bytes }" +done >$rulefile +do_test "nft -f $rulefile" \ +'table=t2 family=2 entries=498 op=nft_register_obj' + # changing the quota value triggers obj update path do_test 'nft add quota t1 q1 { 20 bytes }' \ 'table=t1 family=2 entries=1 op=nft_register_obj' @@ -150,6 +162,40 @@ done do_test 'nft reset set t1 s' \ 'table=t1 family=2 entries=3 op=nft_reset_setelem' +# resetting counters + +do_test 'nft reset counter t1 c1' \ +'table=t1 family=2 entries=1 op=nft_reset_obj' + +do_test 'nft reset counters t1' \ +'table=t1 family=2 entries=1 op=nft_reset_obj' + +do_test 'nft reset counters t2' \ +'table=t2 family=2 entries=342 op=nft_reset_obj +table=t2 family=2 entries=158 op=nft_reset_obj' + +do_test 'nft reset counters' \ +'table=t1 family=2 entries=1 op=nft_reset_obj +table=t2 family=2 entries=341 op=nft_reset_obj +table=t2 family=2 entries=159 op=nft_reset_obj' + +# resetting quotas + +do_test 'nft reset quota t1 q1' \ +'table=t1 family=2 entries=1 op=nft_reset_obj' + +do_test 'nft reset quotas t1' \ +'table=t1 family=2 entries=1 op=nft_reset_obj' + +do_test 'nft reset quotas t2' \ +'table=t2 family=2 entries=315 op=nft_reset_obj +table=t2 family=2 entries=185 op=nft_reset_obj' + +do_test 'nft reset quotas' \ +'table=t1 family=2 entries=1 op=nft_reset_obj +table=t2 family=2 entries=314 op=nft_reset_obj +table=t2 family=2 entries=186 op=nft_reset_obj' + # deleting rules readarray -t handles < <(nft -a list chain t1 c1 | \ -- 2.42.0

1 year, 8 months

1
0
0 0

[PATCH AUTOSEL 6.5 47/52] netfilter: nf_tables: audit log object reset once per table

by Sasha Levin

From: Phil Sutter <phil(a)nwl.cc> [ Upstream commit 1baf0152f7707c6c7e4ea815dcc1f431c0e603f9 ] When resetting multiple objects at once (via dump request), emit a log message per table (or filled skb) and resurrect the 'entries' parameter to contain the number of objects being logged for. To test the skb exhaustion path, perform some bulk counter and quota adds in the kselftest. Signed-off-by: Phil Sutter <phil(a)nwl.cc> Reviewed-by: Richard Guy Briggs <rgb(a)redhat.com> Acked-by: Paul Moore <paul(a)paul-moore.com> (Audit) Signed-off-by: Florian Westphal <fw(a)strlen.de> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- net/netfilter/nf_tables_api.c | 50 +++++++++++-------- .../testing/selftests/netfilter/nft_audit.sh | 46 +++++++++++++++++ 2 files changed, 74 insertions(+), 22 deletions(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index be5869366c7d3..bddf68f364fb5 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7612,6 +7612,16 @@ static int nf_tables_fill_obj_info(struct sk_buff *skb, struct net *net, return -1; } +static void audit_log_obj_reset(const struct nft_table *table, + unsigned int base_seq, unsigned int nentries) +{ + char *buf = kasprintf(GFP_ATOMIC, "%s:%u", table->name, base_seq); + + audit_log_nfcfg(buf, table->family, nentries, + AUDIT_NFT_OP_OBJ_RESET, GFP_ATOMIC); + kfree(buf); +} + struct nft_obj_filter { char *table; u32 type; @@ -7626,8 +7636,10 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) struct net *net = sock_net(skb->sk); int family = nfmsg->nfgen_family; struct nftables_pernet *nft_net; + unsigned int entries = 0; struct nft_object *obj; bool reset = false; + int rc = 0; if (NFNL_MSG_TYPE(cb->nlh->nlmsg_type) == NFT_MSG_GETOBJ_RESET) reset = true; @@ -7640,6 +7652,7 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) if (family != NFPROTO_UNSPEC && family != table->family) continue; + entries = 0; list_for_each_entry_rcu(obj, &table->objects, list) { if (!nft_is_active(net, obj)) goto cont; @@ -7655,34 +7668,27 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) filter->type != NFT_OBJECT_UNSPEC && obj->ops->type->type != filter->type) goto cont; - if (reset) { - char *buf = kasprintf(GFP_ATOMIC, - "%s:%u", - table->name, - nft_net->base_seq); - - audit_log_nfcfg(buf, - family, - obj->handle, - AUDIT_NFT_OP_OBJ_RESET, - GFP_ATOMIC); - kfree(buf); - } - if (nf_tables_fill_obj_info(skb, net, NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, - NFT_MSG_NEWOBJ, - NLM_F_MULTI | NLM_F_APPEND, - table->family, table, - obj, reset) < 0) - goto done; + rc = nf_tables_fill_obj_info(skb, net, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NFT_MSG_NEWOBJ, + NLM_F_MULTI | NLM_F_APPEND, + table->family, table, + obj, reset); + if (rc < 0) + break; + entries++; nl_dump_check_consistent(cb, nlmsg_hdr(skb)); cont: idx++; } + if (reset && entries) + audit_log_obj_reset(table, nft_net->base_seq, entries); + if (rc < 0) + break; } -done: rcu_read_unlock(); cb->args[0] = idx; @@ -7787,7 +7793,7 @@ static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, audit_log_nfcfg(buf, family, - obj->handle, + 1, AUDIT_NFT_OP_OBJ_RESET, GFP_ATOMIC); kfree(buf); diff --git a/tools/testing/selftests/netfilter/nft_audit.sh b/tools/testing/selftests/netfilter/nft_audit.sh index bb34329e02a7f..e94a80859bbdb 100755 --- a/tools/testing/selftests/netfilter/nft_audit.sh +++ b/tools/testing/selftests/netfilter/nft_audit.sh @@ -93,6 +93,12 @@ do_test 'nft add counter t1 c1' \ do_test 'nft add counter t2 c1; add counter t2 c2' \ 'table=t2 family=2 entries=2 op=nft_register_obj' +for ((i = 3; i <= 500; i++)); do + echo "add counter t2 c$i" +done >$rulefile +do_test "nft -f $rulefile" \ +'table=t2 family=2 entries=498 op=nft_register_obj' + # adding/updating quotas do_test 'nft add quota t1 q1 { 10 bytes }' \ @@ -101,6 +107,12 @@ do_test 'nft add quota t1 q1 { 10 bytes }' \ do_test 'nft add quota t2 q1 { 10 bytes }; add quota t2 q2 { 10 bytes }' \ 'table=t2 family=2 entries=2 op=nft_register_obj' +for ((i = 3; i <= 500; i++)); do + echo "add quota t2 q$i { 10 bytes }" +done >$rulefile +do_test "nft -f $rulefile" \ +'table=t2 family=2 entries=498 op=nft_register_obj' + # changing the quota value triggers obj update path do_test 'nft add quota t1 q1 { 20 bytes }' \ 'table=t1 family=2 entries=1 op=nft_register_obj' @@ -150,6 +162,40 @@ done do_test 'nft reset set t1 s' \ 'table=t1 family=2 entries=3 op=nft_reset_setelem' +# resetting counters + +do_test 'nft reset counter t1 c1' \ +'table=t1 family=2 entries=1 op=nft_reset_obj' + +do_test 'nft reset counters t1' \ +'table=t1 family=2 entries=1 op=nft_reset_obj' + +do_test 'nft reset counters t2' \ +'table=t2 family=2 entries=342 op=nft_reset_obj +table=t2 family=2 entries=158 op=nft_reset_obj' + +do_test 'nft reset counters' \ +'table=t1 family=2 entries=1 op=nft_reset_obj +table=t2 family=2 entries=341 op=nft_reset_obj +table=t2 family=2 entries=159 op=nft_reset_obj' + +# resetting quotas + +do_test 'nft reset quota t1 q1' \ +'table=t1 family=2 entries=1 op=nft_reset_obj' + +do_test 'nft reset quotas t1' \ +'table=t1 family=2 entries=1 op=nft_reset_obj' + +do_test 'nft reset quotas t2' \ +'table=t2 family=2 entries=315 op=nft_reset_obj +table=t2 family=2 entries=185 op=nft_reset_obj' + +do_test 'nft reset quotas' \ +'table=t1 family=2 entries=1 op=nft_reset_obj +table=t2 family=2 entries=314 op=nft_reset_obj +table=t2 family=2 entries=186 op=nft_reset_obj' + # deleting rules readarray -t handles < <(nft -a list chain t1 c1 | \ -- 2.42.0

1 year, 8 months

1
0
0 0

[PATCH] selftests/cgroup: Fix awk usage in test_cpuset_prs.sh that may cause error

by Juntong Deng

According to the awk manual, the -e option does not need to be specified in front of 'program' (unless you need to mix program-file). The redundant -e option can cause error when users use awk tools other than gawk (for example, mawk does not support the -e option). Error Example: awk: not an option: -e Cgroup v2 mount point not found! Signed-off-by: Juntong Deng <juntong.deng(a)outlook.com> --- tools/testing/selftests/cgroup/test_cpuset_prs.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index 4afb132e4e4f..6820653e8432 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -20,7 +20,7 @@ skip_test() { WAIT_INOTIFY=$(cd $(dirname $0); pwd)/wait_inotify # Find cgroup v2 mount point -CGROUP2=$(mount -t cgroup2 | head -1 | awk -e '{print $3}') +CGROUP2=$(mount -t cgroup2 | head -1 | awk '{print $3}') [[ -n "$CGROUP2" ]] || skip_test "Cgroup v2 mount point not found!" CPUS=$(lscpu | grep "^CPU(s):" | sed -e "s/.*:[[:space:]]*//") -- 2.39.2

1 year, 8 months

2
2
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror October 2023