June 2024 - Linux-kselftest-mirror

[PATCH net-next v6 00/10] net: openvswitch: Add sample multicasting.

by Adrian Moreno

** Background ** Currently, OVS supports several packet sampling mechanisms (sFlow, per-bridge IPFIX, per-flow IPFIX). These end up being translated into a userspace action that needs to be handled by ovs-vswitchd's handler threads only to be forwarded to some third party application that will somehow process the sample and provide observability on the datapath. A particularly interesting use-case is controller-driven per-flow IPFIX sampling where the OpenFlow controller can add metadata to samples (via two 32bit integers) and this metadata is then available to the sample-collecting system for correlation. ** Problem ** The fact that sampled traffic share netlink sockets and handler thread time with upcalls, apart from being a performance bottleneck in the sample extraction itself, can severely compromise the datapath, yielding this solution unfit for highly loaded production systems. Users are left with little options other than guessing what sampling rate will be OK for their traffic pattern and system load and dealing with the lost accuracy. Looking at available infrastructure, an obvious candidated would be to use psample. However, it's current state does not help with the use-case at stake because sampled packets do not contain user-defined metadata. ** Proposal ** This series is an attempt to fix this situation by extending the existing psample infrastructure to carry a variable length user-defined cookie. The main existing user of psample is tc's act_sample. It is also extended to forward the action's cookie to psample. Finally, a new OVS action (OVS_SAMPLE_ATTR_EMIT_SAMPLE) is created. It accepts a group and an optional cookie and uses psample to multicast the packet and the metadata. -- v5 -> v6: - Renamed emit_sample -> psample - Addressed unused variable and conditionally compilation of function. v4 -> v5: - Rebased. - Removed lefover enum value and wrapped some long lines in selftests. v3 -> v4: - Rebased. - Addressed Jakub's comment on private and unused nla attributes. v2 -> v3: - Addressed comments from Simon, Aaron and Ilya. - Dropped probability propagation in nested sample actions. - Dropped patch v2's 7/9 in favor of a userspace implementation and consume skb if emit_sample is the last action, same as we do with userspace. - Split ovs-dpctl.py features in independent patches. v1 -> v2: - Create a new action ("emit_sample") rather than reuse existing "sample" one. - Add probability semantics to psample's sampling rate. - Store sampling probability in skb's cb area and use it in emit_sample. - Test combining "emit_sample" with "trunc" - Drop group_id filtering and tracepoint in psample. rfc_v2 -> v1: - Accommodate Ilya's comments. - Split OVS's attribute in two attributes and simplify internal handling of psample arguments. - Extend psample and tc with a user-defined cookie. - Add a tracepoint to psample to facilitate troubleshooting. rfc_v1 -> rfc_v2: - Use psample instead of a new OVS-only multicast group. - Extend psample and tc with a user-defined cookie. Adrian Moreno (10): net: psample: add user cookie net: sched: act_sample: add action cookie to sample net: psample: skip packet copy if no listeners net: psample: allow using rate as probability net: openvswitch: add psample action net: openvswitch: store sampling probability in cb. selftests: openvswitch: add psample action selftests: openvswitch: add userspace parsing selftests: openvswitch: parse trunc action selftests: openvswitch: add psample test Documentation/netlink/specs/ovs_flow.yaml | 17 ++ include/net/psample.h | 5 +- include/uapi/linux/openvswitch.h | 31 +- include/uapi/linux/psample.h | 11 +- net/openvswitch/Kconfig | 1 + net/openvswitch/actions.c | 65 ++++- net/openvswitch/datapath.h | 3 + net/openvswitch/flow_netlink.c | 32 ++- net/openvswitch/vport.c | 1 + net/psample/psample.c | 16 +- net/sched/act_sample.c | 12 + .../selftests/net/openvswitch/openvswitch.sh | 115 +++++++- .../selftests/net/openvswitch/ovs-dpctl.py | 272 +++++++++++++++++- 13 files changed, 565 insertions(+), 16 deletions(-) -- 2.45.2

1 year

2
5
0 0

[PATCH v6 0/4] Userspace controls soft-offline pages

by Jiaqi Yan

Correctable memory errors are very common on servers with large amount of memory, and are corrected by ECC, but with two pain points to users: 1. Correction usually happens on the fly and adds latency overhead 2. Not-fully-proved theory states excessive correctable memory errors can develop into uncorrectable memory error. Soft offline is kernel's additional solution for memory pages having (excessive) corrected memory errors. Impacted page is migrated to healthy page if it is in use, then the original page is discarded for any future use. The actual policy on whether (and when) to soft offline should be maintained by userspace, especially in case of an 1G HugeTLB page. Soft-offline dissolves the HugeTLB page, either in-use or free, into chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage. If userspace has not acknowledged such behavior, it may be surprised when later mmap hugepages MAP_FAILED due to lack of hugepages. In case of a transparent hugepage, it will be split into 4K pages as well; userspace will stop enjoying the transparent performance. In addition, discarding the entire 1G HugeTLB page only because of corrected memory errors sounds very costly and kernel better not doing under the hood. But today there are at least 2 such cases: 1. GHES driver sees both GHES_SEV_CORRECTED and CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER. 2. RAS Correctable Errors Collector counts correctable errors per PFN and when the counter for a PFN reaches threshold In both cases, userspace has no control of the soft offline performed by kernel's memory failure recovery. This patch series give userspace the control of softofflining any page: kernel only soft offlines raw page / transparent hugepage / HugeTLB hugepage if userspace has agreed to. The interface to userspace is a new sysctl called enable_soft_offline under /proc/sys/vm. By default enable_soft_line is 1 to preserve existing behavior in kernel. Changelog v5=> v6: * incorporate feedbacks from Miaohe Lin <linmiaohe(a)huawei.com> * add a ':' in soft offline log. * close hugetlbfs file descriptor in selftest. * no need to "return" after ksft_exit_fail_msg. v4 => v5: * incorporate feedbacks from Muhammad Usama Anjum <usama.anjum(a)collabora.com> * refactor selftest to use what available in kselftest.h v3 => v4: * incorporate feedbacks from Miaohe Lin <linmiaohe(a)huawei.com>, Andrew Morton <akpm(a)linux-foundation.org>, and Oscar Salvador <osalvador(a)suse.de>. * insert a refactor commit to unify soft offline's logs to follow "Soft offline: 0x${pfn}: ${message}" format. * some rewords in document: fail => will not perform. * v4 is still based on commit 83a7eefedc9b ("Linux 6.10-rc3"), akpm/mm-stable. v2 => v3: * incorporate feedbacks from Miaohe Lin <linmiaohe(a)huawei.com>, Lance Yang <ioworker0(a)gmail.com>, Oscar Salvador <osalvador(a)suse.de>, and David Rientjes <rientjes(a)google.com>. * release potential refcount if enable_soft_offline is 0. * soft_offline_page() returns EOPNOTSUPP if enable_soft_offline is 0. * refactor hugetlb-soft-offline.c, for example, introduce test_soft_offline_common to reduce repeated code. * rewrite enable_soft_offline's documentation, adds more details about the cost of soft-offline for transparent and hugetlb hugepages, and components that are impacted when enable_soft_offline becomes 0. * fix typos in commit messages. * v3 is still based on commit 83a7eefedc9b ("Linux 6.10-rc3"). v1 => v2: * incorporate feedbacks from both Miaohe Lin <linmiaohe(a)huawei.com> and Jane Chu <jane.chu(a)oracle.com>. * make the switch to control all pages, instead of HugeTLB specific. * change the API from /sys/kernel/mm/hugepages/hugepages-${size}kB/softoffline_corrected_errors to /proc/sys/vm/enable_soft_offline. * minor update to test code. * update documentation of the user control API. * v2 is based on commit 83a7eefedc9b ("Linux 6.10-rc3"). Jiaqi Yan (4): mm/memory-failure: refactor log format in soft offline code mm/memory-failure: userspace controls soft-offlining pages selftest/mm: test enable_soft_offline behaviors docs: mm: add enable_soft_offline sysctl Documentation/admin-guide/sysctl/vm.rst | 32 +++ mm/memory-failure.c | 38 ++- tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + .../selftests/mm/hugetlb-soft-offline.c | 228 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 4 + 6 files changed, 296 insertions(+), 8 deletions(-) create mode 100644 tools/testing/selftests/mm/hugetlb-soft-offline.c -- 2.45.2.741.gdbec12cfda-goog

1 year

3
8
0 0

[PATCH v2 0/4] Introduce user namespace capabilities

by Jonathan Calmels

This patch series introduces a new user namespace capability set, as well as some plumbing around it (i.e. sysctl, secbit, lsm support). First patch goes over the motivations for this as well as prior art. In summary, while user namespaces are a great success today in that they avoid running a lot of code as root, they also expand the attack surface of the kernel substantially which is often abused by attackers. Methods exist to limit the creation of such namespaces [1], however, application developers often need to assume that user namespaces are available for various tasks such as sandboxing. Thus, instead of restricting the creation of user namespaces, we offer ways for userspace to limit the capabilities granted to them. Why a new capability set and not something specific to the userns (e.g. ioctl_ns)? 1. We can't really expect userspace to patch every single callsite and opt-in this new security mechanism. 2. We don't necessarily want policies enforced at said callsites. For example a service like systemd-machined or a PAM session need to be able to place restrictions on any namespace spawned under it. 3. We would need to come up with inheritance rules, querying capabilities, etc. At this point we're just reinventing capability sets. 4. We can easily define interactions between capability sets, thus helping with adoption (patch 2 is an example of this) Some examples of how this could be leveraged in userspace: - Prevent user from getting CAP_NET_ADMIN in user namespaces under SSH: echo "auth optional pam_cap.so" >> /etc/pam.d/sshd echo "!cap_net_admin $USER" >> /etc/security/capability.conf capsh --secbits=$((1 << 8)) -- -c /usr/sbin/sshd - Prevent containers from ever getting CAP_DAC_OVERRIDE: systemd-run -p CapabilityBoundingSet=~CAP_DAC_OVERRIDE \ -p SecureBits=userns-strict-caps \ /usr/bin/dockerd systemd-run -p UserNSCapabilities=~CAP_DAC_OVERRIDE \ /usr/bin/incusd - Kernel could be vulnerable to CAP_SYS_RAWIO exploits, prevent it: sysctl -w cap_bound_userns_mask=0x1fffffdffff - Drop CAP_SYS_ADMIN for this shell and all the user namespaces below it: bwrap --unshare-user --cap-drop CAP_SYS_ADMIN /bin/sh [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… --- Changes since v1: - Add documentation - Change commit wording - Cleanup various aspects of the code based on feedback - Add new CAP_SYS_CONTROL capability for sysctl check - Add BPF-LSM support for modifying userns capabilities --- Jonathan Calmels (4): capabilities: Add user namespace capabilities capabilities: Add securebit to restrict userns caps capabilities: Add sysctl to mask off userns caps bpf,lsm: Allow editing capabilities in BPF-LSM hooks Documentation/filesystems/proc.rst | 1 + Documentation/security/credentials.rst | 6 ++ fs/proc/array.c | 9 +++ include/linux/cred.h | 3 + include/linux/lsm_hook_defs.h | 2 +- include/linux/securebits.h | 1 + include/linux/security.h | 4 +- include/linux/user_namespace.h | 7 ++ include/uapi/linux/capability.h | 6 +- include/uapi/linux/prctl.h | 7 ++ include/uapi/linux/securebits.h | 11 ++- kernel/bpf/bpf_lsm.c | 55 +++++++++++++ kernel/cred.c | 3 + kernel/sysctl.c | 10 +++ kernel/umh.c | 15 ++++ kernel/user_namespace.c | 80 +++++++++++++++++-- security/apparmor/lsm.c | 2 +- security/commoncap.c | 62 +++++++++++++- security/keys/process_keys.c | 3 + security/security.c | 6 +- security/selinux/hooks.c | 2 +- security/selinux/include/classmap.h | 5 +- .../selftests/bpf/prog_tests/deny_namespace.c | 12 ++- .../selftests/bpf/progs/test_deny_namespace.c | 7 +- 24 files changed, 291 insertions(+), 28 deletions(-) -- 2.45.2

1 year

7
31
0 0

[PATCH v1] selftests/harness: Fix tests timeout and race condition

by Mickaël Salaün

We cannot use CLONE_VFORK because we also need to wait for the timeout signal. Restore tests timeout by using the original fork() call in __run_test() but also in __TEST_F_IMPL(). Also fix a race condition when waiting for the test child process. Because test metadata are shared between test processes, only the parent process must set the test PID (child). Otherwise, t->pid may be set to zero, leading to inconsistent error cases: # RUN layout1.rule_on_mountpoint ... # rule_on_mountpoint: Test ended in some other way [127] # OK layout1.rule_on_mountpoint ok 20 layout1.rule_on_mountpoint As safeguards, initialize the "status" variable with a valid exit code, and handle unknown test exits as errors. The use of fork() introduces a new race condition in landlock/fs_test.c which seems to be specific to hostfs bind mounts, but I haven't found the root cause and it's difficult to trigger. I'll try to fix it with another patch. Cc: Christian Brauner <brauner(a)kernel.org> Cc: Günther Noack <gnoack(a)google.com> Cc: Jakub Kicinski <kuba(a)kernel.org> Cc: Kees Cook <keescook(a)chromium.org> Cc: Mark Brown <broonie(a)kernel.org> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Will Drewry <wad(a)chromium.org> Cc: stable(a)vger.kernel.org Closes: https://lore.kernel.org/r/9341d4db-5e21-418c-bf9e-9ae2da7877e1@sirena.org.uk Fixes: a86f18903db9 ("selftests/harness: Fix interleaved scheduling leading to race conditions") Fixes: 24cf65a62266 ("selftests/harness: Share _metadata between forked processes") Signed-off-by: Mickaël Salaün <mic(a)digikod.net> Link: https://lore.kernel.org/r/20240621180605.834676-1-mic@digikod.net --- tools/testing/selftests/kselftest_harness.h | 43 ++++++++++++--------- 1 file changed, 24 insertions(+), 19 deletions(-) diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h index b634969cbb6f..40723a6a083f 100644 --- a/tools/testing/selftests/kselftest_harness.h +++ b/tools/testing/selftests/kselftest_harness.h @@ -66,8 +66,6 @@ #include <sys/wait.h> #include <unistd.h> #include <setjmp.h> -#include <syscall.h> -#include <linux/sched.h> #include "kselftest.h" @@ -82,17 +80,6 @@ # define TH_LOG_ENABLED 1 #endif -/* Wait for the child process to end but without sharing memory mapping. */ -static inline pid_t clone3_vfork(void) -{ - struct clone_args args = { - .flags = CLONE_VFORK, - .exit_signal = SIGCHLD, - }; - - return syscall(__NR_clone3, &args, sizeof(args)); -} - /** * TH_LOG() * @@ -437,7 +424,7 @@ static inline pid_t clone3_vfork(void) } \ if (setjmp(_metadata->env) == 0) { \ /* _metadata and potentially self are shared with all forks. */ \ - child = clone3_vfork(); \ + child = fork(); \ if (child == 0) { \ fixture_name##_setup(_metadata, self, variant->data); \ /* Let setup failure terminate early. */ \ @@ -1016,7 +1003,14 @@ void __wait_for_test(struct __test_metadata *t) .sa_flags = SA_SIGINFO, }; struct sigaction saved_action; - int status; + /* + * Sets status so that WIFEXITED(status) returns true and + * WEXITSTATUS(status) returns KSFT_FAIL. This safe default value + * should never be evaluated because of the waitpid(2) check and + * SIGALRM handling. + */ + int status = KSFT_FAIL << 8; + int child; if (sigaction(SIGALRM, &action, &saved_action)) { t->exit_code = KSFT_FAIL; @@ -1028,7 +1022,15 @@ void __wait_for_test(struct __test_metadata *t) __active_test = t; t->timed_out = false; alarm(t->timeout); - waitpid(t->pid, &status, 0); + child = waitpid(t->pid, &status, 0); + if (child == -1 && errno != EINTR) { + t->exit_code = KSFT_FAIL; + fprintf(TH_LOG_STREAM, + "# %s: Failed to wait for PID %d (errno: %d)\n", + t->name, t->pid, errno); + return; + } + alarm(0); if (sigaction(SIGALRM, &saved_action, NULL)) { t->exit_code = KSFT_FAIL; @@ -1083,6 +1085,7 @@ void __wait_for_test(struct __test_metadata *t) WTERMSIG(status)); } } else { + t->exit_code = KSFT_FAIL; fprintf(TH_LOG_STREAM, "# %s: Test ended in some other way [%u]\n", t->name, @@ -1218,6 +1221,7 @@ void __run_test(struct __fixture_metadata *f, struct __test_xfail *xfail; char test_name[1024]; const char *diagnostic; + int child; /* reset test struct */ t->exit_code = KSFT_PASS; @@ -1236,15 +1240,16 @@ void __run_test(struct __fixture_metadata *f, fflush(stdout); fflush(stderr); - t->pid = clone3_vfork(); - if (t->pid < 0) { + child = fork(); + if (child < 0) { ksft_print_msg("ERROR SPAWNING TEST CHILD\n"); t->exit_code = KSFT_FAIL; - } else if (t->pid == 0) { + } else if (child == 0) { setpgrp(); t->fn(t, variant); _exit(t->exit_code); } else { + t->pid = child; __wait_for_test(t); } ksft_print_msg(" %4s %s\n", base-commit: 83a7eefedc9b56fe7bfeff13b6c7356688ffa670 -- 2.45.2

1 year

2
2
0 0

[PATCH net-next 00/12] selftest: Clean-up and stabilize mirroring tests

by Petr Machata

The mirroring selftests work by sending ICMP traffic between two hosts. Along the way, this traffic is mirrored to a gretap netdevice, and counter taps are then installed strategically along the path of the mirrored traffic to verify the mirroring took place. The problem with this is that besides mirroring the primary traffic, any other service traffic is mirrored as well. At the same time, because the tests need to work in HW-offloaded scenarios, the ability of the device to do arbitrary packet inspection should not be taken for granted. Most tests therefore simply use matchall, one uses flower to match on IP address. As a result, the selftests are noisy. mirror_test() accommodated this noisiness by giving the counters an allowance of several packets. But that only works up to a point, and on busy systems won't be always enough. In this patch set, clean up and stabilize the mirroring selftests. The original intention was to port the tests over to UDP, but the logic of ICMP ends up being so entangled in the mirroring selftests that the changes feel overly invasive. Instead, ICMP is kept, but where possible, we match on ICMP message type, thus filtering out hits by other ICMP messages. Where this is not practical (where the counter tap is put on a device that carries encapsulated packets), switch the counter condition to _at least_ X observed packets. This is less robust, but barely so -- probably the only scenario that this would not catch is something like erroneous packet duplication, which would hopefully get caught by the numerous other tests in this extensive suite. - Patches #1 to #3 clean up parameters at various helpers. - Patches #4 to #6 stabilize the mirroring selftests as described above. - Mirroring tests currently allow testing SW datapath even on HW netdevices by trapping traffic to the SW datapath. This complicates the tests a bit without a good reason: to test SW datapath, just run the selftests on the veth topology. Thus in patch #7, drop support for this dual SW/HW testing. - At this point, some cleanups were either made possible by the previous patches, or were always possible. In patches #8 to #11, realize these cleanups. - In patch #12, fix mlxsw mirror_gre selftest to respect setting TESTS. Petr Machata (12): selftests: libs: Expand "$@" where possible selftests: mirror: Drop direction argument from several functions selftests: lib: tc_rule_stats_get(): Move default to argument definition selftests: mirror_gre_lag_lacp: Check counters at tunnel selftests: mirror: do_test_span_dir_ips(): Install accurate taps selftests: mirror: mirror_test(): Allow exact count of packets selftests: mirror: Drop dual SW/HW testing selftests: mlxsw: mirror_gre: Simplify selftests: mirror_gre_lag_lacp: Drop unnecessary code selftests: libs: Drop slow_path_trap_install()/_uninstall() selftests: libs: Drop unused functions selftests: mlxsw: mirror_gre: Obey TESTS .../selftests/drivers/net/mlxsw/mirror_gre.sh | 71 ++++++--------- .../drivers/net/mlxsw/mirror_gre_scale.sh | 18 +--- tools/testing/selftests/net/forwarding/lib.sh | 83 +++++++++++------ .../selftests/net/forwarding/mirror_gre.sh | 45 +++------- .../net/forwarding/mirror_gre_bound.sh | 23 +---- .../net/forwarding/mirror_gre_bridge_1d.sh | 21 +---- .../forwarding/mirror_gre_bridge_1d_vlan.sh | 21 +---- .../net/forwarding/mirror_gre_bridge_1q.sh | 21 +---- .../forwarding/mirror_gre_bridge_1q_lag.sh | 29 ++---- .../net/forwarding/mirror_gre_changes.sh | 73 ++++++--------- .../net/forwarding/mirror_gre_flower.sh | 43 ++++----- .../net/forwarding/mirror_gre_lag_lacp.sh | 65 ++++++-------- .../net/forwarding/mirror_gre_lib.sh | 90 ++++++++++++++----- .../net/forwarding/mirror_gre_neigh.sh | 39 +++----- .../selftests/net/forwarding/mirror_gre_nh.sh | 35 ++------ .../net/forwarding/mirror_gre_vlan.sh | 21 +---- .../forwarding/mirror_gre_vlan_bridge_1q.sh | 69 ++++++-------- .../selftests/net/forwarding/mirror_lib.sh | 79 +++++++++++----- .../selftests/net/forwarding/mirror_vlan.sh | 43 +++------ tools/testing/selftests/net/lib.sh | 4 +- 20 files changed, 355 insertions(+), 538 deletions(-) -- 2.45.0

1 year

2
13
0 0

[PATCH v6 4/4] KVM: riscv: selftests: Add Svade and Svadu Extension to get-reg-list test

by Yong-Xuan Wang

Update the get-reg-list test to test the Svade and Svadu Extensions are available for guest OS. Signed-off-by: Yong-Xuan Wang <yongxuan.wang(a)sifive.com> Reviewed-by: Andrew Jones <ajones(a)ventanamicro.com> --- tools/testing/selftests/kvm/riscv/get-reg-list.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/tools/testing/selftests/kvm/riscv/get-reg-list.c b/tools/testing/selftests/kvm/riscv/get-reg-list.c index 222198dd6d04..1d32351ad55e 100644 --- a/tools/testing/selftests/kvm/riscv/get-reg-list.c +++ b/tools/testing/selftests/kvm/riscv/get-reg-list.c @@ -45,6 +45,8 @@ bool filter_reg(__u64 reg) case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SSAIA: case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SSCOFPMF: case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SSTC: + case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVADE: + case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVADU: case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVINVAL: case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVNAPOT: case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVPBMT: @@ -411,6 +413,8 @@ static const char *isa_ext_single_id_to_str(__u64 reg_off) KVM_ISA_EXT_ARR(SSAIA), KVM_ISA_EXT_ARR(SSCOFPMF), KVM_ISA_EXT_ARR(SSTC), + KVM_ISA_EXT_ARR(SVADE), + KVM_ISA_EXT_ARR(SVADU), KVM_ISA_EXT_ARR(SVINVAL), KVM_ISA_EXT_ARR(SVNAPOT), KVM_ISA_EXT_ARR(SVPBMT), @@ -935,6 +939,8 @@ KVM_ISA_EXT_SIMPLE_CONFIG(h, H); KVM_ISA_EXT_SUBLIST_CONFIG(smstateen, SMSTATEEN); KVM_ISA_EXT_SIMPLE_CONFIG(sscofpmf, SSCOFPMF); KVM_ISA_EXT_SIMPLE_CONFIG(sstc, SSTC); +KVM_ISA_EXT_SIMPLE_CONFIG(svade, SVADE); +KVM_ISA_EXT_SIMPLE_CONFIG(svadu, SVADU); KVM_ISA_EXT_SIMPLE_CONFIG(svinval, SVINVAL); KVM_ISA_EXT_SIMPLE_CONFIG(svnapot, SVNAPOT); KVM_ISA_EXT_SIMPLE_CONFIG(svpbmt, SVPBMT); @@ -991,6 +997,8 @@ struct vcpu_reg_list *vcpu_configs[] = { &config_smstateen, &config_sscofpmf, &config_sstc, + &config_svade, + &config_svadu, &config_svinval, &config_svnapot, &config_svpbmt, -- 2.17.1

1 year

1
0
0 0

Testing Quality Call notes - 2024-06-27

by Laura Nao

Hello, KernelCI is hosting a bi-weekly call on Thursday to discuss improvements to existing upstream tests, the development of new tests to increase kernel testing coverage, and the enablement of these tests in KernelCI. In recent months, we at Collabora have focused on various kernel areas, assessing the tests already available upstream and contributing patches to make them easily runnable in CIs. Below is a list of the tests we've been working on and their latest status updates, as discussed in the last meeting held on 2024-06-27: *USB/PCI devices kselftest* - Upstream test to detect unprobed devices on discoverable buses: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… - Kernel patches to allow running the test on more platforms on KernelCI were merged: https://lore.kernel.org/all/20240613-kselftest-discoverable-probe-mt8195-kc… - Waiting for KernelCI PRs to be merged: https://github.com/kernelci/kernelci-core/pull/2577 and https://github.com/kernelci/kernelci-pipeline/pull/642 *Error log test* - Proposing new kselftest to report device log errors: https://lore.kernel.org/all/20240423-dev-err-log-selftest-v1-0-690c1741d68b… - Currently fixing test failures in KernelCI *Suspend/resume in cpufreq kselftest* - Enabling suspend/resume test within the cpufreq kselftest in KernelCI - Parameter support for running subtests in a kselftest was merged: https://github.com/Linaro/test-definitions/pull/511 - Added rtcwake support in the test to enable automated resume, currently testing/debugging solution *Boot time test* - Investigating possibility of adding new test upstream to measure the kernel boot time and detect regressions - Currently looking into boot tracing with ftrace events and kprobes (see: https://www.kernel.org/doc/html/latest/trace/boottime-trace.html) - Idea for potential kselftest: insert explicit tracepoints in strategic places, let the user configure which times to measure. The test could provide a bootconfig file and a fragment to enable the required configs. This could be an alternative to using external tools (e.g. grabserial w/ early serial port init). - Need a list of functions to track in order to measure key metrics (e.g. device tree overhead, probe overhead, module load overhead) - Identify key drivers that need to be loaded early, for potentially supporting a two-phase boot: (1) time-critical, and (2) rest of the system *Other interesting updates* - Flaky serial on sc7180 was recently fixed: https://github.com/kernelci/kernelci-project/issues/380 and https://lore.kernel.org/all/20240610222515.3023730-1-dianders@chromium.org/… *Strategy for test enablement in KernelCI* - Guidance on test quality: KernelCI should set the standard for test quality, providing guidance on which tests to enable from various projects (e.g., kselftest, LTP). By doing so, KernelCI can serve as a model for other CI systems. - Develop mechanisms to automatically detect which tests should run on a specific platform - Embed metadata in the test themselves to facilitate the test selection process - Leverage device tree info to determine the appropriate tests for each platform Please reply to this thread if you'd like to join the call or discuss any of the topics further. We look forward to collaborating with the community to improve upstream tests and expand coverage to more areas of interest within the kernel. Best regards, Laura Nao

1 year

1
0
0 0

[PATCH v2 0/2] selftests/resctrl: SNC kernel support discovery

by Maciej Wieczor-Retman

Changes v2: - Removed patches 2 and 3 since now this part will be supported by the kernel. Sub-Numa Clustering (SNC) allows splitting CPU cores, caches and memory into multiple NUMA nodes. When enabled, NUMA-aware applications can achieve better performance on bigger server platforms. SNC support in the kernel is currently in review [1]. With SNC enabled and kernel support in place all the tests will function normally. There might be a problem when SNC is enabled but the system is still using an older kernel version without SNC support. Currently the only message displayed in that situation is a guess that SNC might be enabled and is causing issues. That message also is displayed whenever the test fails on an Intel platform. Add a mechanism to discover kernel support for SNC which will add more meaning and certainty to the error message. Series was tested on Ice Lake server platforms with SNC disabled, SNC-2 and SNC-4. The tests were also ran with and without kernel support for SNC. Series applies cleanly on kselftest/next. [1] https://lore.kernel.org/all/20240503203325.21512-1-tony.luck@intel.com/ Previous versions of this series: [v1] https://lore.kernel.org/all/cover.1709721159.git.maciej.wieczor-retman@inte… Maciej Wieczor-Retman (2): selftests/resctrl: Adjust effective L3 cache size with SNC enabled selftests/resctrl: Adjust SNC support messages tools/testing/selftests/resctrl/cat_test.c | 2 +- tools/testing/selftests/resctrl/cmt_test.c | 6 +- tools/testing/selftests/resctrl/mba_test.c | 2 + tools/testing/selftests/resctrl/mbm_test.c | 4 +- tools/testing/selftests/resctrl/resctrl.h | 8 +- tools/testing/selftests/resctrl/resctrlfs.c | 131 +++++++++++++++++++- 6 files changed, 144 insertions(+), 9 deletions(-) -- 2.45.0

1 year

3
18
0 0

[PATCH v3 0/2] Allow userspace to change ID_AA64PFR1_EL1

by Shaoqin Huang

Allow userspace to change the guest-visible value of the register with some severe limitation: - No changes to features not virtualized by KVM (MPAM_frac, RAS_frac, SME, RNDP_trap). - No changes to features (CSV2_frac, NMI, MTE_frac, GCS, THE, MTEX, DF2, PFAR) which haven't been added into the ftr_id_aa64pfr1[]. Because the struct arm64_ftr_bits definition for each feature in the ftr_id_aa64pfr1[] is used by arm64_check_features. If they're not existing in the ftr_id_aa64pfr1[], the for loop won't check the if the new_val is safe for those features. For the question why can't those fields be hidden depending on the VM configuration? I don't find there is the related VM configuration, maybe we should add the new VM configuration? I'm not sure I'm right, so if there're any problems please help to point out and I will fix them. Also add the selftest for it. Changelog: ---------- v2 -> v3: * Give more description about why only part of the fields can be writable. * Updated the writable mask by referring the latest ARM spec. v1 -> v2: * Tackling the full register instead of single field. * Changing the patch title and commit message. RFCv1 -> v1: * Fix the compilation error. * Delete the machine specific information and make the description more generable. RFCv1: https://lore.kernel.org/all/20240612023553.127813-1-shahuang@redhat.com/ v1: https://lore.kernel.org/all/20240617075131.1006173-1-shahuang@redhat.com/ v2: https://lore.kernel.org/all/20240618063808.1040085-1-shahuang@redhat.com/ Shaoqin Huang (2): KVM: arm64: Allow userspace to change ID_AA64PFR1_EL1 KVM: selftests: aarch64: Add writable test for ID_AA64PFR1_EL1 arch/arm64/kvm/sys_regs.c | 4 +++- tools/testing/selftests/kvm/aarch64/set_id_regs.c | 8 ++++++++ 2 files changed, 11 insertions(+), 1 deletion(-) -- 2.40.1

1 year

1
1
0 0

selftests/ftrace kprobe_eventname test fails on s390x QEMU (KVM/Linux)

by Yunseong Kim

Hi all, In my s390x archtecture, kprobe_eventname selftest have always failed because of rcu_sched stalls. My environment is QEMU Ubuntu 24.04 KVM Machine Linux version 6.8.0-36-generic (buildd@bos01-s390x-012) (s390x-linux-gnu-gcc-13 (Ubuntu 13.2.0-23ubuntu4) 13.2.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #36 1 configured CPUs, Linux is running under KVM in 64-bit mode qemu-system-s390x -no-reboot -name auto-inst-test -nographic -m 4096 \ -drive file=disk-image.qcow2,format=qcow2,cache=none,if=virtio \ -netdev user,id=enc0,hostfwd=tcp::10000-:22 \ -device virtio-net-ccw,netdev=enc0 \ -qmp tcp:localhost:4444,server,nowait Currently, This failure can be always reproduced by this kselftests script: # tools/testing/selftests/ftrace/ftracetest tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc I've investigating cause of line, then I find this line. for i in `seq 0 255`; do echo p $FUNCTION_FORK+${i} >> kprobe_events || continue done cat kprobe_events >> $testlog echo 1 > events/kprobes/enable # <<< This line makes "rcu_sched detected stalls" log and stall the system. [ 7825.578940] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: After this line, the test doesn't go any further. This test was added in the patch below. ("selftests/ftrace: Add new test case which adds multiple consecutive probes in a function") Link: https://lore.kernel.org/linux-trace-kernel/20230428163842.95118-2-akanksha@… I've attached a link to a compressed version of vmcore, vmlinux and config files that I dumped from my environment. https://drive.google.com/file/d/1O2bCKrRbyJ-yP4zTz_sAd_qM80nHnCGr/view?usp=… I used QEMU QMP to dump the vmcore. $ telnet localhost 4444 {"execute": "qmp_capabilities"} {"execute":"dump-guest-memory","arguments": {"paging":false,"protocol":"file:/home/paran/vmcore1.img"}} rcu: INFO: rcu_sched detected stalls on CPUs/tasks:s: rcu: (detected by 0, t=6002 jiffies, g=24353, q=1 ncpus=1)1) rcu: All QSes seen, last rcu_sched kthread activity 6002 (4294978930-4294972928), jiffies_till_next_fqs=1, root ->qsmask 0x0x0 rcu: rcu_sched kthread starved for 6002 jiffies! g24353 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0=0 rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.r. rcu: RCU grace-period kthread stack dump:p: task:rcu_sched state:R running task stack:0 pid:16 tgid:16 ppid:2 flags:0x0000000000 Call Trace:e: __schedule+0x346/0x8b8 8 schedule+0x36/0x148 8 schedule_timeout+0x8e/0x148 8 rcu_gp_fqs_loop+0x444/0x548 8 rcu_gp_kthread+0x146/0x198 8 kthread+0x124/0x128 8 __ret_from_fork+0x40/0x58 8 ret_from_fork+0xa/0x30 0 rcu: Stack dump where RCU GP kthread last ran:n: CPU: 0 PID: 1077 Comm: ftracetest Not tainted 6.8.0-36-generic #36-Ubuntu Hardware name: QEMU 8561 QEMU (KVM/Linux) Krnl PSW : 0704f00180000000 0000000000121d32 kprobe_exceptions_notify (/build/linux-3nCxw2/linux-6.8.0/arch/s390/kernel/kprobes.c:519 (discriminator 1)) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3:3 Krnl GPRS: 0000000000000000 0000000000000000 0000000000008001 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000001956720 ffffffffffffffff 0000000000121c98 0000000001958d40 00000380000cfbd8 000003ff938dbc78 00000380000cfab8 0000000000121d1c 00000380000cf980 Krnl Code: 0000000000121d26: 9103b008 Code starting with the faulting instruction =========================================== 8(%r11),3 0000000000121d2a: a7840004 8,0000000000121d32 #0000000000121d2e: ad03f0a0 stosm 160(%r15),3 >0000000000121d32: b9140022 lgfr %r2,%r2 0000000000121d36: ebbff0a80004 %r11,%r15,168(%r15) 0000000000121d3c: a7190000 lghi %r1,0 0000000000121d40: a7390000 lghi %r3,0 0000000000121d44: a7490000 lghi %r4,0 Call Trace: kprobe_exceptions_notify (/build/linux-3nCxw2/linux-6.8.0/arch/s390/kernel/kprobes.c:519 (discriminator 1)) kprobe_exceptions_notify (/build/linux-3nCxw2/linux-6.8.0/arch/s390/kernel/kprobes.c:504 (discriminator 1)) notifier_call_chain (/build/linux-3nCxw2/linux-6.8.0/kernel/notifier.c:93) notify_die (/build/linux-3nCxw2/linux-6.8.0/kernel/notifier.c:597) do_per_trap (/build/linux-3nCxw2/linux-6.8.0/arch/s390/kernel/traps.c:75 (discriminator 1)) __do_pgm_check (/build/linux-3nCxw2/linux-6.8.0/arch/s390/include/asm/irqflags.h:47 (discriminator 1) /build/linux-3nCxw2/linux-6.8.0/arch/s390/include/asm/irqflags.h:52 (discriminator 1) /build/linux-3nCxw2/linux-6.8.0/arch/s390/kernel/traps.c:356 (discriminator 1)) pgm_check_handler (/build/linux-3nCxw2/linux-6.8.0/arch/s390/kernel/entry.S:383) kernel_clone (/build/linux-3nCxw2/linux-6.8.0/arch/s390/include/asm/uaccess.h:119 (discriminator 1) /build/linux-3nCxw2/linux-6.8.0/kernel/fork.c:2927 (discriminator 1)) __do_sys_clone (/build/linux-3nCxw2/linux-6.8.0/kernel/fork.c:3055) __s390x_sys_clone (/build/linux-3nCxw2/linux-6.8.0/kernel/fork.c:3027) __do_syscall (/build/linux-3nCxw2/linux-6.8.0/arch/s390/include/asm/ptrace.h:195 (discriminator 3) /build/linux-3nCxw2/linux-6.8.0/arch/s390/include/asm/ptrace.h:200 (discriminator 3) /build/linux-3nCxw2/linux-6.8.0/arch/s390/kernel/syscall.c:145 (discriminator 3) /build/linux-3nCxw2/linux-6.8.0/arch/s390/kernel/syscall.c:168 (discriminator 3)) system_call (/build/linux-3nCxw2/linux-6.8.0/arch/s390/kernel/entry.S:309) Last Breaking-Event-Address: 0xfdf5045050 ?rcu: INFO: rcu_sched detected stalls on CPUs/tasks:s: @rcu: (detected by 0, t=24007 jiffies, g=24353, q=1 ncpus=1) rcu: All QSes seen, last rcu_sched kthread activity 24007 (4294996935-4294972928), jiffies_till_next_fqs=1, root ->qsmask 0x0x0 rcu: rcu_sched kthread starved for 24007 jiffies! g24353 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0=0 rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.r. rcu: RCU grace-period kthread stack dump:p: task:rcu_sched state:R running task stack:0 pid:16 tgid:16 ppid:2 flags:0x0000000000 Call Trace:e: __schedule+0x346/0x8b8 8 schedule+0x36/0x148 8 schedule_timeout+0x8e/0x148 8 rcu_gp_fqs_loop+0x444/0x548 8 rcu_gp_kthread+0x146/0x198 8 kthread+0x124/0x128 8 __ret_from_fork+0x40/0x58 8 ret_from_fork+0xa/0x30 0 rcu: Stack dump where RCU GP kthread last ran:n: CPU: 0 PID: 1077 Comm: ftracetest Not tainted 6.8.0-36-generic #36-Ubuntu Hardware name: QEMU 8561 QEMU (KVM/Linux) Krnl PSW : 0704d00180000000 0000000000ebe0b2 __do_pgm_check (/build/linux-3nCxw2/linux-6.8.0/arch/s390/kernel/traps.c:353) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3:3 Krnl GPRS: 0704c00180000000 0000000000000000 00000380000cfb97 0000000000000000 0000000000000000 0704c00180000000 0000000000000000 0000000000000000 0704c00180000000 00000000001a8388 0000000000000000 00000380000cfbd8 000003ff938dbc78 0000000000ed1c6c 0000000000ebe024 00000380000cfaf0 Krnl Code: 0000000000ebe0a4: a504bfff Code starting with the faulting instruction =========================================== nihh %r0,49151 0000000000ebe0a8: e300f0a80024 %r0,168(%r15) #0000000000ebe0ae: 8000f0a8 168(%r15) >0000000000ebe0b2: 5850b0a0 %r5,160(%r11) 0000000000ebe0b6: c05b0000007f nilf %r5,127 0000000000ebe0bc: a7840012 8,0000000000ebe0e0 0000000000ebe0c0: b91600e5 llgfr %r14,%r5 0000000000ebe0c4: c0400051121e larl %r4,00000000018e0500 Call Trace: __do_pgm_check (/build/linux-3nCxw2/linux-6.8.0/arch/s390/kernel/traps.c:353) __do_pgm_check (/build/linux-3nCxw2/linux-6.8.0/arch/s390/kernel/traps.c:318) pgm_check_handler (/build/linux-3nCxw2/linux-6.8.0/arch/s390/kernel/entry.S:383) kernel_clone (/build/linux-3nCxw2/linux-6.8.0/arch/s390/include/asm/uaccess.h:119 (discriminator 1) /build/linux-3nCxw2/linux-6.8.0/kernel/fork.c:2927 (discriminator 1)) __do_sys_clone (/build/linux-3nCxw2/linux-6.8.0/kernel/fork.c:3055) __s390x_sys_clone (/build/linux-3nCxw2/linux-6.8.0/kernel/fork.c:3027) __do_syscall (/build/linux-3nCxw2/linux-6.8.0/arch/s390/include/asm/ptrace.h:195 (discriminator 3) /build/linux-3nCxw2/linux-6.8.0/arch/s390/include/asm/ptrace.h:200 (discriminator 3) /build/linux-3nCxw2/linux-6.8.0/arch/s390/kernel/syscall.c:145 (discriminator 3) /build/linux-3nCxw2/linux-6.8.0/arch/s390/kernel/syscall.c:168 (discriminator 3)) system_call (/build/linux-3nCxw2/linux-6.8.0/arch/s390/kernel/entry.S:309) Last Breaking-Event-Address: 0x4404c0018000000000 It's not easy for me to resolve this issue. If advice or guidance can be provided on how to resolve this issue, I'll try sending a patch! Warm regards, Yunseong Kim

1 year

4
4
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror June 2024