From: Jeff Xu <jeffxu(a)chromium.org>
This patch is a follow up to the comments [1] on test code during
mseal discussion. This is style only change to the selftest code, not to
test code logic.
Please apply on top of mseal v10 patch [2]
[1]
https://lore.kernel.org/all/e1744539-a843-468a-9101-ce7a08669394@collabora.…
[2]
https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/
Thanks
Jeff Xu (1):
selftest mm/mseal: style change
tools/testing/selftests/mm/mseal_test.c | 124 +++++++++++++++++-------
tools/testing/selftests/mm/seal_elf.c | 3 -
2 files changed, 91 insertions(+), 36 deletions(-)
--
2.44.0.683.g7961c838ac-goog
This series introduces the selftests/arm directory, which tests 32 and 64-bit
kernel compatibility with 32-bit ELFs running on the Aarch platform.
The need for this bucket of tests is that 32 bit applications built on legacy
ARM architecture must not break on the new Aarch64 platforms and the 64-bit
kernel. The kernel must emulate the data structures, system calls and the
registers according to Aarch32, when running a 32-bit process; this directory
fills that testing requirement.
One may find similarity between this directory and selftests/arm64; it is
advisable to refer to that since a lot has been copied from there itself.
The mm directory includes a test for checking 4GB limit of the virtual
address space of a process.
The signal directory contains two tests, following a common theme: mangle
with arm_cpsr, dumped by the kernel to user space while invoking the signal
handler; kernel must spot this illegal attempt and terminate the program by
SEGV.
The elf directory includes a test for checking the 32-bit status of the ELF.
The series has been tested on 6.9.0-rc2, on Aarch64 platform. Testing remains
to be done on Aaarch32.
Dev Jain (4):
selftests/arm: Add mm test
selftests/arm: Add signal tests
selftests/arm: Add elf test
selftests: Add build infrastructure along with README
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/arm/Makefile | 57 ++++
tools/testing/selftests/arm/README | 31 +++
tools/testing/selftests/arm/elf/Makefile | 6 +
tools/testing/selftests/arm/elf/parse_elf.c | 75 +++++
tools/testing/selftests/arm/mm/Makefile | 6 +
tools/testing/selftests/arm/mm/compat_va.c | 94 +++++++
tools/testing/selftests/arm/signal/Makefile | 30 ++
.../selftests/arm/signal/test_signals.c | 27 ++
.../selftests/arm/signal/test_signals.h | 74 +++++
.../selftests/arm/signal/test_signals_utils.c | 257 ++++++++++++++++++
.../selftests/arm/signal/test_signals_utils.h | 128 +++++++++
.../signal/testcases/mangle_cpsr_aif_bits.c | 33 +++
.../mangle_cpsr_invalid_compat_toggle.c | 29 ++
14 files changed, 848 insertions(+)
create mode 100644 tools/testing/selftests/arm/Makefile
create mode 100644 tools/testing/selftests/arm/README
create mode 100644 tools/testing/selftests/arm/elf/Makefile
create mode 100644 tools/testing/selftests/arm/elf/parse_elf.c
create mode 100644 tools/testing/selftests/arm/mm/Makefile
create mode 100644 tools/testing/selftests/arm/mm/compat_va.c
create mode 100644 tools/testing/selftests/arm/signal/Makefile
create mode 100644 tools/testing/selftests/arm/signal/test_signals.c
create mode 100644 tools/testing/selftests/arm/signal/test_signals.h
create mode 100644 tools/testing/selftests/arm/signal/test_signals_utils.c
create mode 100644 tools/testing/selftests/arm/signal/test_signals_utils.h
create mode 100644 tools/testing/selftests/arm/signal/testcases/mangle_cpsr_aif_bits.c
create mode 100644 tools/testing/selftests/arm/signal/testcases/mangle_cpsr_invalid_compat_toggle.c
--
2.39.2
Hi!
Implement support for tests which require access to a remote system /
endpoint which can generate traffic.
This series concludes the "groundwork" for upstream driver tests.
I wanted to support the three models which came up in discussions:
- SW testing with netdevsim
- "local" testing with two ports on the same system in a loopback
- "remote" testing via SSH
so there is a tiny bit of an abstraction which wraps up how "remote"
commands are executed. Otherwise hopefully there's nothing surprising.
I'm only adding a ping test. I had a bigger one written but I was
worried we'll get into discussing the details of the test itself
and how I chose to hack up netdevsim, instead of the test infra...
So that test will be a follow up :)
v2:
- rename endpoint -> remote
- use 2001:db8:: v6 prefix
- add a note about persistent SSH connections
- add the kernel config
v1: https://lore.kernel.org/all/20240412233705.1066444-1-kuba@kernel.org
Jakub Kicinski (6):
selftests: drv-net: add stdout to the command failed exception
selftests: drv-net: add config for netdevsim
selftests: drv-net: define endpoint structures
selftests: drv-net: factor out parsing of the env
selftests: drv-net: construct environment for running tests which
require an endpoint
selftests: drv-net: add a trivial ping test
tools/testing/selftests/drivers/net/Makefile | 5 +-
.../testing/selftests/drivers/net/README.rst | 33 ++++
tools/testing/selftests/drivers/net/config | 2 +
.../selftests/drivers/net/lib/py/__init__.py | 1 +
.../selftests/drivers/net/lib/py/env.py | 141 +++++++++++++++---
.../selftests/drivers/net/lib/py/remote.py | 13 ++
.../drivers/net/lib/py/remote_netns.py | 15 ++
.../drivers/net/lib/py/remote_ssh.py | 34 +++++
tools/testing/selftests/drivers/net/ping.py | 32 ++++
.../testing/selftests/net/lib/py/__init__.py | 1 +
tools/testing/selftests/net/lib/py/netns.py | 31 ++++
tools/testing/selftests/net/lib/py/utils.py | 22 +--
12 files changed, 301 insertions(+), 29 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/config
create mode 100644 tools/testing/selftests/drivers/net/lib/py/remote.py
create mode 100644 tools/testing/selftests/drivers/net/lib/py/remote_netns.py
create mode 100644 tools/testing/selftests/drivers/net/lib/py/remote_ssh.py
create mode 100755 tools/testing/selftests/drivers/net/ping.py
create mode 100644 tools/testing/selftests/net/lib/py/netns.py
--
2.44.0
Let the compilers (clang) know that this function would just call
exit() and would never return. It is needed to avoid false positive
static analysis errors. All similar functions calling exit()
unconditionally have been marked as __noreturn.
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
This patch has been suggested [1] and tested on top of the following
patches:
- f07041728422 ("selftests: add ksft_exit_fail_perror()") which is
in kselftest tree already
- ("kselftest: Mark functions that unconditionally call exit() as
__noreturn") would appear in tip/timers/urgent
[1] https://lore.kernel.org/all/8254ab4d-9cb6-402e-80dd-d9ec70d77de5@linuxfound…
---
tools/testing/selftests/kselftest.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 050c5fd018400..b43a7a7ca4b40 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -372,7 +372,7 @@ static inline __printf(1, 2) int ksft_exit_fail_msg(const char *msg, ...)
exit(KSFT_FAIL);
}
-static inline void ksft_exit_fail_perror(const char *msg)
+static inline __noreturn void ksft_exit_fail_perror(const char *msg)
{
#ifndef NOLIBC
ksft_exit_fail_msg("%s: %s (%d)\n", msg, strerror(errno), errno);
--
2.39.2
Currently, the migration worker delays 1-10 us, assuming that one
KVM_RUN iteration only takes a few microseconds. But if C-state exit
latencies are large enough, for example, hundreds or even thousands
of microseconds on server CPUs, it may happen that it's not able to
bring the target CPU out of C-state before the migration worker starts
to migrate it to the next CPU.
If the system workload is light, most CPUs could be at a certain level
of C-state, which may result in less successful migrations and fail the
migration/KVM_RUN ratio sanity check.
This patch adds a command line option to skip the sanity check in
this case.
Additionally, seems it's reasonable to randomize the length of usleep(),
other than delay in a fixed pattern.
V2:
- removed the busy loop implementation
- add the new "-s" option
Signed-off-by: Zide Chen <zide.chen(a)intel.com>
---
tools/testing/selftests/kvm/rseq_test.c | 37 +++++++++++++++++++++++--
1 file changed, 34 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
index 28f97fb52044..515cfa32a925 100644
--- a/tools/testing/selftests/kvm/rseq_test.c
+++ b/tools/testing/selftests/kvm/rseq_test.c
@@ -150,7 +150,7 @@ static void *migration_worker(void *__rseq_tid)
* Use usleep() for simplicity and to avoid unnecessary kernel
* dependencies.
*/
- usleep((i % 10) + 1);
+ usleep((rand() % 10) + 1);
}
done = true;
return NULL;
@@ -186,12 +186,35 @@ static void calc_min_max_cpu(void)
"Only one usable CPU, task migration not possible");
}
+static void usage(const char *name)
+{
+ puts("");
+ printf("usage: %s [-h] [-s]\n", name);
+ printf(" -s: skip the sanity check for successful KVM_RUN.\n");
+ puts("");
+ exit(0);
+}
+
int main(int argc, char *argv[])
{
int r, i, snapshot;
struct kvm_vm *vm;
struct kvm_vcpu *vcpu;
u32 cpu, rseq_cpu;
+ bool skip_sanity_check = false;
+ int opt;
+
+ while ((opt = getopt(argc, argv, "sh")) != -1) {
+ switch (opt) {
+ case 's':
+ skip_sanity_check = true;
+ break;
+ case 'h':
+ default:
+ usage(argv[0]);
+ break;
+ }
+ }
r = sched_getaffinity(0, sizeof(possible_mask), &possible_mask);
TEST_ASSERT(!r, "sched_getaffinity failed, errno = %d (%s)", errno,
@@ -199,6 +222,8 @@ int main(int argc, char *argv[])
calc_min_max_cpu();
+ srand(time(NULL));
+
r = rseq_register_current_thread();
TEST_ASSERT(!r, "rseq_register_current_thread failed, errno = %d (%s)",
errno, strerror(errno));
@@ -254,9 +279,15 @@ int main(int argc, char *argv[])
* getcpu() to stabilize. A 2:1 migration:KVM_RUN ratio is a fairly
* conservative ratio on x86-64, which can do _more_ KVM_RUNs than
* migrations given the 1us+ delay in the migration task.
+ *
+ * Another reason why it may have small migration:KVM_RUN ratio is that,
+ * on systems with large C-state exit latency, it may happen quite often
+ * that the scheduler is not able to wake up the target CPU before the
+ * vCPU thread is scheduled to another CPU.
*/
- TEST_ASSERT(i > (NR_TASK_MIGRATIONS / 2),
- "Only performed %d KVM_RUNs, task stalled too much?", i);
+ TEST_ASSERT(skip_sanity_check || i > (NR_TASK_MIGRATIONS / 2),
+ "Only performed %d KVM_RUNs, task stalled too much? "
+ "Try to turn off C-states or run it with the -s option", i);
pthread_join(migration_thread, NULL);
--
2.34.1
This is a loose follow-up to the Kernel CI patchset posted recently. It
contains various fixes that were supposed to be part of said patchset, but
didn't fit due to its size. The latter 4 patches were written independently
of the CI effort, but again didn't fit in their intended patchsets.
- Patch #1 unifies code of two very similar looking functions, busywait()
and slowwait().
- Patch #2 adds sanity checks around the setting of NETIFS, which carries
list of interfaces to run on.
- Patch #3 changes bail_on_lldpad() to SKIP instead of FAILing.
- Patches #4 to #7 fix issues in selftests.
- Patches #8 to #10 add topology diagrams to several selftests.
This should have been part of the mlxsw leg of NH group stats patches,
but again, it did not fit in due to size.
Danielle Ratson (1):
selftests: mlxsw: ethtool_lanes: Wait for lanes parameter dump
explicitly
Petr Machata (9):
selftests: net: Unify code of busywait() and slowwait()
selftests: forwarding: lib.sh: Validate NETIFS
selftests: forwarding: bail_on_lldpad() should SKIP
selftests: drivers: hw: Fix ethtool_rmon
selftests: drivers: hw: ethtool.sh: Adjust output
selftests: drivers: hw: Include tc_common.sh in hw_stats_l3
selftests: forwarding: router_mpath_nh: Add a diagram
selftests: forwarding: router_mpath_nh_res: Add a diagram
selftests: forwarding: router_nh: Add a diagram
.../selftests/drivers/net/hw/ethtool.sh | 15 +++---
.../selftests/drivers/net/hw/ethtool_rmon.sh | 1 +
.../selftests/drivers/net/hw/hw_stats_l3.sh | 1 +
.../drivers/net/hw/hw_stats_l3_gre.sh | 1 +
.../drivers/net/mlxsw/ethtool_lanes.sh | 14 +++---
tools/testing/selftests/net/forwarding/lib.sh | 49 +++++++++----------
.../net/forwarding/router_mpath_nh.sh | 35 +++++++++++++
.../net/forwarding/router_mpath_nh_res.sh | 35 +++++++++++++
.../selftests/net/forwarding/router_nh.sh | 14 ++++++
tools/testing/selftests/net/lib.sh | 16 ++++--
10 files changed, 137 insertions(+), 44 deletions(-)
--
2.43.0
This series implements SBI PMU improvements done in SBI v2.0[1] i.e. PMU snapshot
and fw_read_hi() functions.
SBI v2.0 introduced PMU snapshot feature which allows the SBI implementation
to provide counter information (i.e. values/overflow status) via a shared
memory between the SBI implementation and supervisor OS. This allows to minimize
the number of traps in when perf being used inside a kvm guest as it relies on
SBI PMU + trap/emulation of the counters.
The current set of ratified RISC-V specification also doesn't allow scountovf
to be trap/emulated by the hypervisor. The SBI PMU snapshot bridges the gap
in ISA as well and enables perf sampling in the guest. However, LCOFI in the
guest only works via IRQ filtering in AIA specification. That's why, AIA
has to be enabled in the hardware (at least the Ssaia extension) in order to
use the sampling support in the perf.
Here are the patch wise implementation details.
PATCH 1,4,7,8,9,10,11,15 : Generic cleanups/improvements.
PATCH 2,3,14 : FW_READ_HI function implementation
PATCH 5-6: Add PMU snapshot feature in sbi pmu driver
PATCH 12-13: KVM implementation for snapshot and sampling in kvm guests
PATCH 16-17: Generic improvements for kvm selftests
PATCH 18-22: KVM selftests for SBI PMU extension
The series is based on v6.9-rc1 and is available at:
https://github.com/atishp04/linux/tree/kvm_pmu_snapshot_v6
The kvmtool patch is also available at:
https://github.com/atishp04/kvmtool/tree/sscofpmf
It also requires Ssaia ISA extension to be present in the hardware in order to
get perf sampling support in the guest. In Qemu virt machine, it can be done
by the following config.
```
-cpu rv64,sscofpmf=true,x-ssaia=true
```
There is no other dependencies on AIA apart from that. Thus, Ssaia must be disabled
for the guest if AIA patches are not available. Here is the example command.
```
./lkvm-static run -m 256 -c2 --console serial -p "console=ttyS0 earlycon" --disable-ssaia -k ./Image --debug
```
The series has been tested only in Qemu.
Here is the snippet of the perf running inside a kvm guest.
===================================================
$ perf record -e cycles -e instructions perf bench sched messaging -g 5
...
$ Running 'sched/messaging' benchmark:
...
[ 45.928723] perf_duration_warn: 2 callbacks suppressed
[ 45.929000] perf: interrupt took too long (484426 > 483186), lowering kernel.perf_event_max_sample_rate to 250
$ 20 sender and receiver processes per group
$ 5 groups == 200 processes run
Total time: 14.220 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.117 MB perf.data (1942 samples) ]
$ perf report --stdio
$ To display the perf.data header info, please use --header/--header-only optio>
$
$
$ Total Lost Samples: 0
$
$ Samples: 943 of event 'cycles'
$ Event count (approx.): 5128976844
$
$ Overhead Command Shared Object Symbol >
$ ........ ............... ........................... .....................>
$
7.59% sched-messaging [kernel.kallsyms] [k] memcpy
5.48% sched-messaging [kernel.kallsyms] [k] percpu_counter_ad>
5.24% sched-messaging [kernel.kallsyms] [k] __sbi_rfence_v02_>
4.00% sched-messaging [kernel.kallsyms] [k] _raw_spin_unlock_>
3.79% sched-messaging [kernel.kallsyms] [k] set_pte_range
3.72% sched-messaging [kernel.kallsyms] [k] next_uptodate_fol>
3.46% sched-messaging [kernel.kallsyms] [k] filemap_map_pages
3.31% sched-messaging [kernel.kallsyms] [k] handle_mm_fault
3.20% sched-messaging [kernel.kallsyms] [k] finish_task_switc>
3.16% sched-messaging [kernel.kallsyms] [k] clear_page
3.03% sched-messaging [kernel.kallsyms] [k] mtree_range_walk
2.42% sched-messaging [kernel.kallsyms] [k] flush_icache_pte
===================================================
[1] https://github.com/riscv-non-isa/riscv-sbi-doc
Changes from v5->v6:
1. Added a patch for command line option for the sbi pmu tests.
2. Removed redundant prints and restructure the code little bit.
3. Added a patch for computing the sbi minor version correctly.
4. Addressed all other comments on v5.
Changes from v4->v5:
1. Moved sbi related definitions to its own header file from processor.h
2. Added few helper functions for selftests.
3. Improved firmware counter read and RV32 start/stop functions.
4. Converted all the shifting operations to use BIT macro
5. Addressed all other comments on v4.
Changes from v3->v4:
1. Added selftests.
2. Fixed an issue to clear the interrupt pending bits.
3. Fixed the counter index in snapshot memory start function.
Changes from v2->v3:
1. Fixed a patchwork warning on patch6.
2. Fixed a comment formatting & nit fix in PATCH 3 & 5.
3. Moved the hvien update and sscofpmf enabling to PATCH 9 from PATCH 8.
Changes from v1->v2:
1. Fixed warning/errors from patchwork CI.
2. Rebased on top of kvm-next.
3. Added Acked-by tags.
Changes from RFC->v1:
1. Addressed all the comments on RFC series.
2. Removed PATCH2 and merged into later patches.
3. Added 2 more patches for minor fixes.
4. Fixed KVM boot issue without Ssaia and made sscofpmf in guest dependent on
Ssaia in the host.
Atish Patra (24):
RISC-V: Fix the typo in Scountovf CSR name
RISC-V: Add FIRMWARE_READ_HI definition
drivers/perf: riscv: Read upper bits of a firmware counter
drivers/perf: riscv: Use BIT macro for shifting operations
RISC-V: Add SBI PMU snapshot definitions
RISC-V: KVM: Rename the SBI_STA_SHMEM_DISABLE to a generic name
RISC-V: Use the minor version mask while computing sbi version
drivers/perf: riscv: Implement SBI PMU snapshot function
drivers/perf: riscv: Fix counter mask iteration for RV32
RISC-V: KVM: Fix the initial sample period value
RISC-V: KVM: No need to update the counter value during reset
RISC-V: KVM: No need to exit to the user space if perf event failed
RISC-V: KVM: Implement SBI PMU Snapshot feature
RISC-V: KVM: Add perf sampling support for guests
RISC-V: KVM: Support 64 bit firmware counters on RV32
RISC-V: KVM: Improve firmware counter read function
KVM: riscv: selftests: Move sbi definitions to its own header file
KVM: riscv: selftests: Add helper functions for extension checks
KVM: riscv: selftests: Add Sscofpmf to get-reg-list test
KVM: riscv: selftests: Add SBI PMU extension definitions
KVM: riscv: selftests: Add SBI PMU selftest
KVM: riscv: selftests: Add a test for PMU snapshot functionality
KVM: riscv: selftests: Add a test for counter overflow
KVM: riscv: selftests: Add commandline option for SBI PMU test
arch/riscv/include/asm/csr.h | 5 +-
arch/riscv/include/asm/kvm_vcpu_pmu.h | 16 +-
arch/riscv/include/asm/sbi.h | 38 +-
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kernel/paravirt.c | 6 +-
arch/riscv/kvm/aia.c | 5 +
arch/riscv/kvm/vcpu.c | 15 +-
arch/riscv/kvm/vcpu_onereg.c | 6 +
arch/riscv/kvm/vcpu_pmu.c | 260 ++++++-
arch/riscv/kvm/vcpu_sbi_pmu.c | 17 +-
arch/riscv/kvm/vcpu_sbi_sta.c | 4 +-
drivers/perf/riscv_pmu.c | 1 +
drivers/perf/riscv_pmu_sbi.c | 272 ++++++-
include/linux/perf/riscv_pmu.h | 6 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/riscv/processor.h | 49 +-
.../testing/selftests/kvm/include/riscv/sbi.h | 141 ++++
.../selftests/kvm/include/riscv/ucall.h | 1 +
.../selftests/kvm/lib/riscv/processor.c | 12 +
.../testing/selftests/kvm/riscv/arch_timer.c | 2 +-
.../selftests/kvm/riscv/get-reg-list.c | 4 +
.../selftests/kvm/riscv/sbi_pmu_test.c | 685 ++++++++++++++++++
tools/testing/selftests/kvm/steal_time.c | 4 +-
23 files changed, 1437 insertions(+), 114 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/riscv/sbi.h
create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
--
2.34.1
Hi!
Implement support for tests which require access to a remote system /
endpoint which can generate traffic.
This series concludes the "groundwork" for upstream driver tests.
I wanted to support the three models which came up in discussions:
- SW testing with netdevsim
- "local" testing with two ports on the same system in a loopback
- "remote" testing via SSH
so there is a tiny bit of an abstraction which wraps up how "remote"
commands are executed. Otherwise hopefully there's nothing surprising.
I'm only adding a ping test. I had a bigger one written but I was
worried we'll get into discussing the details of the test itself
and how I chose to hack up netdevsim, instead of the test infra...
So that test will be a follow up :)
---
TBH, this series is on top of the one I posted in the morning:
https://lore.kernel.org/all/20240412141436.828666-1-kuba@kernel.org/
but it applies cleanly, and all it needs is the ifindex definition
in netdevsim. Testing with real HW works fine even without the other
series.
Jakub Kicinski (5):
selftests: drv-net: define endpoint structures
selftests: drv-net: add stdout to the command failed exception
selftests: drv-net: factor out parsing of the env
selftests: drv-net: construct environment for running tests which
require an endpoint
selftests: drv-net: add a trivial ping test
tools/testing/selftests/drivers/net/Makefile | 4 +-
.../testing/selftests/drivers/net/README.rst | 31 ++++
.../selftests/drivers/net/lib/py/__init__.py | 1 +
.../selftests/drivers/net/lib/py/endpoint.py | 13 ++
.../selftests/drivers/net/lib/py/env.py | 136 +++++++++++++++---
.../selftests/drivers/net/lib/py/ep_netns.py | 15 ++
.../selftests/drivers/net/lib/py/ep_ssh.py | 34 +++++
tools/testing/selftests/drivers/net/ping.py | 32 +++++
.../testing/selftests/net/lib/py/__init__.py | 1 +
tools/testing/selftests/net/lib/py/netns.py | 31 ++++
tools/testing/selftests/net/lib/py/utils.py | 22 +--
11 files changed, 291 insertions(+), 29 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/lib/py/endpoint.py
create mode 100644 tools/testing/selftests/drivers/net/lib/py/ep_netns.py
create mode 100644 tools/testing/selftests/drivers/net/lib/py/ep_ssh.py
create mode 100755 tools/testing/selftests/drivers/net/ping.py
create mode 100644 tools/testing/selftests/net/lib/py/netns.py
--
2.44.0
Hi Linus,
Please pull the following kselftest fixes update for Linux 6.9-rc5.
This kselftest fixes update for Linux 6.9-rc5 consists of a fix to
kselftest harness to prevent infinite loop triggered in an assert
in FIXTURE_TEARDOWN and a fix to a problem seen in being able to stop
subsystem-enable tests when sched events are being traced.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 224fe424c356cb5c8f451eca4127f32099a6f764:
selftests: dmabuf-heap: add config file for the test (2024-03-29 13:57:14 -0600)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.9-rc5
for you to fetch changes up to 72d7cb5c190befbb095bae7737e71560ec0fcaa6:
selftests/harness: Prevent infinite loop due to Assert in FIXTURE_TEARDOWN (2024-04-04 10:50:53 -0600)
----------------------------------------------------------------
linux_kselftest-fixes-6.9-rc5
This kselftest fixes update for Linux 6.9-rc5 consists of a fix to
kselftest harness to prevent infinite loop triggered in an assert
in FIXTURE_TEARDOWN and a fix to a problem seen in being able to stop
subsystem-enable tests when sched events are being traced.
----------------------------------------------------------------
Shengyu Li (1):
selftests/harness: Prevent infinite loop due to Assert in FIXTURE_TEARDOWN
Yuanhe Shu (1):
selftests/ftrace: Limit length in subsystem-enable tests
tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc | 6 +++---
tools/testing/selftests/kselftest_harness.h | 5 ++++-
2 files changed, 7 insertions(+), 4 deletions(-)
----------------------------------------------------------------
Add a basic test for page pool netlink reporting.
v2:
- pass args as *args (patch 3)
- improve the test and add busy wait helper (patch 6)
v1: https://lore.kernel.org/all/20240411012815.174400-1-kuba@kernel.org/
Jakub Kicinski (6):
net: netdevsim: add some fake page pool use
tools: ynl: don't return None for dumps
selftests: net: print report check location in python tests
selftests: net: print full exception on failure
selftests: net: support use of NetdevSimDev under "with" in python
selftests: net: exercise page pool reporting via netlink
drivers/net/netdevsim/netdev.c | 93 ++++++++++++++++++++++
drivers/net/netdevsim/netdevsim.h | 4 +
tools/net/ynl/lib/ynl.py | 4 +-
tools/testing/selftests/net/lib/py/ksft.py | 41 +++++++---
tools/testing/selftests/net/lib/py/nsim.py | 16 +++-
tools/testing/selftests/net/nl_netdev.py | 76 +++++++++++++++++-
6 files changed, 218 insertions(+), 16 deletions(-)
--
2.44.0
Add FAULT_INJECTION_DEBUG_FS and FAILSLAB configurations which are
needed by iommufd_fail_nth test.
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
While building and running these tests on x86, defconfig had these
configs enabled. But ARM64's defconfig doesn't enable these configs.
Hence the config options are being added explicitly in this patch.
---
tools/testing/selftests/iommu/config | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/iommu/config b/tools/testing/selftests/iommu/config
index 110d73917615d..02a2a1b267c1e 100644
--- a/tools/testing/selftests/iommu/config
+++ b/tools/testing/selftests/iommu/config
@@ -1,3 +1,5 @@
CONFIG_IOMMUFD=y
+CONFIG_FAULT_INJECTION_DEBUG_FS=y
CONFIG_FAULT_INJECTION=y
CONFIG_IOMMUFD_TEST=y
+CONFIG_FAILSLAB=y
--
2.39.2
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Integrity detection and protection has long been a desirable feature, to
reach a large user base and mitigate the risk of flaws in the software
and attacks.
However, while solutions exist, they struggle to reach the large user
base, due to requiring higher than desired constraints on performance,
flexibility and configurability, that only security conscious people are
willing to accept.
This is where the new digest_cache LSM comes into play, it offers
additional support for new and existing integrity solutions, to make
them faster and easier to deploy.
The full documentation with the motivation and the solution details can be
found in patch 14.
The IMA integration patch set will be introduced separately. Also a PoC
based on the current version of IPE can be provided.
v3:
- Rewrite documentation, and remove the installation instructions since
they are now included in the README of digest-cache-tools
- Add digest cache event notifier
- Drop digest_cache_was_reset(), and send instead to asynchronous
notifications
- Fix digest_cache LSM Kconfig style issues (suggested by Randy Dunlap)
- Propagate digest cache reset to directory entries
- Destroy per directory entry mutex
- Introduce RESET_USER bit, to clear the dig_user pointer on
set/removexattr
- Replace 'file content' with 'file data' (suggested by Mimi)
- Introduce per digest cache mutex and replace verif_data_lock spinlock
- Track changes of security.digest_list xattr
- Stop tracking file_open and use file_release instead also for file writes
- Add error messages in digest_cache_create()
- Load/unload testing kernel module automatically during execution of test
- Add tests for digest cache event notifier
- Add test for ftruncate()
- Remove DIGEST_CACHE_RESET_PREFETCH_BUF command in test and clear the
buffer on read instead
v2:
- Include the TLV parser in this patch set (from user asymmetric keys and
signatures)
- Move from IMA and make an independent LSM
- Remove IMA-specific stuff from this patch set
- Add per algorithm hash table
- Expect all digest lists to be in the same directory and allow changing
the default directory
- Support digest lookup on directories, when there is no
security.digest_list xattr
- Add seq num to digest list file name, to impose ordering on directory
iteration
- Add a new data type DIGEST_LIST_ENTRY_DATA for the nested data in the
tlv digest list format
- Add the concept of verification data attached to digest caches
- Add the reset mechanism to track changes on digest lists and directory
containing the digest lists
- Add kernel selftests
v1:
- Add documentation in Documentation/security/integrity-digest-cache.rst
- Pass the mask of IMA actions to digest_cache_alloc()
- Add a reference count to the digest cache
- Remove the path parameter from digest_cache_get(), and rely on the
reference count to avoid the digest cache disappearing while being used
- Rename the dentry_to_check parameter of digest_cache_get() to dentry
- Rename digest_cache_get() to digest_cache_new() and add
digest_cache_get() to set the digest cache in the iint of the inode for
which the digest cache was requested
- Add dig_owner and dig_user to the iint, to distinguish from which inode
the digest cache was created from, and which is using it; consequently it
makes the digest cache usable to measure/appraise other digest caches
(support not yet enabled)
- Add dig_owner_mutex and dig_user_mutex to serialize accesses to dig_owner
and dig_user until they are initialized
- Enforce strong synchronization and make the contenders wait until
dig_owner and dig_user are assigned to the iint the first time
- Move checking IMA actions on the digest list earlier, and fail if no
action were performed (digest cache not usable)
- Remove digest_cache_put(), not needed anymore with the introduction of
the reference count
- Fail immediately in digest_cache_lookup() if the digest algorithm is
not set in the digest cache
- Use 64 bit mask for IMA actions on the digest list instead of 8 bit
- Return NULL in the inline version of digest_cache_get()
- Use list_add_tail() instead of list_add() in the iterator
- Copy the digest list path to a separate buffer in digest_cache_iter_dir()
- Use digest list parsers verified with Frama-C
- Explicitly disable (for now) the possibility in the IMA policy to use the
digest cache to measure/appraise other digest lists
- Replace exit(<value>) with return <value> in manage_digest_lists.c
Roberto Sassu (14):
lib: Add TLV parser
security: Introduce the digest_cache LSM
digest_cache: Add securityfs interface
digest_cache: Add hash tables and operations
digest_cache: Populate the digest cache from a digest list
digest_cache: Parse tlv digest lists
digest_cache: Parse rpm digest lists
digest_cache: Add management of verification data
digest_cache: Add support for directories
digest cache: Prefetch digest lists if requested
digest_cache: Reset digest cache on file/directory change
digest_cache: Notify digest cache events
selftests/digest_cache: Add selftests for digest_cache LSM
docs: Add documentation of the digest_cache LSM
Documentation/security/digest_cache.rst | 763 ++++++++++++++++
Documentation/security/index.rst | 1 +
MAINTAINERS | 16 +
include/linux/digest_cache.h | 117 +++
include/linux/kernel_read_file.h | 1 +
include/linux/tlv_parser.h | 28 +
include/uapi/linux/lsm.h | 1 +
include/uapi/linux/tlv_digest_list.h | 72 ++
include/uapi/linux/tlv_parser.h | 59 ++
include/uapi/linux/xattr.h | 6 +
lib/Kconfig | 3 +
lib/Makefile | 3 +
lib/tlv_parser.c | 214 +++++
lib/tlv_parser.h | 17 +
security/Kconfig | 11 +-
security/Makefile | 1 +
security/digest_cache/Kconfig | 33 +
security/digest_cache/Makefile | 11 +
security/digest_cache/dir.c | 252 ++++++
security/digest_cache/htable.c | 268 ++++++
security/digest_cache/internal.h | 290 +++++++
security/digest_cache/main.c | 570 ++++++++++++
security/digest_cache/modsig.c | 66 ++
security/digest_cache/notifier.c | 135 +++
security/digest_cache/parsers/parsers.h | 15 +
security/digest_cache/parsers/rpm.c | 223 +++++
security/digest_cache/parsers/tlv.c | 299 +++++++
security/digest_cache/populate.c | 163 ++++
security/digest_cache/reset.c | 235 +++++
security/digest_cache/secfs.c | 87 ++
security/digest_cache/verif.c | 119 +++
security/security.c | 3 +-
tools/testing/selftests/Makefile | 1 +
.../testing/selftests/digest_cache/.gitignore | 3 +
tools/testing/selftests/digest_cache/Makefile | 24 +
.../testing/selftests/digest_cache/all_test.c | 815 ++++++++++++++++++
tools/testing/selftests/digest_cache/common.c | 78 ++
tools/testing/selftests/digest_cache/common.h | 135 +++
.../selftests/digest_cache/common_user.c | 47 +
.../selftests/digest_cache/common_user.h | 17 +
tools/testing/selftests/digest_cache/config | 1 +
.../selftests/digest_cache/generators.c | 248 ++++++
.../selftests/digest_cache/generators.h | 19 +
.../selftests/digest_cache/testmod/Makefile | 16 +
.../selftests/digest_cache/testmod/kern.c | 564 ++++++++++++
.../selftests/lsm/lsm_list_modules_test.c | 3 +
46 files changed, 6047 insertions(+), 6 deletions(-)
create mode 100644 Documentation/security/digest_cache.rst
create mode 100644 include/linux/digest_cache.h
create mode 100644 include/linux/tlv_parser.h
create mode 100644 include/uapi/linux/tlv_digest_list.h
create mode 100644 include/uapi/linux/tlv_parser.h
create mode 100644 lib/tlv_parser.c
create mode 100644 lib/tlv_parser.h
create mode 100644 security/digest_cache/Kconfig
create mode 100644 security/digest_cache/Makefile
create mode 100644 security/digest_cache/dir.c
create mode 100644 security/digest_cache/htable.c
create mode 100644 security/digest_cache/internal.h
create mode 100644 security/digest_cache/main.c
create mode 100644 security/digest_cache/modsig.c
create mode 100644 security/digest_cache/notifier.c
create mode 100644 security/digest_cache/parsers/parsers.h
create mode 100644 security/digest_cache/parsers/rpm.c
create mode 100644 security/digest_cache/parsers/tlv.c
create mode 100644 security/digest_cache/populate.c
create mode 100644 security/digest_cache/reset.c
create mode 100644 security/digest_cache/secfs.c
create mode 100644 security/digest_cache/verif.c
create mode 100644 tools/testing/selftests/digest_cache/.gitignore
create mode 100644 tools/testing/selftests/digest_cache/Makefile
create mode 100644 tools/testing/selftests/digest_cache/all_test.c
create mode 100644 tools/testing/selftests/digest_cache/common.c
create mode 100644 tools/testing/selftests/digest_cache/common.h
create mode 100644 tools/testing/selftests/digest_cache/common_user.c
create mode 100644 tools/testing/selftests/digest_cache/common_user.h
create mode 100644 tools/testing/selftests/digest_cache/config
create mode 100644 tools/testing/selftests/digest_cache/generators.c
create mode 100644 tools/testing/selftests/digest_cache/generators.h
create mode 100644 tools/testing/selftests/digest_cache/testmod/Makefile
create mode 100644 tools/testing/selftests/digest_cache/testmod/kern.c
--
2.34.1
By default, HLT instruction executed by guest is intercepted by hypervisor.
However, KVM_CAP_X86_DISABLE_EXITS capability can be used to not intercept
HLT by setting KVM_X86_DISABLE_EXITS_HLT.
By default, vms are created with in-kernel APIC support in KVM selftests.
VM needs to be created without in-kernel APIC support for this test, so
that HLT will exit to userspace. To do so, __vm_create() is modified to not
call KVM_CREATE_IRQCHIP ioctl while creating vm.
Add a test case to test KVM_X86_DISABLE_EXITS_HLT functionality.
Patch 1, 2 -> Preparatory patches to add the KVM_X86_DISABLE_EXITS_HLT test
case
Patch 3 -> Adds a test case for KVM_X86_DISABLE_EXITS_HLT
Testing done:
Tested KVM_X86_DISABLE_EXITS_HLT test case on AMD and Intel machines.
v1 -> v2
- Extended @shape to allow creation of VM without in-kernel APIC support
(Andrew Jones)
- Changed the test case based on Andrew's comments.
- Few more changes to the test case to pass the address of the flag on
which guest waits to execute HLT.
Manali Shukla (3):
KVM: selftests: Add safe_halt() and cli() helpers to common code
KVM: selftests: Extend @shape to allow creation of VM without
in-kernel APIC
KVM: selftests: Add a test case for KVM_X86_DISABLE_EXITS_HLT
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/kvm_util_base.h | 17 ++-
.../selftests/kvm/include/x86_64/processor.h | 17 +++
tools/testing/selftests/kvm/lib/kvm_util.c | 1 +
.../selftests/kvm/lib/x86_64/processor.c | 4 +-
.../kvm/x86_64/halt_disable_exit_test.c | 120 ++++++++++++++++++
6 files changed, 158 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/halt_disable_exit_test.c
base-commit: e9da6f08edb0bd4c621165496778d77a222e1174
--
2.34.1
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v3:
- address comments of Martin and Eduard in v2. (thanks)
- move "int type" to the first argument of start_server_addr and
connect_to_addr.
- add start_server_addr_opts.
- using "sockaddr_storage" instead of "sockaddr".
- move start_server_setsockopt patches out of this series.
v2:
- update patch 6 only, fix errors reported by CI.
This patchset uses public helpers start_server_* and connect_to_* defined
in network_helpers.c to drop duplicate code.
Geliang Tang (9):
selftests/bpf: Update arguments of connect_to_addr
selftests/bpf: Add start_server_addr* helpers
selftests/bpf: Use start_server_addr in cls_redirect
selftests/bpf: Use connect_to_addr in cls_redirect
selftests/bpf: Use start_server_addr in sk_assign
selftests/bpf: Use connect_to_addr in sk_assign
selftests/bpf: Use log_err in network_helpers
selftests/bpf: Use start_server_addr in test_sock_addr
selftests/bpf: Use connect_to_addr in test_sock_addr
tools/testing/selftests/bpf/Makefile | 3 +-
tools/testing/selftests/bpf/network_helpers.c | 37 ++++++++--
tools/testing/selftests/bpf/network_helpers.h | 6 +-
.../selftests/bpf/prog_tests/cls_redirect.c | 38 +---------
.../selftests/bpf/prog_tests/empty_skb.c | 2 +
.../bpf/prog_tests/ip_check_defrag.c | 2 +
.../selftests/bpf/prog_tests/sk_assign.c | 59 ++-------------
.../selftests/bpf/prog_tests/sock_addr.c | 6 +-
.../selftests/bpf/prog_tests/tc_redirect.c | 2 +-
.../selftests/bpf/prog_tests/test_tunnel.c | 4 +
.../selftests/bpf/prog_tests/xdp_metadata.c | 16 ++++
tools/testing/selftests/bpf/test_sock_addr.c | 74 ++-----------------
12 files changed, 81 insertions(+), 168 deletions(-)
--
2.40.1
Hello,
kernel test robot noticed "kunit.VCAP_API_DebugFS_Testsuite.vcap_api_show_admin_raw_test.fail" on:
commit: 93533996100c60ea6d4342c454752c0eb1e4b6b1 ("kunit: Handle test faults")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
[test failed on linux-next/master 9ed46da14b9b9b2ad4edb3b0c545b6dbe5c00d39]
in testcase: kunit
version:
with following parameters:
group: group-03
compiler: gcc-13
test machine: 16 threads 1 sockets Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz (Broadwell-DE) with 48G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang(a)intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202404151340.5b152d96-lkp@intel.com
[ 206.153880][ T2978] # vcap_api_show_admin_raw_test: EXPECTATION FAILED at drivers/net/ethernet/microchip/vcap/vcap_api_debugfs_kunit.c:377
[ 206.153880][ T2978] Expected test_expected == test_pr_buffer[0], but
[ 206.153880][ T2978] test_expected == " addr: 786, X6 rule, keysets: VCAP_KFS_MAC_ETYPE
[ 206.153880][ T2978] "
[ 206.153880][ T2978] test_pr_buffer[0] == ""
[ 206.159902][ T1] not ok 2 vcap_api_show_admin_raw_test
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240415/202404151340.5b152d96-lkp@…
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
From: Dmitry Vyukov <dvyukov(a)google.com>
POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main
thread of a thread group for signal delivery. However, this has a
significant downside: it requires waking up a potentially idle thread.
Instead, prefer to deliver signals to the current thread (in the same
thread group) if SIGEV_THREAD_ID is not set by the user. This does not
change guaranteed semantics, since POSIX process CPU time timers have
never guaranteed that signal delivery is to a specific thread (without
SIGEV_THREAD_ID set).
The effect is that we no longer wake up potentially idle threads, and
the kernel is no longer biased towards delivering the timer signal to
any particular thread (which better distributes the timer signals esp.
when multiple timers fire concurrently).
Signed-off-by: Dmitry Vyukov <dvyukov(a)google.com>
Suggested-by: Oleg Nesterov <oleg(a)redhat.com>
Reviewed-by: Oleg Nesterov <oleg(a)redhat.com>
Signed-off-by: Marco Elver <elver(a)google.com>
---
v6:
- Split test from this patch.
- Update wording on what this patch aims to improve.
v5:
- Rebased onto v6.2.
v4:
- Restructured checks in send_sigqueue() as suggested.
v3:
- Switched to the completely different implementation (much simpler)
based on the Oleg's idea.
RFC v2:
- Added additional Cc as Thomas asked.
---
kernel/signal.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c
index 8cb28f1df294..605445fa27d4 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
/*
* Now find a thread we can wake up to take the signal off the queue.
*
- * If the main thread wants the signal, it gets first crack.
- * Probably the least surprising to the average bear.
+ * Try the suggested task first (may or may not be the main thread).
*/
if (wants_signal(sig, p))
t = p;
@@ -1970,8 +1969,23 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
ret = -1;
rcu_read_lock();
+ /*
+ * This function is used by POSIX timers to deliver a timer signal.
+ * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID
+ * set), the signal must be delivered to the specific thread (queues
+ * into t->pending).
+ *
+ * Where type is not PIDTYPE_PID, signals must just be delivered to the
+ * current process. In this case, prefer to deliver to current if it is
+ * in the same thread group as the target, as it avoids unnecessarily
+ * waking up a potentially idle task.
+ */
t = pid_task(pid, type);
- if (!t || !likely(lock_task_sighand(t, &flags)))
+ if (!t)
+ goto ret;
+ if (type != PIDTYPE_PID && same_thread_group(t, current))
+ t = current;
+ if (!likely(lock_task_sighand(t, &flags)))
goto ret;
ret = 1; /* the signal is ignored */
@@ -1993,6 +2007,11 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
q->info.si_overrun = 0;
signalfd_notify(t, sig);
+ /*
+ * If the type is not PIDTYPE_PID, we just use shared_pending, which
+ * won't guarantee that the specified task will receive the signal, but
+ * is sufficient if t==current in the common case.
+ */
pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;
list_add_tail(&q->list, &pending->list);
sigaddset(&pending->signal, sig);
--
2.40.0.rc1.284.g88254d51c5-goog
KUnit's try-catch infrastructure now uses vfork_done, which is always
set to a valid completion when a kthread is created, but which is set to
NULL once the thread terminates. This creates a race condition, where
the kthread exits before we can wait on it.
Keep a copy of vfork_done, which is taken before we wake_up_process()
and so valid, and wait on that instead.
Fixes: 4de2a8e4cca4 ("kunit: Handle test faults")
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Closes: https://lore.kernel.org/lkml/20240410102710.35911-1-naresh.kamboju@linaro.o…
Tested-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Acked-by: Mickaël Salaün <mic(a)digikod.net>
Signed-off-by: David Gow <davidgow(a)google.com>
---
lib/kunit/try-catch.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c
index fa687278ccc9..6bbe0025b079 100644
--- a/lib/kunit/try-catch.c
+++ b/lib/kunit/try-catch.c
@@ -63,6 +63,7 @@ void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
{
struct kunit *test = try_catch->test;
struct task_struct *task_struct;
+ struct completion *task_done;
int exit_code, time_remaining;
try_catch->context = context;
@@ -75,13 +76,16 @@ void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
return;
}
get_task_struct(task_struct);
- wake_up_process(task_struct);
/*
* As for a vfork(2), task_struct->vfork_done (pointing to the
* underlying kthread->exited) can be used to wait for the end of a
- * kernel thread.
+ * kernel thread. It is set to NULL when the thread exits, so we
+ * keep a copy here.
*/
- time_remaining = wait_for_completion_timeout(task_struct->vfork_done,
+ task_done = task_struct->vfork_done;
+ wake_up_process(task_struct);
+
+ time_remaining = wait_for_completion_timeout(task_done,
kunit_test_timeout());
if (time_remaining == 0) {
try_catch->try_result = -ETIMEDOUT;
--
2.44.0.683.g7961c838ac-goog
Currently, the migration worker delays 1-10 us, assuming that one
KVM_RUN iteration only takes a few microseconds. But if C-state exit
latencies are large enough, for example, hundreds or even thousands
of microseconds on server CPUs, it may happen that it's not able to
bring the target CPU out of C-state before the migration worker starts
to migrate it to the next CPU.
If the system workload is light, most CPUs could be at a certain level
of C-state, and the vCPU thread may waste milliseconds before it can
actually migrate to a new CPU.
Thus, the tests may be inefficient in such systems, and in some cases
it may fail the migration/KVM_RUN ratio sanity check.
Since we are not able to turn off the cpuidle sub-system in run time,
this patch creates an idle thread on every CPU to prevent them from
entering C-states.
Additionally, seems it's reasonable to randomize the length of usleep(),
other than delay in a fixed pattern.
Signed-off-by: Zide Chen <zide.chen(a)intel.com>
---
tools/testing/selftests/kvm/rseq_test.c | 76 ++++++++++++++++++++++---
1 file changed, 69 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
index 28f97fb52044..d6e8b851d29e 100644
--- a/tools/testing/selftests/kvm/rseq_test.c
+++ b/tools/testing/selftests/kvm/rseq_test.c
@@ -11,6 +11,7 @@
#include <syscall.h>
#include <sys/ioctl.h>
#include <sys/sysinfo.h>
+#include <sys/resource.h>
#include <asm/barrier.h>
#include <linux/atomic.h>
#include <linux/rseq.h>
@@ -29,9 +30,10 @@
#define NR_TASK_MIGRATIONS 100000
static pthread_t migration_thread;
+static pthread_t *idle_threads;
static cpu_set_t possible_mask;
-static int min_cpu, max_cpu;
-static bool done;
+static int min_cpu, max_cpu, nproc;
+static volatile bool done;
static atomic_t seq_cnt;
@@ -150,7 +152,7 @@ static void *migration_worker(void *__rseq_tid)
* Use usleep() for simplicity and to avoid unnecessary kernel
* dependencies.
*/
- usleep((i % 10) + 1);
+ usleep((rand() % 10) + 1);
}
done = true;
return NULL;
@@ -158,7 +160,7 @@ static void *migration_worker(void *__rseq_tid)
static void calc_min_max_cpu(void)
{
- int i, cnt, nproc;
+ int i, cnt;
TEST_REQUIRE(CPU_COUNT(&possible_mask) >= 2);
@@ -186,6 +188,61 @@ static void calc_min_max_cpu(void)
"Only one usable CPU, task migration not possible");
}
+static void *idle_thread_fn(void *__idle_cpu)
+{
+ int r, cpu = (int)(unsigned long)__idle_cpu;
+ cpu_set_t allowed_mask;
+
+ CPU_ZERO(&allowed_mask);
+ CPU_SET(cpu, &allowed_mask);
+
+ r = sched_setaffinity(0, sizeof(allowed_mask), &allowed_mask);
+ TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d (%s)",
+ errno, strerror(errno));
+
+ /* lowest priority, trying to prevent it from entering C-states */
+ r = setpriority(PRIO_PROCESS, 0, 19);
+ TEST_ASSERT(!r, "setpriority failed, errno = %d (%s)",
+ errno, strerror(errno));
+
+ while(!done);
+
+ return NULL;
+}
+
+static void spawn_threads(void)
+{
+ int cpu;
+
+ /* Run a dummy thread on every CPU */
+ for (cpu = min_cpu; cpu <= max_cpu; cpu++) {
+ if (!CPU_ISSET(cpu, &possible_mask))
+ continue;
+
+ pthread_create(&idle_threads[cpu], NULL, idle_thread_fn,
+ (void *)(unsigned long)cpu);
+ }
+
+ pthread_create(&migration_thread, NULL, migration_worker,
+ (void *)(unsigned long)syscall(SYS_gettid));
+}
+
+static void join_threads(void)
+{
+ int cpu;
+
+ pthread_join(migration_thread, NULL);
+
+ for (cpu = min_cpu; cpu <= max_cpu; cpu++) {
+ if (!CPU_ISSET(cpu, &possible_mask))
+ continue;
+
+ pthread_join(idle_threads[cpu], NULL);
+ }
+
+ free(idle_threads);
+}
+
int main(int argc, char *argv[])
{
int r, i, snapshot;
@@ -199,6 +256,12 @@ int main(int argc, char *argv[])
calc_min_max_cpu();
+ srand(time(NULL));
+
+ idle_threads = malloc(sizeof(pthread_t) * nproc);
+ TEST_ASSERT(idle_threads, "malloc failed, errno = %d (%s)", errno,
+ strerror(errno));
+
r = rseq_register_current_thread();
TEST_ASSERT(!r, "rseq_register_current_thread failed, errno = %d (%s)",
errno, strerror(errno));
@@ -210,8 +273,7 @@ int main(int argc, char *argv[])
*/
vm = vm_create_with_one_vcpu(&vcpu, guest_code);
- pthread_create(&migration_thread, NULL, migration_worker,
- (void *)(unsigned long)syscall(SYS_gettid));
+ spawn_threads();
for (i = 0; !done; i++) {
vcpu_run(vcpu);
@@ -258,7 +320,7 @@ int main(int argc, char *argv[])
TEST_ASSERT(i > (NR_TASK_MIGRATIONS / 2),
"Only performed %d KVM_RUNs, task stalled too much?", i);
- pthread_join(migration_thread, NULL);
+ join_threads();
kvm_vm_free(vm);
--
2.34.1
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v2:
- update patch 6 only, fix errors reported by CI.
This patchset uses public helpers start_server_* and connect_to_* defined
in network_helpers.c to drop duplicate code.
Geliang Tang (14):
selftests/bpf: Add start_server_addr helper
selftests/bpf: Use start_server_addr in cls_redirect
selftests/bpf: Use connect_to_addr in cls_redirect
selftests/bpf: Use start_server_addr in sk_assign
selftests/bpf: Use connect_to_addr in sk_assign
selftests/bpf: Use log_err in network_helpers
selftests/bpf: Use start_server_addr in test_sock_addr
selftests/bpf: Use connect_to_addr in test_sock_addr
selftests/bpf: Add function pointer for __start_server
selftests/bpf: Add start_server_setsockopt helper
selftests/bpf: Use start_server_setsockopt in sockopt_inherit
selftests/bpf: Use connect_to_fd in sockopt_inherit
selftests/bpf: Use start_server_* in test_tcp_check_syncookie
selftests/bpf: Use connect_to_addr in test_tcp_check_syncookie
tools/testing/selftests/bpf/Makefile | 4 +-
tools/testing/selftests/bpf/network_helpers.c | 50 ++++++++----
tools/testing/selftests/bpf/network_helpers.h | 4 +
.../selftests/bpf/prog_tests/cls_redirect.c | 38 +---------
.../selftests/bpf/prog_tests/sk_assign.c | 53 +------------
.../bpf/prog_tests/sockopt_inherit.c | 64 ++++------------
tools/testing/selftests/bpf/test_sock_addr.c | 74 ++----------------
.../bpf/test_tcp_check_syncookie_user.c | 76 +++----------------
8 files changed, 83 insertions(+), 280 deletions(-)
--
2.40.1
The series consists of two parts:
- pids.events rework (originally v2, patches 1-6,
- migration charging, patches 7-9.
The changes are independent in principle, I stacked them for (my)
convenience and because they both deserve RFC:
1) Changed semantics of v2 pids.events
- similar change was proposed for memory.swap.events:max [1]
2) Migration charging is obsolete concept
How are the new events supposed to be useful?
- pids.events.local:max
- tells that cgroup's limit is hit (too tight?)
- pids.events.local:max.imposed
- tells that cgroup's workload was restricted (generalization of
'cgroup: fork rejected by pids controller in %s' message)
- pids.events:*
- "only" directs top-down search to cgroups of interest
The migration charging is motivated by apparenty surprising
pids.current > pids.max
because supervised processes are forked in supervisor's cgroup (more
details in commit cgroup/pids: Enforce pids.max on task migrations too)
Changes from v2 (https://lore.kernel.org/r/20200205134426.10570-1-mkoutny@suse.com)
- implemented pids.events.local (Tejun)
- added migration charging
[1] https://lore.kernel.org/r/20230202155626.1829121-1-hannes@cmpxchg.org/
Michal Koutný (9):
cgroup/pids: Remove superfluous zeroing
cgroup/pids: Separate semantics of pids.events related to pids.max
cgroup/pids: Make event counters hierarchical
cgroup/pids: Add pids.events.local
selftests: cgroup: Lexicographic order in Makefile
selftests: cgroup: Add basic tests for pids controller
cgroup/pids: Replace uncharge/charge pair with a single function
cgroup/pids: Enforce pids.max on task migrations
selftests: cgroup: Add tests pids controller
Documentation/admin-guide/cgroup-v1/pids.rst | 3 +-
Documentation/admin-guide/cgroup-v2.rst | 22 +-
include/linux/cgroup-defs.h | 7 +-
kernel/cgroup/cgroup.c | 16 +-
kernel/cgroup/pids.c | 206 +++++++++----
tools/testing/selftests/cgroup/Makefile | 25 +-
tools/testing/selftests/cgroup/test_pids.c | 302 +++++++++++++++++++
7 files changed, 514 insertions(+), 67 deletions(-)
create mode 100644 tools/testing/selftests/cgroup/test_pids.c
base-commit: 026e680b0a08a62b1d948e5a8ca78700bfac0e6e
--
2.44.0
After commit 6d029c25b71f ("selftests/timers/posix_timers: Reimplement
check_timer_distribution()"), clang warns:
tools/testing/selftests/timers/../kselftest.h:398:6: warning: variable 'major' is used uninitialized whenever '||' condition is true [-Wsometimes-uninitialized]
398 | if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
| ^~~~~~~~~~~~
tools/testing/selftests/timers/../kselftest.h:401:9: note: uninitialized use occurs here
401 | return major > min_major || (major == min_major && minor >= min_minor);
| ^~~~~
tools/testing/selftests/timers/../kselftest.h:398:6: note: remove the '||' if its condition is always false
398 | if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
| ^~~~~~~~~~~~~~~
tools/testing/selftests/timers/../kselftest.h:395:20: note: initialize the variable 'major' to silence this warning
395 | unsigned int major, minor;
| ^
| = 0
This is a false positive because if uname() fails, ksft_exit_fail_msg()
will be called, which unconditionally calls exit(), a noreturn function.
However, clang does not know that ksft_exit_fail_msg() will call exit()
at the point in the pipeline that the warning is emitted because
inlining has not occurred, so it assumes control flow will resume
normally after ksft_exit_fail_msg() is called.
Make it clear to clang that all of the functions that call exit()
unconditionally in kselftest.h are noreturn transitively by marking them
explicitly with '__attribute__((__noreturn__))', which clears up the
warning above and any future warnings that may appear for the same
reason.
Fixes: 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
Reported-by: John Stultz <jstultz(a)google.com>
Closes: https://lore.kernel.org/all/20240410232637.4135564-2-jstultz@google.com/
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
I have based this change on timers/urgent, as the commit that introduces
this particular warning is there and it is marked for stable, even
though this appears to be a generic kselftest issue. I think it makes
the most sense for this change to go via timers/urgent with Shuah's ack.
While __noreturn with a return type other than 'void' does not make much
sense semantically, there are many places that these functions are used
as the return value for other functions such as main(), so I did not
change the return type of these functions from 'int' to 'void' to
minimize the necessary changes for a backport (it is an existing issue
anyways).
I see there is another instance of this problem that will need to be
addressed in -next, introduced by commit f07041728422 ("selftests: add
ksft_exit_fail_perror()").
---
tools/testing/selftests/kselftest.h | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 973b18e156b2..0591974b57e0 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -80,6 +80,9 @@
#define KSFT_XPASS 3
#define KSFT_SKIP 4
+#ifndef __noreturn
+#define __noreturn __attribute__((__noreturn__))
+#endif
#define __printf(a, b) __attribute__((format(printf, a, b)))
/* counters */
@@ -300,13 +303,13 @@ void ksft_test_result_code(int exit_code, const char *test_name,
va_end(args);
}
-static inline int ksft_exit_pass(void)
+static inline __noreturn int ksft_exit_pass(void)
{
ksft_print_cnts();
exit(KSFT_PASS);
}
-static inline int ksft_exit_fail(void)
+static inline __noreturn int ksft_exit_fail(void)
{
ksft_print_cnts();
exit(KSFT_FAIL);
@@ -333,7 +336,7 @@ static inline int ksft_exit_fail(void)
ksft_cnt.ksft_xfail + \
ksft_cnt.ksft_xskip)
-static inline __printf(1, 2) int ksft_exit_fail_msg(const char *msg, ...)
+static inline __noreturn __printf(1, 2) int ksft_exit_fail_msg(const char *msg, ...)
{
int saved_errno = errno;
va_list args;
@@ -348,19 +351,19 @@ static inline __printf(1, 2) int ksft_exit_fail_msg(const char *msg, ...)
exit(KSFT_FAIL);
}
-static inline int ksft_exit_xfail(void)
+static inline __noreturn int ksft_exit_xfail(void)
{
ksft_print_cnts();
exit(KSFT_XFAIL);
}
-static inline int ksft_exit_xpass(void)
+static inline __noreturn int ksft_exit_xpass(void)
{
ksft_print_cnts();
exit(KSFT_XPASS);
}
-static inline __printf(1, 2) int ksft_exit_skip(const char *msg, ...)
+static inline __noreturn __printf(1, 2) int ksft_exit_skip(const char *msg, ...)
{
int saved_errno = errno;
va_list args;
---
base-commit: 076361362122a6d8a4c45f172ced5576b2d4a50d
change-id: 20240411-mark-kselftest-exit-funcs-noreturn-17d8ff729a7a
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
This series fixes a bug in the complete phase of UDP in GRO, in which
socket lookup fails due to using network_header when parsing encapsulated
packets. The fix is to pass p_off parameter in *_gro_complete.
Next, the fields network_offset and inner_network_offset are added to
napi_gro_cb, and are both set during the receive phase of GRO. This is then
leveraged in the next commit to remove flush_id state from napi_gro_cb, and
stateful code in {ipv6,inet}_gro_receive which may be unnecessarily
complicated due to encapsulation support in GRO.
In addition, udpgro_fwd selftest is adjusted to include the socket lookup
case for vxlan. This selftest will test its supposed functionality once
local bind support is merged (https://lore.kernel.org/netdev/df300a49-7811-4126-a56a-a77100c8841b@gmail.c…).
v5 -> v6:
- Write inner_network_offset in vxlan, geneve and ipsec
- Ignore is_atomic when DF=0
- v5:
https://lore.kernel.org/all/20240408141720.98832-1-richardbgobert@gmail.com/
v4 -> v5:
- Add 1st commit - flush id checks in udp_gro_receive segment which can be
backported by itself
- Add TCP measurements for the 5th commit
- Add flush id tests to ensure flush id logic is preserved in GRO
- Simplify gro_inet_flush by removing a branch
- v4:
https://lore.kernel.org/all/20240325182543.87683-1-richardbgobert@gmail.com/
v3 -> v4:
- Fix code comment and commit message typos
- v3:
https://lore.kernel.org/all/f939c84a-2322-4393-a5b0-9b1e0be8ed8e@gmail.com/
v2 -> v3:
- Use napi_gro_cb instead of skb->{offset}
- v2:
https://lore.kernel.org/all/2ce1600b-e733-448b-91ac-9d0ae2b866a4@gmail.com/
v1 -> v2:
- Pass p_off in *_gro_complete to fix UDP bug
- Remove more conditionals and memory fetches from inet_gro_flush
- v1:
https://lore.kernel.org/netdev/e1d22505-c5f8-4c02-a997-64248480338b@gmail.c…
Richard Gobert (6):
net: gro: add flush check in udp_gro_receive_segment
net: gro: add p_off param in *_gro_complete
selftests/net: add local address bind in vxlan selftest
net: gro: add {inner_}network_offset to napi_gro_cb
net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment
selftests/net: add flush id selftests
drivers/net/geneve.c | 8 +-
drivers/net/vxlan/vxlan_core.c | 12 +-
include/linux/etherdevice.h | 2 +-
include/linux/netdevice.h | 3 +-
include/linux/udp.h | 2 +-
include/net/gro.h | 93 ++++++++++++--
include/net/inet_common.h | 2 +-
include/net/tcp.h | 6 +-
include/net/udp.h | 8 +-
include/net/udp_tunnel.h | 2 +-
net/8021q/vlan_core.c | 6 +-
net/core/gro.c | 7 +-
net/ethernet/eth.c | 5 +-
net/ipv4/af_inet.c | 54 +-------
net/ipv4/fou_core.c | 9 +-
net/ipv4/gre_offload.c | 6 +-
net/ipv4/tcp_offload.c | 22 +---
net/ipv4/udp.c | 3 +-
net/ipv4/udp_offload.c | 31 +++--
net/ipv6/ip6_offload.c | 41 +++---
net/ipv6/tcpv6_offload.c | 7 +-
net/ipv6/udp.c | 3 +-
net/ipv6/udp_offload.c | 13 +-
tools/testing/selftests/net/gro.c | 144 ++++++++++++++++++++++
tools/testing/selftests/net/udpgro_fwd.sh | 10 +-
25 files changed, 338 insertions(+), 161 deletions(-)
--
2.36.1
Add a basic test for page pool netlink reporting.
Jakub Kicinski (6):
net: netdevsim: add some fake page pool use
tools: ynl: don't return None for dumps
selftests: net: print report check location in python tests
selftests: net: print full exception on failure
selftests: net: support use of NetdevSimDev under "with" in python
selftests: net: exercise page pool reporting via netlink
drivers/net/netdevsim/netdev.c | 93 ++++++++++++++++++++++
drivers/net/netdevsim/netdevsim.h | 4 +
tools/net/ynl/lib/ynl.py | 4 +-
tools/testing/selftests/net/lib/py/ksft.py | 29 ++++---
tools/testing/selftests/net/lib/py/nsim.py | 22 ++++-
tools/testing/selftests/net/nl_netdev.py | 79 +++++++++++++++++-
6 files changed, 215 insertions(+), 16 deletions(-)
--
2.44.0
New version of the sleepable bpf_timer code.
I'm posting this as this is the result of the previous review, so we can
have a baseline to compare to.
The plan is now to introduce a new user API struct bpf_wq, as the timer
API working on softIRQ seems to be quite far away from a wq.
For reference, the use cases I have in mind:
---
Basically, I need to be able to defer a HID-BPF program for the
following reasons (from the aforementioned patch):
1. defer an event:
Sometimes we receive an out of proximity event, but the device can not
be trusted enough, and we need to ensure that we won't receive another
one in the following n milliseconds. So we need to wait those n
milliseconds, and eventually re-inject that event in the stack.
2. inject new events in reaction to one given event:
We might want to transform one given event into several. This is the
case for macro keys where a single key press is supposed to send
a sequence of key presses. But this could also be used to patch a
faulty behavior, if a device forgets to send a release event.
3. communicate with the device in reaction to one event:
We might want to communicate back to the device after a given event.
For example a device might send us an event saying that it came back
from sleeping state and needs to be re-initialized.
Currently we can achieve that by keeping a userspace program around,
raise a bpf event, and let that userspace program inject the events and
commands.
However, we are just keeping that program alive as a daemon for just
scheduling commands. There is no logic in it, so it doesn't really justify
an actual userspace wakeup. So a kernel workqueue seems simpler to handle.
bpf_timers are currently running in a soft IRQ context, this patch
series implements a sleppable context for them.
Cheers,
Benjamin
To: Alexei Starovoitov <ast(a)kernel.org>
To: Daniel Borkmann <daniel(a)iogearbox.net>
To: Andrii Nakryiko <andrii(a)kernel.org>
To: Martin KaFai Lau <martin.lau(a)linux.dev>
To: Eduard Zingerman <eddyz87(a)gmail.com>
To: Song Liu <song(a)kernel.org>
To: Yonghong Song <yonghong.song(a)linux.dev>
To: John Fastabend <john.fastabend(a)gmail.com>
To: KP Singh <kpsingh(a)kernel.org>
To: Stanislav Fomichev <sdf(a)google.com>
To: Hao Luo <haoluo(a)google.com>
To: Jiri Olsa <jolsa(a)kernel.org>
To: Mykola Lysenko <mykolal(a)fb.com>
To: Shuah Khan <shuah(a)kernel.org>
Cc: Benjamin Tissoires <bentiss(a)kernel.org>
Cc: <bpf(a)vger.kernel.org>
Cc: <linux-kernel(a)vger.kernel.org>
Cc: <linux-kselftest(a)vger.kernel.org>
---
Changes in v6:
- Use of a workqueue to clean up sleepable timers
- integrated Kumar's patch instead of mine
- Link to v5: https://lore.kernel.org/r/20240322-hid-bpf-sleepable-v5-0-179c7b59eaaa@kern…
Changes in v5:
- took various reviews into account
- rewrote the tests to be separated to not have a uggly include
- Link to v4: https://lore.kernel.org/r/20240315-hid-bpf-sleepable-v4-0-5658f2540564@kern…
Changes in v4:
- dropped the HID changes, they can go independently from bpf-core
- addressed Alexei's and Eduard's remarks
- added selftests
- Link to v3: https://lore.kernel.org/r/20240221-hid-bpf-sleepable-v3-0-1fb378ca6301@kern…
Changes in v3:
- fixed the crash from v2
- changed the API to have only BPF_F_TIMER_SLEEPABLE for
bpf_timer_start()
- split the new kfuncs/verifier patch into several sub-patches, for
easier reviews
- Link to v2: https://lore.kernel.org/r/20240214-hid-bpf-sleepable-v2-0-5756b054724d@kern…
Changes in v2:
- make use of bpf_timer (and dropped the custom HID handling)
- implemented bpf_timer_set_sleepable_cb as a kfunc
- still not implemented global subprogs
- no sleepable bpf_timer selftests yet
- Link to v1: https://lore.kernel.org/r/20240209-hid-bpf-sleepable-v1-0-4cc895b5adbd@kern…
---
Benjamin Tissoires (5):
bpf/helpers: introduce sleepable bpf_timers
bpf/helpers: introduce bpf_timer_set_sleepable_cb() kfunc
bpf/helpers: mark the callback of bpf_timer_set_sleepable_cb() as sleepable
tools: sync include/uapi/linux/bpf.h
selftests/bpf: add sleepable timer tests
Kumar Kartikeya Dwivedi (1):
bpf: Add support for KF_ARG_PTR_TO_TIMER
include/linux/bpf_verifier.h | 1 +
include/uapi/linux/bpf.h | 13 ++
kernel/bpf/helpers.c | 202 ++++++++++++++++---
kernel/bpf/verifier.c | 98 +++++++++-
tools/include/uapi/linux/bpf.h | 20 +-
tools/testing/selftests/bpf/bpf_experimental.h | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h | 1 +
tools/testing/selftests/bpf/prog_tests/timer.c | 34 ++++
.../testing/selftests/bpf/progs/timer_sleepable.c | 213 +++++++++++++++++++++
10 files changed, 553 insertions(+), 39 deletions(-)
---
base-commit: 61df575632d6b39213f47810c441bddbd87c3606
change-id: 20240205-hid-bpf-sleepable-c01260fd91c4
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
This patch series introduces a new char misc driver, /dev/ntsync, which is used
to implement Windows NT synchronization primitives.
== Background ==
The Wine project emulates the Windows API in user space. One particular part of
that API, namely the NT synchronization primitives, have historically been
implemented via RPC to a dedicated "kernel" process. However, more recent
applications use these APIs more strenuously, and the overhead of RPC has become
a bottleneck.
The NT synchronization APIs are too complex to implement on top of existing
primitives without sacrificing correctness. Certain operations, such as
NtPulseEvent() or the "wait-for-all" mode of NtWaitForMultipleObjects(), require
direct control over the underlying wait queue, and implementing a wait queue
sufficiently robust for Wine in user space is not possible. This proposed
driver, therefore, implements the problematic interfaces directly in the Linux
kernel.
This driver was presented at Linux Plumbers Conference 2023. For those further
interested in the history of synchronization in Wine and past attempts to solve
this problem in user space, a recording of the presentation can be viewed here:
https://www.youtube.com/watch?v=NjU4nyWyhU8
== Performance ==
The gain in performance varies wildly depending on the application in question
and the user's hardware. For some games NT synchronization is not a bottleneck
and no change can be observed, but for others frame rate improvements of 50 to
150 percent are not atypical. The following table lists frame rate measurements
from a variety of games on a variety of hardware, taken by users Dmitry
Skvortsov, FuzzyQuils, OnMars, and myself:
Game Upstream ntsync improvement
===========================================================================
Anger Foot 69 99 43%
Call of Juarez 99.8 224.1 125%
Dirt 3 110.6 860.7 678%
Forza Horizon 5 108 160 48%
Lara Croft: Temple of Osiris 141 326 131%
Metro 2033 164.4 199.2 21%
Resident Evil 2 26 77 196%
The Crew 26 51 96%
Tiny Tina's Wonderlands 130 360 177%
Total War Saga: Troy 109 146 34%
===========================================================================
== Patches ==
The intended semantics of the patches are broadly intended to match those of the
corresponding Windows functions. For those not already familiar with the Windows
functions (or their undocumented behaviour), patch 31/31 provides a detailed
specification, and individual patches also include a brief description of the
API they are implementing.
The patches making use of this driver in Wine can be retrieved or browsed here:
https://repo.or.cz/wine/zf.git/shortlog/refs/heads/ntsync5
== Implementation ==
Some aspects of the implementation may deserve particular comment:
* In the interest of performance, each object is governed only by a single
spinlock. However, NTSYNC_IOC_WAIT_ALL requires that the state of multiple
objects be changed as a single atomic operation. In order to achieve this, we
first take a device-wide lock ("wait_all_lock") any time we are going to lock
more than one object at a time.
The maximum number of objects that can be used in a vectored wait, and
therefore the maximum that can be locked simultaneously, is 64. This number is
NT's own limit.
The acquisition of multiple spinlocks will degrade performance. This is a
conscious choice, however. Wait-for-all is known to be a very rare operation
in practice, especially with counts that approach the maximum, and it is the
intent of the ntsync driver to optimize wait-for-any at the expense of
wait-for-all as much as possible.
* NT mutexes are tied to their threads on an OS level, and the kernel includes
builtin support for "robust" mutexes. In order to keep the ntsync driver
self-contained and avoid touching more code than necessary, it does not hook
into task exit nor use pids.
Instead, the user space emulator is expected to manage thread IDs and pass
them as an argument to any relevant functions; this is the "owner" field of
ntsync_wait_args and ntsync_mutex_args.
When the emulator detects that a thread dies, it should therefore call
NTSYNC_IOC_MUTEX_KILL on any open mutexes.
* ntsync is module-capable mostly because there was nothing preventing it, and
because it aided development. It is not a hard requirement, though.
== Previous versions ==
Changes from v2:
* Check the result of fget() for NULL.
* Squash patch 31 (introducing the NTSYNC_WAIT_REALTIME flag) into patch 4, per
Arnd Bergmann.
* Use atomic_try_cmpxchg() instead of atomic_cmpxchg(), per off-list review from
Uros Bizjak.
* Link to v2: https://lore.kernel.org/lkml/20240219223833.95710-1-zfigura@codeweavers.com/
* Link to v1: https://lore.kernel.org/lkml/20240214233645.9273-1-zfigura@codeweavers.com/
* Link to RFC v2: https://lore.kernel.org/lkml/20240131021356.10322-1-zfigura@codeweavers.com/
* Link to RFC v1: https://lore.kernel.org/lkml/20240124004028.16826-1-zfigura@codeweavers.com/
Elizabeth Figura (30):
ntsync: Introduce the ntsync driver and character device.
ntsync: Introduce NTSYNC_IOC_CREATE_SEM.
ntsync: Introduce NTSYNC_IOC_SEM_POST.
ntsync: Introduce NTSYNC_IOC_WAIT_ANY.
ntsync: Introduce NTSYNC_IOC_WAIT_ALL.
ntsync: Introduce NTSYNC_IOC_CREATE_MUTEX.
ntsync: Introduce NTSYNC_IOC_MUTEX_UNLOCK.
ntsync: Introduce NTSYNC_IOC_MUTEX_KILL.
ntsync: Introduce NTSYNC_IOC_CREATE_EVENT.
ntsync: Introduce NTSYNC_IOC_EVENT_SET.
ntsync: Introduce NTSYNC_IOC_EVENT_RESET.
ntsync: Introduce NTSYNC_IOC_EVENT_PULSE.
ntsync: Introduce NTSYNC_IOC_SEM_READ.
ntsync: Introduce NTSYNC_IOC_MUTEX_READ.
ntsync: Introduce NTSYNC_IOC_EVENT_READ.
ntsync: Introduce alertable waits.
selftests: ntsync: Add some tests for semaphore state.
selftests: ntsync: Add some tests for mutex state.
selftests: ntsync: Add some tests for NTSYNC_IOC_WAIT_ANY.
selftests: ntsync: Add some tests for NTSYNC_IOC_WAIT_ALL.
selftests: ntsync: Add some tests for wakeup signaling with
WINESYNC_IOC_WAIT_ANY.
selftests: ntsync: Add some tests for wakeup signaling with
WINESYNC_IOC_WAIT_ALL.
selftests: ntsync: Add some tests for manual-reset event state.
selftests: ntsync: Add some tests for auto-reset event state.
selftests: ntsync: Add some tests for wakeup signaling with events.
selftests: ntsync: Add tests for alertable waits.
selftests: ntsync: Add some tests for wakeup signaling via alerts.
selftests: ntsync: Add a stress test for contended waits.
maintainers: Add an entry for ntsync.
docs: ntsync: Add documentation for the ntsync uAPI.
Documentation/userspace-api/index.rst | 1 +
.../userspace-api/ioctl/ioctl-number.rst | 2 +
Documentation/userspace-api/ntsync.rst | 399 +++++
MAINTAINERS | 9 +
drivers/misc/Kconfig | 11 +
drivers/misc/Makefile | 1 +
drivers/misc/ntsync.c | 1166 ++++++++++++++
include/uapi/linux/ntsync.h | 62 +
tools/testing/selftests/Makefile | 1 +
.../testing/selftests/drivers/ntsync/Makefile | 8 +
tools/testing/selftests/drivers/ntsync/config | 1 +
.../testing/selftests/drivers/ntsync/ntsync.c | 1407 +++++++++++++++++
12 files changed, 3068 insertions(+)
create mode 100644 Documentation/userspace-api/ntsync.rst
create mode 100644 drivers/misc/ntsync.c
create mode 100644 include/uapi/linux/ntsync.h
create mode 100644 tools/testing/selftests/drivers/ntsync/Makefile
create mode 100644 tools/testing/selftests/drivers/ntsync/config
create mode 100644 tools/testing/selftests/drivers/ntsync/ntsync.c
base-commit: 4cece764965020c22cff7665b18a012006359095
--
2.43.0
When logging an error from calling waitpid() on the child we print a
misleading error message saying that the error we report was returned by
the chilld. Fix this to say the error is from waitpid().
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/clone3/clone3.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index 3c9bf0cd82a8..eb108727c35c 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -95,7 +95,7 @@ static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
getpid(), pid);
if (waitpid(-1, &status, __WALL) < 0) {
- ksft_print_msg("Child returned %s\n", strerror(errno));
+ ksft_print_msg("waitpid() returned %s\n", strerror(errno));
return -errno;
}
if (WEXITSTATUS(status))
---
base-commit: 8cb4a9a82b21623dbb4b3051dd30d98356cf95bc
change-id: 20240405-kselftest-clone3-waitpid-68c4833cf5ff
Best regards,
--
Mark Brown <broonie(a)kernel.org>
When the child exits during the clone3() selftest we use WEXITSTATUS() to
get the exit status from the process without first checking WIFEXITED() to
see if the result will be valid. This can lead to incorrect results, for
example if the child exits due to signal. Add a WIFEXTED() check and report
any non-standard exit as a failure, using EXIT_FAILURE as the exit status
for call_clone3() since we otherwise report 0 or negative errnos.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/clone3/clone3.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index 3c9bf0cd82a8..0e0e5dfa97c6 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -98,6 +98,11 @@ static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
ksft_print_msg("Child returned %s\n", strerror(errno));
return -errno;
}
+ if (!WIFEXITED(status)) {
+ ksft_print_msg("Child did not exit normally, status 0x%x\n",
+ status);
+ return EXIT_FAILURE;
+ }
if (WEXITSTATUS(status))
return WEXITSTATUS(status);
---
base-commit: 39cd87c4eb2b893354f3b850f916353f2658ae6f
change-id: 20240405-kselftest-clone3-signal-1edb1a3f5473
Best regards,
--
Mark Brown <broonie(a)kernel.org>
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v5:
- address Martin's comments for v4 (thanks).
- update patch 2, use 'return err' instead of 'return -1/0'.
- drop patch 3 in v4.
v4:
- fix a bug in v3, it should be 'if (err)', not 'if (!err)'.
- move "selftests/bpf: Use log_err in network_helpers" out of this
series.
v3:
- add two more patches.
- use log_err instead of ASSERT in v3.
- let send_recv_data return int as Martin suggested.
v2:
Address Martin's comments for v1 (thanks.)
- drop patch 1, "export send_byte helper".
- drop "WRITE_ONCE(arg.stop, 0)".
- rebased.
send_recv_data will be re-used in MPTCP bpf tests, but not included
in this set because it depends on other patches that have not been
in the bpf-next yet. It will be sent as another set soon.
Geliang Tang (2):
selftests/bpf: Add struct send_recv_arg
selftests/bpf: Export send_recv_data helper
tools/testing/selftests/bpf/network_helpers.c | 96 +++++++++++++++++++
tools/testing/selftests/bpf/network_helpers.h | 1 +
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 71 +-------------
3 files changed, 98 insertions(+), 70 deletions(-)
--
2.40.1
These patches from Geliang add support for the "last time" field in
MPTCP Info, and verify that the counters look valid.
Patch 1 adds these counters: last_data_sent, last_data_recv and
last_ack_recv. They are available in the MPTCP Info, so exposed via
getsockopt(MPTCP_INFO) and the Netlink Diag interface.
Patch 2 adds a test in diag.sh MPTCP selftest, to check that the
counters have moved by at least 250ms, after having waited twice that
time.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Changes in v2:
- Only patch 1/2 has been modified following Eric's suggestion, see the
individual changelog for more details.
- Link to v1: https://lore.kernel.org/r/20240405-upstream-net-next-20240405-mptcp-last-ti…
---
Geliang Tang (2):
mptcp: add last time fields in mptcp_info
selftests: mptcp: test last time mptcp_info
include/uapi/linux/mptcp.h | 4 +++
net/mptcp/options.c | 1 +
net/mptcp/protocol.c | 7 ++++
net/mptcp/protocol.h | 3 ++
net/mptcp/sockopt.c | 16 +++++++---
tools/testing/selftests/net/mptcp/diag.sh | 53 +++++++++++++++++++++++++++++++
6 files changed, 79 insertions(+), 5 deletions(-)
---
base-commit: 2ecd487b670fcbb1ad4893fff1af4aafdecb6023
change-id: 20240405-upstream-net-next-20240405-mptcp-last-time-info-9b03618e08f1
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
This patch series adds support to freeze the task cgroup hierarchy
that is on a default cgroup v2 without going through kernfs interface.
For some cases we want to freeze the cgroup of a task based on some
signals, doing so from bpf is better than user space which could be
too late.
Planned users of this feature are: tetragon and systemd when freezing
a cgroup hierarchy that could be a K8s pod, container, system service
or a user session.
Patch 1: cgroup: add cgroup_freeze_no_kn() to freeze a cgroup from bpf
Patch 2: bpf: add bpf_task_freeze_cgroup() to freeze the cgroup of a task
Patch 3: selftests/bpf: add selftest for bpf_task_freeze_cgroup
include/linux/cgroup.h | 2 ++
kernel/bpf/helpers.c | 31 ++++
kernel/cgroup/cgroup.c | 67 ++++++++
tools/testing/selftests/bpf/prog_tests/task_freeze_cgroup.c | 165 +++++++++++++++++++++
tools/testing/selftests/bpf/progs/test_task_freeze_cgroup.c | 110 ++++++++++++++
5 files changed, 375 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_freeze_cgroup.c
create mode 100644 tools/testing/selftests/bpf/progs/test_task_freeze_cgroup.c
--
2.34.1
This series implements SBI PMU improvements done in SBI v2.0[1] i.e. PMU snapshot
and fw_read_hi() functions.
SBI v2.0 introduced PMU snapshot feature which allows the SBI implementation
to provide counter information (i.e. values/overflow status) via a shared
memory between the SBI implementation and supervisor OS. This allows to minimize
the number of traps in when perf being used inside a kvm guest as it relies on
SBI PMU + trap/emulation of the counters.
The current set of ratified RISC-V specification also doesn't allow scountovf
to be trap/emulated by the hypervisor. The SBI PMU snapshot bridges the gap
in ISA as well and enables perf sampling in the guest. However, LCOFI in the
guest only works via IRQ filtering in AIA specification. That's why, AIA
has to be enabled in the hardware (at least the Ssaia extension) in order to
use the sampling support in the perf.
Here are the patch wise implementation details.
PATCH 1,4,7,8,9,10,11,15 : Generic cleanups/improvements.
PATCH 2,3,14 : FW_READ_HI function implementation
PATCH 5-6: Add PMU snapshot feature in sbi pmu driver
PATCH 12-13: KVM implementation for snapshot and sampling in kvm guests
PATCH 16-17: Generic improvements for kvm selftests
PATCH 18-22: KVM selftests for SBI PMU extension
The series is based on v6.9-rc1 and is available at:
https://github.com/atishp04/linux/tree/kvm_pmu_snapshot_v5
The kvmtool patch is also available at:
https://github.com/atishp04/kvmtool/tree/sscofpmf
It also requires Ssaia ISA extension to be present in the hardware in order to
get perf sampling support in the guest. In Qemu virt machine, it can be done
by the following config.
```
-cpu rv64,sscofpmf=true,x-ssaia=true
```
There is no other dependencies on AIA apart from that. Thus, Ssaia must be disabled
for the guest if AIA patches are not available. Here is the example command.
```
./lkvm-static run -m 256 -c2 --console serial -p "console=ttyS0 earlycon" --disable-ssaia -k ./Image --debug
```
The series has been tested only in Qemu.
Here is the snippet of the perf running inside a kvm guest.
===================================================
$ perf record -e cycles -e instructions perf bench sched messaging -g 5
...
$ Running 'sched/messaging' benchmark:
...
[ 45.928723] perf_duration_warn: 2 callbacks suppressed
[ 45.929000] perf: interrupt took too long (484426 > 483186), lowering kernel.perf_event_max_sample_rate to 250
$ 20 sender and receiver processes per group
$ 5 groups == 200 processes run
Total time: 14.220 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.117 MB perf.data (1942 samples) ]
$ perf report --stdio
$ To display the perf.data header info, please use --header/--header-only optio>
$
$
$ Total Lost Samples: 0
$
$ Samples: 943 of event 'cycles'
$ Event count (approx.): 5128976844
$
$ Overhead Command Shared Object Symbol >
$ ........ ............... ........................... .....................>
$
7.59% sched-messaging [kernel.kallsyms] [k] memcpy
5.48% sched-messaging [kernel.kallsyms] [k] percpu_counter_ad>
5.24% sched-messaging [kernel.kallsyms] [k] __sbi_rfence_v02_>
4.00% sched-messaging [kernel.kallsyms] [k] _raw_spin_unlock_>
3.79% sched-messaging [kernel.kallsyms] [k] set_pte_range
3.72% sched-messaging [kernel.kallsyms] [k] next_uptodate_fol>
3.46% sched-messaging [kernel.kallsyms] [k] filemap_map_pages
3.31% sched-messaging [kernel.kallsyms] [k] handle_mm_fault
3.20% sched-messaging [kernel.kallsyms] [k] finish_task_switc>
3.16% sched-messaging [kernel.kallsyms] [k] clear_page
3.03% sched-messaging [kernel.kallsyms] [k] mtree_range_walk
2.42% sched-messaging [kernel.kallsyms] [k] flush_icache_pte
===================================================
[1] https://github.com/riscv-non-isa/riscv-sbi-doc
Changes from v4->v5:
1. Moved sbi related definitions to its own header file from processor.h
2. Added few helper functions for selftests.
3. Improved firmware counter read and RV32 start/stop functions.
4. Converted all the shifting operations to use BIT macro
5. Addressed all other comments on v4.
Changes from v3->v4:
1. Added selftests.
2. Fixed an issue to clear the interrupt pending bits.
3. Fixed the counter index in snapshot memory start function.
Changes from v2->v3:
1. Fixed a patchwork warning on patch6.
2. Fixed a comment formatting & nit fix in PATCH 3 & 5.
3. Moved the hvien update and sscofpmf enabling to PATCH 9 from PATCH 8.
Changes from v1->v2:
1. Fixed warning/errors from patchwork CI.
2. Rebased on top of kvm-next.
3. Added Acked-by tags.
Changes from RFC->v1:
1. Addressed all the comments on RFC series.
2. Removed PATCH2 and merged into later patches.
3. Added 2 more patches for minor fixes.
4. Fixed KVM boot issue without Ssaia and made sscofpmf in guest dependent on
Ssaia in the host.
Atish Patra (22):
RISC-V: Fix the typo in Scountovf CSR name
RISC-V: Add FIRMWARE_READ_HI definition
drivers/perf: riscv: Read upper bits of a firmware counter
drivers/perf: riscv: Use BIT macro for shifting operations
RISC-V: Add SBI PMU snapshot definitions
drivers/perf: riscv: Implement SBI PMU snapshot function
drivers/perf: riscv: Fix counter mask iteration for RV32
RISC-V: KVM: Fix the initial sample period value
RISC-V: KVM: Rename the SBI_STA_SHMEM_DISABLE to a generic name
RISC-V: KVM: No need to update the counter value during reset
RISC-V: KVM: No need to exit to the user space if perf event failed
RISC-V: KVM: Implement SBI PMU Snapshot feature
RISC-V: KVM: Add perf sampling support for guests
RISC-V: KVM: Support 64 bit firmware counters on RV32
RISC-V: KVM: Improve firmware counter read function
KVM: riscv: selftests: Move sbi definitions to its own header file
KVM: riscv: selftests: Add helper functions for extension checks
KVM: riscv: selftests: Add Sscofpmf to get-reg-list test
KVM: riscv: selftests: Add SBI PMU extension definitions
KVM: riscv: selftests: Add SBI PMU selftest
KVM: riscv: selftests: Add a test for PMU snapshot functionality
KVM: riscv: selftests: Add a test for counter overflow
arch/riscv/include/asm/csr.h | 5 +-
arch/riscv/include/asm/kvm_vcpu_pmu.h | 16 +-
arch/riscv/include/asm/sbi.h | 34 +-
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kernel/paravirt.c | 6 +-
arch/riscv/kvm/aia.c | 5 +
arch/riscv/kvm/vcpu.c | 15 +-
arch/riscv/kvm/vcpu_onereg.c | 5 +
arch/riscv/kvm/vcpu_pmu.c | 260 +++++++-
arch/riscv/kvm/vcpu_sbi_pmu.c | 17 +-
arch/riscv/kvm/vcpu_sbi_sta.c | 4 +-
drivers/perf/riscv_pmu.c | 1 +
drivers/perf/riscv_pmu_sbi.c | 264 +++++++-
include/linux/perf/riscv_pmu.h | 6 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/riscv/processor.h | 49 +-
.../testing/selftests/kvm/include/riscv/sbi.h | 141 +++++
.../selftests/kvm/include/riscv/ucall.h | 1 +
.../selftests/kvm/lib/riscv/processor.c | 12 +
.../testing/selftests/kvm/riscv/arch_timer.c | 2 +-
.../selftests/kvm/riscv/get-reg-list.c | 4 +
.../selftests/kvm/riscv/sbi_pmu_test.c | 581 ++++++++++++++++++
tools/testing/selftests/kvm/steal_time.c | 4 +-
23 files changed, 1322 insertions(+), 112 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/riscv/sbi.h
create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
--
2.34.1
This series implements SBI PMU improvements done in SBI v2.0[1] i.e. PMU snapshot
and fw_read_hi() functions.
SBI v2.0 introduced PMU snapshot feature which allows the SBI implementation
to provide counter information (i.e. values/overflow status) via a shared
memory between the SBI implementation and supervisor OS. This allows to minimize
the number of traps in when perf being used inside a kvm guest as it relies on
SBI PMU + trap/emulation of the counters.
The current set of ratified RISC-V specification also doesn't allow scountovf
to be trap/emulated by the hypervisor. The SBI PMU snapshot bridges the gap
in ISA as well and enables perf sampling in the guest. However, LCOFI in the
guest only works via IRQ filtering in AIA specification. That's why, AIA
has to be enabled in the hardware (at least the Ssaia extension) in order to
use the sampling support in the perf.
Here are the patch wise implementation details.
PATCH 1,6,7 : Generic cleanups/improvements.
PATCH 2,3,10 : FW_READ_HI function implementation
PATCH 4-5: Add PMU snapshot feature in sbi pmu driver
PATCH 6-7: KVM implementation for snapshot and sampling in kvm guests
PATCH 11-15: KVM selftests for SBI PMU extension
The series is based on kvm-next and is available at:
https://github.com/atishp04/linux/tree/kvm_pmu_snapshot_v4
The series is based on kvm-riscv/queue branch + fixes suggested on the following
series
https://patchwork.kernel.org/project/kvm/cover/cover.1705916069.git.haibo1.…
The kvmtool patch is also available at:
https://github.com/atishp04/kvmtool/tree/sscofpmf
It also requires Ssaia ISA extension to be present in the hardware in order to
get perf sampling support in the guest. In Qemu virt machine, it can be done
by the following config.
```
-cpu rv64,sscofpmf=true,x-ssaia=true
```
There is no other dependencies on AIA apart from that. Thus, Ssaia must be disabled
for the guest if AIA patches are not available. Here is the example command.
```
./lkvm-static run -m 256 -c2 --console serial -p "console=ttyS0 earlycon" --disable-ssaia -k ./Image --debug
```
The series has been tested only in Qemu.
Here is the snippet of the perf running inside a kvm guest.
===================================================
$ perf record -e cycles -e instructions perf bench sched messaging -g 5
...
$ Running 'sched/messaging' benchmark:
...
[ 45.928723] perf_duration_warn: 2 callbacks suppressed
[ 45.929000] perf: interrupt took too long (484426 > 483186), lowering kernel.perf_event_max_sample_rate to 250
$ 20 sender and receiver processes per group
$ 5 groups == 200 processes run
Total time: 14.220 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.117 MB perf.data (1942 samples) ]
$ perf report --stdio
$ To display the perf.data header info, please use --header/--header-only optio>
$
$
$ Total Lost Samples: 0
$
$ Samples: 943 of event 'cycles'
$ Event count (approx.): 5128976844
$
$ Overhead Command Shared Object Symbol >
$ ........ ............... ........................... .....................>
$
7.59% sched-messaging [kernel.kallsyms] [k] memcpy
5.48% sched-messaging [kernel.kallsyms] [k] percpu_counter_ad>
5.24% sched-messaging [kernel.kallsyms] [k] __sbi_rfence_v02_>
4.00% sched-messaging [kernel.kallsyms] [k] _raw_spin_unlock_>
3.79% sched-messaging [kernel.kallsyms] [k] set_pte_range
3.72% sched-messaging [kernel.kallsyms] [k] next_uptodate_fol>
3.46% sched-messaging [kernel.kallsyms] [k] filemap_map_pages
3.31% sched-messaging [kernel.kallsyms] [k] handle_mm_fault
3.20% sched-messaging [kernel.kallsyms] [k] finish_task_switc>
3.16% sched-messaging [kernel.kallsyms] [k] clear_page
3.03% sched-messaging [kernel.kallsyms] [k] mtree_range_walk
2.42% sched-messaging [kernel.kallsyms] [k] flush_icache_pte
===================================================
[1] https://github.com/riscv-non-isa/riscv-sbi-doc
Changes from v3->v4:
1. Added selftests.
2. Fixed an issue to clear the interrupt pending bits.
3. Fixed the counter index in snapshot memory start function.
Changes from v2->v3:
1. Fixed a patchwork warning on patch6.
2. Fixed a comment formatting & nit fix in PATCH 3 & 5.
3. Moved the hvien update and sscofpmf enabling to PATCH 9 from PATCH 8.
Changes from v1->v2:
1. Fixed warning/errors from patchwork CI.
2. Rebased on top of kvm-next.
3. Added Acked-by tags.
Changes from RFC->v1:
1. Addressed all the comments on RFC series.
2. Removed PATCH2 and merged into later patches.
3. Added 2 more patches for minor fixes.
4. Fixed KVM boot issue without Ssaia and made sscofpmf in guest dependent on
Ssaia in the host.
Atish Patra (15):
RISC-V: Fix the typo in Scountovf CSR name
RISC-V: Add FIRMWARE_READ_HI definition
drivers/perf: riscv: Read upper bits of a firmware counter
RISC-V: Add SBI PMU snapshot definitions
drivers/perf: riscv: Implement SBI PMU snapshot function
RISC-V: KVM: No need to update the counter value during reset
RISC-V: KVM: No need to exit to the user space if perf event failed
RISC-V: KVM: Implement SBI PMU Snapshot feature
RISC-V: KVM: Add perf sampling support for guests
RISC-V: KVM: Support 64 bit firmware counters on RV32
KVM: riscv: selftests: Add Sscofpmf to get-reg-list test
KVM: riscv: selftests: Add SBI PMU extension definitions
KVM: riscv: selftests: Add SBI PMU selftest
KVM: riscv: selftests: Add a test for PMU snapshot functionality
KVM: riscv: selftests: Add a test for counter overflow
arch/riscv/include/asm/csr.h | 5 +-
arch/riscv/include/asm/errata_list.h | 2 +-
arch/riscv/include/asm/kvm_vcpu_pmu.h | 14 +-
arch/riscv/include/asm/sbi.h | 12 +
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kvm/aia.c | 5 +
arch/riscv/kvm/vcpu.c | 14 +-
arch/riscv/kvm/vcpu_onereg.c | 9 +-
arch/riscv/kvm/vcpu_pmu.c | 247 +++++++-
arch/riscv/kvm/vcpu_sbi_pmu.c | 15 +-
drivers/perf/riscv_pmu.c | 1 +
drivers/perf/riscv_pmu_sbi.c | 229 ++++++-
include/linux/perf/riscv_pmu.h | 6 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/riscv/processor.h | 92 +++
.../selftests/kvm/lib/riscv/processor.c | 12 +
.../selftests/kvm/riscv/get-reg-list.c | 4 +
tools/testing/selftests/kvm/riscv/sbi_pmu.c | 588 ++++++++++++++++++
18 files changed, 1212 insertions(+), 45 deletions(-)
create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu.c
--
2.34.1