This series implements SBI PMU improvements done in SBI v2.0[1] i.e. PMU snapshot
and fw_read_hi() functions.
SBI v2.0 introduced PMU snapshot feature which allows the SBI implementation
to provide counter information (i.e. values/overflow status) via a shared
memory between the SBI implementation and supervisor OS. This allows to minimize
the number of traps in when perf being used inside a kvm guest as it relies on
SBI PMU + trap/emulation of the counters.
The current set of ratified RISC-V specification also doesn't allow scountovf
to be trap/emulated by the hypervisor. The SBI PMU snapshot bridges the gap
in ISA as well and enables perf sampling in the guest. However, LCOFI in the
guest only works via IRQ filtering in AIA specification. That's why, AIA
has to be enabled in the hardware (at least the Ssaia extension) in order to
use the sampling support in the perf.
Here are the patch wise implementation details.
PATCH 1,6,7 : Generic cleanups/improvements.
PATCH 2,3,10 : FW_READ_HI function implementation
PATCH 4-5: Add PMU snapshot feature in sbi pmu driver
PATCH 6-7: KVM implementation for snapshot and sampling in kvm guests
PATCH 11-15: KVM selftests for SBI PMU extension
The series is based on kvm-next and is available at:
https://github.com/atishp04/linux/tree/kvm_pmu_snapshot_v4
The series is based on kvm-riscv/queue branch + fixes suggested on the following
series
https://patchwork.kernel.org/project/kvm/cover/cover.1705916069.git.haibo1.…
The kvmtool patch is also available at:
https://github.com/atishp04/kvmtool/tree/sscofpmf
It also requires Ssaia ISA extension to be present in the hardware in order to
get perf sampling support in the guest. In Qemu virt machine, it can be done
by the following config.
```
-cpu rv64,sscofpmf=true,x-ssaia=true
```
There is no other dependencies on AIA apart from that. Thus, Ssaia must be disabled
for the guest if AIA patches are not available. Here is the example command.
```
./lkvm-static run -m 256 -c2 --console serial -p "console=ttyS0 earlycon" --disable-ssaia -k ./Image --debug
```
The series has been tested only in Qemu.
Here is the snippet of the perf running inside a kvm guest.
===================================================
$ perf record -e cycles -e instructions perf bench sched messaging -g 5
...
$ Running 'sched/messaging' benchmark:
...
[ 45.928723] perf_duration_warn: 2 callbacks suppressed
[ 45.929000] perf: interrupt took too long (484426 > 483186), lowering kernel.perf_event_max_sample_rate to 250
$ 20 sender and receiver processes per group
$ 5 groups == 200 processes run
Total time: 14.220 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.117 MB perf.data (1942 samples) ]
$ perf report --stdio
$ To display the perf.data header info, please use --header/--header-only optio>
$
$
$ Total Lost Samples: 0
$
$ Samples: 943 of event 'cycles'
$ Event count (approx.): 5128976844
$
$ Overhead Command Shared Object Symbol >
$ ........ ............... ........................... .....................>
$
7.59% sched-messaging [kernel.kallsyms] [k] memcpy
5.48% sched-messaging [kernel.kallsyms] [k] percpu_counter_ad>
5.24% sched-messaging [kernel.kallsyms] [k] __sbi_rfence_v02_>
4.00% sched-messaging [kernel.kallsyms] [k] _raw_spin_unlock_>
3.79% sched-messaging [kernel.kallsyms] [k] set_pte_range
3.72% sched-messaging [kernel.kallsyms] [k] next_uptodate_fol>
3.46% sched-messaging [kernel.kallsyms] [k] filemap_map_pages
3.31% sched-messaging [kernel.kallsyms] [k] handle_mm_fault
3.20% sched-messaging [kernel.kallsyms] [k] finish_task_switc>
3.16% sched-messaging [kernel.kallsyms] [k] clear_page
3.03% sched-messaging [kernel.kallsyms] [k] mtree_range_walk
2.42% sched-messaging [kernel.kallsyms] [k] flush_icache_pte
===================================================
[1] https://github.com/riscv-non-isa/riscv-sbi-doc
Changes from v3->v4:
1. Added selftests.
2. Fixed an issue to clear the interrupt pending bits.
3. Fixed the counter index in snapshot memory start function.
Changes from v2->v3:
1. Fixed a patchwork warning on patch6.
2. Fixed a comment formatting & nit fix in PATCH 3 & 5.
3. Moved the hvien update and sscofpmf enabling to PATCH 9 from PATCH 8.
Changes from v1->v2:
1. Fixed warning/errors from patchwork CI.
2. Rebased on top of kvm-next.
3. Added Acked-by tags.
Changes from RFC->v1:
1. Addressed all the comments on RFC series.
2. Removed PATCH2 and merged into later patches.
3. Added 2 more patches for minor fixes.
4. Fixed KVM boot issue without Ssaia and made sscofpmf in guest dependent on
Ssaia in the host.
Atish Patra (15):
RISC-V: Fix the typo in Scountovf CSR name
RISC-V: Add FIRMWARE_READ_HI definition
drivers/perf: riscv: Read upper bits of a firmware counter
RISC-V: Add SBI PMU snapshot definitions
drivers/perf: riscv: Implement SBI PMU snapshot function
RISC-V: KVM: No need to update the counter value during reset
RISC-V: KVM: No need to exit to the user space if perf event failed
RISC-V: KVM: Implement SBI PMU Snapshot feature
RISC-V: KVM: Add perf sampling support for guests
RISC-V: KVM: Support 64 bit firmware counters on RV32
KVM: riscv: selftests: Add Sscofpmf to get-reg-list test
KVM: riscv: selftests: Add SBI PMU extension definitions
KVM: riscv: selftests: Add SBI PMU selftest
KVM: riscv: selftests: Add a test for PMU snapshot functionality
KVM: riscv: selftests: Add a test for counter overflow
arch/riscv/include/asm/csr.h | 5 +-
arch/riscv/include/asm/errata_list.h | 2 +-
arch/riscv/include/asm/kvm_vcpu_pmu.h | 14 +-
arch/riscv/include/asm/sbi.h | 12 +
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kvm/aia.c | 5 +
arch/riscv/kvm/vcpu.c | 14 +-
arch/riscv/kvm/vcpu_onereg.c | 9 +-
arch/riscv/kvm/vcpu_pmu.c | 247 +++++++-
arch/riscv/kvm/vcpu_sbi_pmu.c | 15 +-
drivers/perf/riscv_pmu.c | 1 +
drivers/perf/riscv_pmu_sbi.c | 229 ++++++-
include/linux/perf/riscv_pmu.h | 6 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/riscv/processor.h | 92 +++
.../selftests/kvm/lib/riscv/processor.c | 12 +
.../selftests/kvm/riscv/get-reg-list.c | 4 +
tools/testing/selftests/kvm/riscv/sbi_pmu.c | 588 ++++++++++++++++++
18 files changed, 1212 insertions(+), 45 deletions(-)
create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu.c
--
2.34.1
Currently, the sud_test expects the emulated syscall to return the
emulated syscall number. This assumption only works on architectures
were the syscall calling convention use the same register for syscall
number/syscall return value. This is not the case for RISC-V and thus
the return value must be also emulated using the provided ucontext.
Signed-off-by: Clément Léger <cleger(a)rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer(a)rivosinc.com>
Acked-by: Palmer Dabbelt <palmer(a)rivosinc.com>
---
Changes in V2:
- Changes comment to be more explicit
- Use A7 syscall arg rather than hardcoding MAGIC_SYSCALL_1
---
.../selftests/syscall_user_dispatch/sud_test.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/tools/testing/selftests/syscall_user_dispatch/sud_test.c b/tools/testing/selftests/syscall_user_dispatch/sud_test.c
index b5d592d4099e..d975a6767329 100644
--- a/tools/testing/selftests/syscall_user_dispatch/sud_test.c
+++ b/tools/testing/selftests/syscall_user_dispatch/sud_test.c
@@ -158,6 +158,20 @@ static void handle_sigsys(int sig, siginfo_t *info, void *ucontext)
/* In preparation for sigreturn. */
SYSCALL_DISPATCH_OFF(glob_sel);
+
+ /*
+ * The tests for argument handling assume that `syscall(x) == x`. This
+ * is a NOP on x86 because the syscall number is passed in %rax, which
+ * happens to also be the function ABI return register. Other
+ * architectures may need to swizzle the arguments around.
+ */
+#if defined(__riscv)
+/* REG_A7 is not defined in libc headers */
+# define REG_A7 (REG_A0 + 7)
+
+ ((ucontext_t *)ucontext)->uc_mcontext.__gregs[REG_A0] =
+ ((ucontext_t *)ucontext)->uc_mcontext.__gregs[REG_A7];
+#endif
}
TEST(dispatch_and_return)
--
2.43.0
Cleaning up after tests is implemented separately for individual tests
and called at the end of each test execution. Since these functions are
very similar and a more generalized test framework was introduced a
function pointer in the resctrl_test struct can be used to reduce the
amount of function calls.
These functions are also all called in the ctrl-c handler because the
handler isn't aware which test is currently running. Since the handler
is implemented with a sigaction no function parameters can be passed
there but information about what test is currently running can be passed
with a global variable.
Series applies cleanly on top of kselftests/next.
Changelog v5:
- Rebase onto kselftests/next.
- Add Reinette's reviewed-by tag.
Changelog v4:
- Check current_test pointer and reset it in signal unregistering.
- Move cleanup call to test_cleanup function.
Changelog v3:
- Make current_test static.
- Add callback NULL check to the ctrl-c handler.
Changelog v2:
- Make current_test a const pointer limited in scope to resctrl_val
file.
- Remove tests_cleanup from resctrl.h.
- Cleanup 'goto out' path and labels in individual test functions.
Older versions of this series:
[v1] https://lore.kernel.org/all/cover.1708434017.git.maciej.wieczor-retman@inte…
[v2] https://lore.kernel.org/all/cover.1708596015.git.maciej.wieczor-retman@inte…
[v3] https://lore.kernel.org/all/cover.1708599491.git.maciej.wieczor-retman@inte…
[v4] https://lore.kernel.org/all/cover.1708949785.git.maciej.wieczor-retman@inte…
Maciej Wieczor-Retman (3):
selftests/resctrl: Add cleanup function to test framework
selftests/resctrl: Simplify cleanup in ctrl-c handler
selftests/resctrl: Move cleanups out of individual tests
tools/testing/selftests/resctrl/cat_test.c | 8 +++-----
tools/testing/selftests/resctrl/cmt_test.c | 4 ++--
tools/testing/selftests/resctrl/mba_test.c | 8 +++-----
tools/testing/selftests/resctrl/mbm_test.c | 8 +++-----
tools/testing/selftests/resctrl/resctrl.h | 9 +++------
.../testing/selftests/resctrl/resctrl_tests.c | 20 +++++++------------
tools/testing/selftests/resctrl/resctrl_val.c | 8 ++++++--
7 files changed, 27 insertions(+), 38 deletions(-)
--
2.43.2
The test is inspired by the pmu_event_filter_test which implemented by x86. On
the arm64 platform, there is the same ability to set the pmu_event_filter
through the KVM_ARM_VCPU_PMU_V3_FILTER attribute. So add the test for arm64.
The series first create the helper function which can be used
for the vpmu related tests. Then, it implement the test.
Changelog:
----------
v4->v5:
- Rebased to v6.8-rc6.
- Refactor the helper function, make it fine-grained and easy to be used.
- Namimg improvements.
- Use the kvm_device_attr_set() helper.
- Make the test descriptor array readable and clean.
- Delete the patch which moves the pmu related helper to vpmu.h.
- Remove the kvm_supports_pmu_event_filter() function since nobody will run
this on a old kernel.
v3->v4:
- Rebased to the v6.8-rc2.
v2->v3:
- Check the pmceid in guest code instead of pmu event count since different
hardware may have different event count result, check pmceid makes it stable
on different platform. [Eric]
- Some typo fixed and commit message improved.
v1->v2:
- Improve the commit message. [Eric]
- Fix the bug in [enable|disable]_counter. [Raghavendra & Marc]
- Add the check if kvm has attr KVM_ARM_VCPU_PMU_V3_FILTER.
- Add if host pmu support the test event throught pmceid0.
- Split the test_invalid_filter() to another patch. [Eric]
v1: https://lore.kernel.org/all/20231123063750.2176250-1-shahuang@redhat.com/
v2: https://lore.kernel.org/all/20231129072712.2667337-1-shahuang@redhat.com/
v3: https://lore.kernel.org/all/20240116060129.55473-1-shahuang@redhat.com/
v4: https://lore.kernel.org/all/20240202025659.5065-1-shahuang@redhat.com/
Shaoqin Huang (3):
KVM: selftests: aarch64: Add helper function for the vpmu vcpu
creation
KVM: selftests: aarch64: Introduce pmu_event_filter_test
KVM: selftests: aarch64: Add invalid filter test in
pmu_event_filter_test
tools/testing/selftests/kvm/Makefile | 1 +
.../kvm/aarch64/pmu_event_filter_test.c | 325 ++++++++++++++++++
.../kvm/aarch64/vpmu_counter_access.c | 33 +-
.../selftests/kvm/include/aarch64/vpmu.h | 29 ++
4 files changed, 362 insertions(+), 26 deletions(-)
create mode 100644 tools/testing/selftests/kvm/aarch64/pmu_event_filter_test.c
create mode 100644 tools/testing/selftests/kvm/include/aarch64/vpmu.h
--
2.40.1
PASID (Process Address Space ID) is a PCIe extension to tag the DMA
transactions out of a physical device, and most modern IOMMU hardware
have supported PASID granular address translation. So a PASID-capable
device can be attached to multiple hwpts (a.k.a. domains), each attachment
is tagged with a pasid.
This series first adds a missing iommu API to replace domain for a pasid,
then adds iommufd APIs for device drivers to attach/replace/detach pasid
to/from hwpt per userspace's request, and adds selftest to validate the
iommufd APIs.
pasid attach/replace is mandatory on Intel VT-d given the PASID table
locates in the physical address space hence must be managed by the kernel,
both for supporting vSVA and coming SIOV. But it's optional on ARM/AMD
which allow configuring the PASID/CD table either in host physical address
space or nested on top of an GPA address space. This series only add VT-d
support as the minimal requirement.
Complete code can be found in below link:
https://github.com/yiliu1765/iommufd/tree/iommufd_pasid
Change log:
v1:
- Implemnet iommu_replace_device_pasid() to fall back to the original domain
if this replacement failed (Kevin)
- Add check in do_attach() to check corressponding attach_fn per the pasid value.
rfc: https://lore.kernel.org/linux-iommu/20230926092651.17041-1-yi.l.liu@intel.c…
Regards,
Yi Liu
Kevin Tian (1):
iommufd: Support attach/replace hwpt per pasid
Lu Baolu (2):
iommu: Introduce a replace API for device pasid
iommu/vt-d: Add set_dev_pasid callback for nested domain
Yi Liu (5):
iommufd: replace attach_fn with a structure
iommufd/selftest: Add set_dev_pasid and remove_dev_pasid in mock iommu
iommufd/selftest: Add a helper to get test device
iommufd/selftest: Add test ops to test pasid attach/detach
iommufd/selftest: Add coverage for iommufd pasid attach/detach
drivers/iommu/intel/nested.c | 47 +++++
drivers/iommu/iommu-priv.h | 2 +
drivers/iommu/iommu.c | 82 ++++++--
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/device.c | 50 +++--
drivers/iommu/iommufd/iommufd_private.h | 23 +++
drivers/iommu/iommufd/iommufd_test.h | 24 +++
drivers/iommu/iommufd/pasid.c | 138 ++++++++++++++
drivers/iommu/iommufd/selftest.c | 176 ++++++++++++++++--
include/linux/iommufd.h | 6 +
tools/testing/selftests/iommu/iommufd.c | 172 +++++++++++++++++
.../selftests/iommu/iommufd_fail_nth.c | 28 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 78 ++++++++
13 files changed, 785 insertions(+), 42 deletions(-)
create mode 100644 drivers/iommu/iommufd/pasid.c
--
2.34.1
On riscv, mmap currently returns an address from the largest address
space that can fit entirely inside of the hint address. This makes it
such that the hint address is almost never returned. This patch raises
the mappable area up to and including the hint address. This allows mmap
to often return the hint address, which allows a performance improvement
over searching for a valid address as well as making the behavior more
similar to other architectures.
Note that a previous patch introduced stronger semantics compared to
other architectures for riscv mmap. On riscv, mmap will not use bits in
the upper bits of the virtual address depending on the hint address. On
other architectures, a random address is returned in the address space
requested. On all architectures the hint address will be returned if it
is available. This allows riscv applications to configure how many bits
in the virtual address should be left empty. This has the two benefits
of being able to request address spaces that are smaller than the
default and doesn't require the application to know the page table
layout of riscv.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
---
Changes in v2:
- Add back forgotten "mmap_end = STACK_TOP_MAX"
- Link to v1: https://lore.kernel.org/r/20240129-use_mmap_hint_address-v1-0-4c74da813ba1@…
---
Charlie Jenkins (3):
riscv: mm: Use hint address in mmap if available
selftests: riscv: Generalize mm selftests
docs: riscv: Define behavior of mmap
Documentation/arch/riscv/vm-layout.rst | 16 ++--
arch/riscv/include/asm/processor.h | 22 +++---
tools/testing/selftests/riscv/mm/mmap_bottomup.c | 20 +----
tools/testing/selftests/riscv/mm/mmap_default.c | 20 +----
tools/testing/selftests/riscv/mm/mmap_test.h | 93 +++++++++++++-----------
5 files changed, 67 insertions(+), 104 deletions(-)
---
base-commit: 556e2d17cae620d549c5474b1ece053430cd50bc
change-id: 20240119-use_mmap_hint_address-f9f4b1b6f5f1
--
- Charlie