This series creates a new PMU scheme on ARM, a partitioned PMU that allows reserving a subset of counters for more direct guest access, significantly reducing overhead. More details, including performance benchmarks, can be read in the v1 cover letter linked below.
An overview of what this series accomplishes was presented at KVM Forum 2025. Slides [1] and video [2] are linked below.
The long duration between v4 and v5 is due to time spent on this project being monopolized preparing this feature for internal production. As a result, there are too many improvements to fully list here, but I will cover the notable ones.
v5:
* Rebase onto v6.18-rc7. This required pulling some reorganization patches from Anish and Sean that were dependencies from previous versions based on kvm/queue but never made it to upstream.
* Ensure FGTs (fine-grained traps) are correctly programmed at vCPU load using kvm_vcpu_load_fgt() and helpers introduced by Oliver Upton.
* Cleanly separate concerns of whether the partitioned PMU is enabled for the guest and whether FGT should be enabled. This allows that the capability can be VM-scoped while the implementation detail of whether FGT and context switching are in effect can remain vCPU-scoped.
* Shrink the uAPI change. Instead of a cap and corresponding ioctl, the feature can be controlled by just a cap with an argument. The cap is now also VM-scoped and enforces ordering that it should be decided before vCPUs are created. Whether the cap is enabled is now tracked by the new flag KVM_ARCH_ARM_PARTITIONED_PMU_ENABLED.
* Improve log messages when partitioning in the PMUv3 driver.
* Introduce a global variable armv8pmu_hpmn_max in the PMUv3 driver so KVM code can read if a value was set before the PMU is probed. This is needed to properly test if we have the capability before vCPUs are created.
* Make it possible for a VMM to filter the HPMN0 feature bit.
* Fix event filter problems with PMEVTYPER handling in writethrough_pmevtyper() and kvm_pmu_apply_event_filter() by using kvm_pmu_event_mask() in the right spots. And if an event is filtered, write the physical register with the appropriate exclude bits set but keep the virtual register exactly what the guest wrote.
* Fix register access problems with the PMU register fast path handler by lifting some static PMU access checks from sys_regs.c to use them in the fast path too and make bit masking more strict for better ARM compliance.
* Fix the readability and logic of programming the MDCR_EL2 register when entering the guest. Make sure to set the HPME bit to allow host counters to count guest events. Set TPM and TPMCR by default and clear them if partitioning is enabled rather than the previous inverted logic of leaving them clear and setting them if partitioning is not enabled. Make the HPMN field computation more clear.
* As part of lazy context switching, do a load when the guest is switching to physical access to ensure any previous writes that only reached the virtual registers reach the physical ones as well and are not clobbered by the next vcpu_put().
* Other fixes and improvements that are too small to mention or left out from my personal notes.
v4: https://lore.kernel.org/kvmarm/20250714225917.1396543-1-coltonlewis@google.c...
v3: https://lore.kernel.org/kvm/20250626200459.1153955-1-coltonlewis@google.com/
v2: https://lore.kernel.org/kvm/20250620221326.1261128-1-coltonlewis@google.com/
v1: https://lore.kernel.org/kvm/20250602192702.2125115-1-coltonlewis@google.com/
[1] https://gitlab.com/qemu-project/kvm-forum/-/raw/main/_attachments/2025/Optim... [2] https://www.youtube.com/watch?v=YRzZ8jMIA6M&list=PLW3ep1uCIRfxwmllXTOA2t...
Anish Ghulati (1): KVM: arm64: Move arm_{psci,hypercalls}.h to an internal KVM path
Colton Lewis (20): arm64: cpufeature: Add cpucap for HPMN0 KVM: arm64: Reorganize PMU functions perf: arm_pmuv3: Introduce method to partition the PMU perf: arm_pmuv3: Generalize counter bitmasks perf: arm_pmuv3: Keep out of guest counter partition KVM: arm64: Set up FGT for Partitioned PMU KVM: arm64: Writethrough trapped PMEVTYPER register KVM: arm64: Use physical PMSELR for PMXEVTYPER if partitioned KVM: arm64: Writethrough trapped PMOVS register KVM: arm64: Write fast path PMU register handlers KVM: arm64: Setup MDCR_EL2 to handle a partitioned PMU KVM: arm64: Account for partitioning in PMCR_EL0 access KVM: arm64: Context swap Partitioned PMU guest registers KVM: arm64: Enforce PMU event filter at vcpu_load() KVM: arm64: Implement lazy PMU context swaps perf: arm_pmuv3: Handle IRQs for Partitioned PMU guest counters KVM: arm64: Inject recorded guest interrupts KVM: arm64: Add KVM_CAP to partition the PMU KVM: selftests: Add find_bit to KVM library KVM: arm64: selftests: Add test case for partitioned PMU
Marc Zyngier (1): KVM: arm64: Reorganize PMU includes
Sean Christopherson (2): KVM: arm64: Include KVM headers to get forward declarations KVM: arm64: Move ARM specific headers in include/kvm to arch directory
Documentation/virt/kvm/api.rst | 24 + arch/arm/include/asm/arm_pmuv3.h | 28 + arch/arm64/include/asm/arm_pmuv3.h | 61 +- .../arm64/include/asm/kvm_arch_timer.h | 2 + arch/arm64/include/asm/kvm_host.h | 24 +- .../arm64/include/asm/kvm_pmu.h | 142 ++++ arch/arm64/include/asm/kvm_types.h | 7 +- .../arm64/include/asm/kvm_vgic.h | 0 arch/arm64/kernel/cpufeature.c | 8 + arch/arm64/kvm/Makefile | 2 +- arch/arm64/kvm/arch_timer.c | 5 +- arch/arm64/kvm/arm.c | 23 +- {include => arch/arm64}/kvm/arm_hypercalls.h | 0 {include => arch/arm64}/kvm/arm_psci.h | 0 arch/arm64/kvm/config.c | 34 +- arch/arm64/kvm/debug.c | 31 +- arch/arm64/kvm/guest.c | 2 +- arch/arm64/kvm/handle_exit.c | 2 +- arch/arm64/kvm/hyp/Makefile | 6 +- arch/arm64/kvm/hyp/include/hyp/switch.h | 211 ++++- arch/arm64/kvm/hyp/nvhe/switch.c | 4 +- arch/arm64/kvm/hyp/vhe/switch.c | 4 +- arch/arm64/kvm/hypercalls.c | 4 +- arch/arm64/kvm/pmu-direct.c | 464 +++++++++++ arch/arm64/kvm/pmu-emul.c | 678 +--------------- arch/arm64/kvm/pmu.c | 726 ++++++++++++++++++ arch/arm64/kvm/psci.c | 4 +- arch/arm64/kvm/pvtime.c | 2 +- arch/arm64/kvm/reset.c | 3 +- arch/arm64/kvm/sys_regs.c | 110 +-- arch/arm64/kvm/trace_arm.h | 2 +- arch/arm64/kvm/trng.c | 2 +- arch/arm64/kvm/vgic/vgic-debug.c | 2 +- arch/arm64/kvm/vgic/vgic-init.c | 2 +- arch/arm64/kvm/vgic/vgic-irqfd.c | 2 +- arch/arm64/kvm/vgic/vgic-kvm-device.c | 2 +- arch/arm64/kvm/vgic/vgic-mmio-v2.c | 2 +- arch/arm64/kvm/vgic/vgic-mmio-v3.c | 2 +- arch/arm64/kvm/vgic/vgic-mmio.c | 4 +- arch/arm64/kvm/vgic/vgic-v2.c | 2 +- arch/arm64/kvm/vgic/vgic-v3-nested.c | 3 +- arch/arm64/kvm/vgic/vgic-v3.c | 2 +- arch/arm64/kvm/vgic/vgic-v5.c | 2 +- arch/arm64/tools/cpucaps | 1 + arch/arm64/tools/sysreg | 6 +- drivers/perf/arm_pmuv3.c | 137 +++- include/linux/perf/arm_pmu.h | 1 + include/linux/perf/arm_pmuv3.h | 14 +- include/uapi/linux/kvm.h | 1 + tools/include/uapi/linux/kvm.h | 1 + tools/testing/selftests/kvm/Makefile.kvm | 1 + .../selftests/kvm/arm64/vpmu_counter_access.c | 77 +- tools/testing/selftests/kvm/lib/find_bit.c | 1 + 53 files changed, 2049 insertions(+), 831 deletions(-) rename include/kvm/arm_arch_timer.h => arch/arm64/include/asm/kvm_arch_timer.h (98%) rename include/kvm/arm_pmu.h => arch/arm64/include/asm/kvm_pmu.h (61%) rename include/kvm/arm_vgic.h => arch/arm64/include/asm/kvm_vgic.h (100%) rename {include => arch/arm64}/kvm/arm_hypercalls.h (100%) rename {include => arch/arm64}/kvm/arm_psci.h (100%) create mode 100644 arch/arm64/kvm/pmu-direct.c create mode 100644 tools/testing/selftests/kvm/lib/find_bit.c
base-commit: ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d -- 2.52.0.239.gd5f0c6e74e-goog
Add a capability for FEAT_HPMN0, whether MDCR_EL2.HPMN can specify 0 counters reserved for the guest.
This required changing HPMN0 to an UnsignedEnum in tools/sysreg because otherwise not all the appropriate macros are generated to add it to arm64_cpu_capabilities_arm64_features.
Acked-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/kernel/cpufeature.c | 8 ++++++++ arch/arm64/kvm/sys_regs.c | 3 ++- arch/arm64/tools/cpucaps | 1 + arch/arm64/tools/sysreg | 6 +++--- 4 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index e25b0f84a22da..ceddc55eb30a0 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -555,6 +555,7 @@ static const struct arm64_ftr_bits ftr_id_mmfr0[] = { };
static const struct arm64_ftr_bits ftr_id_aa64dfr0[] = { + ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_HPMN0_SHIFT, 4, 0), S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_DoubleLock_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_PMSVer_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_CTX_CMPs_SHIFT, 4, 0), @@ -2898,6 +2899,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature, ARM64_CPUID_FIELDS(ID_AA64MMFR0_EL1, FGT, FGT2) }, + { + .desc = "HPMN0", + .type = ARM64_CPUCAP_SYSTEM_FEATURE, + .capability = ARM64_HAS_HPMN0, + .matches = has_cpuid_feature, + ARM64_CPUID_FIELDS(ID_AA64DFR0_EL1, HPMN0, IMP) + }, #ifdef CONFIG_ARM64_SME { .desc = "Scalable Matrix Extension", diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index ec3fbe0b8d525..c636840b1f6f9 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -3214,7 +3214,8 @@ static const struct sys_reg_desc sys_reg_descs[] = { ID_AA64DFR0_EL1_DoubleLock_MASK | ID_AA64DFR0_EL1_WRPs_MASK | ID_AA64DFR0_EL1_PMUVer_MASK | - ID_AA64DFR0_EL1_DebugVer_MASK), + ID_AA64DFR0_EL1_DebugVer_MASK | + ID_AA64DFR0_EL1_HPMN0_MASK), ID_SANITISED(ID_AA64DFR1_EL1), ID_UNALLOCATED(5,2), ID_UNALLOCATED(5,3), diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps index 1b32c1232d28d..8efa6a437515d 100644 --- a/arch/arm64/tools/cpucaps +++ b/arch/arm64/tools/cpucaps @@ -41,6 +41,7 @@ HAS_GICV5_LEGACY HAS_GIC_PRIO_MASKING HAS_GIC_PRIO_RELAXED_SYNC HAS_HCR_NV1 +HAS_HPMN0 HAS_HCX HAS_LDAPR HAS_LPA2 diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index 1c6cdf9d54bba..24d20138ea664 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -1666,9 +1666,9 @@ EndEnum EndSysreg
Sysreg ID_AA64DFR0_EL1 3 0 0 5 0 -Enum 63:60 HPMN0 - 0b0000 UNPREDICTABLE - 0b0001 DEF +UnsignedEnum 63:60 HPMN0 + 0b0000 NI + 0b0001 IMP EndEnum UnsignedEnum 59:56 ExtTrcBuff 0b0000 NI
From: Anish Ghulati aghulati@google.com
Move arm_hypercalls.h and arm_psci.h into arch/arm64/kvm now that KVM no longer supports 32-bit ARM, i.e. now that there's no reason to make the hypercall and PSCI APIs "public".
Signed-off-by: Anish Ghulati aghulati@google.com [sean: squash into one patch, write changelog] Signed-off-by: Sean Christopherson seanjc@google.com Message-ID: 20250611001042.170501-2-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com --- arch/arm64/kvm/arm.c | 5 +++-- {include => arch/arm64}/kvm/arm_hypercalls.h | 0 {include => arch/arm64}/kvm/arm_psci.h | 0 arch/arm64/kvm/guest.c | 2 +- arch/arm64/kvm/handle_exit.c | 2 +- arch/arm64/kvm/hyp/Makefile | 6 +++--- arch/arm64/kvm/hyp/include/hyp/switch.h | 4 ++-- arch/arm64/kvm/hyp/nvhe/switch.c | 4 ++-- arch/arm64/kvm/hyp/vhe/switch.c | 4 ++-- arch/arm64/kvm/hypercalls.c | 4 ++-- arch/arm64/kvm/psci.c | 4 ++-- arch/arm64/kvm/pvtime.c | 2 +- arch/arm64/kvm/trng.c | 2 +- 13 files changed, 20 insertions(+), 19 deletions(-) rename {include => arch/arm64}/kvm/arm_hypercalls.h (100%) rename {include => arch/arm64}/kvm/arm_psci.h (100%)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 052bf0d4d0b03..d1750d6058dfd 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -41,9 +41,10 @@ #include <asm/kvm_ptrauth.h> #include <asm/sections.h>
-#include <kvm/arm_hypercalls.h> #include <kvm/arm_pmu.h> -#include <kvm/arm_psci.h> + +#include "arm_hypercalls.h" +#include "arm_psci.h"
#include "sys_regs.h"
diff --git a/include/kvm/arm_hypercalls.h b/arch/arm64/kvm/arm_hypercalls.h similarity index 100% rename from include/kvm/arm_hypercalls.h rename to arch/arm64/kvm/arm_hypercalls.h diff --git a/include/kvm/arm_psci.h b/arch/arm64/kvm/arm_psci.h similarity index 100% rename from include/kvm/arm_psci.h rename to arch/arm64/kvm/arm_psci.h diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index 1c87699fd886e..863b351ae1221 100644 --- a/arch/arm64/kvm/guest.c +++ b/arch/arm64/kvm/guest.c @@ -18,7 +18,6 @@ #include <linux/string.h> #include <linux/vmalloc.h> #include <linux/fs.h> -#include <kvm/arm_hypercalls.h> #include <asm/cputype.h> #include <linux/uaccess.h> #include <asm/fpsimd.h> @@ -27,6 +26,7 @@ #include <asm/kvm_nested.h> #include <asm/sigcontext.h>
+#include "arm_hypercalls.h" #include "trace.h"
const struct _kvm_stats_desc kvm_vm_stats_desc[] = { diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c index cc7d5d1709cb8..66740520f2166 100644 --- a/arch/arm64/kvm/handle_exit.c +++ b/arch/arm64/kvm/handle_exit.c @@ -22,7 +22,7 @@ #include <asm/stacktrace/nvhe.h> #include <asm/traps.h>
-#include <kvm/arm_hypercalls.h> +#include "arm_hypercalls.h"
#define CREATE_TRACE_POINTS #include "trace_handle_exit.h" diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile index d61e44642f980..b1a4884446c69 100644 --- a/arch/arm64/kvm/hyp/Makefile +++ b/arch/arm64/kvm/hyp/Makefile @@ -3,8 +3,8 @@ # Makefile for Kernel-based Virtual Machine module, HYP part #
-incdir := $(src)/include -subdir-asflags-y := -I$(incdir) -subdir-ccflags-y := -I$(incdir) +hyp_includes := -I$(src)/include -I$(srctree)/arch/arm64/kvm +subdir-asflags-y := $(hyp_includes) +subdir-ccflags-y := $(hyp_includes)
obj-$(CONFIG_KVM) += vhe/ nvhe/ pgtable.o diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index c5d5e5b86eaf0..6e8050f260f34 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -16,8 +16,6 @@ #include <linux/jump_label.h> #include <uapi/linux/psci.h>
-#include <kvm/arm_psci.h> - #include <asm/barrier.h> #include <asm/cpufeature.h> #include <asm/extable.h> @@ -32,6 +30,8 @@ #include <asm/processor.h> #include <asm/traps.h>
+#include "arm_psci.h" + struct kvm_exception_table_entry { int insn, fixup; }; diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c index d3b9ec8a7c283..5d626308952ac 100644 --- a/arch/arm64/kvm/hyp/nvhe/switch.c +++ b/arch/arm64/kvm/hyp/nvhe/switch.c @@ -13,8 +13,6 @@ #include <linux/jump_label.h> #include <uapi/linux/psci.h>
-#include <kvm/arm_psci.h> - #include <asm/barrier.h> #include <asm/cpufeature.h> #include <asm/kprobes.h> @@ -28,6 +26,8 @@
#include <nvhe/mem_protect.h>
+#include "arm_psci.h" + /* Non-VHE specific context */ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data); DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt); diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c index 9984c492305a8..0039e501a3cb7 100644 --- a/arch/arm64/kvm/hyp/vhe/switch.c +++ b/arch/arm64/kvm/hyp/vhe/switch.c @@ -13,8 +13,6 @@ #include <linux/percpu.h> #include <uapi/linux/psci.h>
-#include <kvm/arm_psci.h> - #include <asm/barrier.h> #include <asm/cpufeature.h> #include <asm/kprobes.h> @@ -28,6 +26,8 @@ #include <asm/thread_info.h> #include <asm/vectors.h>
+#include "arm_psci.h" + /* VHE specific context */ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data); DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt); diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index 58c5fe7d75727..05331389081f8 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -6,8 +6,8 @@
#include <asm/kvm_emulate.h>
-#include <kvm/arm_hypercalls.h> -#include <kvm/arm_psci.h> +#include "arm_hypercalls.h" +#include "arm_psci.h"
#define KVM_ARM_SMCCC_STD_FEATURES \ GENMASK(KVM_REG_ARM_STD_BMAP_BIT_COUNT - 1, 0) diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c index 3b5dbe9a0a0ea..0566b59074978 100644 --- a/arch/arm64/kvm/psci.c +++ b/arch/arm64/kvm/psci.c @@ -13,8 +13,8 @@ #include <asm/cputype.h> #include <asm/kvm_emulate.h>
-#include <kvm/arm_psci.h> -#include <kvm/arm_hypercalls.h> +#include "arm_hypercalls.h" +#include "arm_psci.h"
/* * This is an implementation of the Power State Coordination Interface diff --git a/arch/arm64/kvm/pvtime.c b/arch/arm64/kvm/pvtime.c index 4ceabaa4c30bd..b07d250d223c0 100644 --- a/arch/arm64/kvm/pvtime.c +++ b/arch/arm64/kvm/pvtime.c @@ -8,7 +8,7 @@ #include <asm/kvm_mmu.h> #include <asm/pvclock-abi.h>
-#include <kvm/arm_hypercalls.h> +#include "arm_hypercalls.h"
void kvm_update_stolen_time(struct kvm_vcpu *vcpu) { diff --git a/arch/arm64/kvm/trng.c b/arch/arm64/kvm/trng.c index 99bdd7103c9c1..b5dc0f09797a3 100644 --- a/arch/arm64/kvm/trng.c +++ b/arch/arm64/kvm/trng.c @@ -6,7 +6,7 @@
#include <asm/kvm_emulate.h>
-#include <kvm/arm_hypercalls.h> +#include "arm_hypercalls.h"
#define ARM_SMCCC_TRNG_VERSION_1_0 0x10000UL
From: Sean Christopherson seanjc@google.com
Include include/uapi/linux/kvm.h and include/linux/kvm_types.h in ARM's public arm_arch_timer.h and arm_pmu.h headers to get forward declarations of things like "struct kvm_vcpu" and "struct kvm_device_attr", which are referenced but never declared (neither file includes *any* KVM headers).
The missing includes don't currently cause problems because of the order of includes in parent files, but that order is largely arbitrary and is subject to change, e.g. a future commit will move the ARM specific headers to arch/arm64/include/asm and reorder parent includes to maintain alphabetic ordering.
Reported-by: kernel test robot lkp@intel.com Signed-off-by: Sean Christopherson seanjc@google.com Message-ID: 20250611001042.170501-3-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com --- include/kvm/arm_arch_timer.h | 2 ++ include/kvm/arm_pmu.h | 2 ++ 2 files changed, 4 insertions(+)
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h index 7310841f45121..d55359e67c22c 100644 --- a/include/kvm/arm_arch_timer.h +++ b/include/kvm/arm_arch_timer.h @@ -7,6 +7,8 @@ #ifndef __ASM_ARM_KVM_ARCH_TIMER_H #define __ASM_ARM_KVM_ARCH_TIMER_H
+#include <linux/kvm.h> +#include <linux/kvm_types.h> #include <linux/clocksource.h> #include <linux/hrtimer.h>
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h index 96754b51b4116..baf028d19dfc9 100644 --- a/include/kvm/arm_pmu.h +++ b/include/kvm/arm_pmu.h @@ -7,6 +7,8 @@ #ifndef __ASM_ARM_KVM_PMU_H #define __ASM_ARM_KVM_PMU_H
+#include <linux/kvm.h> +#include <linux/kvm_types.h> #include <linux/perf_event.h> #include <linux/perf/arm_pmuv3.h>
From: Marc Zyngier maz@kernel.org
Including *all* of asm/kvm_host.h in asm/arm_pmuv3.h is a bad idea because that is much more than arm_pmuv3.h logically needs and creates a circular dependency that makes it easy to introduce compiler errors when editing this code.
asm/kvm_host.h includes asm/kvm_pmu.h includes perf/arm_pmuv3.h includes asm/arm_pmuv3.h includes asm/kvm_host.h
Reorganize the PMU includes to be more sane. In particular:
* Remove the circular dependency by removing the kvm_host.h include from asm/arm_pmuv3.h since 99% of it isn't needed.
* Move the remaining tiny bit of KVM/PMU interface from kvm_host.h into kvm_pmu.h
* Conditionally on ARM64, include the more targeted kvm_pmu.h directly in the arm_pmuv3.c driver.
Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/include/asm/arm_pmuv3.h | 2 -- arch/arm64/include/asm/kvm_host.h | 14 -------------- arch/arm64/include/asm/kvm_pmu.h | 15 +++++++++++++++ drivers/perf/arm_pmuv3.c | 5 +++++ 4 files changed, 20 insertions(+), 16 deletions(-)
diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h index 8a777dec8d88a..cf2b2212e00a2 100644 --- a/arch/arm64/include/asm/arm_pmuv3.h +++ b/arch/arm64/include/asm/arm_pmuv3.h @@ -6,8 +6,6 @@ #ifndef __ASM_PMUV3_H #define __ASM_PMUV3_H
-#include <asm/kvm_host.h> - #include <asm/cpufeature.h> #include <asm/sysreg.h>
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 7f19702eac2b9..c7e52aaf469dc 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -1410,25 +1410,11 @@ void kvm_arch_vcpu_ctxflush_fp(struct kvm_vcpu *vcpu); void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu); void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
-static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr) -{ - return (!has_vhe() && attr->exclude_host); -} - #ifdef CONFIG_KVM -void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr); -void kvm_clr_pmu_events(u64 clr); -bool kvm_set_pmuserenr(u64 val); void kvm_enable_trbe(void); void kvm_disable_trbe(void); void kvm_tracing_set_el1_configuration(u64 trfcr_while_in_guest); #else -static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {} -static inline void kvm_clr_pmu_events(u64 clr) {} -static inline bool kvm_set_pmuserenr(u64 val) -{ - return false; -} static inline void kvm_enable_trbe(void) {} static inline void kvm_disable_trbe(void) {} static inline void kvm_tracing_set_el1_configuration(u64 trfcr_while_in_guest) {} diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index baf028d19dfc9..ad3247b468388 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -11,9 +11,15 @@ #include <linux/kvm_types.h> #include <linux/perf_event.h> #include <linux/perf/arm_pmuv3.h> +#include <linux/perf/arm_pmu.h>
#define KVM_ARMV8_PMU_MAX_COUNTERS 32
+#define kvm_pmu_counter_deferred(attr) \ + ({ \ + !has_vhe() && (attr)->exclude_host; \ + }) + #if IS_ENABLED(CONFIG_HW_PERF_EVENTS) && IS_ENABLED(CONFIG_KVM) struct kvm_pmc { u8 idx; /* index into the pmu->pmc array */ @@ -68,6 +74,9 @@ int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu);
struct kvm_pmu_events *kvm_get_pmu_events(void); +void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr); +void kvm_clr_pmu_events(u64 clr); +bool kvm_set_pmuserenr(u64 val); void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu); void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu); void kvm_vcpu_pmu_resync_el0(void); @@ -161,6 +170,12 @@ static inline u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1)
#define kvm_vcpu_has_pmu(vcpu) ({ false; }) static inline void kvm_pmu_update_vcpu_events(struct kvm_vcpu *vcpu) {} +static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {} +static inline void kvm_clr_pmu_events(u64 clr) {} +static inline bool kvm_set_pmuserenr(u64 val) +{ + return false; +} static inline void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu) {} static inline void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu) {} static inline void kvm_vcpu_reload_pmu(struct kvm_vcpu *vcpu) {} diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 69c5cc8f56067..513122388b9da 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -9,6 +9,11 @@ */
#include <asm/irq_regs.h> + +#if defined(CONFIG_ARM64) +#include <asm/kvm_pmu.h> +#endif + #include <asm/perf_event.h> #include <asm/virt.h>
From: Sean Christopherson seanjc@google.com
Move kvm/arm_{arch_timer,pmu,vgic}.h to arch/arm64/include/asm and drop the "arm" prefix from all file names. Now that KVM no longer supports 32-bit ARM, there is no reason to expose ARM specific headers to other architectures beyond arm64.
Cc: Colton Lewis coltonlewis@google.com Signed-off-by: Sean Christopherson seanjc@google.com Message-ID: 20250611001042.170501-4-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com [Colton: applied header change to vgic-v5.c] Signed-off-by: Colton Lewis coltonlewis@google.com --- .../arm64/include/asm/kvm_arch_timer.h | 0 arch/arm64/include/asm/kvm_host.h | 7 +++---- include/kvm/arm_pmu.h => arch/arm64/include/asm/kvm_pmu.h | 0 .../kvm/arm_vgic.h => arch/arm64/include/asm/kvm_vgic.h | 0 arch/arm64/kvm/arch_timer.c | 5 ++--- arch/arm64/kvm/arm.c | 3 +-- arch/arm64/kvm/pmu-emul.c | 4 ++-- arch/arm64/kvm/reset.c | 3 +-- arch/arm64/kvm/trace_arm.h | 2 +- arch/arm64/kvm/vgic/vgic-debug.c | 2 +- arch/arm64/kvm/vgic/vgic-init.c | 2 +- arch/arm64/kvm/vgic/vgic-irqfd.c | 2 +- arch/arm64/kvm/vgic/vgic-kvm-device.c | 2 +- arch/arm64/kvm/vgic/vgic-mmio-v2.c | 2 +- arch/arm64/kvm/vgic/vgic-mmio-v3.c | 2 +- arch/arm64/kvm/vgic/vgic-mmio.c | 4 ++-- arch/arm64/kvm/vgic/vgic-v2.c | 2 +- arch/arm64/kvm/vgic/vgic-v3-nested.c | 3 +-- arch/arm64/kvm/vgic/vgic-v3.c | 2 +- arch/arm64/kvm/vgic/vgic-v5.c | 2 +- 20 files changed, 22 insertions(+), 27 deletions(-) rename include/kvm/arm_arch_timer.h => arch/arm64/include/asm/kvm_arch_timer.h (100%) rename include/kvm/arm_pmu.h => arch/arm64/include/asm/kvm_pmu.h (100%) rename include/kvm/arm_vgic.h => arch/arm64/include/asm/kvm_vgic.h (100%)
diff --git a/include/kvm/arm_arch_timer.h b/arch/arm64/include/asm/kvm_arch_timer.h similarity index 100% rename from include/kvm/arm_arch_timer.h rename to arch/arm64/include/asm/kvm_arch_timer.h diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 64302c438355c..7f19702eac2b9 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -26,17 +26,16 @@ #include <asm/daifflags.h> #include <asm/fpsimd.h> #include <asm/kvm.h> +#include <asm/kvm_arch_timer.h> #include <asm/kvm_asm.h> +#include <asm/kvm_pmu.h> +#include <asm/kvm_vgic.h> #include <asm/vncr_mapping.h>
#define __KVM_HAVE_ARCH_INTC_INITIALIZED
#define KVM_HALT_POLL_NS_DEFAULT 500000
-#include <kvm/arm_vgic.h> -#include <kvm/arm_arch_timer.h> -#include <kvm/arm_pmu.h> - #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
#define KVM_VCPU_MAX_FEATURES 9 diff --git a/include/kvm/arm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h similarity index 100% rename from include/kvm/arm_pmu.h rename to arch/arm64/include/asm/kvm_pmu.h diff --git a/include/kvm/arm_vgic.h b/arch/arm64/include/asm/kvm_vgic.h similarity index 100% rename from include/kvm/arm_vgic.h rename to arch/arm64/include/asm/kvm_vgic.h diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c index 3f675875abea2..ce62a12cf0e5c 100644 --- a/arch/arm64/kvm/arch_timer.c +++ b/arch/arm64/kvm/arch_timer.c @@ -14,12 +14,11 @@
#include <clocksource/arm_arch_timer.h> #include <asm/arch_timer.h> +#include <asm/kvm_arch_timer.h> #include <asm/kvm_emulate.h> #include <asm/kvm_hyp.h> #include <asm/kvm_nested.h> - -#include <kvm/arm_vgic.h> -#include <kvm/arm_arch_timer.h> +#include <asm/kvm_vgic.h>
#include "trace.h"
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index d1750d6058dfd..43e92f35f56ab 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -38,11 +38,10 @@ #include <asm/kvm_mmu.h> #include <asm/kvm_nested.h> #include <asm/kvm_pkvm.h> +#include <asm/kvm_pmu.h> #include <asm/kvm_ptrauth.h> #include <asm/sections.h>
-#include <kvm/arm_pmu.h> - #include "arm_hypercalls.h" #include "arm_psci.h"
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index b03dbda7f1ab9..dcdd80ffd49d5 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -12,8 +12,8 @@ #include <linux/perf/arm_pmu.h> #include <linux/uaccess.h> #include <asm/kvm_emulate.h> -#include <kvm/arm_pmu.h> -#include <kvm/arm_vgic.h> +#include <asm/kvm_pmu.h> +#include <asm/kvm_vgic.h>
#define PERF_ATTR_CFG1_COUNTER_64BIT BIT(0)
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 959532422d3a3..bae3676387419 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -17,12 +17,11 @@ #include <linux/string.h> #include <linux/types.h>
-#include <kvm/arm_arch_timer.h> - #include <asm/cpufeature.h> #include <asm/cputype.h> #include <asm/fpsimd.h> #include <asm/ptrace.h> +#include <asm/kvm_arch_timer.h> #include <asm/kvm_arm.h> #include <asm/kvm_asm.h> #include <asm/kvm_emulate.h> diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h index 9c60f6465c787..8fc8178e21a70 100644 --- a/arch/arm64/kvm/trace_arm.h +++ b/arch/arm64/kvm/trace_arm.h @@ -3,7 +3,7 @@ #define _TRACE_ARM_ARM64_KVM_H
#include <asm/kvm_emulate.h> -#include <kvm/arm_arch_timer.h> +#include <asm/kvm_arch_timer.h> #include <linux/tracepoint.h>
#undef TRACE_SYSTEM diff --git a/arch/arm64/kvm/vgic/vgic-debug.c b/arch/arm64/kvm/vgic/vgic-debug.c index bb92853d1fd3a..a67e5e5f44871 100644 --- a/arch/arm64/kvm/vgic/vgic-debug.c +++ b/arch/arm64/kvm/vgic/vgic-debug.c @@ -9,7 +9,7 @@ #include <linux/interrupt.h> #include <linux/kvm_host.h> #include <linux/seq_file.h> -#include <kvm/arm_vgic.h> +#include <asm/kvm_vgic.h> #include <asm/kvm_mmu.h> #include "vgic.h"
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c index da62edbc1205a..39ead7ec1b43a 100644 --- a/arch/arm64/kvm/vgic/vgic-init.c +++ b/arch/arm64/kvm/vgic/vgic-init.c @@ -7,7 +7,7 @@ #include <linux/interrupt.h> #include <linux/cpu.h> #include <linux/kvm_host.h> -#include <kvm/arm_vgic.h> +#include <asm/kvm_vgic.h> #include <asm/kvm_emulate.h> #include <asm/kvm_mmu.h> #include "vgic.h" diff --git a/arch/arm64/kvm/vgic/vgic-irqfd.c b/arch/arm64/kvm/vgic/vgic-irqfd.c index c314c016659ab..b73401c34f298 100644 --- a/arch/arm64/kvm/vgic/vgic-irqfd.c +++ b/arch/arm64/kvm/vgic/vgic-irqfd.c @@ -6,7 +6,7 @@ #include <linux/kvm.h> #include <linux/kvm_host.h> #include <trace/events/kvm.h> -#include <kvm/arm_vgic.h> +#include <asm/kvm_vgic.h> #include "vgic.h"
/* diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c index 3d1a776b716d7..39d96b52f773d 100644 --- a/arch/arm64/kvm/vgic/vgic-kvm-device.c +++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c @@ -7,7 +7,7 @@ */ #include <linux/irqchip/arm-gic-v3.h> #include <linux/kvm_host.h> -#include <kvm/arm_vgic.h> +#include <asm/kvm_vgic.h> #include <linux/uaccess.h> #include <asm/kvm_mmu.h> #include <asm/cputype.h> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v2.c b/arch/arm64/kvm/vgic/vgic-mmio-v2.c index f25fccb1f8e63..d00c8a74fad63 100644 --- a/arch/arm64/kvm/vgic/vgic-mmio-v2.c +++ b/arch/arm64/kvm/vgic/vgic-mmio-v2.c @@ -9,7 +9,7 @@ #include <linux/nospec.h>
#include <kvm/iodev.h> -#include <kvm/arm_vgic.h> +#include <asm/kvm_vgic.h>
#include "vgic.h" #include "vgic-mmio.h" diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c index 70d50c77e5dc7..5191ad3b74b7e 100644 --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c @@ -9,11 +9,11 @@ #include <linux/kvm_host.h> #include <linux/interrupt.h> #include <kvm/iodev.h> -#include <kvm/arm_vgic.h>
#include <asm/kvm_emulate.h> #include <asm/kvm_arm.h> #include <asm/kvm_mmu.h> +#include <asm/kvm_vgic.h>
#include "vgic.h" #include "vgic-mmio.h" diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c b/arch/arm64/kvm/vgic/vgic-mmio.c index a573b1f0c6cbe..45876b5ef9fc8 100644 --- a/arch/arm64/kvm/vgic/vgic-mmio.c +++ b/arch/arm64/kvm/vgic/vgic-mmio.c @@ -10,8 +10,8 @@ #include <linux/kvm.h> #include <linux/kvm_host.h> #include <kvm/iodev.h> -#include <kvm/arm_arch_timer.h> -#include <kvm/arm_vgic.h> +#include <asm/kvm_arch_timer.h> +#include <asm/kvm_vgic.h>
#include "vgic.h" #include "vgic-mmio.h" diff --git a/arch/arm64/kvm/vgic/vgic-v2.c b/arch/arm64/kvm/vgic/vgic-v2.c index 381673f03c395..780afb7aad06e 100644 --- a/arch/arm64/kvm/vgic/vgic-v2.c +++ b/arch/arm64/kvm/vgic/vgic-v2.c @@ -6,7 +6,7 @@ #include <linux/irqchip/arm-gic.h> #include <linux/kvm.h> #include <linux/kvm_host.h> -#include <kvm/arm_vgic.h> +#include <asm/kvm_vgic.h> #include <asm/kvm_mmu.h>
#include "vgic.h" diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c index 7f1259b49c505..f3f21d8fa8335 100644 --- a/arch/arm64/kvm/vgic/vgic-v3-nested.c +++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c @@ -7,11 +7,10 @@ #include <linux/io.h> #include <linux/uaccess.h>
-#include <kvm/arm_vgic.h> - #include <asm/kvm_arm.h> #include <asm/kvm_emulate.h> #include <asm/kvm_nested.h> +#include <asm/kvm_vgic.h>
#include "vgic.h"
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c index 2f75ef14d3399..f345501016e2c 100644 --- a/arch/arm64/kvm/vgic/vgic-v3.c +++ b/arch/arm64/kvm/vgic/vgic-v3.c @@ -7,10 +7,10 @@ #include <linux/kvm.h> #include <linux/kvm_host.h> #include <linux/string_choices.h> -#include <kvm/arm_vgic.h> #include <asm/kvm_hyp.h> #include <asm/kvm_mmu.h> #include <asm/kvm_asm.h> +#include <asm/kvm_vgic.h>
#include "vgic.h"
diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c index 2d3811f4e1174..601d7b376deef 100644 --- a/arch/arm64/kvm/vgic/vgic-v5.c +++ b/arch/arm64/kvm/vgic/vgic-v5.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only
-#include <kvm/arm_vgic.h> +#include <asm/kvm_vgic.h> #include <linux/irqchip/arm-vgic-info.h>
#include "vgic.h"
Because ARM hardware is not yet capable of direct interrupt injection into guests, guest counters will still trigger interrupts that need to be handled by the host PMU interrupt handler. Clear the overflow flags in hardware to handle the interrupt as normal, but record which guest overflow flags were set in the virtual overflow register for later injecting the interrupt into the guest.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm/include/asm/arm_pmuv3.h | 6 ++++++ arch/arm64/include/asm/kvm_pmu.h | 2 ++ arch/arm64/kvm/pmu-direct.c | 17 +++++++++++++++++ drivers/perf/arm_pmuv3.c | 9 +++++++++ 4 files changed, 34 insertions(+)
diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h index 3ea5741d213d8..485d2f08ac113 100644 --- a/arch/arm/include/asm/arm_pmuv3.h +++ b/arch/arm/include/asm/arm_pmuv3.h @@ -180,6 +180,11 @@ static inline void write_pmintenset(u32 val) write_sysreg(val, PMINTENSET); }
+static inline u32 read_pmintenset(void) +{ + return read_sysreg(PMINTENSET); +} + static inline void write_pmintenclr(u32 val) { write_sysreg(val, PMINTENCLR); @@ -249,6 +254,7 @@ static inline u64 kvm_pmu_guest_counter_mask(struct arm_pmu *pmu) return ~0; }
+static inline void kvm_pmu_handle_guest_irq(u64 govf) {}
/* PMU Version in DFR Register */ #define ARMV8_PMU_DFR_VER_NI 0 diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 43aa334dce517..e4cbab0fd09cf 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -101,6 +101,7 @@ u64 kvm_pmu_host_counter_mask(struct arm_pmu *pmu); u64 kvm_pmu_guest_counter_mask(struct arm_pmu *pmu); void kvm_pmu_host_counters_enable(void); void kvm_pmu_host_counters_disable(void); +void kvm_pmu_handle_guest_irq(u64 govf);
u8 kvm_pmu_guest_num_counters(struct kvm_vcpu *vcpu); u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu); @@ -322,6 +323,7 @@ static inline u64 kvm_pmu_guest_counter_mask(void *pmu)
static inline void kvm_pmu_host_counters_enable(void) {} static inline void kvm_pmu_host_counters_disable(void) {} +static inline void kvm_pmu_handle_guest_irq(u64 govf) {}
#endif
diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c index c5767e2ebc651..76d8ed24c8646 100644 --- a/arch/arm64/kvm/pmu-direct.c +++ b/arch/arm64/kvm/pmu-direct.c @@ -396,3 +396,20 @@ void kvm_pmu_put(struct kvm_vcpu *vcpu) val = read_pmintenset(); __vcpu_assign_sys_reg(vcpu, PMINTENSET_EL1, val & mask); } + +/** + * kvm_pmu_handle_guest_irq() - Record IRQs in guest counters + * @govf: Bitmask of guest overflowed counters + * + * Record IRQs from overflows in guest-reserved counters in the VCPU + * register for the guest to clear later. + */ +void kvm_pmu_handle_guest_irq(u64 govf) +{ + struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); + + if (!vcpu) + return; + + __vcpu_rmw_sys_reg(vcpu, PMOVSSET_EL0, |=, govf); +} diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 2bed99ba992d7..3c1a69f88b284 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -783,6 +783,8 @@ static u64 armv8pmu_getreset_flags(void)
/* Write to clear flags */ value &= ARMV8_PMU_CNT_MASK_ALL; + /* Only reset interrupt enabled counters. */ + value &= read_pmintenset(); write_pmovsclr(value);
return value; @@ -904,6 +906,7 @@ static void read_branch_records(struct pmu_hw_events *cpuc, static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu) { u64 pmovsr; + u64 govf; struct perf_sample_data data; struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events); struct pt_regs *regs; @@ -961,6 +964,12 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu) */ perf_event_overflow(event, &data, regs); } + + govf = pmovsr & kvm_pmu_guest_counter_mask(cpu_pmu); + + if (kvm_pmu_is_partitioned(cpu_pmu) && govf) + kvm_pmu_handle_guest_irq(govf); + armv8pmu_start(cpu_pmu);
return IRQ_HANDLED;
For PMUv3, the register field MDCR_EL2.HPMN partitiones the PMU counters into two ranges where counters 0..HPMN-1 are accessible by EL1 and, if allowed, EL0 while counters HPMN..N are only accessible by EL2.
Create module parameter reserved_host_counters to reserve a number of counters for the host. This number is set at boot because the perf subsystem assumes the number of counters will not change after the PMU is probed.
Introduce the function armv8pmu_partition() to modify the PMU driver's cntr_mask of available counters to exclude the counters being reserved for the guest and record reserved_guest_counters as the maximum allowable value for HPMN.
Due to the difficulty this feature would create for the driver running in nVHE mode, partitioning is only allowed in VHE mode. In order to support a partitioning on nVHE we'd need to explicitly disable guest counters on every exit and reset HPMN to place all counters in the first range.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm/include/asm/arm_pmuv3.h | 4 ++ arch/arm64/include/asm/arm_pmuv3.h | 5 ++ arch/arm64/include/asm/kvm_pmu.h | 8 +++ arch/arm64/kvm/Makefile | 2 +- arch/arm64/kvm/pmu-direct.c | 22 +++++++++ drivers/perf/arm_pmuv3.c | 78 +++++++++++++++++++++++++++++- include/linux/perf/arm_pmu.h | 1 + 7 files changed, 117 insertions(+), 3 deletions(-) create mode 100644 arch/arm64/kvm/pmu-direct.c
diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h index 2ec0e5e83fc98..636b1aab9e8d2 100644 --- a/arch/arm/include/asm/arm_pmuv3.h +++ b/arch/arm/include/asm/arm_pmuv3.h @@ -221,6 +221,10 @@ static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr) return false; }
+static inline bool kvm_pmu_partition_supported(void) +{ + return false; +} static inline bool kvm_set_pmuserenr(u64 val) { return false; diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h index cf2b2212e00a2..27c4d6d47da31 100644 --- a/arch/arm64/include/asm/arm_pmuv3.h +++ b/arch/arm64/include/asm/arm_pmuv3.h @@ -171,6 +171,11 @@ static inline bool pmuv3_implemented(int pmuver) pmuver == ID_AA64DFR0_EL1_PMUVer_NI); }
+static inline bool is_pmuv3p1(int pmuver) +{ + return pmuver >= ID_AA64DFR0_EL1_PMUVer_V3P1; +} + static inline bool is_pmuv3p4(int pmuver) { return pmuver >= ID_AA64DFR0_EL1_PMUVer_V3P4; diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 6c961e8778047..63bff75e4f8dd 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -45,7 +45,10 @@ struct arm_pmu_entry { struct arm_pmu *arm_pmu; };
+extern int armv8pmu_hpmn_max; + bool kvm_supports_guest_pmuv3(void); +bool kvm_pmu_partition_supported(void); #define kvm_arm_pmu_irq_initialized(v) ((v)->arch.pmu.irq_num >= VGIC_NR_SGIS) u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx); void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val); @@ -115,6 +118,11 @@ static inline bool kvm_supports_guest_pmuv3(void) return false; }
+static inline bool kvm_pmu_partition_supported(void) +{ + return false; +} + #define kvm_arm_pmu_irq_initialized(v) (false) static inline u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx) diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index 3ebc0570345cc..baf0f296c0e53 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -26,7 +26,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \ vgic/vgic-its.o vgic/vgic-debug.o vgic/vgic-v3-nested.o \ vgic/vgic-v5.o
-kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu.o +kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu-direct.o pmu.o kvm-$(CONFIG_ARM64_PTR_AUTH) += pauth.o kvm-$(CONFIG_PTDUMP_STAGE2_DEBUGFS) += ptdump.o
diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c new file mode 100644 index 0000000000000..0d38265b6f290 --- /dev/null +++ b/arch/arm64/kvm/pmu-direct.c @@ -0,0 +1,22 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2025 Google LLC + * Author: Colton Lewis coltonlewis@google.com + */ + +#include <linux/kvm_host.h> + +#include <asm/kvm_pmu.h> + +/** + * kvm_pmu_partition_supported() - Determine if partitioning is possible + * + * Partitioning is only supported in VHE mode with PMUv3 + * + * Return: True if partitioning is possible, false otherwise + */ +bool kvm_pmu_partition_supported(void) +{ + return has_vhe() && + system_supports_pmuv3(); +} diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 513122388b9da..379d1877a61ba 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -42,6 +42,13 @@ #define ARMV8_THUNDER_PERFCTR_L1I_CACHE_PREF_ACCESS 0xEC #define ARMV8_THUNDER_PERFCTR_L1I_CACHE_PREF_MISS 0xED
+static int reserved_host_counters __read_mostly = -1; +int armv8pmu_hpmn_max = -1; + +module_param(reserved_host_counters, int, 0); +MODULE_PARM_DESC(reserved_host_counters, + "PMU Partition: -1 = No partition; +N = Reserve N counters for the host"); + /* * ARMv8 Architectural defined events, not all of these may * be supported on any given implementation. Unsupported events will @@ -532,6 +539,11 @@ static void armv8pmu_pmcr_write(u64 val) write_pmcr(val); }
+static u64 armv8pmu_pmcr_n_read(void) +{ + return FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read()); +} + static int armv8pmu_has_overflowed(u64 pmovsr) { return !!(pmovsr & ARMV8_PMU_OVERFLOWED_MASK); @@ -1299,6 +1311,61 @@ struct armv8pmu_probe_info { bool present; };
+/** + * armv8pmu_reservation_is_valid() - Determine if reservation is allowed + * @host_counters: Number of host counters to reserve + * + * Determine if the number of host counters in the argument is an + * allowed reservation, 0 to NR_COUNTERS inclusive. + * + * Return: True if reservation allowed, false otherwise + */ +static bool armv8pmu_reservation_is_valid(int host_counters) +{ + return host_counters >= 0 && + host_counters <= armv8pmu_pmcr_n_read(); +} + +/** + * armv8pmu_partition() - Partition the PMU + * @pmu: Pointer to pmu being partitioned + * @host_counters: Number of host counters to reserve + * + * Partition the given PMU by taking a number of host counters to + * reserve and, if it is a valid reservation, recording the + * corresponding HPMN value in the hpmn_max field of the PMU and + * clearing the guest-reserved counters from the counter mask. + * + * Return: 0 on success, -ERROR otherwise + */ +static int armv8pmu_partition(struct arm_pmu *pmu, int host_counters) +{ + u8 nr_counters; + u8 hpmn; + + if (!armv8pmu_reservation_is_valid(host_counters)) { + pr_err("PMU partition reservation of %d host counters is not valid", host_counters); + return -EINVAL; + } + + nr_counters = armv8pmu_pmcr_n_read(); + hpmn = nr_counters - host_counters; + + pmu->hpmn_max = hpmn; + armv8pmu_hpmn_max = hpmn; + + bitmap_clear(pmu->cntr_mask, 0, hpmn); + bitmap_set(pmu->cntr_mask, hpmn, host_counters); + clear_bit(ARMV8_PMU_CYCLE_IDX, pmu->cntr_mask); + + if (pmuv3_has_icntr()) + clear_bit(ARMV8_PMU_INSTR_IDX, pmu->cntr_mask); + + pr_info("Partitioned PMU with %d host counters -> %u guest counters", host_counters, hpmn); + + return 0; +} + static void __armv8pmu_probe_pmu(void *info) { struct armv8pmu_probe_info *probe = info; @@ -1313,10 +1380,10 @@ static void __armv8pmu_probe_pmu(void *info)
cpu_pmu->pmuver = pmuver; probe->present = true; + cpu_pmu->hpmn_max = -1;
/* Read the nb of CNTx counters supported from PMNC */ - bitmap_set(cpu_pmu->cntr_mask, - 0, FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read())); + bitmap_set(cpu_pmu->cntr_mask, 0, armv8pmu_pmcr_n_read());
/* Add the CPU cycles counter */ set_bit(ARMV8_PMU_CYCLE_IDX, cpu_pmu->cntr_mask); @@ -1325,6 +1392,13 @@ static void __armv8pmu_probe_pmu(void *info) if (pmuv3_has_icntr()) set_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->cntr_mask);
+ if (reserved_host_counters >= 0) { + if (kvm_pmu_partition_supported()) + armv8pmu_partition(cpu_pmu, reserved_host_counters); + else + pr_err("PMU partition is not supported"); + } + pmceid[0] = pmceid_raw[0] = read_pmceid0(); pmceid[1] = pmceid_raw[1] = read_pmceid1();
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 93c9a26492fcf..69071e887f98f 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -128,6 +128,7 @@ struct arm_pmu {
/* Only to be used by ACPI probing code */ unsigned long acpi_cpuid; + int hpmn_max; /* MDCR_EL2.HPMN: counter partition pivot */ };
#define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))
The OVSR bitmasks are valid for enable and interrupt registers as well as overflow registers. Generalize the names.
Acked-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Colton Lewis coltonlewis@google.com --- drivers/perf/arm_pmuv3.c | 4 ++-- include/linux/perf/arm_pmuv3.h | 14 +++++++------- 2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 379d1877a61ba..3e6eb4be4ac43 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -546,7 +546,7 @@ static u64 armv8pmu_pmcr_n_read(void)
static int armv8pmu_has_overflowed(u64 pmovsr) { - return !!(pmovsr & ARMV8_PMU_OVERFLOWED_MASK); + return !!(pmovsr & ARMV8_PMU_CNT_MASK_ALL); }
static int armv8pmu_counter_has_overflowed(u64 pmnc, int idx) @@ -782,7 +782,7 @@ static u64 armv8pmu_getreset_flags(void) value = read_pmovsclr();
/* Write to clear flags */ - value &= ARMV8_PMU_OVERFLOWED_MASK; + value &= ARMV8_PMU_CNT_MASK_ALL; write_pmovsclr(value);
return value; diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h index d698efba28a27..fd2a34b4a64d1 100644 --- a/include/linux/perf/arm_pmuv3.h +++ b/include/linux/perf/arm_pmuv3.h @@ -224,14 +224,14 @@ ARMV8_PMU_PMCR_LC | ARMV8_PMU_PMCR_LP)
/* - * PMOVSR: counters overflow flag status reg + * Counter bitmask layouts for overflow, enable, and interrupts */ -#define ARMV8_PMU_OVSR_P GENMASK(30, 0) -#define ARMV8_PMU_OVSR_C BIT(31) -#define ARMV8_PMU_OVSR_F BIT_ULL(32) /* arm64 only */ -/* Mask for writable bits is both P and C fields */ -#define ARMV8_PMU_OVERFLOWED_MASK (ARMV8_PMU_OVSR_P | ARMV8_PMU_OVSR_C | \ - ARMV8_PMU_OVSR_F) +#define ARMV8_PMU_CNT_MASK_P GENMASK(30, 0) +#define ARMV8_PMU_CNT_MASK_C BIT(31) +#define ARMV8_PMU_CNT_MASK_F BIT_ULL(32) /* arm64 only */ +#define ARMV8_PMU_CNT_MASK_ALL (ARMV8_PMU_CNT_MASK_P | \ + ARMV8_PMU_CNT_MASK_C | \ + ARMV8_PMU_CNT_MASK_F)
/* * PMXEVTYPER: Event selection reg
Make sure reads and writes to PMCR_EL0 conform to additional constraints imposed when the PMU is partitioned.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/kvm/pmu.c | 2 +- arch/arm64/kvm/sys_regs.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c index 1fd012f8ff4a9..48b39f096fa12 100644 --- a/arch/arm64/kvm/pmu.c +++ b/arch/arm64/kvm/pmu.c @@ -877,7 +877,7 @@ u64 kvm_pmu_accessible_counter_mask(struct kvm_vcpu *vcpu) u64 kvm_vcpu_read_pmcr(struct kvm_vcpu *vcpu) { u64 pmcr = __vcpu_sys_reg(vcpu, PMCR_EL0); - u64 n = vcpu->kvm->arch.nr_pmu_counters; + u64 n = kvm_pmu_guest_num_counters(vcpu);
if (vcpu_has_nv(vcpu) && !vcpu_is_el2(vcpu)) n = FIELD_GET(MDCR_EL2_HPMN, __vcpu_sys_reg(vcpu, MDCR_EL2)); diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 70104087b6c7b..f2ae761625a66 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1360,7 +1360,7 @@ static int set_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r, */ if (!kvm_vm_has_ran_once(kvm) && !vcpu_has_nv(vcpu) && - new_n <= kvm_arm_pmu_get_max_counters(kvm)) + new_n <= kvm_pmu_hpmn(vcpu)) kvm->arch.nr_pmu_counters = new_n;
mutex_unlock(&kvm->arch.config_lock);
We may want a partitioned PMU but not have FEAT_FGT to untrap the specific registers that would normally be untrapped. Add a handler for those registers in the fast path so we can still get a performance boost from partitioning.
The idea is to handle traps for all the PMU registers quickly by writing through to the hardware as possible instead of hooking into the emulated vPMU as the standard handlers in sys_regs.c do. Since context switching will not happen without FGT, make sure to write both physical and virtual registers so they stay in sync.
To assist with that, fill of the gaps in arm_pmuv3.h for helper functions to read and write PMU registers and lift some of the access checking functions from sys_regs.c to pmu.c
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/include/asm/arm_pmuv3.h | 37 ++++- arch/arm64/include/asm/kvm_pmu.h | 25 +++ arch/arm64/kvm/hyp/include/hyp/switch.h | 201 ++++++++++++++++++++++++ arch/arm64/kvm/pmu.c | 40 +++++ arch/arm64/kvm/sys_regs.c | 41 +---- 5 files changed, 303 insertions(+), 41 deletions(-)
diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h index 3e25c0313263c..41ec6730ebc62 100644 --- a/arch/arm64/include/asm/arm_pmuv3.h +++ b/arch/arm64/include/asm/arm_pmuv3.h @@ -39,6 +39,16 @@ static inline unsigned long read_pmevtypern(int n) return 0; }
+static inline void write_pmxevcntr(u64 val) +{ + write_sysreg(val, pmxevcntr_el0); +} + +static inline u64 read_pmxevcntr(void) +{ + return read_sysreg(pmxevcntr_el0); +} + static inline unsigned long read_pmmir(void) { return read_cpuid(PMMIR_EL1); @@ -105,21 +115,41 @@ static inline void write_pmcntenset(u64 val) write_sysreg(val, pmcntenset_el0); }
+static inline u64 read_pmcntenset(void) +{ + return read_sysreg(pmcntenset_el0); +} + static inline void write_pmcntenclr(u64 val) { write_sysreg(val, pmcntenclr_el0); }
+static inline u64 read_pmcntenclr(void) +{ + return read_sysreg(pmcntenclr_el0); +} + static inline void write_pmintenset(u64 val) { write_sysreg(val, pmintenset_el1); }
+static inline u64 read_pmintenset(void) +{ + return read_sysreg(pmintenset_el1); +} + static inline void write_pmintenclr(u64 val) { write_sysreg(val, pmintenclr_el1); }
+static inline u64 read_pmintenclr(void) +{ + return read_sysreg(pmintenclr_el1); +} + static inline void write_pmccfiltr(u64 val) { write_sysreg(val, pmccfiltr_el0); @@ -160,11 +190,16 @@ static inline u64 read_pmovsclr(void) return read_sysreg(pmovsclr_el0); }
-static inline void write_pmuserenr(u32 val) +static inline void write_pmuserenr(u64 val) { write_sysreg(val, pmuserenr_el0); }
+static inline u64 read_pmuserenr(void) +{ + return read_sysreg(pmuserenr_el0); +} + static inline void write_pmuacr(u64 val) { write_sysreg_s(val, SYS_PMUACR_EL1); diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 7297a697a4a62..60b8a48cad456 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -83,6 +83,11 @@ struct kvm_pmu_events *kvm_get_pmu_events(void); void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr); void kvm_clr_pmu_events(u64 clr); bool kvm_set_pmuserenr(u64 val); +bool check_pmu_access_disabled(struct kvm_vcpu *vcpu, u64 flags); +bool pmu_access_el0_disabled(struct kvm_vcpu *vcpu); +bool pmu_access_event_counter_el0_disabled(struct kvm_vcpu *vcpu); +bool pmu_access_cycle_counter_el0_disabled(struct kvm_vcpu *vcpu); +bool pmu_counter_idx_valid(struct kvm_vcpu *vcpu, u64 idx); void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu); void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu); void kvm_vcpu_pmu_resync_el0(void); @@ -226,6 +231,26 @@ static inline bool kvm_set_pmuserenr(u64 val) { return false; } +static inline bool check_pmu_access_disabled(struct kvm_vcpu *vcpu, u64 flags) +{ + return false; +} +static inline bool pmu_access_el0_disabled(struct kvm_vcpu *vcpu) +{ + return false; +} +static inline bool pmu_access_event_counter_el0_disabled(struct kvm_vcpu *vcpu) +{ + return false; +} +static inline bool pmu_access_cycle_counter_el0_disabled(struct kvm_vcpu *vcpu) +{ + return false; +} +static inline bool pmu_counter_idx_valid(struct kvm_vcpu *vcpu, u64 idx) +{ + return false; +} static inline void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu) {} static inline void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu) {} static inline void kvm_vcpu_reload_pmu(struct kvm_vcpu *vcpu) {} diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index 6e8050f260f34..40bd00df6c58f 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -24,12 +24,14 @@ #include <asm/kvm_emulate.h> #include <asm/kvm_hyp.h> #include <asm/kvm_mmu.h> +#include <asm/kvm_pmu.h> #include <asm/kvm_nested.h> #include <asm/fpsimd.h> #include <asm/debug-monitors.h> #include <asm/processor.h> #include <asm/traps.h>
+#include <../../sys_regs.h> #include "arm_psci.h"
struct kvm_exception_table_entry { @@ -768,6 +770,202 @@ static bool handle_ampere1_tcr(struct kvm_vcpu *vcpu) return true; }
+/** + * handle_pmu_reg() - Handle fast access to most PMU regs + * @vcpu: Ponter to kvm_vcpu struct + * @p: System register parameters (read/write, Op0, Op1, CRm, CRn, Op2) + * @reg: VCPU register identifier + * @rt: Target general register + * @val: Value to write + * @readfn: Sysreg read function + * @writefn: Sysreg write function + * + * Handle fast access to most PMU regs. Writethrough to the physical + * register. This function is a wrapper for the simplest case, but + * sadly there aren't many of those. + * + * Always return true. The boolean makes usage more consistent with + * similar functions. + * + * Return: True + */ +static bool handle_pmu_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p, + enum vcpu_sysreg reg, u8 rt, u64 val, + u64 (*readfn)(void), void (*writefn)(u64)) +{ + if (p->is_write) { + __vcpu_assign_sys_reg(vcpu, reg, val); + writefn(val); + } else { + vcpu_set_reg(vcpu, rt, readfn()); + } + + return true; +} + +/** + * kvm_hyp_handle_pmu_regs() - Fast handler for PMU registers + * @vcpu: Pointer to vcpu struct + * + * This handler immediately writes through certain PMU registers when + * we have a partitioned PMU (that is, MDCR_EL2.HPMN is set to reserve + * a range of counters for the guest) but the machine does not have + * FEAT_FGT to selectively untrap the registers we want. + * + * Return: True if the exception was successfully handled, false otherwise + */ +static bool kvm_hyp_handle_pmu_regs(struct kvm_vcpu *vcpu) +{ + struct sys_reg_params p; + u64 esr; + u32 sysreg; + u8 rt; + u64 val; + u64 mask; + u8 idx; + bool ret; + + if (!kvm_vcpu_pmu_is_partitioned(vcpu) + || pmu_access_el0_disabled(vcpu)) + return false; + + esr = kvm_vcpu_get_esr(vcpu); + p = esr_sys64_to_params(esr); + sysreg = esr_sys64_to_sysreg(esr); + rt = kvm_vcpu_sys_get_rt(vcpu); + val = vcpu_get_reg(vcpu, rt); + + switch (sysreg) { + case SYS_PMCR_EL0: + mask = ARMV8_PMU_PMCR_MASK; + + if (p.is_write) { + write_pmcr(val); + mask &= ~(ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C); + __vcpu_assign_sys_reg(vcpu, PMCR_EL0, val & mask); + } else { + val = u64_replace_bits( + read_pmcr(), + vcpu->kvm->arch.nr_pmu_counters, + ARMV8_PMU_PMCR_N); + vcpu_set_reg(vcpu, rt, val); + } + + ret = true; + break; + case SYS_PMUSERENR_EL0: + mask = ARMV8_PMU_USERENR_MASK; + ret = handle_pmu_reg(vcpu, &p, PMUSERENR_EL0, rt, val & mask, + &read_pmuserenr, &write_pmuserenr); + break; + case SYS_PMSELR_EL0: + mask = PMSELR_EL0_SEL_MASK; + + if (pmu_access_event_counter_el0_disabled(vcpu)) + return false; + + ret = handle_pmu_reg(vcpu, &p, PMSELR_EL0, rt, val & mask, + &read_pmselr, &write_pmselr); + break; + case SYS_PMINTENCLR_EL1: + mask = kvm_pmu_accessible_counter_mask(vcpu); + + if (p.is_write) { + __vcpu_rmw_sys_reg(vcpu, PMINTENSET_EL1, &=, ~(val & mask)); + write_pmintenclr(val); + } else { + val = read_pmintenclr(); + vcpu_set_reg(vcpu, rt, val & mask); + } + ret = true; + + break; + case SYS_PMINTENSET_EL1: + mask = kvm_pmu_accessible_counter_mask(vcpu); + + if (p.is_write) { + __vcpu_rmw_sys_reg(vcpu, PMINTENSET_EL1, |=, val & mask); + write_pmintenset(val); + } else { + val = read_pmintenset(); + vcpu_set_reg(vcpu, rt, val & mask); + } + + ret = true; + break; + case SYS_PMCNTENCLR_EL0: + mask = kvm_pmu_accessible_counter_mask(vcpu); + + if (p.is_write) { + __vcpu_rmw_sys_reg(vcpu, PMCNTENSET_EL0, &=, ~(val & mask)); + write_pmcntenclr(val); + } else { + val = read_pmcntenclr(); + vcpu_set_reg(vcpu, rt, val & mask); + } + + ret = true; + break; + case SYS_PMCNTENSET_EL0: + mask = kvm_pmu_accessible_counter_mask(vcpu); + + if (p.is_write) { + __vcpu_rmw_sys_reg(vcpu, PMCNTENSET_EL0, |=, val & mask); + write_pmcntenset(val); + } else { + val = read_pmcntenset(); + vcpu_set_reg(vcpu, rt, val & mask); + } + + ret = true; + break; + case SYS_PMCCNTR_EL0: + if (pmu_access_cycle_counter_el0_disabled(vcpu)) + return false; + + ret = handle_pmu_reg(vcpu, &p, PMCCNTR_EL0, rt, val, + &read_pmccntr, &write_pmccntr); + break; + case SYS_PMXEVCNTR_EL0: + idx = FIELD_GET(PMSELR_EL0_SEL, read_pmselr()); + + if (pmu_access_event_counter_el0_disabled(vcpu)) + return false; + + if (!pmu_counter_idx_valid(vcpu, idx)) + return false; + + ret = handle_pmu_reg(vcpu, &p, PMEVCNTR0_EL0 + idx, rt, val, + &read_pmxevcntr, &write_pmxevcntr); + break; + case SYS_PMEVCNTRn_EL0(0) ... SYS_PMEVCNTRn_EL0(30): + idx = ((p.CRm & 3) << 3) | (p.Op2 & 7); + + if (pmu_access_event_counter_el0_disabled(vcpu)) + return false; + + if (!pmu_counter_idx_valid(vcpu, idx)) + return false; + + if (p.is_write) { + write_pmevcntrn(idx, val); + __vcpu_assign_sys_reg(vcpu, PMEVCNTR0_EL0 + idx, val); + } else { + vcpu_set_reg(vcpu, rt, read_pmevcntrn(idx)); + } + + ret = true; + break; + default: + ret = false; + } + + if (ret) + __kvm_skip_instr(vcpu); + + return ret; +} + static inline bool kvm_hyp_handle_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code) { if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM) && @@ -785,6 +983,9 @@ static inline bool kvm_hyp_handle_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code) if (kvm_handle_cntxct(vcpu)) return true;
+ if (kvm_hyp_handle_pmu_regs(vcpu)) + return true; + return false; }
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c index 79b7ea037153a..1fd012f8ff4a9 100644 --- a/arch/arm64/kvm/pmu.c +++ b/arch/arm64/kvm/pmu.c @@ -884,3 +884,43 @@ u64 kvm_vcpu_read_pmcr(struct kvm_vcpu *vcpu)
return u64_replace_bits(pmcr, n, ARMV8_PMU_PMCR_N); } + +bool check_pmu_access_disabled(struct kvm_vcpu *vcpu, u64 flags) +{ + u64 reg = __vcpu_sys_reg(vcpu, PMUSERENR_EL0); + bool enabled = (reg & flags) || vcpu_mode_priv(vcpu); + + if (!enabled) + kvm_inject_undefined(vcpu); + + return !enabled; +} + +bool pmu_access_el0_disabled(struct kvm_vcpu *vcpu) +{ + return check_pmu_access_disabled(vcpu, ARMV8_PMU_USERENR_EN); +} + +bool pmu_counter_idx_valid(struct kvm_vcpu *vcpu, u64 idx) +{ + u64 pmcr, val; + + pmcr = kvm_vcpu_read_pmcr(vcpu); + val = FIELD_GET(ARMV8_PMU_PMCR_N, pmcr); + if (idx >= val && idx != ARMV8_PMU_CYCLE_IDX) { + kvm_inject_undefined(vcpu); + return false; + } + + return true; +} + +bool pmu_access_cycle_counter_el0_disabled(struct kvm_vcpu *vcpu) +{ + return check_pmu_access_disabled(vcpu, ARMV8_PMU_USERENR_CR | ARMV8_PMU_USERENR_EN); +} + +bool pmu_access_event_counter_el0_disabled(struct kvm_vcpu *vcpu) +{ + return check_pmu_access_disabled(vcpu, ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_EN); +} diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index bee892db9ca8b..70104087b6c7b 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -29,6 +29,7 @@ #include <asm/kvm_hyp.h> #include <asm/kvm_mmu.h> #include <asm/kvm_nested.h> +#include <asm/kvm_pmu.h> #include <asm/perf_event.h> #include <asm/sysreg.h>
@@ -970,37 +971,11 @@ static u64 reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r) return __vcpu_sys_reg(vcpu, r->reg); }
-static bool check_pmu_access_disabled(struct kvm_vcpu *vcpu, u64 flags) -{ - u64 reg = __vcpu_sys_reg(vcpu, PMUSERENR_EL0); - bool enabled = (reg & flags) || vcpu_mode_priv(vcpu); - - if (!enabled) - kvm_inject_undefined(vcpu); - - return !enabled; -} - -static bool pmu_access_el0_disabled(struct kvm_vcpu *vcpu) -{ - return check_pmu_access_disabled(vcpu, ARMV8_PMU_USERENR_EN); -} - static bool pmu_write_swinc_el0_disabled(struct kvm_vcpu *vcpu) { return check_pmu_access_disabled(vcpu, ARMV8_PMU_USERENR_SW | ARMV8_PMU_USERENR_EN); }
-static bool pmu_access_cycle_counter_el0_disabled(struct kvm_vcpu *vcpu) -{ - return check_pmu_access_disabled(vcpu, ARMV8_PMU_USERENR_CR | ARMV8_PMU_USERENR_EN); -} - -static bool pmu_access_event_counter_el0_disabled(struct kvm_vcpu *vcpu) -{ - return check_pmu_access_disabled(vcpu, ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_EN); -} - static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p, const struct sys_reg_desc *r) { @@ -1067,20 +1042,6 @@ static bool access_pmceid(struct kvm_vcpu *vcpu, struct sys_reg_params *p, return true; }
-static bool pmu_counter_idx_valid(struct kvm_vcpu *vcpu, u64 idx) -{ - u64 pmcr, val; - - pmcr = kvm_vcpu_read_pmcr(vcpu); - val = FIELD_GET(ARMV8_PMU_PMCR_N, pmcr); - if (idx >= val && idx != ARMV8_PMU_CYCLE_IDX) { - kvm_inject_undefined(vcpu); - return false; - } - - return true; -} - static int get_pmu_evcntr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r, u64 *val) {
In order to gain the best performance benefit from partitioning the PMU, utilize fine grain traps (FEAT_FGT and FEAT_FGT2) to avoid trapping common PMU register accesses by the guest to remove that overhead.
Untrapped: * PMCR_EL0 * PMUSERENR_EL0 * PMSELR_EL0 * PMCCNTR_EL0 * PMCNTEN_EL0 * PMINTEN_EL1 * PMEVCNTRn_EL0
These are safe to untrap because writing MDCR_EL2.HPMN as this series will do limits the effect of writes to any of these registers to the partition of counters 0..HPMN-1. Reads from these registers will not leak information from between guests as all these registers are context swapped by a later patch in this series. Reads from these registers also do not leak any information about the host's hardware beyond what is promised by PMUv3.
Trapped: * PMOVS_EL0 * PMEVTYPERn_EL0 * PMCCFILTR_EL0 * PMICNTR_EL0 * PMICFILTR_EL0 * PMCEIDn_EL0 * PMMIR_EL1
PMOVS remains trapped so KVM can track overflow IRQs that will need to be injected into the guest.
PMICNTR and PMIFILTR remain trapped because KVM is not handling them yet.
PMEVTYPERn remains trapped so KVM can limit which events guests can count, such as disallowing counting at EL2. PMCCFILTR and PMCIFILTR are special cases of the same.
PMCEIDn and PMMIR remain trapped because they can leak information specific to the host hardware implementation.
NOTE: This patch temporarily forces kvm_vcpu_pmu_is_partitioned() to be false to prevent partial feature activation for easier debugging.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/include/asm/kvm_pmu.h | 33 ++++++++++++++++++++++ arch/arm64/kvm/config.c | 34 ++++++++++++++++++++-- arch/arm64/kvm/pmu-direct.c | 48 ++++++++++++++++++++++++++++++++ 3 files changed, 112 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 8887f39c25e60..7297a697a4a62 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -96,6 +96,23 @@ u64 kvm_pmu_guest_counter_mask(struct arm_pmu *pmu); void kvm_pmu_host_counters_enable(void); void kvm_pmu_host_counters_disable(void);
+#if !defined(__KVM_NVHE_HYPERVISOR__) +bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu); +bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu); +#else +static inline bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu) +{ + return false; +} + +static inline bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu) +{ + return false; +} +#endif +u64 kvm_pmu_fgt_bits(void); +u64 kvm_pmu_fgt2_bits(void); + /* * Updates the vcpu's view of the pmu events for this cpu. * Must be called before every vcpu run after disabling interrupts, to ensure @@ -135,6 +152,22 @@ static inline u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, { return 0; } +static inline bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu) +{ + return false; +} +static inline bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu) +{ + return false; +} +static inline u64 kvm_pmu_fgt_bits(void) +{ + return 0; +} +static inline u64 kvm_pmu_fgt2_bits(void) +{ + return 0; +} static inline void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val) {} static inline void kvm_pmu_set_counter_value_user(struct kvm_vcpu *vcpu, diff --git a/arch/arm64/kvm/config.c b/arch/arm64/kvm/config.c index 24bb3f36e9d59..064dc6aa06f76 100644 --- a/arch/arm64/kvm/config.c +++ b/arch/arm64/kvm/config.c @@ -6,6 +6,7 @@
#include <linux/kvm_host.h> #include <asm/kvm_emulate.h> +#include <asm/kvm_pmu.h> #include <asm/kvm_nested.h> #include <asm/sysreg.h>
@@ -1489,12 +1490,39 @@ static void __compute_hfgwtr(struct kvm_vcpu *vcpu) *vcpu_fgt(vcpu, HFGWTR_EL2) |= HFGWTR_EL2_TCR_EL1; }
+static void __compute_hdfgrtr(struct kvm_vcpu *vcpu) +{ + __compute_fgt(vcpu, HDFGRTR_EL2); + + if (kvm_vcpu_pmu_use_fgt(vcpu)) + *vcpu_fgt(vcpu, HDFGRTR_EL2) |= kvm_pmu_fgt_bits(); +} + static void __compute_hdfgwtr(struct kvm_vcpu *vcpu) { __compute_fgt(vcpu, HDFGWTR_EL2);
if (is_hyp_ctxt(vcpu)) *vcpu_fgt(vcpu, HDFGWTR_EL2) |= HDFGWTR_EL2_MDSCR_EL1; + + if (kvm_vcpu_pmu_use_fgt(vcpu)) + *vcpu_fgt(vcpu, HDFGWTR_EL2) |= kvm_pmu_fgt_bits(); +} + +static void __compute_hdfgrtr2(struct kvm_vcpu *vcpu) +{ + __compute_fgt(vcpu, HDFGRTR2_EL2); + + if (kvm_vcpu_pmu_use_fgt(vcpu)) + *vcpu_fgt(vcpu, HDFGRTR2_EL2) |= kvm_pmu_fgt2_bits(); +} + +static void __compute_hdfgwtr2(struct kvm_vcpu *vcpu) +{ + __compute_fgt(vcpu, HDFGWTR2_EL2); + + if (kvm_vcpu_pmu_use_fgt(vcpu)) + *vcpu_fgt(vcpu, HDFGWTR2_EL2) |= kvm_pmu_fgt2_bits(); }
void kvm_vcpu_load_fgt(struct kvm_vcpu *vcpu) @@ -1505,7 +1533,7 @@ void kvm_vcpu_load_fgt(struct kvm_vcpu *vcpu) __compute_fgt(vcpu, HFGRTR_EL2); __compute_hfgwtr(vcpu); __compute_fgt(vcpu, HFGITR_EL2); - __compute_fgt(vcpu, HDFGRTR_EL2); + __compute_hdfgrtr(vcpu); __compute_hdfgwtr(vcpu); __compute_fgt(vcpu, HAFGRTR_EL2);
@@ -1515,6 +1543,6 @@ void kvm_vcpu_load_fgt(struct kvm_vcpu *vcpu) __compute_fgt(vcpu, HFGRTR2_EL2); __compute_fgt(vcpu, HFGWTR2_EL2); __compute_fgt(vcpu, HFGITR2_EL2); - __compute_fgt(vcpu, HDFGRTR2_EL2); - __compute_fgt(vcpu, HDFGWTR2_EL2); + __compute_hdfgrtr2(vcpu); + __compute_hdfgwtr2(vcpu); } diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c index d5de7fdd059f4..4dd160c878862 100644 --- a/arch/arm64/kvm/pmu-direct.c +++ b/arch/arm64/kvm/pmu-direct.c @@ -43,6 +43,54 @@ bool kvm_pmu_is_partitioned(struct arm_pmu *pmu) pmu->hpmn_max <= *host_data_ptr(nr_event_counters); }
+/** + * kvm_vcpu_pmu_is_partitioned() - Determine if given VCPU has a partitioned PMU + * @vcpu: Pointer to kvm_vcpu struct + * + * Determine if given VCPU has a partitioned PMU by extracting that + * field and passing it to :c:func:`kvm_pmu_is_partitioned` + * + * Return: True if the VCPU PMU is partitioned, false otherwise + */ +bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu) +{ + return kvm_pmu_is_partitioned(vcpu->kvm->arch.arm_pmu) && + false; +} + +/** + * kvm_vcpu_pmu_use_fgt() - Determine if we can use FGT + * @vcpu: Pointer to struct kvm_vcpu + * + * Determine if we can use FGT for direct access to registers. We can + * if capabilities permit the number of guest counters requested. + * + * Return: True if we can use FGT, false otherwise + */ +bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu) +{ + u8 hpmn = vcpu->kvm->arch.nr_pmu_counters; + + return kvm_vcpu_pmu_is_partitioned(vcpu) && + cpus_have_final_cap(ARM64_HAS_FGT) && + (hpmn != 0 || cpus_have_final_cap(ARM64_HAS_HPMN0)); +} + +u64 kvm_pmu_fgt_bits(void) +{ + return HDFGRTR_EL2_PMOVS + | HDFGRTR_EL2_PMCCFILTR_EL0 + | HDFGRTR_EL2_PMEVTYPERn_EL0 + | HDFGRTR_EL2_PMCEIDn_EL0 + | HDFGRTR_EL2_PMMIR_EL1; +} + +u64 kvm_pmu_fgt2_bits(void) +{ + return HDFGRTR2_EL2_nPMICFILTR_EL0 + | HDFGRTR2_EL2_nPMICNTR_EL0; +} + /** * kvm_pmu_host_counter_mask() - Compute bitmask of host-reserved counters * @pmu: Pointer to arm_pmu struct
Setup MDCR_EL2 to handle a partitioned PMU. That means calculate an appropriate value for HPMN instead of the default maximum setting the host allows (which implies no partition) so hardware enforces that a guest will only see the counters in the guest partition.
Setting HPMN to a non default value means the global enable bit for the host counters is now MDCR_EL2.HPME instead of the usual PMCR_EL0.E. Enable the HPME bit to allow the host to count guest events. Since HPME only has an effect when HPMN is set which we only do for the guest, it is correct to enable it unconditionally here.
Unset the TPM and TPMCR bits, which trap all PMU accesses, if FGT (fine grain trapping) is being used.
If available, set the filtering bits HPMD and HCCD to be extra sure nothing in the guest counts at EL2.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/include/asm/kvm_pmu.h | 11 ++++++ arch/arm64/kvm/debug.c | 29 ++++++++++++-- arch/arm64/kvm/pmu-direct.c | 65 ++++++++++++++++++++++++++++++++ 3 files changed, 102 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 60b8a48cad456..8b634112eded2 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -101,6 +101,9 @@ u64 kvm_pmu_guest_counter_mask(struct arm_pmu *pmu); void kvm_pmu_host_counters_enable(void); void kvm_pmu_host_counters_disable(void);
+u8 kvm_pmu_guest_num_counters(struct kvm_vcpu *vcpu); +u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu); + #if !defined(__KVM_NVHE_HYPERVISOR__) bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu); bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu); @@ -173,6 +176,14 @@ static inline u64 kvm_pmu_fgt2_bits(void) { return 0; } +static inline u8 kvm_pmu_guest_num_counters(struct kvm_vcpu *vcpu) +{ + return 0; +} +static inline u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu) +{ + return 0; +} static inline void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val) {} static inline void kvm_pmu_set_counter_value_user(struct kvm_vcpu *vcpu, diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c index 3ad6b7c6e4ba7..0ab89c91e19cb 100644 --- a/arch/arm64/kvm/debug.c +++ b/arch/arm64/kvm/debug.c @@ -36,20 +36,43 @@ static int cpu_has_spe(u64 dfr0) */ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu) { + int hpmn = kvm_pmu_hpmn(vcpu); + preempt_disable();
/* * This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK * to disable guest access to the profiling and trace buffers */ - vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, - *host_data_ptr(nr_event_counters)); + + vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, hpmn); vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM | MDCR_EL2_TPMS | MDCR_EL2_TTRF | MDCR_EL2_TPMCR | MDCR_EL2_TDRA | - MDCR_EL2_TDOSA); + MDCR_EL2_TDOSA | + MDCR_EL2_HPME); + + if (kvm_vcpu_pmu_is_partitioned(vcpu)) { + /* + * Filtering these should be redundant because we trap + * all the TYPER and FILTR registers anyway and ensure + * they filter EL2, but set the bits if they are here. + */ + if (is_pmuv3p1(read_pmuver())) + vcpu->arch.mdcr_el2 |= MDCR_EL2_HPMD; + if (is_pmuv3p5(read_pmuver())) + vcpu->arch.mdcr_el2 |= MDCR_EL2_HCCD; + + /* + * Take out the coarse grain traps if we are using + * fine grain traps. + */ + if (kvm_vcpu_pmu_use_fgt(vcpu)) + vcpu->arch.mdcr_el2 &= ~(MDCR_EL2_TPM | MDCR_EL2_TPMCR); + + }
/* Is the VM being debugged by userspace? */ if (vcpu->guest_debug) diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c index 4dd160c878862..7fb4fb5c22e2a 100644 --- a/arch/arm64/kvm/pmu-direct.c +++ b/arch/arm64/kvm/pmu-direct.c @@ -154,3 +154,68 @@ void kvm_pmu_host_counters_disable(void) mdcr &= ~MDCR_EL2_HPME; write_sysreg(mdcr, mdcr_el2); } + +/** + * kvm_pmu_guest_num_counters() - Number of counters to show to guest + * @vcpu: Pointer to struct kvm_vcpu + * + * Calculate the number of counters to show to the guest via + * PMCR_EL0.N, making sure to respect the maximum the host allows, + * which is hpmn_max if partitioned and host_max otherwise. + * + * Return: Valid value for PMCR_EL0.N + */ +u8 kvm_pmu_guest_num_counters(struct kvm_vcpu *vcpu) +{ + u8 nr_cnt = vcpu->kvm->arch.nr_pmu_counters; + int hpmn_max = armv8pmu_hpmn_max; + u8 host_max = *host_data_ptr(nr_event_counters); + + if (vcpu->kvm->arch.arm_pmu) + hpmn_max = vcpu->kvm->arch.arm_pmu->hpmn_max; + + if (kvm_vcpu_pmu_is_partitioned(vcpu)) { + if (nr_cnt <= hpmn_max && nr_cnt <= host_max) + return nr_cnt; + if (hpmn_max <= host_max) + return hpmn_max; + } + + if (nr_cnt <= host_max) + return nr_cnt; + + return host_max; +} + +/** + * kvm_pmu_hpmn() - Calculate HPMN field value + * @vcpu: Pointer to struct kvm_vcpu + * + * Calculate the appropriate value to set for MDCR_EL2.HPMN. If + * partitioned, this is the number of counters set for the guest if + * supported, falling back to hpmn_max if needed. If we are not + * partitioned or can't set the implied HPMN value, fall back to the + * host value. + * + * Return: A valid HPMN value + */ +u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu) +{ + u8 nr_guest_cnt = kvm_pmu_guest_num_counters(vcpu); + int nr_guest_cnt_max = armv8pmu_hpmn_max; + u8 nr_host_cnt_max = *host_data_ptr(nr_event_counters); + + if (vcpu->kvm->arch.arm_pmu) + nr_guest_cnt_max = vcpu->kvm->arch.arm_pmu->hpmn_max; + + if (kvm_vcpu_pmu_is_partitioned(vcpu)) { + if (cpus_have_final_cap(ARM64_HAS_HPMN0)) + return nr_guest_cnt; + else if (nr_guest_cnt > 0) + return nr_guest_cnt; + else if (nr_guest_cnt_max > 0) + return nr_guest_cnt_max; + } + + return nr_host_cnt_max; +}
Because PMXEVTYPER is trapped and PMSELR is not, it is not appropriate to use the virtual PMSELR register when it could be outdated and lead to an invalid write. Use the physical register when partitioned.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/include/asm/arm_pmuv3.h | 7 ++++++- arch/arm64/kvm/sys_regs.c | 9 +++++++-- 2 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h index 27c4d6d47da31..60600f04b5902 100644 --- a/arch/arm64/include/asm/arm_pmuv3.h +++ b/arch/arm64/include/asm/arm_pmuv3.h @@ -70,11 +70,16 @@ static inline u64 read_pmcr(void) return read_sysreg(pmcr_el0); }
-static inline void write_pmselr(u32 val) +static inline void write_pmselr(u64 val) { write_sysreg(val, pmselr_el0); }
+static inline u64 read_pmselr(void) +{ + return read_sysreg(pmselr_el0); +} + static inline void write_pmccntr(u64 val) { write_sysreg(val, pmccntr_el0); diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 0c9596325519b..2e6d907fa8af2 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1199,14 +1199,19 @@ static bool writethrough_pmevtyper(struct kvm_vcpu *vcpu, struct sys_reg_params static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p, const struct sys_reg_desc *r) { - u64 idx, reg; + u64 idx, reg, pmselr;
if (pmu_access_el0_disabled(vcpu)) return false;
if (r->CRn == 9 && r->CRm == 13 && r->Op2 == 1) { /* PMXEVTYPER_EL0 */ - idx = SYS_FIELD_GET(PMSELR_EL0, SEL, __vcpu_sys_reg(vcpu, PMSELR_EL0)); + if (kvm_vcpu_pmu_is_partitioned(vcpu)) + pmselr = read_pmselr(); + else + pmselr = __vcpu_sys_reg(vcpu, PMSELR_EL0); + + idx = SYS_FIELD_GET(PMSELR_EL0, SEL, pmselr); reg = PMEVTYPER0_EL0 + idx; } else if (r->CRn == 14 && (r->CRm & 12) == 12) { idx = ((r->CRm & 3) << 3) | (r->Op2 & 7);
A lot of functions in pmu-emul.c aren't specific to the emulated PMU implementation. Move them to the more appropriate pmu.c file where shared PMU functions should live.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/include/asm/kvm_pmu.h | 3 + arch/arm64/kvm/pmu-emul.c | 672 +----------------------------- arch/arm64/kvm/pmu.c | 675 +++++++++++++++++++++++++++++++ 3 files changed, 679 insertions(+), 671 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index ad3247b468388..6c961e8778047 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -51,13 +51,16 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx); void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val); void kvm_pmu_set_counter_value_user(struct kvm_vcpu *vcpu, u64 select_idx, u64 val); u64 kvm_pmu_implemented_counter_mask(struct kvm_vcpu *vcpu); +u64 kvm_pmu_hyp_counter_mask(struct kvm_vcpu *vcpu); u64 kvm_pmu_accessible_counter_mask(struct kvm_vcpu *vcpu); +u32 kvm_pmu_event_mask(struct kvm *kvm); u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1); void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu); void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu); void kvm_pmu_reprogram_counter_mask(struct kvm_vcpu *vcpu, u64 val); void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu); void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu); +bool kvm_pmu_overflow_status(struct kvm_vcpu *vcpu); bool kvm_pmu_should_notify_user(struct kvm_vcpu *vcpu); void kvm_pmu_update_run(struct kvm_vcpu *vcpu); void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val); diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index dcdd80ffd49d5..bcaa9f7a8ca28 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -17,19 +17,10 @@
#define PERF_ATTR_CFG1_COUNTER_64BIT BIT(0)
-static LIST_HEAD(arm_pmus); -static DEFINE_MUTEX(arm_pmus_lock); - static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc); static void kvm_pmu_release_perf_event(struct kvm_pmc *pmc); static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc);
-bool kvm_supports_guest_pmuv3(void) -{ - guard(mutex)(&arm_pmus_lock); - return !list_empty(&arm_pmus); -} - static struct kvm_vcpu *kvm_pmc_to_vcpu(const struct kvm_pmc *pmc) { return container_of(pmc, struct kvm_vcpu, arch.pmu.pmc[pmc->idx]); @@ -40,46 +31,6 @@ static struct kvm_pmc *kvm_vcpu_idx_to_pmc(struct kvm_vcpu *vcpu, int cnt_idx) return &vcpu->arch.pmu.pmc[cnt_idx]; }
-static u32 __kvm_pmu_event_mask(unsigned int pmuver) -{ - switch (pmuver) { - case ID_AA64DFR0_EL1_PMUVer_IMP: - return GENMASK(9, 0); - case ID_AA64DFR0_EL1_PMUVer_V3P1: - case ID_AA64DFR0_EL1_PMUVer_V3P4: - case ID_AA64DFR0_EL1_PMUVer_V3P5: - case ID_AA64DFR0_EL1_PMUVer_V3P7: - return GENMASK(15, 0); - default: /* Shouldn't be here, just for sanity */ - WARN_ONCE(1, "Unknown PMU version %d\n", pmuver); - return 0; - } -} - -static u32 kvm_pmu_event_mask(struct kvm *kvm) -{ - u64 dfr0 = kvm_read_vm_id_reg(kvm, SYS_ID_AA64DFR0_EL1); - u8 pmuver = SYS_FIELD_GET(ID_AA64DFR0_EL1, PMUVer, dfr0); - - return __kvm_pmu_event_mask(pmuver); -} - -u64 kvm_pmu_evtyper_mask(struct kvm *kvm) -{ - u64 mask = ARMV8_PMU_EXCLUDE_EL1 | ARMV8_PMU_EXCLUDE_EL0 | - kvm_pmu_event_mask(kvm); - - if (kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL2, IMP)) - mask |= ARMV8_PMU_INCLUDE_EL2; - - if (kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL3, IMP)) - mask |= ARMV8_PMU_EXCLUDE_NS_EL0 | - ARMV8_PMU_EXCLUDE_NS_EL1 | - ARMV8_PMU_EXCLUDE_EL3; - - return mask; -} - /** * kvm_pmc_is_64bit - determine if counter is 64bit * @pmc: counter context @@ -272,59 +223,6 @@ void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu) irq_work_sync(&vcpu->arch.pmu.overflow_work); }
-static u64 kvm_pmu_hyp_counter_mask(struct kvm_vcpu *vcpu) -{ - unsigned int hpmn, n; - - if (!vcpu_has_nv(vcpu)) - return 0; - - hpmn = SYS_FIELD_GET(MDCR_EL2, HPMN, __vcpu_sys_reg(vcpu, MDCR_EL2)); - n = vcpu->kvm->arch.nr_pmu_counters; - - /* - * Programming HPMN to a value greater than PMCR_EL0.N is - * CONSTRAINED UNPREDICTABLE. Make the implementation choice that an - * UNKNOWN number of counters (in our case, zero) are reserved for EL2. - */ - if (hpmn >= n) - return 0; - - /* - * Programming HPMN=0 is CONSTRAINED UNPREDICTABLE if FEAT_HPMN0 isn't - * implemented. Since KVM's ability to emulate HPMN=0 does not directly - * depend on hardware (all PMU registers are trapped), make the - * implementation choice that all counters are included in the second - * range reserved for EL2/EL3. - */ - return GENMASK(n - 1, hpmn); -} - -bool kvm_pmu_counter_is_hyp(struct kvm_vcpu *vcpu, unsigned int idx) -{ - return kvm_pmu_hyp_counter_mask(vcpu) & BIT(idx); -} - -u64 kvm_pmu_accessible_counter_mask(struct kvm_vcpu *vcpu) -{ - u64 mask = kvm_pmu_implemented_counter_mask(vcpu); - - if (!vcpu_has_nv(vcpu) || vcpu_is_el2(vcpu)) - return mask; - - return mask & ~kvm_pmu_hyp_counter_mask(vcpu); -} - -u64 kvm_pmu_implemented_counter_mask(struct kvm_vcpu *vcpu) -{ - u64 val = FIELD_GET(ARMV8_PMU_PMCR_N, kvm_vcpu_read_pmcr(vcpu)); - - if (val == 0) - return BIT(ARMV8_PMU_CYCLE_IDX); - else - return GENMASK(val - 1, 0) | BIT(ARMV8_PMU_CYCLE_IDX); -} - static void kvm_pmc_enable_perf_event(struct kvm_pmc *pmc) { if (!pmc->perf_event) { @@ -370,7 +268,7 @@ void kvm_pmu_reprogram_counter_mask(struct kvm_vcpu *vcpu, u64 val) * counter where the values of the global enable control, PMOVSSET_EL0[n], and * PMINTENSET_EL1[n] are all 1. */ -static bool kvm_pmu_overflow_status(struct kvm_vcpu *vcpu) +bool kvm_pmu_overflow_status(struct kvm_vcpu *vcpu) { u64 reg = __vcpu_sys_reg(vcpu, PMOVSSET_EL0);
@@ -393,24 +291,6 @@ static bool kvm_pmu_overflow_status(struct kvm_vcpu *vcpu) return reg; }
-static void kvm_pmu_update_state(struct kvm_vcpu *vcpu) -{ - struct kvm_pmu *pmu = &vcpu->arch.pmu; - bool overflow; - - overflow = kvm_pmu_overflow_status(vcpu); - if (pmu->irq_level == overflow) - return; - - pmu->irq_level = overflow; - - if (likely(irqchip_in_kernel(vcpu->kvm))) { - int ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu, - pmu->irq_num, overflow, pmu); - WARN_ON(ret); - } -} - bool kvm_pmu_should_notify_user(struct kvm_vcpu *vcpu) { struct kvm_pmu *pmu = &vcpu->arch.pmu; @@ -436,43 +316,6 @@ void kvm_pmu_update_run(struct kvm_vcpu *vcpu) regs->device_irq_level |= KVM_ARM_DEV_PMU; }
-/** - * kvm_pmu_flush_hwstate - flush pmu state to cpu - * @vcpu: The vcpu pointer - * - * Check if the PMU has overflowed while we were running in the host, and inject - * an interrupt if that was the case. - */ -void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu) -{ - kvm_pmu_update_state(vcpu); -} - -/** - * kvm_pmu_sync_hwstate - sync pmu state from cpu - * @vcpu: The vcpu pointer - * - * Check if the PMU has overflowed while we were running in the guest, and - * inject an interrupt if that was the case. - */ -void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) -{ - kvm_pmu_update_state(vcpu); -} - -/* - * When perf interrupt is an NMI, we cannot safely notify the vcpu corresponding - * to the event. - * This is why we need a callback to do it once outside of the NMI context. - */ -static void kvm_pmu_perf_overflow_notify_vcpu(struct irq_work *work) -{ - struct kvm_vcpu *vcpu; - - vcpu = container_of(work, struct kvm_vcpu, arch.pmu.overflow_work); - kvm_vcpu_kick(vcpu); -} - /* * Perform an increment on any of the counters described in @mask, * generating the overflow if required, and propagate it as a chained @@ -784,132 +627,6 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, kvm_pmu_create_perf_event(pmc); }
-void kvm_host_pmu_init(struct arm_pmu *pmu) -{ - struct arm_pmu_entry *entry; - - /* - * Check the sanitised PMU version for the system, as KVM does not - * support implementations where PMUv3 exists on a subset of CPUs. - */ - if (!pmuv3_implemented(kvm_arm_pmu_get_pmuver_limit())) - return; - - guard(mutex)(&arm_pmus_lock); - - entry = kmalloc(sizeof(*entry), GFP_KERNEL); - if (!entry) - return; - - entry->arm_pmu = pmu; - list_add_tail(&entry->entry, &arm_pmus); -} - -static struct arm_pmu *kvm_pmu_probe_armpmu(void) -{ - struct arm_pmu_entry *entry; - struct arm_pmu *pmu; - int cpu; - - guard(mutex)(&arm_pmus_lock); - - /* - * It is safe to use a stale cpu to iterate the list of PMUs so long as - * the same value is used for the entirety of the loop. Given this, and - * the fact that no percpu data is used for the lookup there is no need - * to disable preemption. - * - * It is still necessary to get a valid cpu, though, to probe for the - * default PMU instance as userspace is not required to specify a PMU - * type. In order to uphold the preexisting behavior KVM selects the - * PMU instance for the core during vcpu init. A dependent use - * case would be a user with disdain of all things big.LITTLE that - * affines the VMM to a particular cluster of cores. - * - * In any case, userspace should just do the sane thing and use the UAPI - * to select a PMU type directly. But, be wary of the baggage being - * carried here. - */ - cpu = raw_smp_processor_id(); - list_for_each_entry(entry, &arm_pmus, entry) { - pmu = entry->arm_pmu; - - if (cpumask_test_cpu(cpu, &pmu->supported_cpus)) - return pmu; - } - - return NULL; -} - -static u64 __compute_pmceid(struct arm_pmu *pmu, bool pmceid1) -{ - u32 hi[2], lo[2]; - - bitmap_to_arr32(lo, pmu->pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS); - bitmap_to_arr32(hi, pmu->pmceid_ext_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS); - - return ((u64)hi[pmceid1] << 32) | lo[pmceid1]; -} - -static u64 compute_pmceid0(struct arm_pmu *pmu) -{ - u64 val = __compute_pmceid(pmu, 0); - - /* always support SW_INCR */ - val |= BIT(ARMV8_PMUV3_PERFCTR_SW_INCR); - /* always support CHAIN */ - val |= BIT(ARMV8_PMUV3_PERFCTR_CHAIN); - return val; -} - -static u64 compute_pmceid1(struct arm_pmu *pmu) -{ - u64 val = __compute_pmceid(pmu, 1); - - /* - * Don't advertise STALL_SLOT*, as PMMIR_EL0 is handled - * as RAZ - */ - val &= ~(BIT_ULL(ARMV8_PMUV3_PERFCTR_STALL_SLOT - 32) | - BIT_ULL(ARMV8_PMUV3_PERFCTR_STALL_SLOT_FRONTEND - 32) | - BIT_ULL(ARMV8_PMUV3_PERFCTR_STALL_SLOT_BACKEND - 32)); - return val; -} - -u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1) -{ - struct arm_pmu *cpu_pmu = vcpu->kvm->arch.arm_pmu; - unsigned long *bmap = vcpu->kvm->arch.pmu_filter; - u64 val, mask = 0; - int base, i, nr_events; - - if (!pmceid1) { - val = compute_pmceid0(cpu_pmu); - base = 0; - } else { - val = compute_pmceid1(cpu_pmu); - base = 32; - } - - if (!bmap) - return val; - - nr_events = kvm_pmu_event_mask(vcpu->kvm) + 1; - - for (i = 0; i < 32; i += 8) { - u64 byte; - - byte = bitmap_get_value8(bmap, base + i); - mask |= byte << i; - if (nr_events >= (0x4000 + base + 32)) { - byte = bitmap_get_value8(bmap, 0x4000 + base + i); - mask |= byte << (32 + i); - } - } - - return val & mask; -} - void kvm_vcpu_reload_pmu(struct kvm_vcpu *vcpu) { u64 mask = kvm_pmu_implemented_counter_mask(vcpu); @@ -921,393 +638,6 @@ void kvm_vcpu_reload_pmu(struct kvm_vcpu *vcpu) kvm_pmu_reprogram_counter_mask(vcpu, mask); }
-int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu) -{ - if (!vcpu->arch.pmu.created) - return -EINVAL; - - /* - * A valid interrupt configuration for the PMU is either to have a - * properly configured interrupt number and using an in-kernel - * irqchip, or to not have an in-kernel GIC and not set an IRQ. - */ - if (irqchip_in_kernel(vcpu->kvm)) { - int irq = vcpu->arch.pmu.irq_num; - /* - * If we are using an in-kernel vgic, at this point we know - * the vgic will be initialized, so we can check the PMU irq - * number against the dimensions of the vgic and make sure - * it's valid. - */ - if (!irq_is_ppi(irq) && !vgic_valid_spi(vcpu->kvm, irq)) - return -EINVAL; - } else if (kvm_arm_pmu_irq_initialized(vcpu)) { - return -EINVAL; - } - - return 0; -} - -static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu) -{ - if (irqchip_in_kernel(vcpu->kvm)) { - int ret; - - /* - * If using the PMU with an in-kernel virtual GIC - * implementation, we require the GIC to be already - * initialized when initializing the PMU. - */ - if (!vgic_initialized(vcpu->kvm)) - return -ENODEV; - - if (!kvm_arm_pmu_irq_initialized(vcpu)) - return -ENXIO; - - ret = kvm_vgic_set_owner(vcpu, vcpu->arch.pmu.irq_num, - &vcpu->arch.pmu); - if (ret) - return ret; - } - - init_irq_work(&vcpu->arch.pmu.overflow_work, - kvm_pmu_perf_overflow_notify_vcpu); - - vcpu->arch.pmu.created = true; - return 0; -} - -/* - * For one VM the interrupt type must be same for each vcpu. - * As a PPI, the interrupt number is the same for all vcpus, - * while as an SPI it must be a separate number per vcpu. - */ -static bool pmu_irq_is_valid(struct kvm *kvm, int irq) -{ - unsigned long i; - struct kvm_vcpu *vcpu; - - kvm_for_each_vcpu(i, vcpu, kvm) { - if (!kvm_arm_pmu_irq_initialized(vcpu)) - continue; - - if (irq_is_ppi(irq)) { - if (vcpu->arch.pmu.irq_num != irq) - return false; - } else { - if (vcpu->arch.pmu.irq_num == irq) - return false; - } - } - - return true; -} - -/** - * kvm_arm_pmu_get_max_counters - Return the max number of PMU counters. - * @kvm: The kvm pointer - */ -u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm) -{ - struct arm_pmu *arm_pmu = kvm->arch.arm_pmu; - - /* - * PMUv3 requires that all event counters are capable of counting any - * event, though the same may not be true of non-PMUv3 hardware. - */ - if (cpus_have_final_cap(ARM64_WORKAROUND_PMUV3_IMPDEF_TRAPS)) - return 1; - - /* - * The arm_pmu->cntr_mask considers the fixed counter(s) as well. - * Ignore those and return only the general-purpose counters. - */ - return bitmap_weight(arm_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS); -} - -static void kvm_arm_set_nr_counters(struct kvm *kvm, unsigned int nr) -{ - kvm->arch.nr_pmu_counters = nr; - - /* Reset MDCR_EL2.HPMN behind the vcpus' back... */ - if (test_bit(KVM_ARM_VCPU_HAS_EL2, kvm->arch.vcpu_features)) { - struct kvm_vcpu *vcpu; - unsigned long i; - - kvm_for_each_vcpu(i, vcpu, kvm) { - u64 val = __vcpu_sys_reg(vcpu, MDCR_EL2); - val &= ~MDCR_EL2_HPMN; - val |= FIELD_PREP(MDCR_EL2_HPMN, kvm->arch.nr_pmu_counters); - __vcpu_assign_sys_reg(vcpu, MDCR_EL2, val); - } - } -} - -static void kvm_arm_set_pmu(struct kvm *kvm, struct arm_pmu *arm_pmu) -{ - lockdep_assert_held(&kvm->arch.config_lock); - - kvm->arch.arm_pmu = arm_pmu; - kvm_arm_set_nr_counters(kvm, kvm_arm_pmu_get_max_counters(kvm)); -} - -/** - * kvm_arm_set_default_pmu - No PMU set, get the default one. - * @kvm: The kvm pointer - * - * The observant among you will notice that the supported_cpus - * mask does not get updated for the default PMU even though it - * is quite possible the selected instance supports only a - * subset of cores in the system. This is intentional, and - * upholds the preexisting behavior on heterogeneous systems - * where vCPUs can be scheduled on any core but the guest - * counters could stop working. - */ -int kvm_arm_set_default_pmu(struct kvm *kvm) -{ - struct arm_pmu *arm_pmu = kvm_pmu_probe_armpmu(); - - if (!arm_pmu) - return -ENODEV; - - kvm_arm_set_pmu(kvm, arm_pmu); - return 0; -} - -static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id) -{ - struct kvm *kvm = vcpu->kvm; - struct arm_pmu_entry *entry; - struct arm_pmu *arm_pmu; - int ret = -ENXIO; - - lockdep_assert_held(&kvm->arch.config_lock); - mutex_lock(&arm_pmus_lock); - - list_for_each_entry(entry, &arm_pmus, entry) { - arm_pmu = entry->arm_pmu; - if (arm_pmu->pmu.type == pmu_id) { - if (kvm_vm_has_ran_once(kvm) || - (kvm->arch.pmu_filter && kvm->arch.arm_pmu != arm_pmu)) { - ret = -EBUSY; - break; - } - - kvm_arm_set_pmu(kvm, arm_pmu); - cpumask_copy(kvm->arch.supported_cpus, &arm_pmu->supported_cpus); - ret = 0; - break; - } - } - - mutex_unlock(&arm_pmus_lock); - return ret; -} - -static int kvm_arm_pmu_v3_set_nr_counters(struct kvm_vcpu *vcpu, unsigned int n) -{ - struct kvm *kvm = vcpu->kvm; - - if (!kvm->arch.arm_pmu) - return -EINVAL; - - if (n > kvm_arm_pmu_get_max_counters(kvm)) - return -EINVAL; - - kvm_arm_set_nr_counters(kvm, n); - return 0; -} - -int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) -{ - struct kvm *kvm = vcpu->kvm; - - lockdep_assert_held(&kvm->arch.config_lock); - - if (!kvm_vcpu_has_pmu(vcpu)) - return -ENODEV; - - if (vcpu->arch.pmu.created) - return -EBUSY; - - switch (attr->attr) { - case KVM_ARM_VCPU_PMU_V3_IRQ: { - int __user *uaddr = (int __user *)(long)attr->addr; - int irq; - - if (!irqchip_in_kernel(kvm)) - return -EINVAL; - - if (get_user(irq, uaddr)) - return -EFAULT; - - /* The PMU overflow interrupt can be a PPI or a valid SPI. */ - if (!(irq_is_ppi(irq) || irq_is_spi(irq))) - return -EINVAL; - - if (!pmu_irq_is_valid(kvm, irq)) - return -EINVAL; - - if (kvm_arm_pmu_irq_initialized(vcpu)) - return -EBUSY; - - kvm_debug("Set kvm ARM PMU irq: %d\n", irq); - vcpu->arch.pmu.irq_num = irq; - return 0; - } - case KVM_ARM_VCPU_PMU_V3_FILTER: { - u8 pmuver = kvm_arm_pmu_get_pmuver_limit(); - struct kvm_pmu_event_filter __user *uaddr; - struct kvm_pmu_event_filter filter; - int nr_events; - - /* - * Allow userspace to specify an event filter for the entire - * event range supported by PMUVer of the hardware, rather - * than the guest's PMUVer for KVM backward compatibility. - */ - nr_events = __kvm_pmu_event_mask(pmuver) + 1; - - uaddr = (struct kvm_pmu_event_filter __user *)(long)attr->addr; - - if (copy_from_user(&filter, uaddr, sizeof(filter))) - return -EFAULT; - - if (((u32)filter.base_event + filter.nevents) > nr_events || - (filter.action != KVM_PMU_EVENT_ALLOW && - filter.action != KVM_PMU_EVENT_DENY)) - return -EINVAL; - - if (kvm_vm_has_ran_once(kvm)) - return -EBUSY; - - if (!kvm->arch.pmu_filter) { - kvm->arch.pmu_filter = bitmap_alloc(nr_events, GFP_KERNEL_ACCOUNT); - if (!kvm->arch.pmu_filter) - return -ENOMEM; - - /* - * The default depends on the first applied filter. - * If it allows events, the default is to deny. - * Conversely, if the first filter denies a set of - * events, the default is to allow. - */ - if (filter.action == KVM_PMU_EVENT_ALLOW) - bitmap_zero(kvm->arch.pmu_filter, nr_events); - else - bitmap_fill(kvm->arch.pmu_filter, nr_events); - } - - if (filter.action == KVM_PMU_EVENT_ALLOW) - bitmap_set(kvm->arch.pmu_filter, filter.base_event, filter.nevents); - else - bitmap_clear(kvm->arch.pmu_filter, filter.base_event, filter.nevents); - - return 0; - } - case KVM_ARM_VCPU_PMU_V3_SET_PMU: { - int __user *uaddr = (int __user *)(long)attr->addr; - int pmu_id; - - if (get_user(pmu_id, uaddr)) - return -EFAULT; - - return kvm_arm_pmu_v3_set_pmu(vcpu, pmu_id); - } - case KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS: { - unsigned int __user *uaddr = (unsigned int __user *)(long)attr->addr; - unsigned int n; - - if (get_user(n, uaddr)) - return -EFAULT; - - return kvm_arm_pmu_v3_set_nr_counters(vcpu, n); - } - case KVM_ARM_VCPU_PMU_V3_INIT: - return kvm_arm_pmu_v3_init(vcpu); - } - - return -ENXIO; -} - -int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) -{ - switch (attr->attr) { - case KVM_ARM_VCPU_PMU_V3_IRQ: { - int __user *uaddr = (int __user *)(long)attr->addr; - int irq; - - if (!irqchip_in_kernel(vcpu->kvm)) - return -EINVAL; - - if (!kvm_vcpu_has_pmu(vcpu)) - return -ENODEV; - - if (!kvm_arm_pmu_irq_initialized(vcpu)) - return -ENXIO; - - irq = vcpu->arch.pmu.irq_num; - return put_user(irq, uaddr); - } - } - - return -ENXIO; -} - -int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) -{ - switch (attr->attr) { - case KVM_ARM_VCPU_PMU_V3_IRQ: - case KVM_ARM_VCPU_PMU_V3_INIT: - case KVM_ARM_VCPU_PMU_V3_FILTER: - case KVM_ARM_VCPU_PMU_V3_SET_PMU: - case KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS: - if (kvm_vcpu_has_pmu(vcpu)) - return 0; - } - - return -ENXIO; -} - -u8 kvm_arm_pmu_get_pmuver_limit(void) -{ - unsigned int pmuver; - - pmuver = SYS_FIELD_GET(ID_AA64DFR0_EL1, PMUVer, - read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1)); - - /* - * Spoof a barebones PMUv3 implementation if the system supports IMPDEF - * traps of the PMUv3 sysregs - */ - if (cpus_have_final_cap(ARM64_WORKAROUND_PMUV3_IMPDEF_TRAPS)) - return ID_AA64DFR0_EL1_PMUVer_IMP; - - /* - * Otherwise, treat IMPLEMENTATION DEFINED functionality as - * unimplemented - */ - if (pmuver == ID_AA64DFR0_EL1_PMUVer_IMP_DEF) - return 0; - - return min(pmuver, ID_AA64DFR0_EL1_PMUVer_V3P5); -} - -/** - * kvm_vcpu_read_pmcr - Read PMCR_EL0 register for the vCPU - * @vcpu: The vcpu pointer - */ -u64 kvm_vcpu_read_pmcr(struct kvm_vcpu *vcpu) -{ - u64 pmcr = __vcpu_sys_reg(vcpu, PMCR_EL0); - u64 n = vcpu->kvm->arch.nr_pmu_counters; - - if (vcpu_has_nv(vcpu) && !vcpu_is_el2(vcpu)) - n = FIELD_GET(MDCR_EL2_HPMN, __vcpu_sys_reg(vcpu, MDCR_EL2)); - - return u64_replace_bits(pmcr, n, ARMV8_PMU_PMCR_N); -} - void kvm_pmu_nested_transition(struct kvm_vcpu *vcpu) { bool reprogrammed = false; diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c index 6b48a3d16d0d5..79b7ea037153a 100644 --- a/arch/arm64/kvm/pmu.c +++ b/arch/arm64/kvm/pmu.c @@ -8,8 +8,21 @@ #include <linux/perf/arm_pmu.h> #include <linux/perf/arm_pmuv3.h>
+#include <asm/kvm_emulate.h> +#include <asm/kvm_pmu.h> + +static LIST_HEAD(arm_pmus); +static DEFINE_MUTEX(arm_pmus_lock); static DEFINE_PER_CPU(struct kvm_pmu_events, kvm_pmu_events);
+#define kvm_arm_pmu_irq_initialized(v) ((v)->arch.pmu.irq_num >= VGIC_NR_SGIS) + +bool kvm_supports_guest_pmuv3(void) +{ + guard(mutex)(&arm_pmus_lock); + return !list_empty(&arm_pmus); +} + /* * Given the perf event attributes and system type, determine * if we are going to need to switch counters at guest entry/exit. @@ -209,3 +222,665 @@ void kvm_vcpu_pmu_resync_el0(void)
kvm_make_request(KVM_REQ_RESYNC_PMU_EL0, vcpu); } + +void kvm_host_pmu_init(struct arm_pmu *pmu) +{ + struct arm_pmu_entry *entry; + + /* + * Check the sanitised PMU version for the system, as KVM does not + * support implementations where PMUv3 exists on a subset of CPUs. + */ + if (!pmuv3_implemented(kvm_arm_pmu_get_pmuver_limit())) + return; + + guard(mutex)(&arm_pmus_lock); + + entry = kmalloc(sizeof(*entry), GFP_KERNEL); + if (!entry) + return; + + entry->arm_pmu = pmu; + list_add_tail(&entry->entry, &arm_pmus); +} + +static struct arm_pmu *kvm_pmu_probe_armpmu(void) +{ + struct arm_pmu_entry *entry; + struct arm_pmu *pmu; + int cpu; + + guard(mutex)(&arm_pmus_lock); + + /* + * It is safe to use a stale cpu to iterate the list of PMUs so long as + * the same value is used for the entirety of the loop. Given this, and + * the fact that no percpu data is used for the lookup there is no need + * to disable preemption. + * + * It is still necessary to get a valid cpu, though, to probe for the + * default PMU instance as userspace is not required to specify a PMU + * type. In order to uphold the preexisting behavior KVM selects the + * PMU instance for the core during vcpu init. A dependent use + * case would be a user with disdain of all things big.LITTLE that + * affines the VMM to a particular cluster of cores. + * + * In any case, userspace should just do the sane thing and use the UAPI + * to select a PMU type directly. But, be wary of the baggage being + * carried here. + */ + cpu = raw_smp_processor_id(); + list_for_each_entry(entry, &arm_pmus, entry) { + pmu = entry->arm_pmu; + + if (cpumask_test_cpu(cpu, &pmu->supported_cpus)) + return pmu; + } + + return NULL; +} + +static u64 __compute_pmceid(struct arm_pmu *pmu, bool pmceid1) +{ + u32 hi[2], lo[2]; + + bitmap_to_arr32(lo, pmu->pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS); + bitmap_to_arr32(hi, pmu->pmceid_ext_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS); + + return ((u64)hi[pmceid1] << 32) | lo[pmceid1]; +} + +static u64 compute_pmceid0(struct arm_pmu *pmu) +{ + u64 val = __compute_pmceid(pmu, 0); + + /* always support SW_INCR */ + val |= BIT(ARMV8_PMUV3_PERFCTR_SW_INCR); + /* always support CHAIN */ + val |= BIT(ARMV8_PMUV3_PERFCTR_CHAIN); + return val; +} + +static u64 compute_pmceid1(struct arm_pmu *pmu) +{ + u64 val = __compute_pmceid(pmu, 1); + + /* + * Don't advertise STALL_SLOT*, as PMMIR_EL0 is handled + * as RAZ + */ + val &= ~(BIT_ULL(ARMV8_PMUV3_PERFCTR_STALL_SLOT - 32) | + BIT_ULL(ARMV8_PMUV3_PERFCTR_STALL_SLOT_FRONTEND - 32) | + BIT_ULL(ARMV8_PMUV3_PERFCTR_STALL_SLOT_BACKEND - 32)); + return val; +} + +u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1) +{ + struct arm_pmu *cpu_pmu = vcpu->kvm->arch.arm_pmu; + unsigned long *bmap = vcpu->kvm->arch.pmu_filter; + u64 val, mask = 0; + int base, i, nr_events; + + if (!pmceid1) { + val = compute_pmceid0(cpu_pmu); + base = 0; + } else { + val = compute_pmceid1(cpu_pmu); + base = 32; + } + + if (!bmap) + return val; + + nr_events = kvm_pmu_event_mask(vcpu->kvm) + 1; + + for (i = 0; i < 32; i += 8) { + u64 byte; + + byte = bitmap_get_value8(bmap, base + i); + mask |= byte << i; + if (nr_events >= (0x4000 + base + 32)) { + byte = bitmap_get_value8(bmap, 0x4000 + base + i); + mask |= byte << (32 + i); + } + } + + return val & mask; +} + +/* + * When perf interrupt is an NMI, we cannot safely notify the vcpu corresponding + * to the event. + * This is why we need a callback to do it once outside of the NMI context. + */ +static void kvm_pmu_perf_overflow_notify_vcpu(struct irq_work *work) +{ + struct kvm_vcpu *vcpu; + + vcpu = container_of(work, struct kvm_vcpu, arch.pmu.overflow_work); + kvm_vcpu_kick(vcpu); +} + +static u32 __kvm_pmu_event_mask(unsigned int pmuver) +{ + switch (pmuver) { + case ID_AA64DFR0_EL1_PMUVer_IMP: + return GENMASK(9, 0); + case ID_AA64DFR0_EL1_PMUVer_V3P1: + case ID_AA64DFR0_EL1_PMUVer_V3P4: + case ID_AA64DFR0_EL1_PMUVer_V3P5: + case ID_AA64DFR0_EL1_PMUVer_V3P7: + return GENMASK(15, 0); + default: /* Shouldn't be here, just for sanity */ + WARN_ONCE(1, "Unknown PMU version %d\n", pmuver); + return 0; + } +} + +u32 kvm_pmu_event_mask(struct kvm *kvm) +{ + u64 dfr0 = kvm_read_vm_id_reg(kvm, SYS_ID_AA64DFR0_EL1); + u8 pmuver = SYS_FIELD_GET(ID_AA64DFR0_EL1, PMUVer, dfr0); + + return __kvm_pmu_event_mask(pmuver); +} + +u64 kvm_pmu_evtyper_mask(struct kvm *kvm) +{ + u64 mask = ARMV8_PMU_EXCLUDE_EL1 | ARMV8_PMU_EXCLUDE_EL0 | + kvm_pmu_event_mask(kvm); + + if (kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL2, IMP)) + mask |= ARMV8_PMU_INCLUDE_EL2; + + if (kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL3, IMP)) + mask |= ARMV8_PMU_EXCLUDE_NS_EL0 | + ARMV8_PMU_EXCLUDE_NS_EL1 | + ARMV8_PMU_EXCLUDE_EL3; + + return mask; +} + +static void kvm_pmu_update_state(struct kvm_vcpu *vcpu) +{ + struct kvm_pmu *pmu = &vcpu->arch.pmu; + bool overflow; + + overflow = kvm_pmu_overflow_status(vcpu); + if (pmu->irq_level == overflow) + return; + + pmu->irq_level = overflow; + + if (likely(irqchip_in_kernel(vcpu->kvm))) { + int ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu, + pmu->irq_num, overflow, pmu); + WARN_ON(ret); + } +} + +/** + * kvm_pmu_flush_hwstate - flush pmu state to cpu + * @vcpu: The vcpu pointer + * + * Check if the PMU has overflowed while we were running in the host, and inject + * an interrupt if that was the case. + */ +void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu) +{ + kvm_pmu_update_state(vcpu); +} + +/** + * kvm_pmu_sync_hwstate - sync pmu state from cpu + * @vcpu: The vcpu pointer + * + * Check if the PMU has overflowed while we were running in the guest, and + * inject an interrupt if that was the case. + */ +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) +{ + kvm_pmu_update_state(vcpu); +} + +int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu) +{ + if (!vcpu->arch.pmu.created) + return -EINVAL; + + /* + * A valid interrupt configuration for the PMU is either to have a + * properly configured interrupt number and using an in-kernel + * irqchip, or to not have an in-kernel GIC and not set an IRQ. + */ + if (irqchip_in_kernel(vcpu->kvm)) { + int irq = vcpu->arch.pmu.irq_num; + /* + * If we are using an in-kernel vgic, at this point we know + * the vgic will be initialized, so we can check the PMU irq + * number against the dimensions of the vgic and make sure + * it's valid. + */ + if (!irq_is_ppi(irq) && !vgic_valid_spi(vcpu->kvm, irq)) + return -EINVAL; + } else if (kvm_arm_pmu_irq_initialized(vcpu)) { + return -EINVAL; + } + + return 0; +} + +static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu) +{ + if (irqchip_in_kernel(vcpu->kvm)) { + int ret; + + /* + * If using the PMU with an in-kernel virtual GIC + * implementation, we require the GIC to be already + * initialized when initializing the PMU. + */ + if (!vgic_initialized(vcpu->kvm)) + return -ENODEV; + + if (!kvm_arm_pmu_irq_initialized(vcpu)) + return -ENXIO; + + ret = kvm_vgic_set_owner(vcpu, vcpu->arch.pmu.irq_num, + &vcpu->arch.pmu); + if (ret) + return ret; + } + + init_irq_work(&vcpu->arch.pmu.overflow_work, + kvm_pmu_perf_overflow_notify_vcpu); + + vcpu->arch.pmu.created = true; + return 0; +} + +/* + * For one VM the interrupt type must be same for each vcpu. + * As a PPI, the interrupt number is the same for all vcpus, + * while as an SPI it must be a separate number per vcpu. + */ +static bool pmu_irq_is_valid(struct kvm *kvm, int irq) +{ + unsigned long i; + struct kvm_vcpu *vcpu; + + kvm_for_each_vcpu(i, vcpu, kvm) { + if (!kvm_arm_pmu_irq_initialized(vcpu)) + continue; + + if (irq_is_ppi(irq)) { + if (vcpu->arch.pmu.irq_num != irq) + return false; + } else { + if (vcpu->arch.pmu.irq_num == irq) + return false; + } + } + + return true; +} + +/** + * kvm_arm_pmu_get_max_counters - Return the max number of PMU counters. + * @kvm: The kvm pointer + */ +u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm) +{ + struct arm_pmu *arm_pmu = kvm->arch.arm_pmu; + + /* + * PMUv3 requires that all event counters are capable of counting any + * event, though the same may not be true of non-PMUv3 hardware. + */ + if (cpus_have_final_cap(ARM64_WORKAROUND_PMUV3_IMPDEF_TRAPS)) + return 1; + + /* + * The arm_pmu->cntr_mask considers the fixed counter(s) as well. + * Ignore those and return only the general-purpose counters. + */ + return bitmap_weight(arm_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS); +} + +static void kvm_arm_set_nr_counters(struct kvm *kvm, unsigned int nr) +{ + kvm->arch.nr_pmu_counters = nr; + + /* Reset MDCR_EL2.HPMN behind the vcpus' back... */ + if (test_bit(KVM_ARM_VCPU_HAS_EL2, kvm->arch.vcpu_features)) { + struct kvm_vcpu *vcpu; + unsigned long i; + + kvm_for_each_vcpu(i, vcpu, kvm) { + u64 val = __vcpu_sys_reg(vcpu, MDCR_EL2); + + val &= ~MDCR_EL2_HPMN; + val |= FIELD_PREP(MDCR_EL2_HPMN, kvm->arch.nr_pmu_counters); + __vcpu_assign_sys_reg(vcpu, MDCR_EL2, val); + } + } +} + +static void kvm_arm_set_pmu(struct kvm *kvm, struct arm_pmu *arm_pmu) +{ + lockdep_assert_held(&kvm->arch.config_lock); + + kvm->arch.arm_pmu = arm_pmu; + kvm_arm_set_nr_counters(kvm, kvm_arm_pmu_get_max_counters(kvm)); +} + +/** + * kvm_arm_set_default_pmu - No PMU set, get the default one. + * @kvm: The kvm pointer + * + * The observant among you will notice that the supported_cpus + * mask does not get updated for the default PMU even though it + * is quite possible the selected instance supports only a + * subset of cores in the system. This is intentional, and + * upholds the preexisting behavior on heterogeneous systems + * where vCPUs can be scheduled on any core but the guest + * counters could stop working. + */ +int kvm_arm_set_default_pmu(struct kvm *kvm) +{ + struct arm_pmu *arm_pmu = kvm_pmu_probe_armpmu(); + + if (!arm_pmu) + return -ENODEV; + + kvm_arm_set_pmu(kvm, arm_pmu); + return 0; +} + +static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id) +{ + struct kvm *kvm = vcpu->kvm; + struct arm_pmu_entry *entry; + struct arm_pmu *arm_pmu; + int ret = -ENXIO; + + lockdep_assert_held(&kvm->arch.config_lock); + mutex_lock(&arm_pmus_lock); + + list_for_each_entry(entry, &arm_pmus, entry) { + arm_pmu = entry->arm_pmu; + if (arm_pmu->pmu.type == pmu_id) { + if (kvm_vm_has_ran_once(kvm) || + (kvm->arch.pmu_filter && kvm->arch.arm_pmu != arm_pmu)) { + ret = -EBUSY; + break; + } + + kvm_arm_set_pmu(kvm, arm_pmu); + cpumask_copy(kvm->arch.supported_cpus, &arm_pmu->supported_cpus); + ret = 0; + break; + } + } + + mutex_unlock(&arm_pmus_lock); + return ret; +} + +static int kvm_arm_pmu_v3_set_nr_counters(struct kvm_vcpu *vcpu, unsigned int n) +{ + struct kvm *kvm = vcpu->kvm; + + if (!kvm->arch.arm_pmu) + return -EINVAL; + + if (n > kvm_arm_pmu_get_max_counters(kvm)) + return -EINVAL; + + kvm_arm_set_nr_counters(kvm, n); + return 0; +} + +int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) +{ + struct kvm *kvm = vcpu->kvm; + + lockdep_assert_held(&kvm->arch.config_lock); + + if (!kvm_vcpu_has_pmu(vcpu)) + return -ENODEV; + + if (vcpu->arch.pmu.created) + return -EBUSY; + + switch (attr->attr) { + case KVM_ARM_VCPU_PMU_V3_IRQ: { + int __user *uaddr = (int __user *)(long)attr->addr; + int irq; + + if (!irqchip_in_kernel(kvm)) + return -EINVAL; + + if (get_user(irq, uaddr)) + return -EFAULT; + + /* The PMU overflow interrupt can be a PPI or a valid SPI. */ + if (!(irq_is_ppi(irq) || irq_is_spi(irq))) + return -EINVAL; + + if (!pmu_irq_is_valid(kvm, irq)) + return -EINVAL; + + if (kvm_arm_pmu_irq_initialized(vcpu)) + return -EBUSY; + + kvm_debug("Set kvm ARM PMU irq: %d\n", irq); + vcpu->arch.pmu.irq_num = irq; + return 0; + } + case KVM_ARM_VCPU_PMU_V3_FILTER: { + u8 pmuver = kvm_arm_pmu_get_pmuver_limit(); + struct kvm_pmu_event_filter __user *uaddr; + struct kvm_pmu_event_filter filter; + int nr_events; + + /* + * Allow userspace to specify an event filter for the entire + * event range supported by PMUVer of the hardware, rather + * than the guest's PMUVer for KVM backward compatibility. + */ + nr_events = __kvm_pmu_event_mask(pmuver) + 1; + + uaddr = (struct kvm_pmu_event_filter __user *)(long)attr->addr; + + if (copy_from_user(&filter, uaddr, sizeof(filter))) + return -EFAULT; + + if (((u32)filter.base_event + filter.nevents) > nr_events || + (filter.action != KVM_PMU_EVENT_ALLOW && + filter.action != KVM_PMU_EVENT_DENY)) + return -EINVAL; + + if (kvm_vm_has_ran_once(kvm)) + return -EBUSY; + + if (!kvm->arch.pmu_filter) { + kvm->arch.pmu_filter = bitmap_alloc(nr_events, GFP_KERNEL_ACCOUNT); + if (!kvm->arch.pmu_filter) + return -ENOMEM; + + /* + * The default depends on the first applied filter. + * If it allows events, the default is to deny. + * Conversely, if the first filter denies a set of + * events, the default is to allow. + */ + if (filter.action == KVM_PMU_EVENT_ALLOW) + bitmap_zero(kvm->arch.pmu_filter, nr_events); + else + bitmap_fill(kvm->arch.pmu_filter, nr_events); + } + + if (filter.action == KVM_PMU_EVENT_ALLOW) + bitmap_set(kvm->arch.pmu_filter, filter.base_event, filter.nevents); + else + bitmap_clear(kvm->arch.pmu_filter, filter.base_event, filter.nevents); + + return 0; + } + case KVM_ARM_VCPU_PMU_V3_SET_PMU: { + int __user *uaddr = (int __user *)(long)attr->addr; + int pmu_id; + + if (get_user(pmu_id, uaddr)) + return -EFAULT; + + return kvm_arm_pmu_v3_set_pmu(vcpu, pmu_id); + } + case KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS: { + unsigned int __user *uaddr = (unsigned int __user *)(long)attr->addr; + unsigned int n; + + if (get_user(n, uaddr)) + return -EFAULT; + + return kvm_arm_pmu_v3_set_nr_counters(vcpu, n); + } + case KVM_ARM_VCPU_PMU_V3_INIT: + return kvm_arm_pmu_v3_init(vcpu); + } + + return -ENXIO; +} + +int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) +{ + switch (attr->attr) { + case KVM_ARM_VCPU_PMU_V3_IRQ: { + int __user *uaddr = (int __user *)(long)attr->addr; + int irq; + + if (!irqchip_in_kernel(vcpu->kvm)) + return -EINVAL; + + if (!kvm_vcpu_has_pmu(vcpu)) + return -ENODEV; + + if (!kvm_arm_pmu_irq_initialized(vcpu)) + return -ENXIO; + + irq = vcpu->arch.pmu.irq_num; + return put_user(irq, uaddr); + } + } + + return -ENXIO; +} + +int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) +{ + switch (attr->attr) { + case KVM_ARM_VCPU_PMU_V3_IRQ: + case KVM_ARM_VCPU_PMU_V3_INIT: + case KVM_ARM_VCPU_PMU_V3_FILTER: + case KVM_ARM_VCPU_PMU_V3_SET_PMU: + case KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS: + if (kvm_vcpu_has_pmu(vcpu)) + return 0; + } + + return -ENXIO; +} + +u8 kvm_arm_pmu_get_pmuver_limit(void) +{ + unsigned int pmuver; + + pmuver = SYS_FIELD_GET(ID_AA64DFR0_EL1, PMUVer, + read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1)); + + /* + * Spoof a barebones PMUv3 implementation if the system supports IMPDEF + * traps of the PMUv3 sysregs + */ + if (cpus_have_final_cap(ARM64_WORKAROUND_PMUV3_IMPDEF_TRAPS)) + return ID_AA64DFR0_EL1_PMUVer_IMP; + + /* + * Otherwise, treat IMPLEMENTATION DEFINED functionality as + * unimplemented + */ + if (pmuver == ID_AA64DFR0_EL1_PMUVer_IMP_DEF) + return 0; + + return min(pmuver, ID_AA64DFR0_EL1_PMUVer_V3P5); +} + +u64 kvm_pmu_implemented_counter_mask(struct kvm_vcpu *vcpu) +{ + u64 val = FIELD_GET(ARMV8_PMU_PMCR_N, kvm_vcpu_read_pmcr(vcpu)); + + if (val == 0) + return BIT(ARMV8_PMU_CYCLE_IDX); + else + return GENMASK(val - 1, 0) | BIT(ARMV8_PMU_CYCLE_IDX); +} + +u64 kvm_pmu_hyp_counter_mask(struct kvm_vcpu *vcpu) +{ + unsigned int hpmn, n; + + if (!vcpu_has_nv(vcpu)) + return 0; + + hpmn = SYS_FIELD_GET(MDCR_EL2, HPMN, __vcpu_sys_reg(vcpu, MDCR_EL2)); + n = vcpu->kvm->arch.nr_pmu_counters; + + /* + * Programming HPMN to a value greater than PMCR_EL0.N is + * CONSTRAINED UNPREDICTABLE. Make the implementation choice that an + * UNKNOWN number of counters (in our case, zero) are reserved for EL2. + */ + if (hpmn >= n) + return 0; + + /* + * Programming HPMN=0 is CONSTRAINED UNPREDICTABLE if FEAT_HPMN0 isn't + * implemented. Since KVM's ability to emulate HPMN=0 does not directly + * depend on hardware (all PMU registers are trapped), make the + * implementation choice that all counters are included in the second + * range reserved for EL2/EL3. + */ + return GENMASK(n - 1, hpmn); +} + +bool kvm_pmu_counter_is_hyp(struct kvm_vcpu *vcpu, unsigned int idx) +{ + return kvm_pmu_hyp_counter_mask(vcpu) & BIT(idx); +} + +u64 kvm_pmu_accessible_counter_mask(struct kvm_vcpu *vcpu) +{ + u64 mask = kvm_pmu_implemented_counter_mask(vcpu); + + if (!vcpu_has_nv(vcpu) || vcpu_is_el2(vcpu)) + return mask; + + return mask & ~kvm_pmu_hyp_counter_mask(vcpu); +} + +/** + * kvm_vcpu_read_pmcr - Read PMCR_EL0 register for the vCPU + * @vcpu: The vcpu pointer + */ +u64 kvm_vcpu_read_pmcr(struct kvm_vcpu *vcpu) +{ + u64 pmcr = __vcpu_sys_reg(vcpu, PMCR_EL0); + u64 n = vcpu->kvm->arch.nr_pmu_counters; + + if (vcpu_has_nv(vcpu) && !vcpu_is_el2(vcpu)) + n = FIELD_GET(MDCR_EL2_HPMN, __vcpu_sys_reg(vcpu, MDCR_EL2)); + + return u64_replace_bits(pmcr, n, ARMV8_PMU_PMCR_N); +}
When we re-enter the VM after handling a PMU interrupt, calculate whether it was any of the guest counters that overflowed and inject an interrupt into the guest if so.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/include/asm/kvm_pmu.h | 2 ++ arch/arm64/kvm/pmu-direct.c | 20 ++++++++++++++++++++ arch/arm64/kvm/pmu-emul.c | 4 ++-- arch/arm64/kvm/pmu.c | 6 +++++- 4 files changed, 29 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index e4cbab0fd09cf..47e6f2a14e381 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -92,6 +92,8 @@ bool pmu_counter_idx_valid(struct kvm_vcpu *vcpu, u64 idx); void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu); void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu); void kvm_vcpu_pmu_resync_el0(void); +bool kvm_pmu_emul_overflow_status(struct kvm_vcpu *vcpu); +bool kvm_pmu_part_overflow_status(struct kvm_vcpu *vcpu);
#define kvm_vcpu_has_pmu(vcpu) \ (vcpu_has_feature(vcpu, KVM_ARM_VCPU_PMU_V3)) diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c index 76d8ed24c8646..2ee99d6d2b6c1 100644 --- a/arch/arm64/kvm/pmu-direct.c +++ b/arch/arm64/kvm/pmu-direct.c @@ -413,3 +413,23 @@ void kvm_pmu_handle_guest_irq(u64 govf)
__vcpu_rmw_sys_reg(vcpu, PMOVSSET_EL0, |=, govf); } + +/** + * kvm_pmu_part_overflow_status() - Determine if any guest counters have overflowed + * @vcpu: Ponter to struct kvm_vcpu + * + * Determine if any guest counters have overflowed and therefore an + * IRQ needs to be injected into the guest. + * + * Return: True if there was an overflow, false otherwise + */ +bool kvm_pmu_part_overflow_status(struct kvm_vcpu *vcpu) +{ + struct arm_pmu *pmu = vcpu->kvm->arch.arm_pmu; + u64 mask = kvm_pmu_guest_counter_mask(pmu); + u64 pmovs = __vcpu_sys_reg(vcpu, PMOVSSET_EL0); + u64 pmint = read_pmintenset(); + u64 pmcr = read_pmcr(); + + return (pmcr & ARMV8_PMU_PMCR_E) && (mask & pmovs & pmint); +} diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index bcaa9f7a8ca28..6f41fc3e3f74b 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -268,7 +268,7 @@ void kvm_pmu_reprogram_counter_mask(struct kvm_vcpu *vcpu, u64 val) * counter where the values of the global enable control, PMOVSSET_EL0[n], and * PMINTENSET_EL1[n] are all 1. */ -bool kvm_pmu_overflow_status(struct kvm_vcpu *vcpu) +bool kvm_pmu_emul_overflow_status(struct kvm_vcpu *vcpu) { u64 reg = __vcpu_sys_reg(vcpu, PMOVSSET_EL0);
@@ -405,7 +405,7 @@ static void kvm_pmu_perf_overflow(struct perf_event *perf_event, kvm_pmu_counter_increment(vcpu, BIT(idx + 1), ARMV8_PMUV3_PERFCTR_CHAIN);
- if (kvm_pmu_overflow_status(vcpu)) { + if (kvm_pmu_emul_overflow_status(vcpu)) { kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
if (!in_nmi()) diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c index c9862e55a4049..e1332a158dfc8 100644 --- a/arch/arm64/kvm/pmu.c +++ b/arch/arm64/kvm/pmu.c @@ -407,7 +407,11 @@ static void kvm_pmu_update_state(struct kvm_vcpu *vcpu) struct kvm_pmu *pmu = &vcpu->arch.pmu; bool overflow;
- overflow = kvm_pmu_overflow_status(vcpu); + if (kvm_vcpu_pmu_is_partitioned(vcpu)) + overflow = kvm_pmu_part_overflow_status(vcpu); + else + overflow = kvm_pmu_emul_overflow_status(vcpu); + if (pmu->irq_level == overflow) return;
With FGT in place, the remaining trapped registers need to be written through to the underlying physical registers as well as the virtual ones. Failing to do this means guest writes will not take effect when expected.
For the PMEVTYPER register, take care to enforce KVM's PMU event filter. Do that by setting the bits to exclude EL1 and EL0 when an event is not present in the filter and clearing the bit to include EL2 always.
Note the virtual register is always assigned the value specified by the guest to hide the setting of those bits.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/kvm/sys_regs.c | 34 +++++++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index c636840b1f6f9..0c9596325519b 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1166,6 +1166,36 @@ static bool access_pmu_evcntr(struct kvm_vcpu *vcpu, return true; }
+static bool writethrough_pmevtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p, + u64 reg, u64 idx) +{ + u64 eventsel; + u64 val = p->regval; + u64 evtyper_set = ARMV8_PMU_EXCLUDE_EL0 | + ARMV8_PMU_EXCLUDE_EL1; + u64 evtyper_clr = ARMV8_PMU_INCLUDE_EL2; + + __vcpu_assign_sys_reg(vcpu, reg, val); + + if (idx == ARMV8_PMU_CYCLE_IDX) + eventsel = ARMV8_PMUV3_PERFCTR_CPU_CYCLES; + else + eventsel = val & kvm_pmu_event_mask(vcpu->kvm); + + if (vcpu->kvm->arch.pmu_filter && + !test_bit(eventsel, vcpu->kvm->arch.pmu_filter)) + val |= evtyper_set; + + val &= ~evtyper_clr; + + if (idx == ARMV8_PMU_CYCLE_IDX) + write_pmccfiltr(val); + else + write_pmevtypern(idx, val); + + return true; +} + static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p, const struct sys_reg_desc *r) { @@ -1192,7 +1222,9 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p, if (!pmu_counter_idx_valid(vcpu, idx)) return false;
- if (p->is_write) { + if (kvm_vcpu_pmu_is_partitioned(vcpu) && p->is_write) { + writethrough_pmevtyper(vcpu, p, reg, idx); + } else if (p->is_write) { kvm_pmu_set_counter_event_type(vcpu, p->regval, idx); kvm_vcpu_pmu_restore_guest(vcpu); } else {
Save and restore newly untrapped registers that can be directly accessed by the guest when the PMU is partitioned.
* PMEVCNTRn_EL0 * PMCCNTR_EL0 * PMICNTR_EL0 * PMUSERENR_EL0 * PMSELR_EL0 * PMCR_EL0 * PMCNTEN_EL0 * PMINTEN_EL1
If we know we are not using FGT (that is, trapping everything), then return immediately. Either the PMU is not partitioned, or it is but all register writes are being written through the VCPU fields to hardware, so all values are fresh.
Since we are taking over context switching, avoid the writes to PMSELR_EL0 and PMUSERENR_EL0 that would normally occur in __{,de}activate_traps_common()
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/include/asm/kvm_pmu.h | 4 + arch/arm64/kvm/arm.c | 2 + arch/arm64/kvm/hyp/include/hyp/switch.h | 4 +- arch/arm64/kvm/pmu-direct.c | 112 ++++++++++++++++++++++++ 4 files changed, 120 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 8b634112eded2..25a5eb8c623da 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -103,6 +103,8 @@ void kvm_pmu_host_counters_disable(void);
u8 kvm_pmu_guest_num_counters(struct kvm_vcpu *vcpu); u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu); +void kvm_pmu_load(struct kvm_vcpu *vcpu); +void kvm_pmu_put(struct kvm_vcpu *vcpu);
#if !defined(__KVM_NVHE_HYPERVISOR__) bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu); @@ -184,6 +186,8 @@ static inline u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu) { return 0; } +static inline void kvm_pmu_load(struct kvm_vcpu *vcpu) {} +static inline void kvm_pmu_put(struct kvm_vcpu *vcpu) {} static inline void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val) {} static inline void kvm_pmu_set_counter_value_user(struct kvm_vcpu *vcpu, diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 43e92f35f56ab..1750df5944f6d 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -629,6 +629,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) kvm_vcpu_load_vhe(vcpu); kvm_arch_vcpu_load_fp(vcpu); kvm_vcpu_pmu_restore_guest(vcpu); + kvm_pmu_load(vcpu); if (kvm_arm_is_pvtime_enabled(&vcpu->arch)) kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
@@ -671,6 +672,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) kvm_timer_vcpu_put(vcpu); kvm_vgic_put(vcpu); kvm_vcpu_pmu_restore_host(vcpu); + kvm_pmu_put(vcpu); if (vcpu_has_nv(vcpu)) kvm_vcpu_put_hw_mmu(vcpu); kvm_arm_vmid_clear_active(); diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index 40bd00df6c58f..bde79ec1a1836 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -311,7 +311,7 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu) * counter, which could make a PMXEVCNTR_EL0 access UNDEF at * EL1 instead of being trapped to EL2. */ - if (system_supports_pmuv3()) { + if (system_supports_pmuv3() && !kvm_vcpu_pmu_is_partitioned(vcpu)) { write_sysreg(0, pmselr_el0);
ctxt_sys_reg(hctxt, PMUSERENR_EL0) = read_sysreg(pmuserenr_el0); @@ -340,7 +340,7 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu) struct kvm_cpu_context *hctxt = host_data_ptr(host_ctxt);
write_sysreg(0, hstr_el2); - if (system_supports_pmuv3()) { + if (system_supports_pmuv3() && !kvm_vcpu_pmu_is_partitioned(vcpu)) { write_sysreg(ctxt_sys_reg(hctxt, PMUSERENR_EL0), pmuserenr_el0); vcpu_clear_flag(vcpu, PMUSERENR_ON_CPU); } diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c index 7fb4fb5c22e2a..71977d24f489a 100644 --- a/arch/arm64/kvm/pmu-direct.c +++ b/arch/arm64/kvm/pmu-direct.c @@ -9,6 +9,7 @@ #include <linux/perf/arm_pmuv3.h>
#include <asm/arm_pmuv3.h> +#include <asm/kvm_emulate.h> #include <asm/kvm_pmu.h>
/** @@ -219,3 +220,114 @@ u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu)
return nr_host_cnt_max; } + +/** + * kvm_pmu_load() - Load untrapped PMU registers + * @vcpu: Pointer to struct kvm_vcpu + * + * Load all untrapped PMU registers from the VCPU into the PCPU. Mask + * to only bits belonging to guest-reserved counters and leave + * host-reserved counters alone in bitmask registers. + */ +void kvm_pmu_load(struct kvm_vcpu *vcpu) +{ + struct arm_pmu *pmu; + u64 mask; + u8 i; + u64 val; + + /* + * If we aren't using FGT then we are trapping everything + * anyway, so no need to bother with the swap. + */ + if (!kvm_vcpu_pmu_use_fgt(vcpu)) + return; + + pmu = vcpu->kvm->arch.arm_pmu; + + for (i = 0; i < pmu->hpmn_max; i++) { + val = __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i); + write_pmevcntrn(i, val); + } + + val = __vcpu_sys_reg(vcpu, PMCCNTR_EL0); + write_pmccntr(val); + + val = __vcpu_sys_reg(vcpu, PMUSERENR_EL0); + write_pmuserenr(val); + + val = __vcpu_sys_reg(vcpu, PMSELR_EL0); + write_pmselr(val); + + /* Save only the stateful writable bits. */ + val = __vcpu_sys_reg(vcpu, PMCR_EL0); + mask = ARMV8_PMU_PMCR_MASK & + ~(ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C); + write_pmcr(val & mask); + + /* + * When handling these: + * 1. Apply only the bits for guest counters (indicated by mask) + * 2. Use the different registers for set and clear + */ + mask = kvm_pmu_guest_counter_mask(pmu); + + val = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0); + write_pmcntenset(val & mask); + write_pmcntenclr(~val & mask); + + val = __vcpu_sys_reg(vcpu, PMINTENSET_EL1); + write_pmintenset(val & mask); + write_pmintenclr(~val & mask); +} + +/** + * kvm_pmu_put() - Put untrapped PMU registers + * @vcpu: Pointer to struct kvm_vcpu + * + * Put all untrapped PMU registers from the VCPU into the PCPU. Mask + * to only bits belonging to guest-reserved counters and leave + * host-reserved counters alone in bitmask registers. + */ +void kvm_pmu_put(struct kvm_vcpu *vcpu) +{ + struct arm_pmu *pmu; + u64 mask; + u8 i; + u64 val; + + /* + * If we aren't using FGT then we are trapping everything + * anyway, so no need to bother with the swap. + */ + if (!kvm_vcpu_pmu_use_fgt(vcpu)) + return; + + pmu = vcpu->kvm->arch.arm_pmu; + + for (i = 0; i < pmu->hpmn_max; i++) { + val = read_pmevcntrn(i); + __vcpu_assign_sys_reg(vcpu, PMEVCNTR0_EL0 + i, val); + } + + val = read_pmccntr(); + __vcpu_assign_sys_reg(vcpu, PMCCNTR_EL0, val); + + val = read_pmuserenr(); + __vcpu_assign_sys_reg(vcpu, PMUSERENR_EL0, val); + + val = read_pmselr(); + __vcpu_assign_sys_reg(vcpu, PMSELR_EL0, val); + + val = read_pmcr(); + __vcpu_assign_sys_reg(vcpu, PMCR_EL0, val); + + /* Mask these to only save the guest relevant bits. */ + mask = kvm_pmu_guest_counter_mask(pmu); + + val = read_pmcntenset(); + __vcpu_assign_sys_reg(vcpu, PMCNTENSET_EL0, val & mask); + + val = read_pmintenset(); + __vcpu_assign_sys_reg(vcpu, PMINTENSET_EL1, val & mask); +}
Rerun all tests for a partitioned PMU in vpmu_counter_access.
Create an enum specifying whether we are testing the emulated or partitioned PMU and all the test functions are modified to take the implementation as an argument and make the difference in setup appropriately.
Signed-off-by: Colton Lewis coltonlewis@google.com --- tools/include/uapi/linux/kvm.h | 1 + .../selftests/kvm/arm64/vpmu_counter_access.c | 77 ++++++++++++++----- 2 files changed, 57 insertions(+), 21 deletions(-)
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h index 52f6000ab0208..2bb2f234df0e6 100644 --- a/tools/include/uapi/linux/kvm.h +++ b/tools/include/uapi/linux/kvm.h @@ -963,6 +963,7 @@ struct kvm_enable_cap { #define KVM_CAP_RISCV_MP_STATE_RESET 242 #define KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 243 #define KVM_CAP_GUEST_MEMFD_FLAGS 244 +#define KVM_CAP_ARM_PARTITION_PMU 245
struct kvm_irq_routing_irqchip { __u32 irqchip; diff --git a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c index ae36325c022fb..e68072e3e1326 100644 --- a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c +++ b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c @@ -25,9 +25,20 @@ /* The cycle counter bit position that's common among the PMU registers */ #define ARMV8_PMU_CYCLE_IDX 31
+enum pmu_impl { + EMULATED, + PARTITIONED +}; + +const char *pmu_impl_str[] = { + "Emulated", + "Partitioned" +}; + struct vpmu_vm { struct kvm_vm *vm; struct kvm_vcpu *vcpu; + bool pmu_partitioned; };
static struct vpmu_vm vpmu_vm; @@ -399,7 +410,7 @@ static void guest_code(uint64_t expected_pmcr_n) }
/* Create a VM that has one vCPU with PMUv3 configured. */ -static void create_vpmu_vm(void *guest_code) +static void create_vpmu_vm(void *guest_code, enum pmu_impl impl) { struct kvm_vcpu_init init; uint8_t pmuver, ec; @@ -409,6 +420,11 @@ static void create_vpmu_vm(void *guest_code) .attr = KVM_ARM_VCPU_PMU_V3_IRQ, .addr = (uint64_t)&irq, }; + bool partition = (impl == PARTITIONED); + struct kvm_enable_cap partition_cap = { + .cap = KVM_CAP_ARM_PARTITION_PMU, + .args[0] = partition, + };
/* The test creates the vpmu_vm multiple times. Ensure a clean state */ memset(&vpmu_vm, 0, sizeof(vpmu_vm)); @@ -420,6 +436,12 @@ static void create_vpmu_vm(void *guest_code) guest_sync_handler); }
+ if (kvm_has_cap(KVM_CAP_ARM_PARTITION_PMU)) { + vm_ioctl(vpmu_vm.vm, KVM_ENABLE_CAP, &partition_cap); + vpmu_vm.pmu_partitioned = partition; + pr_debug("Set PMU partitioning: %d\n", partition); + } + /* Create vCPU with PMUv3 */ kvm_get_default_vcpu_target(vpmu_vm.vm, &init); init.features[0] |= (1 << KVM_ARM_VCPU_PMU_V3); @@ -461,13 +483,14 @@ static void run_vcpu(struct kvm_vcpu *vcpu, uint64_t pmcr_n) } }
-static void test_create_vpmu_vm_with_nr_counters(unsigned int nr_counters, bool expect_fail) +static void test_create_vpmu_vm_with_nr_counters( + unsigned int nr_counters, enum pmu_impl impl, bool expect_fail) { struct kvm_vcpu *vcpu; unsigned int prev; int ret;
- create_vpmu_vm(guest_code); + create_vpmu_vm(guest_code, impl); vcpu = vpmu_vm.vcpu;
prev = get_pmcr_n(vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(SYS_PMCR_EL0))); @@ -489,7 +512,7 @@ static void test_create_vpmu_vm_with_nr_counters(unsigned int nr_counters, bool * Create a guest with one vCPU, set the PMCR_EL0.N for the vCPU to @pmcr_n, * and run the test. */ -static void run_access_test(uint64_t pmcr_n) +static void run_access_test(uint64_t pmcr_n, enum pmu_impl impl) { uint64_t sp; struct kvm_vcpu *vcpu; @@ -497,7 +520,7 @@ static void run_access_test(uint64_t pmcr_n)
pr_debug("Test with pmcr_n %lu\n", pmcr_n);
- test_create_vpmu_vm_with_nr_counters(pmcr_n, false); + test_create_vpmu_vm_with_nr_counters(pmcr_n, impl, false); vcpu = vpmu_vm.vcpu;
/* Save the initial sp to restore them later to run the guest again */ @@ -531,14 +554,14 @@ static struct pmreg_sets validity_check_reg_sets[] = { * Create a VM, and check if KVM handles the userspace accesses of * the PMU register sets in @validity_check_reg_sets[] correctly. */ -static void run_pmregs_validity_test(uint64_t pmcr_n) +static void run_pmregs_validity_test(uint64_t pmcr_n, enum pmu_impl impl) { int i; struct kvm_vcpu *vcpu; uint64_t set_reg_id, clr_reg_id, reg_val; uint64_t valid_counters_mask, max_counters_mask;
- test_create_vpmu_vm_with_nr_counters(pmcr_n, false); + test_create_vpmu_vm_with_nr_counters(pmcr_n, impl, false); vcpu = vpmu_vm.vcpu;
valid_counters_mask = get_counters_mask(pmcr_n); @@ -588,11 +611,11 @@ static void run_pmregs_validity_test(uint64_t pmcr_n) * the vCPU to @pmcr_n, which is larger than the host value. * The attempt should fail as @pmcr_n is too big to set for the vCPU. */ -static void run_error_test(uint64_t pmcr_n) +static void run_error_test(uint64_t pmcr_n, enum pmu_impl impl) { - pr_debug("Error test with pmcr_n %lu (larger than the host)\n", pmcr_n); + pr_debug("Error test with pmcr_n %lu (larger than the host allows)\n", pmcr_n);
- test_create_vpmu_vm_with_nr_counters(pmcr_n, true); + test_create_vpmu_vm_with_nr_counters(pmcr_n, impl, true); destroy_vpmu_vm(); }
@@ -600,11 +623,11 @@ static void run_error_test(uint64_t pmcr_n) * Return the default number of implemented PMU event counters excluding * the cycle counter (i.e. PMCR_EL0.N value) for the guest. */ -static uint64_t get_pmcr_n_limit(void) +static uint64_t get_pmcr_n_limit(enum pmu_impl impl) { uint64_t pmcr;
- create_vpmu_vm(guest_code); + create_vpmu_vm(guest_code, impl); pmcr = vcpu_get_reg(vpmu_vm.vcpu, KVM_ARM64_SYS_REG(SYS_PMCR_EL0)); destroy_vpmu_vm(); return get_pmcr_n(pmcr); @@ -614,7 +637,7 @@ static bool kvm_supports_nr_counters_attr(void) { bool supported;
- create_vpmu_vm(NULL); + create_vpmu_vm(NULL, EMULATED); supported = !__vcpu_has_device_attr(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL, KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS); destroy_vpmu_vm(); @@ -622,22 +645,34 @@ static bool kvm_supports_nr_counters_attr(void) return supported; }
-int main(void) +void test_pmu(enum pmu_impl impl) { uint64_t i, pmcr_n;
- TEST_REQUIRE(kvm_has_cap(KVM_CAP_ARM_PMU_V3)); - TEST_REQUIRE(kvm_supports_vgic_v3()); - TEST_REQUIRE(kvm_supports_nr_counters_attr()); + pr_info("Testing PMU: Implementation = %s\n", pmu_impl_str[impl]); + + pmcr_n = get_pmcr_n_limit(impl); + pr_debug("PMCR_EL0.N: Limit = %lu\n", pmcr_n);
- pmcr_n = get_pmcr_n_limit(); for (i = 0; i <= pmcr_n; i++) { - run_access_test(i); - run_pmregs_validity_test(i); + run_access_test(i, impl); + run_pmregs_validity_test(i, impl); }
for (i = pmcr_n + 1; i < ARMV8_PMU_MAX_COUNTERS; i++) - run_error_test(i); + run_error_test(i, impl); +} + +int main(void) +{ + TEST_REQUIRE(kvm_has_cap(KVM_CAP_ARM_PMU_V3)); + TEST_REQUIRE(kvm_supports_vgic_v3()); + TEST_REQUIRE(kvm_supports_nr_counters_attr()); + + test_pmu(EMULATED); + + if (kvm_has_cap(KVM_CAP_ARM_PARTITION_PMU)) + test_pmu(PARTITIONED);
return 0; }
If the PMU is partitioned, keep the driver out of the guest counter partition and only use the host counter partition.
Define some functions that determine whether the PMU is partitioned and construct mutually exclusive bitmaps for testing which partition a particular counter is in. Note that despite their separate position in the bitmap, the cycle and instruction counters are always in the guest partition.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm/include/asm/arm_pmuv3.h | 18 +++++++ arch/arm64/include/asm/kvm_pmu.h | 24 +++++++++ arch/arm64/kvm/pmu-direct.c | 86 ++++++++++++++++++++++++++++++++ drivers/perf/arm_pmuv3.c | 41 +++++++++++++-- 4 files changed, 165 insertions(+), 4 deletions(-)
diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h index 636b1aab9e8d2..3ea5741d213d8 100644 --- a/arch/arm/include/asm/arm_pmuv3.h +++ b/arch/arm/include/asm/arm_pmuv3.h @@ -231,6 +231,24 @@ static inline bool kvm_set_pmuserenr(u64 val) }
static inline void kvm_vcpu_pmu_resync_el0(void) {} +static inline void kvm_pmu_host_counters_enable(void) {} +static inline void kvm_pmu_host_counters_disable(void) {} + +static inline bool kvm_pmu_is_partitioned(struct arm_pmu *pmu) +{ + return false; +} + +static inline u64 kvm_pmu_host_counter_mask(struct arm_pmu *pmu) +{ + return ~0; +} + +static inline u64 kvm_pmu_guest_counter_mask(struct arm_pmu *pmu) +{ + return ~0; +} +
/* PMU Version in DFR Register */ #define ARMV8_PMU_DFR_VER_NI 0 diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 63bff75e4f8dd..8887f39c25e60 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -90,6 +90,12 @@ void kvm_vcpu_pmu_resync_el0(void); #define kvm_vcpu_has_pmu(vcpu) \ (vcpu_has_feature(vcpu, KVM_ARM_VCPU_PMU_V3))
+bool kvm_pmu_is_partitioned(struct arm_pmu *pmu); +u64 kvm_pmu_host_counter_mask(struct arm_pmu *pmu); +u64 kvm_pmu_guest_counter_mask(struct arm_pmu *pmu); +void kvm_pmu_host_counters_enable(void); +void kvm_pmu_host_counters_disable(void); + /* * Updates the vcpu's view of the pmu events for this cpu. * Must be called before every vcpu run after disabling interrupts, to ensure @@ -222,6 +228,24 @@ static inline bool kvm_pmu_counter_is_hyp(struct kvm_vcpu *vcpu, unsigned int id
static inline void kvm_pmu_nested_transition(struct kvm_vcpu *vcpu) {}
+static inline bool kvm_pmu_is_partitioned(void *pmu) +{ + return false; +} + +static inline u64 kvm_pmu_host_counter_mask(void *pmu) +{ + return ~0; +} + +static inline u64 kvm_pmu_guest_counter_mask(void *pmu) +{ + return ~0; +} + +static inline void kvm_pmu_host_counters_enable(void) {} +static inline void kvm_pmu_host_counters_disable(void) {} + #endif
#endif diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c index 0d38265b6f290..d5de7fdd059f4 100644 --- a/arch/arm64/kvm/pmu-direct.c +++ b/arch/arm64/kvm/pmu-direct.c @@ -5,7 +5,10 @@ */
#include <linux/kvm_host.h> +#include <linux/perf/arm_pmu.h> +#include <linux/perf/arm_pmuv3.h>
+#include <asm/arm_pmuv3.h> #include <asm/kvm_pmu.h>
/** @@ -20,3 +23,86 @@ bool kvm_pmu_partition_supported(void) return has_vhe() && system_supports_pmuv3(); } + +/** + * kvm_pmu_is_partitioned() - Determine if given PMU is partitioned + * @pmu: Pointer to arm_pmu struct + * + * Determine if given PMU is partitioned by looking at hpmn field. The + * PMU is partitioned if this field is less than the number of + * counters in the system. + * + * Return: True if the PMU is partitioned, false otherwise + */ +bool kvm_pmu_is_partitioned(struct arm_pmu *pmu) +{ + if (!pmu) + return false; + + return pmu->hpmn_max >= 0 && + pmu->hpmn_max <= *host_data_ptr(nr_event_counters); +} + +/** + * kvm_pmu_host_counter_mask() - Compute bitmask of host-reserved counters + * @pmu: Pointer to arm_pmu struct + * + * Compute the bitmask that selects the host-reserved counters in the + * {PMCNTEN,PMINTEN,PMOVS}{SET,CLR} registers. These are the counters + * in HPMN..N + * + * Return: Bitmask + */ +u64 kvm_pmu_host_counter_mask(struct arm_pmu *pmu) +{ + u8 nr_counters = *host_data_ptr(nr_event_counters); + + if (!kvm_pmu_is_partitioned(pmu)) + return ARMV8_PMU_CNT_MASK_ALL; + + return GENMASK(nr_counters - 1, pmu->hpmn_max); +} + +/** + * kvm_pmu_guest_counter_mask() - Compute bitmask of guest-reserved counters + * + * Compute the bitmask that selects the guest-reserved counters in the + * {PMCNTEN,PMINTEN,PMOVS}{SET,CLR} registers. These are the counters + * in 0..HPMN and the cycle and instruction counters. + * + * Return: Bitmask + */ +u64 kvm_pmu_guest_counter_mask(struct arm_pmu *pmu) +{ + return ARMV8_PMU_CNT_MASK_ALL & ~kvm_pmu_host_counter_mask(pmu); +} + +/** + * kvm_pmu_host_counters_enable() - Enable host-reserved counters + * + * When partitioned the enable bit for host-reserved counters is + * MDCR_EL2.HPME instead of the typical PMCR_EL0.E, which now + * exclusively controls the guest-reserved counters. Enable that bit. + */ +void kvm_pmu_host_counters_enable(void) +{ + u64 mdcr = read_sysreg(mdcr_el2); + + mdcr |= MDCR_EL2_HPME; + write_sysreg(mdcr, mdcr_el2); +} + +/** + * kvm_pmu_host_counters_disable() - Disable host-reserved counters + * + * When partitioned the disable bit for host-reserved counters is + * MDCR_EL2.HPME instead of the typical PMCR_EL0.E, which now + * exclusively controls the guest-reserved counters. Disable that bit. + */ +void kvm_pmu_host_counters_disable(void) +{ + u64 mdcr = read_sysreg(mdcr_el2); + + mdcr &= ~MDCR_EL2_HPME; + write_sysreg(mdcr, mdcr_el2); +} diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 3e6eb4be4ac43..2bed99ba992d7 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -871,6 +871,9 @@ static void armv8pmu_start(struct arm_pmu *cpu_pmu) brbe_enable(cpu_pmu);
/* Enable all counters */ + if (kvm_pmu_is_partitioned(cpu_pmu)) + kvm_pmu_host_counters_enable(); + armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E); }
@@ -882,6 +885,9 @@ static void armv8pmu_stop(struct arm_pmu *cpu_pmu) brbe_disable();
/* Disable all counters */ + if (kvm_pmu_is_partitioned(cpu_pmu)) + kvm_pmu_host_counters_disable(); + armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMU_PMCR_E); }
@@ -998,6 +1004,7 @@ static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc, static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc, struct perf_event *event) { + struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); struct hw_perf_event *hwc = &event->hw; unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
@@ -1018,6 +1025,12 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc, if (has_branch_stack(event)) return false;
+ /* + * If partitioned at all, pmccntr belongs to the guest. + */ + if (kvm_pmu_is_partitioned(cpu_pmu)) + return false; + return true; }
@@ -1044,6 +1057,7 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc, * may not know how to handle it. */ if ((evtype == ARMV8_PMUV3_PERFCTR_INST_RETIRED) && + !kvm_pmu_is_partitioned(cpu_pmu) && !armv8pmu_event_get_threshold(&event->attr) && test_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->cntr_mask) && !armv8pmu_event_want_user_access(event)) { @@ -1055,7 +1069,7 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc, * Otherwise use events counters */ if (armv8pmu_event_is_chained(event)) - return armv8pmu_get_chain_idx(cpuc, cpu_pmu); + return armv8pmu_get_chain_idx(cpuc, cpu_pmu); else return armv8pmu_get_single_idx(cpuc, cpu_pmu); } @@ -1167,6 +1181,14 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event, return 0; }
+static void armv8pmu_reset_host_counters(struct arm_pmu *cpu_pmu) +{ + int idx; + + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS) + armv8pmu_write_evcntr(idx, 0); +} + static void armv8pmu_reset(void *info) { struct arm_pmu *cpu_pmu = (struct arm_pmu *)info; @@ -1174,6 +1196,9 @@ static void armv8pmu_reset(void *info)
bitmap_to_arr64(&mask, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS);
+ if (kvm_pmu_is_partitioned(cpu_pmu)) + mask &= kvm_pmu_host_counter_mask(cpu_pmu); + /* The counter and interrupt enable registers are unknown at reset. */ armv8pmu_disable_counter(mask); armv8pmu_disable_intens(mask); @@ -1186,11 +1211,19 @@ static void armv8pmu_reset(void *info) brbe_invalidate(); }
+ pmcr = ARMV8_PMU_PMCR_LC; + /* - * Initialize & Reset PMNC. Request overflow interrupt for - * 64 bit cycle counter but cheat in armv8pmu_write_counter(). + * Initialize & Reset PMNC. Request overflow interrupt for 64 + * bit cycle counter but cheat in armv8pmu_write_counter(). + * + * When partitioned, there is no single bit to reset only the + * host counters. so reset them individually. */ - pmcr = ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C | ARMV8_PMU_PMCR_LC; + if (kvm_pmu_is_partitioned(cpu_pmu)) + armv8pmu_reset_host_counters(cpu_pmu); + else + pmcr = ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C;
/* Enable long event counter support where available */ if (armv8pmu_has_long_event(cpu_pmu))
Since many guests will never touch the PMU, they need not pay the cost of context swapping those registers.
Use an enum to implement a simple state machine for PMU register access. The PMU either accesses registers virtually or physically.
Virtual access implies all PMU registers are trapped coarsely by MDCR_EL2.TPM and therefore do not need to be context swapped. Physical access implies some registers are untrapped through FGT and do need to be context swapped. All vCPUs do virtual access by default and transition to physical if the PMU is partitioned and the guest actually tries a PMU access.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/include/asm/kvm_pmu.h | 4 ++++ arch/arm64/include/asm/kvm_types.h | 7 ++++++- arch/arm64/kvm/debug.c | 2 +- arch/arm64/kvm/hyp/include/hyp/switch.h | 2 ++ arch/arm64/kvm/pmu-direct.c | 21 +++++++++++++++++++++ arch/arm64/kvm/pmu.c | 7 +++++++ arch/arm64/kvm/sys_regs.c | 4 ++++ 8 files changed, 46 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index c7e52aaf469dc..f92027d8fdfd0 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -1373,6 +1373,7 @@ static inline bool kvm_system_needs_idmapped_vectors(void) return cpus_have_final_cap(ARM64_SPECTRE_V3A); }
+void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu); void kvm_init_host_debug_data(void); void kvm_debug_init_vhe(void); void kvm_vcpu_load_debug(struct kvm_vcpu *vcpu); diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 25a5eb8c623da..43aa334dce517 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -38,6 +38,7 @@ struct kvm_pmu { int irq_num; bool created; bool irq_level; + enum vcpu_pmu_register_access access; };
struct arm_pmu_entry { @@ -106,6 +107,8 @@ u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu); void kvm_pmu_load(struct kvm_vcpu *vcpu); void kvm_pmu_put(struct kvm_vcpu *vcpu);
+void kvm_pmu_set_physical_access(struct kvm_vcpu *vcpu); + #if !defined(__KVM_NVHE_HYPERVISOR__) bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu); bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu); @@ -188,6 +191,7 @@ static inline u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu) } static inline void kvm_pmu_load(struct kvm_vcpu *vcpu) {} static inline void kvm_pmu_put(struct kvm_vcpu *vcpu) {} +static inline void kvm_pmu_set_physical_access(struct kvm_vcpu *vcpu) {} static inline void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val) {} static inline void kvm_pmu_set_counter_value_user(struct kvm_vcpu *vcpu, diff --git a/arch/arm64/include/asm/kvm_types.h b/arch/arm64/include/asm/kvm_types.h index 9a126b9e2d7c9..9f67165359f5c 100644 --- a/arch/arm64/include/asm/kvm_types.h +++ b/arch/arm64/include/asm/kvm_types.h @@ -4,5 +4,10 @@
#define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE 40
-#endif /* _ASM_ARM64_KVM_TYPES_H */ +enum vcpu_pmu_register_access { + VCPU_PMU_ACCESS_UNSET, + VCPU_PMU_ACCESS_VIRTUAL, + VCPU_PMU_ACCESS_PHYSICAL, +};
+#endif /* _ASM_ARM64_KVM_TYPES_H */ diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c index 0ab89c91e19cb..c2cf6b308ec60 100644 --- a/arch/arm64/kvm/debug.c +++ b/arch/arm64/kvm/debug.c @@ -34,7 +34,7 @@ static int cpu_has_spe(u64 dfr0) * - Self-hosted Trace Filter controls (MDCR_EL2_TTRF) * - Self-hosted Trace (MDCR_EL2_TTRF/MDCR_EL2_E2TB) */ -static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu) +void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu) { int hpmn = kvm_pmu_hpmn(vcpu);
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index bde79ec1a1836..ea288a712bb5d 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -963,6 +963,8 @@ static bool kvm_hyp_handle_pmu_regs(struct kvm_vcpu *vcpu) if (ret) __kvm_skip_instr(vcpu);
+ kvm_pmu_set_physical_access(vcpu); + return ret; }
diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c index 8d0d6d1a0d851..c5767e2ebc651 100644 --- a/arch/arm64/kvm/pmu-direct.c +++ b/arch/arm64/kvm/pmu-direct.c @@ -73,6 +73,7 @@ bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu) u8 hpmn = vcpu->kvm->arch.nr_pmu_counters;
return kvm_vcpu_pmu_is_partitioned(vcpu) && + vcpu->arch.pmu.access == VCPU_PMU_ACCESS_PHYSICAL && cpus_have_final_cap(ARM64_HAS_FGT) && (hpmn != 0 || cpus_have_final_cap(ARM64_HAS_HPMN0)); } @@ -92,6 +93,26 @@ u64 kvm_pmu_fgt2_bits(void) | HDFGRTR2_EL2_nPMICNTR_EL0; }
+/** + * kvm_pmu_set_physical_access() + * @vcpu: Pointer to vcpu struct + * + * Reconfigure the guest for physical access of PMU hardware if + * allowed. This means reconfiguring mdcr_el2 and loading the vCPU + * state onto hardware. + * + */ + +void kvm_pmu_set_physical_access(struct kvm_vcpu *vcpu) +{ + if (kvm_vcpu_pmu_is_partitioned(vcpu) + && vcpu->arch.pmu.access == VCPU_PMU_ACCESS_VIRTUAL) { + vcpu->arch.pmu.access = VCPU_PMU_ACCESS_PHYSICAL; + kvm_arm_setup_mdcr_el2(vcpu); + kvm_pmu_load(vcpu); + } +} + /** * kvm_pmu_host_counter_mask() - Compute bitmask of host-reserved counters * @pmu: Pointer to arm_pmu struct diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c index 48b39f096fa12..c9862e55a4049 100644 --- a/arch/arm64/kvm/pmu.c +++ b/arch/arm64/kvm/pmu.c @@ -471,6 +471,12 @@ int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu) return 0; }
+static void kvm_pmu_register_init(struct kvm_vcpu *vcpu) +{ + if (vcpu->arch.pmu.access == VCPU_PMU_ACCESS_UNSET) + vcpu->arch.pmu.access = VCPU_PMU_ACCESS_VIRTUAL; +} + static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu) { if (irqchip_in_kernel(vcpu->kvm)) { @@ -496,6 +502,7 @@ static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu) init_irq_work(&vcpu->arch.pmu.overflow_work, kvm_pmu_perf_overflow_notify_vcpu);
+ kvm_pmu_register_init(vcpu); vcpu->arch.pmu.created = true; return 0; } diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index f2ae761625a66..d73218706b834 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1197,6 +1197,8 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p, p->regval = __vcpu_sys_reg(vcpu, reg); }
+ kvm_pmu_set_physical_access(vcpu); + return true; }
@@ -1302,6 +1304,8 @@ static bool access_pmovs(struct kvm_vcpu *vcpu, struct sys_reg_params *p, p->regval = __vcpu_sys_reg(vcpu, PMOVSSET_EL0); }
+ kvm_pmu_set_physical_access(vcpu); + return true; }
On Tue, Dec 09, 2025 at 08:51:09PM +0000, Colton Lewis wrote:
Because PMXEVTYPER is trapped and PMSELR is not, it is not appropriate to use the virtual PMSELR register when it could be outdated and lead to an invalid write. Use the physical register when partitioned.
Signed-off-by: Colton Lewis coltonlewis@google.com
arch/arm64/include/asm/arm_pmuv3.h | 7 ++++++- arch/arm64/kvm/sys_regs.c | 9 +++++++-- 2 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h index 27c4d6d47da31..60600f04b5902 100644 --- a/arch/arm64/include/asm/arm_pmuv3.h +++ b/arch/arm64/include/asm/arm_pmuv3.h @@ -70,11 +70,16 @@ static inline u64 read_pmcr(void) return read_sysreg(pmcr_el0); } -static inline void write_pmselr(u32 val) +static inline void write_pmselr(u64 val) { write_sysreg(val, pmselr_el0); } +static inline u64 read_pmselr(void) +{
- return read_sysreg(pmselr_el0);
+}
static inline void write_pmccntr(u64 val) { write_sysreg(val, pmccntr_el0); diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 0c9596325519b..2e6d907fa8af2 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1199,14 +1199,19 @@ static bool writethrough_pmevtyper(struct kvm_vcpu *vcpu, struct sys_reg_params static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p, const struct sys_reg_desc *r) {
- u64 idx, reg;
- u64 idx, reg, pmselr;
if (pmu_access_el0_disabled(vcpu)) return false; if (r->CRn == 9 && r->CRm == 13 && r->Op2 == 1) { /* PMXEVTYPER_EL0 */
idx = SYS_FIELD_GET(PMSELR_EL0, SEL, __vcpu_sys_reg(vcpu, PMSELR_EL0));
if (kvm_vcpu_pmu_is_partitioned(vcpu))pmselr = read_pmselr();elsepmselr = __vcpu_sys_reg(vcpu, PMSELR_EL0);
This isn't preemption safe. Nor should the "if (partitioned) do X else do Y" get open-coded throughout the shop.
I would rather this be handled with a prepatory patch that provides generic PMU register accessors to the rest of KVM (e.g. vcpu_read_pmu_reg() / vcpu_write_pmu_reg()). Internally those helpers can locate the vCPU's PMU registers (emulated, partitioned in-memory, partitioned in-CPU).
Thanks, Oliver
The KVM API for event filtering says that counters do not count when blocked by the event filter. To enforce that, the event filter must be rechecked on every load since it might have changed since the last time the guest wrote a value. If the event is filtered, exclude counting at all exception levels before writing the hardware.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/kvm/pmu-direct.c | 44 +++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+)
diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c index 71977d24f489a..8d0d6d1a0d851 100644 --- a/arch/arm64/kvm/pmu-direct.c +++ b/arch/arm64/kvm/pmu-direct.c @@ -221,6 +221,49 @@ u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu) return nr_host_cnt_max; }
+/** + * kvm_pmu_apply_event_filter() + * @vcpu: Pointer to vcpu struct + * + * To uphold the guarantee of the KVM PMU event filter, we must ensure + * no counter counts if the event is filtered. Accomplish this by + * filtering all exception levels if the event is filtered. + */ +static void kvm_pmu_apply_event_filter(struct kvm_vcpu *vcpu) +{ + struct arm_pmu *pmu = vcpu->kvm->arch.arm_pmu; + u64 evtyper_set = ARMV8_PMU_EXCLUDE_EL0 | + ARMV8_PMU_EXCLUDE_EL1; + u64 evtyper_clr = ARMV8_PMU_INCLUDE_EL2; + u8 i; + u64 val; + u64 evsel; + + if (!pmu) + return; + + for (i = 0; i < pmu->hpmn_max; i++) { + val = __vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + i); + evsel = val & kvm_pmu_event_mask(vcpu->kvm); + + if (vcpu->kvm->arch.pmu_filter && + !test_bit(evsel, vcpu->kvm->arch.pmu_filter)) + val |= evtyper_set; + + val &= ~evtyper_clr; + write_pmevtypern(i, val); + } + + val = __vcpu_sys_reg(vcpu, PMCCFILTR_EL0); + + if (vcpu->kvm->arch.pmu_filter && + !test_bit(ARMV8_PMUV3_PERFCTR_CPU_CYCLES, vcpu->kvm->arch.pmu_filter)) + val |= evtyper_set; + + val &= ~evtyper_clr; + write_pmccfiltr(val); +} + /** * kvm_pmu_load() - Load untrapped PMU registers * @vcpu: Pointer to struct kvm_vcpu @@ -244,6 +287,7 @@ void kvm_pmu_load(struct kvm_vcpu *vcpu) return;
pmu = vcpu->kvm->arch.arm_pmu; + kvm_pmu_apply_event_filter(vcpu);
for (i = 0; i < pmu->hpmn_max; i++) { val = __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i);
Hi Colton,
On Tue, Dec 09, 2025 at 08:51:07PM +0000, Colton Lewis wrote:
In order to gain the best performance benefit from partitioning the PMU, utilize fine grain traps (FEAT_FGT and FEAT_FGT2) to avoid trapping common PMU register accesses by the guest to remove that overhead.
Untrapped:
- PMCR_EL0
- PMUSERENR_EL0
- PMSELR_EL0
- PMCCNTR_EL0
- PMCNTEN_EL0
- PMINTEN_EL1
- PMEVCNTRn_EL0
These are safe to untrap because writing MDCR_EL2.HPMN as this series will do limits the effect of writes to any of these registers to the partition of counters 0..HPMN-1. Reads from these registers will not leak information from between guests as all these registers are context swapped by a later patch in this series. Reads from these registers also do not leak any information about the host's hardware beyond what is promised by PMUv3.
Trapped:
- PMOVS_EL0
- PMEVTYPERn_EL0
- PMCCFILTR_EL0
- PMICNTR_EL0
- PMICFILTR_EL0
- PMCEIDn_EL0
- PMMIR_EL1
PMOVS remains trapped so KVM can track overflow IRQs that will need to be injected into the guest.
PMICNTR and PMIFILTR remain trapped because KVM is not handling them yet.
PMEVTYPERn remains trapped so KVM can limit which events guests can count, such as disallowing counting at EL2. PMCCFILTR and PMCIFILTR are special cases of the same.
PMCEIDn and PMMIR remain trapped because they can leak information specific to the host hardware implementation.
NOTE: This patch temporarily forces kvm_vcpu_pmu_is_partitioned() to be false to prevent partial feature activation for easier debugging.
Signed-off-by: Colton Lewis coltonlewis@google.com
arch/arm64/include/asm/kvm_pmu.h | 33 ++++++++++++++++++++++ arch/arm64/kvm/config.c | 34 ++++++++++++++++++++-- arch/arm64/kvm/pmu-direct.c | 48 ++++++++++++++++++++++++++++++++ 3 files changed, 112 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 8887f39c25e60..7297a697a4a62 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -96,6 +96,23 @@ u64 kvm_pmu_guest_counter_mask(struct arm_pmu *pmu); void kvm_pmu_host_counters_enable(void); void kvm_pmu_host_counters_disable(void); +#if !defined(__KVM_NVHE_HYPERVISOR__) +bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu); +bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu); +#else +static inline bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu) +{
- return false;
+}
+static inline bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu) +{
- return false;
+} +#endif +u64 kvm_pmu_fgt_bits(void); +u64 kvm_pmu_fgt2_bits(void);
/*
- Updates the vcpu's view of the pmu events for this cpu.
- Must be called before every vcpu run after disabling interrupts, to ensure
@@ -135,6 +152,22 @@ static inline u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, { return 0; } +static inline bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu) +{
- return false;
+} +static inline bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu) +{
- return false;
+} +static inline u64 kvm_pmu_fgt_bits(void) +{
- return 0;
+} +static inline u64 kvm_pmu_fgt2_bits(void) +{
- return 0;
+} static inline void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val) {} static inline void kvm_pmu_set_counter_value_user(struct kvm_vcpu *vcpu, diff --git a/arch/arm64/kvm/config.c b/arch/arm64/kvm/config.c index 24bb3f36e9d59..064dc6aa06f76 100644 --- a/arch/arm64/kvm/config.c +++ b/arch/arm64/kvm/config.c @@ -6,6 +6,7 @@ #include <linux/kvm_host.h> #include <asm/kvm_emulate.h> +#include <asm/kvm_pmu.h> #include <asm/kvm_nested.h> #include <asm/sysreg.h> @@ -1489,12 +1490,39 @@ static void __compute_hfgwtr(struct kvm_vcpu *vcpu) *vcpu_fgt(vcpu, HFGWTR_EL2) |= HFGWTR_EL2_TCR_EL1; } +static void __compute_hdfgrtr(struct kvm_vcpu *vcpu) +{
- __compute_fgt(vcpu, HDFGRTR_EL2);
- if (kvm_vcpu_pmu_use_fgt(vcpu))
*vcpu_fgt(vcpu, HDFGRTR_EL2) |= kvm_pmu_fgt_bits();
Couple of suggestions. I'd rather see this conditioned on kvm_vcpu_pmu_is_partitioned() and get rid of the FGT predicate. After all, kvm_vcpu_load_fgt() already checks for the presence of FEAT_FGT first.
Additionally, I'd prefer that the trap configuration is inline instead of done in a helper in some other file. Centralizing the FGT configuration here was very much intentional.
The other reason for doing this is kvm_pmu_fgt_bits() assumes a 'positive' trap polarity, even though there are several cases where FGTs have a 'negative' priority (i.e. 0 => trap).
Thanks, Oliver
Add KVM_CAP_ARM_PARTITION_PMU to enable the partitioned PMU for a given VM. This capability is allowed where PMUv3 and VHE are supported and the host driver was configured with arm_pmuv3.reserved_host_counters.
Ordering is enforced such that this must be called before the vCPUs are created to avoid the possibility the guest is already using one PMU configuration that gets switched out from under it.
The enabled capability is tracked by the new flag KVM_ARCH_FLAG_PARTITIONED_PMU_ENABLED.
Signed-off-by: Colton Lewis coltonlewis@google.com --- Documentation/virt/kvm/api.rst | 24 +++++++++++++++++++++ arch/arm64/include/asm/kvm_host.h | 2 ++ arch/arm64/include/asm/kvm_pmu.h | 9 ++++++++ arch/arm64/kvm/arm.c | 15 +++++++++++++ arch/arm64/kvm/pmu-direct.c | 35 ++++++++++++++++++++++++++++--- include/uapi/linux/kvm.h | 1 + 6 files changed, 83 insertions(+), 3 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 57061fa29e6a0..ef1b22f20ee71 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8703,6 +8703,30 @@ This capability indicate to the userspace whether a PFNMAP memory region can be safely mapped as cacheable. This relies on the presence of force write back (FWB) feature support on the hardware.
+ +7.245 KVM_CAP_ARM_PARTITION_PMU +------------------------------------- + +:Architectures: arm64 +:Target: VM +:Parameters: arg[0] is a boolean that enables or disables the capability +:Returns: 0 on success + -EPERM if host doesn't support + -EBUSY if vPMU was already created + +This API controls the PMU implementation used for VMs. The capability +is only available if the host PMUv3 driver was configured for +partitioning via the module parameter +``arm-pmuv3.reserved_guest_counters=[0-$NR_COUNTERS]``. When enabled, +VMs are configured to have direct hardware access to the most +frequently used registers for the counters configured by the +aforementioned module parameters, bypassing the KVM traps in the +standard emulated PMU implementation and reducing overhead of any +guest software that uses PMU capabilities such as ``perf``. + +If the host driver was configured for partitioning but the partitioned +PMU is disabled through this interface, the VM will use the legacy PMU + 8. Other capabilities. ======================
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index f92027d8fdfd0..8431fdebcac43 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -349,6 +349,8 @@ struct kvm_arch { #define KVM_ARCH_FLAG_GUEST_HAS_SVE 9 /* MIDR_EL1, REVIDR_EL1, and AIDR_EL1 are writable from userspace */ #define KVM_ARCH_FLAG_WRITABLE_IMP_ID_REGS 10 + /* Partitioned PMU Enabled */ +#define KVM_ARCH_FLAG_PARTITION_PMU_ENABLED 11 unsigned long flags;
/* VM-wide vCPU feature set */ diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 47e6f2a14e381..6146120208e39 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -111,6 +111,8 @@ void kvm_pmu_load(struct kvm_vcpu *vcpu); void kvm_pmu_put(struct kvm_vcpu *vcpu);
void kvm_pmu_set_physical_access(struct kvm_vcpu *vcpu); +bool kvm_pmu_partition_ready(void); +void kvm_pmu_partition_enable(struct kvm *kvm, bool enable);
#if !defined(__KVM_NVHE_HYPERVISOR__) bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu); @@ -327,6 +329,13 @@ static inline void kvm_pmu_host_counters_enable(void) {} static inline void kvm_pmu_host_counters_disable(void) {} static inline void kvm_pmu_handle_guest_irq(u64 govf) {}
+static inline bool kvm_pmu_partition_ready(void) +{ + return false; +} + +static inline void kvm_pmu_partition_enable(struct kvm *kvm, bool enable) {} + #endif
#endif diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 1750df5944f6d..d09f272577277 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -20,6 +20,7 @@ #include <linux/irqbypass.h> #include <linux/sched/stat.h> #include <linux/psci.h> +#include <linux/perf/arm_pmu.h> #include <trace/events/kvm.h>
#define CREATE_TRACE_POINTS @@ -37,6 +38,7 @@ #include <asm/kvm_emulate.h> #include <asm/kvm_mmu.h> #include <asm/kvm_nested.h> +#include <asm/kvm_pmu.h> #include <asm/kvm_pkvm.h> #include <asm/kvm_pmu.h> #include <asm/kvm_ptrauth.h> @@ -132,6 +134,16 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, } mutex_unlock(&kvm->lock); break; + case KVM_CAP_ARM_PARTITION_PMU: + if (kvm->created_vcpus) { + r = -EBUSY; + } else if (!kvm_pmu_partition_ready()) { + r = -EPERM; + } else { + r = 0; + kvm_pmu_partition_enable(kvm, cap->args[0]); + } + break; default: break; } @@ -388,6 +400,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ARM_PMU_V3: r = kvm_supports_guest_pmuv3(); break; + case KVM_CAP_ARM_PARTITION_PMU: + r = kvm_pmu_partition_ready(); + break; case KVM_CAP_ARM_INJECT_SERROR_ESR: r = cpus_have_final_cap(ARM64_HAS_RAS_EXTN); break; diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c index 2ee99d6d2b6c1..6cfba9caeea0e 100644 --- a/arch/arm64/kvm/pmu-direct.c +++ b/arch/arm64/kvm/pmu-direct.c @@ -45,8 +45,8 @@ bool kvm_pmu_is_partitioned(struct arm_pmu *pmu) }
/** - * kvm_vcpu_pmu_is_partitioned() - Determine if given VCPU has a partitioned PMU - * @vcpu: Pointer to kvm_vcpu struct + * kvm_pmu_is_partitioned() - Determine if given VCPU has a partitioned PMU + * @kvm: Pointer to kvm_vcpu struct * * Determine if given VCPU has a partitioned PMU by extracting that * field and passing it to :c:func:`kvm_pmu_is_partitioned` @@ -56,7 +56,36 @@ bool kvm_pmu_is_partitioned(struct arm_pmu *pmu) bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu) { return kvm_pmu_is_partitioned(vcpu->kvm->arch.arm_pmu) && - false; + test_bit(KVM_ARCH_FLAG_PARTITION_PMU_ENABLED, &vcpu->kvm->arch.flags); +} + +/** + * kvm_pmu_partition_ready() - If we can enable/disable partition + * + * Return: true if allowed, false otherwise. + */ +bool kvm_pmu_partition_ready(void) +{ + return kvm_pmu_partition_supported() && + kvm_supports_guest_pmuv3() && + armv8pmu_hpmn_max > -1; +} + +/** + * kvm_pmu_partition_enable() - Enable/disable partition flag + * @kvm: Pointer to vcpu + * @enable: Whether to enable or disable + * + * If we want to enable the partition, the guest is free to grab + * hardware by accessing PMU registers. Otherwise, the host maintains + * control. + */ +void kvm_pmu_partition_enable(struct kvm *kvm, bool enable) +{ + if (enable) + set_bit(KVM_ARCH_FLAG_PARTITION_PMU_ENABLED, &kvm->arch.flags); + else + clear_bit(KVM_ARCH_FLAG_PARTITION_PMU_ENABLED, &kvm->arch.flags); }
/** diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 52f6000ab0208..2bb2f234df0e6 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -963,6 +963,7 @@ struct kvm_enable_cap { #define KVM_CAP_RISCV_MP_STATE_RESET 242 #define KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 243 #define KVM_CAP_GUEST_MEMFD_FLAGS 244 +#define KVM_CAP_ARM_PARTITION_PMU 245
struct kvm_irq_routing_irqchip { __u32 irqchip;
Because PMOVS remains trapped, it needs to be written through when partitioned to affect PMU hardware when expected.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/include/asm/arm_pmuv3.h | 10 ++++++++++ arch/arm64/kvm/sys_regs.c | 17 ++++++++++++++++- 2 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h index 60600f04b5902..3e25c0313263c 100644 --- a/arch/arm64/include/asm/arm_pmuv3.h +++ b/arch/arm64/include/asm/arm_pmuv3.h @@ -140,6 +140,16 @@ static inline u64 read_pmicfiltr(void) return read_sysreg_s(SYS_PMICFILTR_EL0); }
+static inline void write_pmovsset(u64 val) +{ + write_sysreg(val, pmovsset_el0); +} + +static inline u64 read_pmovsset(void) +{ + return read_sysreg(pmovsset_el0); +} + static inline void write_pmovsclr(u64 val) { write_sysreg(val, pmovsclr_el0); diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 2e6d907fa8af2..bee892db9ca8b 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1307,6 +1307,19 @@ static bool access_pminten(struct kvm_vcpu *vcpu, struct sys_reg_params *p, return true; }
+static void writethrough_pmovs(struct kvm_vcpu *vcpu, struct sys_reg_params *p, bool set) +{ + u64 mask = kvm_pmu_accessible_counter_mask(vcpu); + + if (set) { + __vcpu_rmw_sys_reg(vcpu, PMOVSSET_EL0, |=, (p->regval & mask)); + write_pmovsset(p->regval & mask); + } else { + __vcpu_rmw_sys_reg(vcpu, PMOVSSET_EL0, &=, ~(p->regval & mask)); + write_pmovsclr(p->regval & mask); + } +} + static bool access_pmovs(struct kvm_vcpu *vcpu, struct sys_reg_params *p, const struct sys_reg_desc *r) { @@ -1315,7 +1328,9 @@ static bool access_pmovs(struct kvm_vcpu *vcpu, struct sys_reg_params *p, if (pmu_access_el0_disabled(vcpu)) return false;
- if (p->is_write) { + if (kvm_vcpu_pmu_is_partitioned(vcpu) && p->is_write) { + writethrough_pmovs(vcpu, p, r->CRm & 0x2); + } else if (p->is_write) { if (r->CRm & 0x2) /* accessing PMOVSSET_EL0 */ __vcpu_rmw_sys_reg(vcpu, PMOVSSET_EL0, |=, (p->regval & mask));
On Tue, Dec 09, 2025 at 08:51:10PM +0000, Colton Lewis wrote:
Because PMOVS remains trapped, it needs to be written through when partitioned to affect PMU hardware when expected.
Signed-off-by: Colton Lewis coltonlewis@google.com
arch/arm64/include/asm/arm_pmuv3.h | 10 ++++++++++ arch/arm64/kvm/sys_regs.c | 17 ++++++++++++++++- 2 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h index 60600f04b5902..3e25c0313263c 100644 --- a/arch/arm64/include/asm/arm_pmuv3.h +++ b/arch/arm64/include/asm/arm_pmuv3.h @@ -140,6 +140,16 @@ static inline u64 read_pmicfiltr(void) return read_sysreg_s(SYS_PMICFILTR_EL0); } +static inline void write_pmovsset(u64 val) +{
- write_sysreg(val, pmovsset_el0);
+}
+static inline u64 read_pmovsset(void) +{
- return read_sysreg(pmovsset_el0);
+}
static inline void write_pmovsclr(u64 val) { write_sysreg(val, pmovsclr_el0); diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 2e6d907fa8af2..bee892db9ca8b 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1307,6 +1307,19 @@ static bool access_pminten(struct kvm_vcpu *vcpu, struct sys_reg_params *p, return true; } +static void writethrough_pmovs(struct kvm_vcpu *vcpu, struct sys_reg_params *p, bool set) +{
- u64 mask = kvm_pmu_accessible_counter_mask(vcpu);
- if (set) {
__vcpu_rmw_sys_reg(vcpu, PMOVSSET_EL0, |=, (p->regval & mask));write_pmovsset(p->regval & mask);- } else {
__vcpu_rmw_sys_reg(vcpu, PMOVSSET_EL0, &=, ~(p->regval & mask));write_pmovsclr(p->regval & mask);- }
There's only ever a single canonical guest view of a register. Either it has been loaded onto the CPU or it is in memory, writing the value to two different locations is odd. What guarantees the guest context is on the CPU currently? And what about preemption?
Thanks, Oliver
On Tue, Dec 09, 2025 at 08:51:12PM +0000, Colton Lewis wrote:
Setup MDCR_EL2 to handle a partitioned PMU. That means calculate an appropriate value for HPMN instead of the default maximum setting the host allows (which implies no partition) so hardware enforces that a guest will only see the counters in the guest partition.
Setting HPMN to a non default value means the global enable bit for the host counters is now MDCR_EL2.HPME instead of the usual PMCR_EL0.E. Enable the HPME bit to allow the host to count guest events. Since HPME only has an effect when HPMN is set which we only do for the guest, it is correct to enable it unconditionally here.
Unset the TPM and TPMCR bits, which trap all PMU accesses, if FGT (fine grain trapping) is being used.
If available, set the filtering bits HPMD and HCCD to be extra sure nothing in the guest counts at EL2.
Signed-off-by: Colton Lewis coltonlewis@google.com
arch/arm64/include/asm/kvm_pmu.h | 11 ++++++ arch/arm64/kvm/debug.c | 29 ++++++++++++-- arch/arm64/kvm/pmu-direct.c | 65 ++++++++++++++++++++++++++++++++ 3 files changed, 102 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 60b8a48cad456..8b634112eded2 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -101,6 +101,9 @@ u64 kvm_pmu_guest_counter_mask(struct arm_pmu *pmu); void kvm_pmu_host_counters_enable(void); void kvm_pmu_host_counters_disable(void); +u8 kvm_pmu_guest_num_counters(struct kvm_vcpu *vcpu); +u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu);
#if !defined(__KVM_NVHE_HYPERVISOR__) bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu); bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu); @@ -173,6 +176,14 @@ static inline u64 kvm_pmu_fgt2_bits(void) { return 0; } +static inline u8 kvm_pmu_guest_num_counters(struct kvm_vcpu *vcpu) +{
- return 0;
+} +static inline u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu) +{
- return 0;
+} static inline void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val) {} static inline void kvm_pmu_set_counter_value_user(struct kvm_vcpu *vcpu, diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c index 3ad6b7c6e4ba7..0ab89c91e19cb 100644 --- a/arch/arm64/kvm/debug.c +++ b/arch/arm64/kvm/debug.c @@ -36,20 +36,43 @@ static int cpu_has_spe(u64 dfr0) */ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu) {
- int hpmn = kvm_pmu_hpmn(vcpu);
- preempt_disable();
/* * This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK * to disable guest access to the profiling and trace buffers */
- vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN,
*host_data_ptr(nr_event_counters));
- vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, hpmn); vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM | MDCR_EL2_TPMS | MDCR_EL2_TTRF | MDCR_EL2_TPMCR | MDCR_EL2_TDRA |
MDCR_EL2_TDOSA);
MDCR_EL2_TDOSA |MDCR_EL2_HPME);- if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
/** Filtering these should be redundant because we trap* all the TYPER and FILTR registers anyway and ensure* they filter EL2, but set the bits if they are here.*/if (is_pmuv3p1(read_pmuver()))vcpu->arch.mdcr_el2 |= MDCR_EL2_HPMD;if (is_pmuv3p5(read_pmuver()))vcpu->arch.mdcr_el2 |= MDCR_EL2_HCCD;/** Take out the coarse grain traps if we are using* fine grain traps.*/if (kvm_vcpu_pmu_use_fgt(vcpu))vcpu->arch.mdcr_el2 &= ~(MDCR_EL2_TPM | MDCR_EL2_TPMCR);- }
/* Is the VM being debugged by userspace? */ if (vcpu->guest_debug) diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c index 4dd160c878862..7fb4fb5c22e2a 100644 --- a/arch/arm64/kvm/pmu-direct.c +++ b/arch/arm64/kvm/pmu-direct.c @@ -154,3 +154,68 @@ void kvm_pmu_host_counters_disable(void) mdcr &= ~MDCR_EL2_HPME; write_sysreg(mdcr, mdcr_el2); }
<snip>
+/**
- kvm_pmu_guest_num_counters() - Number of counters to show to guest
- @vcpu: Pointer to struct kvm_vcpu
- Calculate the number of counters to show to the guest via
- PMCR_EL0.N, making sure to respect the maximum the host allows,
- which is hpmn_max if partitioned and host_max otherwise.
- Return: Valid value for PMCR_EL0.N
- */
+u8 kvm_pmu_guest_num_counters(struct kvm_vcpu *vcpu) +{
- u8 nr_cnt = vcpu->kvm->arch.nr_pmu_counters;
- int hpmn_max = armv8pmu_hpmn_max;
- u8 host_max = *host_data_ptr(nr_event_counters);
- if (vcpu->kvm->arch.arm_pmu)
hpmn_max = vcpu->kvm->arch.arm_pmu->hpmn_max;- if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
if (nr_cnt <= hpmn_max && nr_cnt <= host_max)return nr_cnt;if (hpmn_max <= host_max)return hpmn_max;- }
- if (nr_cnt <= host_max)
return nr_cnt;- return host_max;
+}
+/**
- kvm_pmu_hpmn() - Calculate HPMN field value
- @vcpu: Pointer to struct kvm_vcpu
- Calculate the appropriate value to set for MDCR_EL2.HPMN. If
- partitioned, this is the number of counters set for the guest if
- supported, falling back to hpmn_max if needed. If we are not
- partitioned or can't set the implied HPMN value, fall back to the
- host value.
- Return: A valid HPMN value
- */
+u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu) +{
- u8 nr_guest_cnt = kvm_pmu_guest_num_counters(vcpu);
- int nr_guest_cnt_max = armv8pmu_hpmn_max;
- u8 nr_host_cnt_max = *host_data_ptr(nr_event_counters);
- if (vcpu->kvm->arch.arm_pmu)
nr_guest_cnt_max = vcpu->kvm->arch.arm_pmu->hpmn_max;- if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
if (cpus_have_final_cap(ARM64_HAS_HPMN0))return nr_guest_cnt;else if (nr_guest_cnt > 0)return nr_guest_cnt;else if (nr_guest_cnt_max > 0)return nr_guest_cnt_max;- }
- return nr_host_cnt_max;
+}
</snip>
I find all of this rather confusing. It seems like you're dealing with sanitizing kvm->arch.nr_pmu_counters vs. the underlying implementation. I'm not sure why you need to do that, I would expect that we reject unsupported values at the time of the ioctl.
The only thing you do need to handle is if the vCPU has migrated to an "unsupported" CPU, for which we already have supporting helpers. I'm too lazy to fetch the Arm ARM and cite architecture but I'm pretty sure an out-of-range HPMN has UNPREDICTABLE behavior.
I think you just need to move the vcpu_set_unsupported_cpu() call earlier in kvm_arch_vcpu_load().
Taking all that into consideration:
u8 kvm_mdcr_hpmn(struct kvm_vcpu *vcpu) { u8 nr_counters = *host_data_ptr(nr_event_counters);
if (!kvm_vcpu_pmu_is_partitioned(vcpu) || vcpu_on_unsupported_cpu(vcpu)) return nr_counters;
return vcpu->kvm->arch.nr_pmu_counters; }
Thanks, Oliver
On Tue, Dec 09, 2025 at 08:51:13PM +0000, Colton Lewis wrote:
Make sure reads and writes to PMCR_EL0 conform to additional constraints imposed when the PMU is partitioned.
Signed-off-by: Colton Lewis coltonlewis@google.com
arch/arm64/kvm/pmu.c | 2 +- arch/arm64/kvm/sys_regs.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c index 1fd012f8ff4a9..48b39f096fa12 100644 --- a/arch/arm64/kvm/pmu.c +++ b/arch/arm64/kvm/pmu.c @@ -877,7 +877,7 @@ u64 kvm_pmu_accessible_counter_mask(struct kvm_vcpu *vcpu) u64 kvm_vcpu_read_pmcr(struct kvm_vcpu *vcpu) { u64 pmcr = __vcpu_sys_reg(vcpu, PMCR_EL0);
- u64 n = vcpu->kvm->arch.nr_pmu_counters;
- u64 n = kvm_pmu_guest_num_counters(vcpu);
Why can't the value of vcpu->kvm->arch.nr_pmu_counters be trusted?
@@ -1360,7 +1360,7 @@ static int set_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r, */ if (!kvm_vm_has_ran_once(kvm) && !vcpu_has_nv(vcpu) &&
new_n <= kvm_arm_pmu_get_max_counters(kvm))
kvm->arch.nr_pmu_counters = new_n;new_n <= kvm_pmu_hpmn(vcpu))
This is the legacy UAPI for setting the number of PMU counters by writing to PMCR_EL0.N.
The 'partitioned' implementation should take a dependency on the SET_NR_COUNTERS attribute and reject attempts to change the value of PMCR_EL0.N. Just like nested.
Thanks, Oliver
Some selftests have a dependency on find_bit and weren't compiling separately without it, so I've added it to the KVM library here using the same method as files like rbtree.c.
Signed-off-by: Colton Lewis coltonlewis@google.com --- tools/testing/selftests/kvm/Makefile.kvm | 1 + tools/testing/selftests/kvm/lib/find_bit.c | 1 + 2 files changed, 2 insertions(+) create mode 100644 tools/testing/selftests/kvm/lib/find_bit.c
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm index 148d427ff24be..f44822b7d0e20 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -5,6 +5,7 @@ all:
LIBKVM += lib/assert.c LIBKVM += lib/elf.c +LIBKVM += lib/find_bit.c LIBKVM += lib/guest_modes.c LIBKVM += lib/io.c LIBKVM += lib/kvm_util.c diff --git a/tools/testing/selftests/kvm/lib/find_bit.c b/tools/testing/selftests/kvm/lib/find_bit.c new file mode 100644 index 0000000000000..67d9d9cbca85c --- /dev/null +++ b/tools/testing/selftests/kvm/lib/find_bit.c @@ -0,0 +1 @@ +#include "../../../../lib/find_bit.c"
On Tue, Dec 09, 2025 at 08:51:14PM +0000, Colton Lewis wrote:
+/**
- kvm_pmu_load() - Load untrapped PMU registers
- @vcpu: Pointer to struct kvm_vcpu
- Load all untrapped PMU registers from the VCPU into the PCPU. Mask
- to only bits belonging to guest-reserved counters and leave
- host-reserved counters alone in bitmask registers.
- */
+void kvm_pmu_load(struct kvm_vcpu *vcpu) +{
- struct arm_pmu *pmu;
- u64 mask;
- u8 i;
- u64 val;
Assert that preemption is disabled.
- /*
* If we aren't using FGT then we are trapping everything* anyway, so no need to bother with the swap.*/- if (!kvm_vcpu_pmu_use_fgt(vcpu))
return;
Uhh... Then how do events count in this case?
The absence of FEAT_FGT shouldn't affect the residence of the guest PMU context. We just need to handle the extra traps, ideally by reading the PMU registers directly as a fast path exit handler.
- pmu = vcpu->kvm->arch.arm_pmu;
- for (i = 0; i < pmu->hpmn_max; i++) {
val = __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i);write_pmevcntrn(i, val);- }
- val = __vcpu_sys_reg(vcpu, PMCCNTR_EL0);
- write_pmccntr(val);
- val = __vcpu_sys_reg(vcpu, PMUSERENR_EL0);
- write_pmuserenr(val);
What about the host's value for PMUSERENR?
- val = __vcpu_sys_reg(vcpu, PMSELR_EL0);
- write_pmselr(val);
PMSELR_EL0 needs to be switched late, e.g. at sysreg_restore_guest_state_vhe(). Even though the host doesn't currently use the selector-based accessor, I'd prefer we not load things that'd affect the host context until we're about to enter the guest.
- /* Save only the stateful writable bits. */
- val = __vcpu_sys_reg(vcpu, PMCR_EL0);
- mask = ARMV8_PMU_PMCR_MASK &
~(ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C);- write_pmcr(val & mask);
- /*
* When handling these:* 1. Apply only the bits for guest counters (indicated by mask)* 2. Use the different registers for set and clear*/- mask = kvm_pmu_guest_counter_mask(pmu);
- val = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
- write_pmcntenset(val & mask);
- write_pmcntenclr(~val & mask);
- val = __vcpu_sys_reg(vcpu, PMINTENSET_EL1);
- write_pmintenset(val & mask);
- write_pmintenclr(~val & mask);
Is this safe? What happens if we put the PMU into an overflow condition?
+}
+/**
- kvm_pmu_put() - Put untrapped PMU registers
- @vcpu: Pointer to struct kvm_vcpu
- Put all untrapped PMU registers from the VCPU into the PCPU. Mask
- to only bits belonging to guest-reserved counters and leave
- host-reserved counters alone in bitmask registers.
- */
+void kvm_pmu_put(struct kvm_vcpu *vcpu) +{
- struct arm_pmu *pmu;
- u64 mask;
- u8 i;
- u64 val;
- /*
* If we aren't using FGT then we are trapping everything* anyway, so no need to bother with the swap.*/- if (!kvm_vcpu_pmu_use_fgt(vcpu))
return;- pmu = vcpu->kvm->arch.arm_pmu;
- for (i = 0; i < pmu->hpmn_max; i++) {
val = read_pmevcntrn(i);__vcpu_assign_sys_reg(vcpu, PMEVCNTR0_EL0 + i, val);- }
- val = read_pmccntr();
- __vcpu_assign_sys_reg(vcpu, PMCCNTR_EL0, val);
- val = read_pmuserenr();
- __vcpu_assign_sys_reg(vcpu, PMUSERENR_EL0, val);
- val = read_pmselr();
- __vcpu_assign_sys_reg(vcpu, PMSELR_EL0, val);
- val = read_pmcr();
- __vcpu_assign_sys_reg(vcpu, PMCR_EL0, val);
- /* Mask these to only save the guest relevant bits. */
- mask = kvm_pmu_guest_counter_mask(pmu);
- val = read_pmcntenset();
- __vcpu_assign_sys_reg(vcpu, PMCNTENSET_EL0, val & mask);
- val = read_pmintenset();
- __vcpu_assign_sys_reg(vcpu, PMINTENSET_EL1, val & mask);
What if the PMU is in an overflow state at this point?
Thanks, Oliver
On Tue, Dec 09, 2025 at 08:51:15PM +0000, Colton Lewis wrote:
The KVM API for event filtering says that counters do not count when blocked by the event filter. To enforce that, the event filter must be rechecked on every load since it might have changed since the last time the guest wrote a value. If the event is filtered, exclude counting at all exception levels before writing the hardware.
Signed-off-by: Colton Lewis coltonlewis@google.com
arch/arm64/kvm/pmu-direct.c | 44 +++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+)
diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c index 71977d24f489a..8d0d6d1a0d851 100644 --- a/arch/arm64/kvm/pmu-direct.c +++ b/arch/arm64/kvm/pmu-direct.c @@ -221,6 +221,49 @@ u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu) return nr_host_cnt_max; } +/**
- kvm_pmu_apply_event_filter()
- @vcpu: Pointer to vcpu struct
- To uphold the guarantee of the KVM PMU event filter, we must ensure
- no counter counts if the event is filtered. Accomplish this by
- filtering all exception levels if the event is filtered.
- */
+static void kvm_pmu_apply_event_filter(struct kvm_vcpu *vcpu) +{
- struct arm_pmu *pmu = vcpu->kvm->arch.arm_pmu;
- u64 evtyper_set = ARMV8_PMU_EXCLUDE_EL0 |
ARMV8_PMU_EXCLUDE_EL1;- u64 evtyper_clr = ARMV8_PMU_INCLUDE_EL2;
- u8 i;
- u64 val;
- u64 evsel;
- if (!pmu)
return;- for (i = 0; i < pmu->hpmn_max; i++) {
val = __vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + i);evsel = val & kvm_pmu_event_mask(vcpu->kvm);if (vcpu->kvm->arch.pmu_filter &&!test_bit(evsel, vcpu->kvm->arch.pmu_filter))val |= evtyper_set;val &= ~evtyper_clr;write_pmevtypern(i, val);- }
- val = __vcpu_sys_reg(vcpu, PMCCFILTR_EL0);
- if (vcpu->kvm->arch.pmu_filter &&
!test_bit(ARMV8_PMUV3_PERFCTR_CPU_CYCLES, vcpu->kvm->arch.pmu_filter))val |= evtyper_set;- val &= ~evtyper_clr;
- write_pmccfiltr(val);
+}
This doesn't work for nested. I agree that the hardware value of PMEVTYPERn_EL0 needs to be under KVM control, but depending on whether or not we're in a hyp context the meaning of the EL1 filtering bit changes. Have a look at kvm_pmu_create_perf_event().
Thanks, Oliver
On Tue, Dec 09, 2025 at 08:51:16PM +0000, Colton Lewis wrote:
+enum vcpu_pmu_register_access {
- VCPU_PMU_ACCESS_UNSET,
- VCPU_PMU_ACCESS_VIRTUAL,
- VCPU_PMU_ACCESS_PHYSICAL,
+};
This is confusing. Even when the guest is accessing registers directly on the CPU I'd still call that "hardware assisted virtualization" and not "physical".
+#endif /* _ASM_ARM64_KVM_TYPES_H */ diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c index 0ab89c91e19cb..c2cf6b308ec60 100644 --- a/arch/arm64/kvm/debug.c +++ b/arch/arm64/kvm/debug.c @@ -34,7 +34,7 @@ static int cpu_has_spe(u64 dfr0)
- Self-hosted Trace Filter controls (MDCR_EL2_TTRF)
- Self-hosted Trace (MDCR_EL2_TTRF/MDCR_EL2_E2TB)
*/ -static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu) +void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu) { int hpmn = kvm_pmu_hpmn(vcpu); diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index bde79ec1a1836..ea288a712bb5d 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -963,6 +963,8 @@ static bool kvm_hyp_handle_pmu_regs(struct kvm_vcpu *vcpu) if (ret) __kvm_skip_instr(vcpu);
- kvm_pmu_set_physical_access(vcpu);
- return ret;
} diff --git a/arch/arm64/kvm/pmu-direct.c b/arch/arm64/kvm/pmu-direct.c index 8d0d6d1a0d851..c5767e2ebc651 100644 --- a/arch/arm64/kvm/pmu-direct.c +++ b/arch/arm64/kvm/pmu-direct.c @@ -73,6 +73,7 @@ bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu) u8 hpmn = vcpu->kvm->arch.nr_pmu_counters; return kvm_vcpu_pmu_is_partitioned(vcpu) &&
cpus_have_final_cap(ARM64_HAS_FGT) && (hpmn != 0 || cpus_have_final_cap(ARM64_HAS_HPMN0));vcpu->arch.pmu.access == VCPU_PMU_ACCESS_PHYSICAL &&} @@ -92,6 +93,26 @@ u64 kvm_pmu_fgt2_bits(void) | HDFGRTR2_EL2_nPMICNTR_EL0; } +/**
- kvm_pmu_set_physical_access()
- @vcpu: Pointer to vcpu struct
- Reconfigure the guest for physical access of PMU hardware if
- allowed. This means reconfiguring mdcr_el2 and loading the vCPU
- state onto hardware.
- */
+void kvm_pmu_set_physical_access(struct kvm_vcpu *vcpu) +{
- if (kvm_vcpu_pmu_is_partitioned(vcpu)
&& vcpu->arch.pmu.access == VCPU_PMU_ACCESS_VIRTUAL) {vcpu->arch.pmu.access = VCPU_PMU_ACCESS_PHYSICAL;kvm_arm_setup_mdcr_el2(vcpu);kvm_pmu_load(vcpu);- }
It isn't immediately obvious how this guards against preemption.
Also, the general approach for these context-loading situations is to do a full load/put on the vCPU rather than a directed load.
+static void kvm_pmu_register_init(struct kvm_vcpu *vcpu) +{
- if (vcpu->arch.pmu.access == VCPU_PMU_ACCESS_UNSET)
vcpu->arch.pmu.access = VCPU_PMU_ACCESS_VIRTUAL;
This is confusing. The zero value of the enum should be consistent with the "unloaded" state.
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index f2ae761625a66..d73218706b834 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1197,6 +1197,8 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p, p->regval = __vcpu_sys_reg(vcpu, reg); }
- kvm_pmu_set_physical_access(vcpu);
- return true;
} @@ -1302,6 +1304,8 @@ static bool access_pmovs(struct kvm_vcpu *vcpu, struct sys_reg_params *p, p->regval = __vcpu_sys_reg(vcpu, PMOVSSET_EL0); }
- kvm_pmu_set_physical_access(vcpu);
- return true;
}
Aren't there a ton of other registers the guest may access before these two? Having generic PMU register accessors would allow you to manage residence of PMU registers from a single spot.
Thanks, Oliver
On Tue, Dec 09, 2025 at 08:51:17PM +0000, Colton Lewis wrote:
Because ARM hardware is not yet capable of direct interrupt injection
PPI injection, it can do LPIs just fine.
@@ -961,6 +964,12 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu) */ perf_event_overflow(event, &data, regs); }
- govf = pmovsr & kvm_pmu_guest_counter_mask(cpu_pmu);
- if (kvm_pmu_is_partitioned(cpu_pmu) && govf)
kvm_pmu_handle_guest_irq(govf);
The state ownership of this whole interaction is very odd. I would much rather that KVM have full ownership of the range of counters while the guest is loaded. By that I mean the PMUv3 driver only clears overflows on PMCs that it owns and KVM will do the same on the back of the IRQ.
Similarly, KVM should be leaving the "guest" range of counters in a non-overflow condition at vcpu_put().
Thanks, Oliver
In no situation should KVM be injecting a "recorded" IRQ. The overflow condition of the PMU is well defined in the architecture and we should implement *exactly* that.
On Tue, Dec 09, 2025 at 08:51:18PM +0000, Colton Lewis wrote:
+/**
- kvm_pmu_part_overflow_status() - Determine if any guest counters have overflowed
- @vcpu: Ponter to struct kvm_vcpu
- Determine if any guest counters have overflowed and therefore an
- IRQ needs to be injected into the guest.
- Return: True if there was an overflow, false otherwise
- */
+bool kvm_pmu_part_overflow_status(struct kvm_vcpu *vcpu) +{
- struct arm_pmu *pmu = vcpu->kvm->arch.arm_pmu;
- u64 mask = kvm_pmu_guest_counter_mask(pmu);
- u64 pmovs = __vcpu_sys_reg(vcpu, PMOVSSET_EL0);
- u64 pmint = read_pmintenset();
- u64 pmcr = read_pmcr();
How do we know that the vPMU has been loaded on the CPU at this point?
- return (pmcr & ARMV8_PMU_PMCR_E) && (mask & pmovs & pmint);
+}
I'd rather reuse kvm_pmu_overflow_status(), relying on the accessors to abstract away the implementation / location of the guest PMU context.
Thanks, Oliver
On Tue, Dec 09, 2025 at 08:51:19PM +0000, Colton Lewis wrote:
+7.245 KVM_CAP_ARM_PARTITION_PMU +-------------------------------------
Why can't this be a vCPU attribute similar to the other vPMU controls? Making the UAPI consistent will make it easier for userspace to reason about it.
Better yet, we could make the UAPI such that userspace selects a PMU implementation and the partitioned-ness of the PMU at the same time.
@@ -132,6 +134,16 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, } mutex_unlock(&kvm->lock); break;
- case KVM_CAP_ARM_PARTITION_PMU:
if (kvm->created_vcpus) {r = -EBUSY;} else if (!kvm_pmu_partition_ready()) {r = -EPERM;} else {r = 0;kvm_pmu_partition_enable(kvm, cap->args[0]);} default: break; }break;@@ -388,6 +400,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ARM_PMU_V3: r = kvm_supports_guest_pmuv3(); break;
- case KVM_CAP_ARM_PARTITION_PMU:
r = kvm_pmu_partition_ready();
"ready" is very confusing in this context, as KVM will never be ready to support the feature on a system w/o the prerequisites.
Thanks, Oliver
On Tue, Dec 09, 2025 at 08:50:57PM +0000, Colton Lewis wrote:
This series creates a new PMU scheme on ARM, a partitioned PMU that allows reserving a subset of counters for more direct guest access, significantly reducing overhead. More details, including performance benchmarks, can be read in the v1 cover letter linked below.
An overview of what this series accomplishes was presented at KVM Forum 2025. Slides [1] and video [2] are linked below.
The long duration between v4 and v5 is due to time spent on this project being monopolized preparing this feature for internal production. As a result, there are too many improvements to fully list here, but I will cover the notable ones.
Thanks for reposting. I think there's still quite a bit of ground to cover on the KVM side of this, but I would definitely appreciate it if someone with more context on the perf side of things could chime in.
Will, IIRC you had some thoughts around counter allocation, right?
Best, Oliver
On 09/12/2025 20:50, Colton Lewis wrote:
Add a capability for FEAT_HPMN0, whether MDCR_EL2.HPMN can specify 0 counters reserved for the guest.
This required changing HPMN0 to an UnsignedEnum in tools/sysreg because otherwise not all the appropriate macros are generated to add it to arm64_cpu_capabilities_arm64_features.
Acked-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Colton Lewis coltonlewis@google.com
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
Hi Colton,
kernel test robot noticed the following build errors:
[auto build test ERROR on ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d]
url: https://github.com/intel-lab-lkp/linux/commits/Colton-Lewis/arm64-cpufeature... base: ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d patch link: https://lore.kernel.org/r/20251209205121.1871534-12-coltonlewis%40google.com patch subject: [PATCH v5 11/24] KVM: arm64: Writethrough trapped PMEVTYPER register config: arm64-randconfig-001-20251210 (https://download.01.org/0day-ci/archive/20251211/202512110209.GjVZa9ti-lkp@i...) compiler: aarch64-linux-gcc (GCC) 14.3.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251211/202512110209.GjVZa9ti-lkp@i...)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202512110209.GjVZa9ti-lkp@intel.com/
All errors (new ones prefixed by >>):
arch/arm64/kvm/sys_regs.c: In function 'writethrough_pmevtyper':
arch/arm64/kvm/sys_regs.c:1183:34: error: implicit declaration of function 'kvm_pmu_event_mask'; did you mean 'kvm_pmu_evtyper_mask'? [-Wimplicit-function-declaration]
1183 | eventsel = val & kvm_pmu_event_mask(vcpu->kvm); | ^~~~~~~~~~~~~~~~~~ | kvm_pmu_evtyper_mask
vim +1183 arch/arm64/kvm/sys_regs.c
1168 1169 static bool writethrough_pmevtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p, 1170 u64 reg, u64 idx) 1171 { 1172 u64 eventsel; 1173 u64 val = p->regval; 1174 u64 evtyper_set = ARMV8_PMU_EXCLUDE_EL0 | 1175 ARMV8_PMU_EXCLUDE_EL1; 1176 u64 evtyper_clr = ARMV8_PMU_INCLUDE_EL2; 1177 1178 __vcpu_assign_sys_reg(vcpu, reg, val); 1179 1180 if (idx == ARMV8_PMU_CYCLE_IDX) 1181 eventsel = ARMV8_PMUV3_PERFCTR_CPU_CYCLES; 1182 else
1183 eventsel = val & kvm_pmu_event_mask(vcpu->kvm);
1184 1185 if (vcpu->kvm->arch.pmu_filter && 1186 !test_bit(eventsel, vcpu->kvm->arch.pmu_filter)) 1187 val |= evtyper_set; 1188 1189 val &= ~evtyper_clr; 1190 1191 if (idx == ARMV8_PMU_CYCLE_IDX) 1192 write_pmccfiltr(val); 1193 else 1194 write_pmevtypern(idx, val); 1195 1196 return true; 1197 } 1198
Hi Colton,
kernel test robot noticed the following build warnings:
[auto build test WARNING on ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d]
url: https://github.com/intel-lab-lkp/linux/commits/Colton-Lewis/arm64-cpufeature... base: ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d patch link: https://lore.kernel.org/r/20251209205121.1871534-10-coltonlewis%40google.com patch subject: [PATCH v5 09/24] perf: arm_pmuv3: Keep out of guest counter partition config: arm64-defconfig (https://download.01.org/0day-ci/archive/20251211/202512110439.UXwb1Qh4-lkp@i...) compiler: aarch64-linux-gcc (GCC) 15.1.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251211/202512110439.UXwb1Qh4-lkp@i...)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202512110439.UXwb1Qh4-lkp@intel.com/
All warnings (new ones prefixed by >>):
Warning: arch/arm64/kvm/pmu-direct.c:75 function parameter 'pmu' not described in 'kvm_pmu_guest_counter_mask'
linux-kselftest-mirror@lists.linaro.org