This series introduces support in the ARM PMUv3 driver for partitioning PMU counters into two separate ranges by taking advantage of the MDCR_EL2.HPMN register field.
The advantage of a partitioned PMU would be to allow KVM guests direct access to a subset of PMU functionality, greatly reducing the overhead of performance monitoring in guests.
While this feature could be accepted on its own merits, practically there is a lot more to be done before it will be fully useful, so I'm sending as an RFC for now.
v2: * Rebase to v6.14-rc1, and take into account Oliver's debug cleanup changing some KVM handling of MDCR_EL2.
* Reorder patches to put related things closer together for easier reading.
* Ensure driver compiles for 32-bit ARM by defining 32-bit access to MDCR_EL2 correctly, which is called HDCR there. These will not be called due to the below requirement for VHE mode but a functionally correct definition seemed preferable to making it a no op.
* Rename parameter to reserved_host_counters to better reflect underlying hardware semantics and ensure host counters are easier to preserve.
* Document parameter with MODULE_PARM_DESC
* Restrict partitioning to VHE mode only. Using HPMN restricts the host counters to EL2 access only. Trying to make this work for the driver running at EL1 would require hypercalls for every access of those registers and a lot of additional handling at the KVM level. This would defeat the whole purpose of having the feature at all. Define has_vhe() on 32-bit arm to always return false.
* While Rob asked about making the parameter configurable at run time rather than boot time through sysfs, that is potentially very tricky because it is only safe to reprogram HPMN when we are certain no one, host or guest, is using any counters that might change ownership. If this is determined to be worthwhile, it should come after getting the core infrastructure right.
* While Marc asked about reusing the sysreg masks instead of redefining them for MDCR_EL2, that doesn't work because sysreg masks are only defined for arm64. I'm not sure how to avoid the duplication.
* Wrap code for determining if a counter index belongs to the host or guest partition in functions
* Typo fixes and additional testing
v1: https://lore.kernel.org/kvm/20250127222031.3078945-1-coltonlewis@google.com/
Colton Lewis (4): perf: arm_pmuv3: Generalize counter bitmasks perf: arm_pmuv3: Introduce module param to partition the PMU perf: arm_pmuv3: Keep out of guest counter partition KVM: arm64: Make guests see only counters they can access
arch/arm/include/asm/arm_pmuv3.h | 13 ++ arch/arm64/include/asm/arm_pmuv3.h | 10 ++ arch/arm64/kvm/debug.c | 3 +- arch/arm64/kvm/pmu-emul.c | 8 +- drivers/perf/arm_pmuv3.c | 117 ++++++++++++++++-- include/linux/perf/arm_pmu.h | 2 + include/linux/perf/arm_pmuv3.h | 34 ++++- .../selftests/kvm/arm64/vpmu_counter_access.c | 2 +- 8 files changed, 168 insertions(+), 21 deletions(-)
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b -- 2.48.1.502.g6dc24dfdaf-goog
These bitmasks are valid for enable and interrupt registers as well as overflow registers. Generalize the names.
Signed-off-by: Colton Lewis coltonlewis@google.com --- include/linux/perf/arm_pmuv3.h | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h index d698efba28a2..c2448477c37f 100644 --- a/include/linux/perf/arm_pmuv3.h +++ b/include/linux/perf/arm_pmuv3.h @@ -223,16 +223,23 @@ ARMV8_PMU_PMCR_X | ARMV8_PMU_PMCR_DP | \ ARMV8_PMU_PMCR_LC | ARMV8_PMU_PMCR_LP)
+/* + * Counter bitmask layouts for overflow, enable, and interrupts + */ +#define ARMV8_PMU_CNT_MASK_P GENMASK(30, 0) +#define ARMV8_PMU_CNT_MASK_C BIT(31) +#define ARMV8_PMU_CNT_MASK_F BIT_ULL(32) /* arm64 only */ +#define ARMV8_PMU_CNT_MASK_ALL (ARMV8_PMU_CNT_MASK_P | \ + ARMV8_PMU_CNT_MASK_C | \ + ARMV8_PMU_CNT_MASK_F) /* * PMOVSR: counters overflow flag status reg */ -#define ARMV8_PMU_OVSR_P GENMASK(30, 0) -#define ARMV8_PMU_OVSR_C BIT(31) -#define ARMV8_PMU_OVSR_F BIT_ULL(32) /* arm64 only */ +#define ARMV8_PMU_OVSR_P ARMV8_PMU_CNT_MASK_P +#define ARMV8_PMU_OVSR_C ARMV8_PMU_CNT_MASK_C +#define ARMV8_PMU_OVSR_F ARMV8_PMU_CNT_MASK_F /* Mask for writable bits is both P and C fields */ -#define ARMV8_PMU_OVERFLOWED_MASK (ARMV8_PMU_OVSR_P | ARMV8_PMU_OVSR_C | \ - ARMV8_PMU_OVSR_F) - +#define ARMV8_PMU_OVERFLOWED_MASK ARMV8_PMU_CNT_MASK_ALL /* * PMXEVTYPER: Event selection reg */
For PMUv3, the register MDCR_EL2.HPMN partitiones the PMU counters into two ranges where counters 0..HPMN-1 are accessible by EL1 and, if allowed, EL0 while counters HPMN..N are only accessible by EL2.
Introduce a module parameter in the PMUv3 driver to set this register. The name reserved_host_counters reflects the intent to reserve some counters for the host so the guest may eventually be allowed direct access to a subset of PMU functionality for increased performance.
Track HPMN and whether the pmu is partitioned in struct arm_pmu because KVM will need to know that to handle guests correctly.
While FEAT_HPMN0 does allow HPMN to be set to 0, this patch specifically disallows that case because it's not useful given the intention to allow guests access to their own counters.
Due to the difficulty this feature would create for the driver running at EL1 on the host, partitioning is only allowed in VHE mode. Working on nVHE mode would require a hypercall for every register access because the counters reserved for the host by HPMN are now only accessible to EL2.
The parameter is only configurable at boot time. Making the parameter configurable on a running system is dangerous due to the difficulty of knowing for sure no counters are in use anywhere so it is safe to reporgram HPMN.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm/include/asm/arm_pmuv3.h | 13 +++++++++ arch/arm64/include/asm/arm_pmuv3.h | 10 +++++++ drivers/perf/arm_pmuv3.c | 46 ++++++++++++++++++++++++++++-- include/linux/perf/arm_pmu.h | 2 ++ include/linux/perf/arm_pmuv3.h | 7 +++++ 5 files changed, 76 insertions(+), 2 deletions(-)
diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h index 2ec0e5e83fc9..c5f496450e16 100644 --- a/arch/arm/include/asm/arm_pmuv3.h +++ b/arch/arm/include/asm/arm_pmuv3.h @@ -11,6 +11,8 @@
#define PMCCNTR __ACCESS_CP15_64(0, c9)
+#define HDCR __ACCESS_CP15(c1, 4, c1, 1) + #define PMCR __ACCESS_CP15(c9, 0, c12, 0) #define PMCNTENSET __ACCESS_CP15(c9, 0, c12, 1) #define PMCNTENCLR __ACCESS_CP15(c9, 0, c12, 2) @@ -214,6 +216,7 @@ static inline void write_pmuserenr(u32 val)
static inline void write_pmuacr(u64 val) {}
+static inline bool has_vhe(void) { return false; } static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {} static inline void kvm_clr_pmu_events(u32 clr) {} static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr) @@ -277,4 +280,14 @@ static inline u64 read_pmceid1(void) return val; }
+static inline u32 read_mdcr(void) +{ + return read_sysreg(HDCR); +} + +static inline void write_mdcr(u32 val) +{ + write_sysreg(val, HDCR); +} + #endif diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h index 8a777dec8d88..fc37e7e81e07 100644 --- a/arch/arm64/include/asm/arm_pmuv3.h +++ b/arch/arm64/include/asm/arm_pmuv3.h @@ -188,4 +188,14 @@ static inline bool is_pmuv3p9(int pmuver) return pmuver >= ID_AA64DFR0_EL1_PMUVer_V3P9; }
+static inline u64 read_mdcr(void) +{ + return read_sysreg(mdcr_el2); +} + +static inline void write_mdcr(u64 val) +{ + write_sysreg(val, mdcr_el2); +} + #endif diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 0e360feb3432..39109260b161 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -325,6 +325,7 @@ GEN_PMU_FORMAT_ATTR(threshold_compare); GEN_PMU_FORMAT_ATTR(threshold);
static int sysctl_perf_user_access __read_mostly; +static u8 reserved_host_counters __read_mostly;
static bool armv8pmu_event_is_64bit(struct perf_event *event) { @@ -500,6 +501,29 @@ static void armv8pmu_pmcr_write(u64 val) write_pmcr(val); }
+static u64 armv8pmu_mdcr_read(void) +{ + return read_mdcr(); +} + +static void armv8pmu_mdcr_write(u64 val) +{ + write_mdcr(val); + isb(); +} + +static void armv8pmu_partition(u8 hpmn) +{ + u64 mdcr = armv8pmu_mdcr_read(); + + mdcr &= ~ARMV8_PMU_MDCR_HPMN; + mdcr |= FIELD_PREP(ARMV8_PMU_MDCR_HPMN, hpmn); + /* Prevent guest counters counting at EL2 */ + mdcr |= ARMV8_PMU_MDCR_HPMD; + + armv8pmu_mdcr_write(mdcr); +} + static int armv8pmu_has_overflowed(u64 pmovsr) { return !!(pmovsr & ARMV8_PMU_OVERFLOWED_MASK); @@ -1069,6 +1093,9 @@ static void armv8pmu_reset(void *info)
bitmap_to_arr64(&mask, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS);
+ if (cpu_pmu->partitioned) + armv8pmu_partition(cpu_pmu->hpmn); + /* The counter and interrupt enable registers are unknown at reset. */ armv8pmu_disable_counter(mask); armv8pmu_disable_intens(mask); @@ -1205,6 +1232,7 @@ static void __armv8pmu_probe_pmu(void *info) { struct armv8pmu_probe_info *probe = info; struct arm_pmu *cpu_pmu = probe->pmu; + u8 pmcr_n; u64 pmceid_raw[2]; u32 pmceid[2]; int pmuver; @@ -1215,10 +1243,20 @@ static void __armv8pmu_probe_pmu(void *info)
cpu_pmu->pmuver = pmuver; probe->present = true; + pmcr_n = FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read());
/* Read the nb of CNTx counters supported from PMNC */ - bitmap_set(cpu_pmu->cntr_mask, - 0, FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read())); + bitmap_set(cpu_pmu->cntr_mask, 0, pmcr_n); + + if (has_vhe() && + reserved_host_counters > 0 && + reserved_host_counters < pmcr_n) { + cpu_pmu->hpmn = pmcr_n - reserved_host_counters; + cpu_pmu->partitioned = true; + } else { + cpu_pmu->hpmn = pmcr_n; + cpu_pmu->partitioned = false; + }
/* Add the CPU cycles counter */ set_bit(ARMV8_PMU_CYCLE_IDX, cpu_pmu->cntr_mask); @@ -1516,3 +1554,7 @@ void arch_perf_update_userpage(struct perf_event *event, userpg->cap_user_time_zero = 1; userpg->cap_user_time_short = 1; } + +module_param(reserved_host_counters, byte, 0); +MODULE_PARM_DESC(reserved_host_counters, + "Partition the PMU into host and guest counters"); diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 4b5b83677e3f..ad97aabed25a 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -101,6 +101,8 @@ struct arm_pmu { void (*reset)(void *); int (*map_event)(struct perf_event *event); DECLARE_BITMAP(cntr_mask, ARMPMU_MAX_HWEVENTS); + u8 hpmn; /* MDCR_EL2.HPMN: counter partition pivot */ + bool partitioned; bool secure_access; /* 32-bit ARM only */ #define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40 DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS); diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h index c2448477c37f..115ee39f693a 100644 --- a/include/linux/perf/arm_pmuv3.h +++ b/include/linux/perf/arm_pmuv3.h @@ -223,6 +223,13 @@ ARMV8_PMU_PMCR_X | ARMV8_PMU_PMCR_DP | \ ARMV8_PMU_PMCR_LC | ARMV8_PMU_PMCR_LP)
+/* + * Per-CPU MDCR: config reg + */ +#define ARMV8_PMU_MDCR_HPMN GENMASK(4, 0) +#define ARMV8_PMU_MDCR_HPME BIT(7) +#define ARMV8_PMU_MDCR_HPMD BIT(17) + /* * Counter bitmask layouts for overflow, enable, and interrupts */
Hi Colton,
On Sat, Feb 08, 2025 at 02:01:09AM +0000, Colton Lewis wrote:
For PMUv3, the register MDCR_EL2.HPMN partitiones the PMU counters into two ranges where counters 0..HPMN-1 are accessible by EL1 and, if allowed, EL0 while counters HPMN..N are only accessible by EL2.
Introduce a module parameter in the PMUv3 driver to set this register. The name reserved_host_counters reflects the intent to reserve some counters for the host so the guest may eventually be allowed direct access to a subset of PMU functionality for increased performance.
Track HPMN and whether the pmu is partitioned in struct arm_pmu because KVM will need to know that to handle guests correctly.
While FEAT_HPMN0 does allow HPMN to be set to 0, this patch specifically disallows that case because it's not useful given the intention to allow guests access to their own counters.
Quite the contrary.
FEAT_HPMN0 is useful if userspace wants to provide a vPMU that has a fixed cycle counter w/o event counters. Certain OSes refuse to boot without it...
static inline u32 read_mdcr(void) { return read_sysreg(HDCR); }
static inline void write_mdcr(u32 val) { write_sysreg(val, HDCR); }
Hmm... While this fixes the 32bit compilation issues, it opens a new can of worms.
VHE is a 64bit only feature, so you're *guaranteed* that these accessors will undef (running at EL1).
+static void armv8pmu_partition(u8 hpmn) +{
- u64 mdcr = armv8pmu_mdcr_read();
- mdcr &= ~ARMV8_PMU_MDCR_HPMN;
- mdcr |= FIELD_PREP(ARMV8_PMU_MDCR_HPMN, hpmn);
- /* Prevent guest counters counting at EL2 */
- mdcr |= ARMV8_PMU_MDCR_HPMD;
- armv8pmu_mdcr_write(mdcr);
+}
After giving this a read, I don't think the host PMU driver should care about MDCR_EL2 at all. The only time that 'guest' events are loaded into the PMU is between vcpu_load() / vcpu_put(), at which point *KVM* has reconfigured MDCR_EL2.
I'm not sure if there's much involvement with the host PMU driver beyond it pinky-swearing to only use the specified counters. KVM is what actually will program HPMN.
That'd alleviate the PMU driver from having VHE awareness or caring about MDCR_EL2.
Hi Oliver, thanks for the review.
Oliver Upton oliver.upton@linux.dev writes:
Hi Colton,
On Sat, Feb 08, 2025 at 02:01:09AM +0000, Colton Lewis wrote:
For PMUv3, the register MDCR_EL2.HPMN partitiones the PMU counters into two ranges where counters 0..HPMN-1 are accessible by EL1 and, if allowed, EL0 while counters HPMN..N are only accessible by EL2.
Introduce a module parameter in the PMUv3 driver to set this register. The name reserved_host_counters reflects the intent to reserve some counters for the host so the guest may eventually be allowed direct access to a subset of PMU functionality for increased performance.
Track HPMN and whether the pmu is partitioned in struct arm_pmu because KVM will need to know that to handle guests correctly.
While FEAT_HPMN0 does allow HPMN to be set to 0, this patch specifically disallows that case because it's not useful given the intention to allow guests access to their own counters.
Quite the contrary.
FEAT_HPMN0 is useful if userspace wants to provide a vPMU that has a fixed cycle counter w/o event counters. Certain OSes refuse to boot without it...
Cool. I'll make sure writing 0 is allowed if we have FEAT_HPMN0.
static inline u32 read_mdcr(void) { return read_sysreg(HDCR); }
static inline void write_mdcr(u32 val) { write_sysreg(val, HDCR); }
Hmm... While this fixes the 32bit compilation issues, it opens a new can of worms.
VHE is a 64bit only feature, so you're *guaranteed* that these accessors will undef (running at EL1).
True. I mentioned in the cover letter they aren't intended to actually run in the PMU driver because I guard for VHE there.
+static void armv8pmu_partition(u8 hpmn) +{
- u64 mdcr = armv8pmu_mdcr_read();
- mdcr &= ~ARMV8_PMU_MDCR_HPMN;
- mdcr |= FIELD_PREP(ARMV8_PMU_MDCR_HPMN, hpmn);
- /* Prevent guest counters counting at EL2 */
- mdcr |= ARMV8_PMU_MDCR_HPMD;
- armv8pmu_mdcr_write(mdcr);
+}
After giving this a read, I don't think the host PMU driver should care about MDCR_EL2 at all. The only time that 'guest' events are loaded into the PMU is between vcpu_load() / vcpu_put(), at which point *KVM* has reconfigured MDCR_EL2.
I'm not sure if there's much involvement with the host PMU driver beyond it pinky-swearing to only use the specified counters. KVM is what actually will program HPMN.
That'd alleviate the PMU driver from having VHE awareness or caring about MDCR_EL2.
That does seem like a cleaner solution. I can lift that handling into KVM code, but I will still need the variables I introduced in struct arm_pmu to be populated so the driver knows what counters to use.
Without looking, I'm concerned there might be an initilization ordering issue with that where KVM will get the value for HPMN first because it's initialized first but then can't store them somewhere accessible to the driver because the driver hasn't been initialized yet. Then the driver is initialized and used without knowing HPMN and is now using counters we wanted the guest to have.
There is probably an easy way around this, but I'll need to do some digging.
If the PMU is partitioned, keep the driver out of the guest counter partition and only use the host counter partition. Partitioning is defined by the MDCR_EL2.HPMN register field and saved in cpu_pmu->hpmn. The range 0..HPMN-1 is accessible by EL1 and EL0 while HPMN..PMCR.N is reserved for EL2.
Define some macros that take HPMN as an argument and construct mutually exclusive bitmaps for testing which partition a particular counter is in. Note that despite their different position in the bitmap, the cycle and instruction counters are always in the guest partition.
Signed-off-by: Colton Lewis coltonlewis@google.com --- drivers/perf/arm_pmuv3.c | 73 +++++++++++++++++++++++++++++----- include/linux/perf/arm_pmuv3.h | 8 ++++ 2 files changed, 71 insertions(+), 10 deletions(-)
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 39109260b161..01468ec9fead 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -754,15 +754,19 @@ static void armv8pmu_disable_event_irq(struct perf_event *event) armv8pmu_disable_intens(BIT(event->hw.idx)); }
-static u64 armv8pmu_getreset_flags(void) +static u64 armv8pmu_getreset_flags(struct arm_pmu *cpu_pmu) { u64 value;
/* Read */ value = read_pmovsclr();
+ if (cpu_pmu->partitioned) + value &= ARMV8_PMU_HOST_CNT_PART(cpu_pmu->hpmn); + else + value &= ARMV8_PMU_OVERFLOWED_MASK; + /* Write to clear flags */ - value &= ARMV8_PMU_OVERFLOWED_MASK; write_pmovsclr(value);
return value; @@ -789,6 +793,18 @@ static void armv8pmu_disable_user_access(void) update_pmuserenr(0); }
+static bool armv8pmu_is_guest_part(struct arm_pmu *cpu_pmu, u8 idx) +{ + return cpu_pmu->partitioned && + (BIT(idx) & ARMV8_PMU_GUEST_CNT_PART(cpu_pmu->hpmn)); +} + +static bool armv8pmu_is_host_part(struct arm_pmu *cpu_pmu, u8 idx) +{ + return !cpu_pmu->partitioned || + (BIT(idx) & ARMV8_PMU_HOST_CNT_PART(cpu_pmu->hpmn)); +} + static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu) { int i; @@ -797,6 +813,8 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu) if (is_pmuv3p9(cpu_pmu->pmuver)) { u64 mask = 0; for_each_set_bit(i, cpuc->used_mask, ARMPMU_MAX_HWEVENTS) { + if (armv8pmu_is_guest_part(cpu_pmu, i)) + continue; if (armv8pmu_event_has_user_read(cpuc->events[i])) mask |= BIT(i); } @@ -805,6 +823,8 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu) /* Clear any unused counters to avoid leaking their contents */ for_each_andnot_bit(i, cpu_pmu->cntr_mask, cpuc->used_mask, ARMPMU_MAX_HWEVENTS) { + if (armv8pmu_is_guest_part(cpu_pmu, i)) + continue; if (i == ARMV8_PMU_CYCLE_IDX) write_pmccntr(0); else if (i == ARMV8_PMU_INSTR_IDX) @@ -850,7 +870,10 @@ static void armv8pmu_start(struct arm_pmu *cpu_pmu) armv8pmu_disable_user_access();
/* Enable all counters */ - armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E); + if (cpu_pmu->partitioned) + armv8pmu_mdcr_write(armv8pmu_mdcr_read() | ARMV8_PMU_MDCR_HPME); + else + armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);
kvm_vcpu_pmu_resync_el0(); } @@ -858,7 +881,10 @@ static void armv8pmu_start(struct arm_pmu *cpu_pmu) static void armv8pmu_stop(struct arm_pmu *cpu_pmu) { /* Disable all counters */ - armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMU_PMCR_E); + if (cpu_pmu->partitioned) + armv8pmu_mdcr_write(armv8pmu_mdcr_read() & ~ARMV8_PMU_MDCR_HPME); + else + armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMU_PMCR_E); }
static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu) @@ -872,7 +898,7 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu) /* * Get and reset the IRQ flags */ - pmovsr = armv8pmu_getreset_flags(); + pmovsr = armv8pmu_getreset_flags(cpu_pmu);
/* * Did an overflow occur? @@ -930,6 +956,8 @@ static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc, int idx;
for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS) { + if (armv8pmu_is_guest_part(cpu_pmu, idx)) + continue; if (!test_and_set_bit(idx, cpuc->used_mask)) return idx; } @@ -946,6 +974,8 @@ static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc, * the lower idx must be even. */ for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS) { + if (armv8pmu_is_guest_part(cpu_pmu, idx)) + continue; if (!(idx & 0x1)) continue; if (!test_and_set_bit(idx, cpuc->used_mask)) { @@ -968,6 +998,7 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
/* Always prefer to place a cycle counter into the cycle counter. */ if ((evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) && + !cpu_pmu->partitioned && !armv8pmu_event_get_threshold(&event->attr)) { if (!test_and_set_bit(ARMV8_PMU_CYCLE_IDX, cpuc->used_mask)) return ARMV8_PMU_CYCLE_IDX; @@ -983,6 +1014,7 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc, * may not know how to handle it. */ if ((evtype == ARMV8_PMUV3_PERFCTR_INST_RETIRED) && + !cpu_pmu->partitioned && !armv8pmu_event_get_threshold(&event->attr) && test_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->cntr_mask) && !armv8pmu_event_want_user_access(event)) { @@ -994,7 +1026,7 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc, * Otherwise use events counters */ if (armv8pmu_event_is_chained(event)) - return armv8pmu_get_chain_idx(cpuc, cpu_pmu); + return armv8pmu_get_chain_idx(cpuc, cpu_pmu); else return armv8pmu_get_single_idx(cpuc, cpu_pmu); } @@ -1086,6 +1118,16 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event, return 0; }
+static void armv8pmu_reset_host_counters(struct arm_pmu *cpu_pmu) +{ + int idx; + + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS) { + if (armv8pmu_is_host_part(cpu_pmu, idx)) + armv8pmu_write_evcntr(idx, 0); + } +} + static void armv8pmu_reset(void *info) { struct arm_pmu *cpu_pmu = (struct arm_pmu *)info; @@ -1093,8 +1135,10 @@ static void armv8pmu_reset(void *info)
bitmap_to_arr64(&mask, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS);
- if (cpu_pmu->partitioned) + if (cpu_pmu->partitioned) { armv8pmu_partition(cpu_pmu->hpmn); + mask &= ARMV8_PMU_HOST_CNT_PART(cpu_pmu->hpmn); + }
/* The counter and interrupt enable registers are unknown at reset. */ armv8pmu_disable_counter(mask); @@ -1103,11 +1147,20 @@ static void armv8pmu_reset(void *info) /* Clear the counters we flip at guest entry/exit */ kvm_clr_pmu_events(mask);
+ + pmcr = ARMV8_PMU_PMCR_LC; + /* - * Initialize & Reset PMNC. Request overflow interrupt for - * 64 bit cycle counter but cheat in armv8pmu_write_counter(). + * Initialize & Reset PMNC. Request overflow interrupt for 64 + * bit cycle counter but cheat in armv8pmu_write_counter(). + * + * When partitioned, there is no single bit to reset only the + * host counters. so reset them individually. */ - pmcr = ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C | ARMV8_PMU_PMCR_LC; + if (cpu_pmu->partitioned) + armv8pmu_reset_host_counters(cpu_pmu); + else + pmcr = ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C;
/* Enable long event counter support where available */ if (armv8pmu_has_long_event(cpu_pmu)) diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h index 115ee39f693a..5f8b143794ce 100644 --- a/include/linux/perf/arm_pmuv3.h +++ b/include/linux/perf/arm_pmuv3.h @@ -247,6 +247,14 @@ #define ARMV8_PMU_OVSR_F ARMV8_PMU_CNT_MASK_F /* Mask for writable bits is both P and C fields */ #define ARMV8_PMU_OVERFLOWED_MASK ARMV8_PMU_CNT_MASK_ALL + +/* Masks for guest and host counter partitions */ +#define ARMV8_PMU_HPMN_CNT_MASK(N) GENMASK((N) - 1, 0) +#define ARMV8_PMU_GUEST_CNT_PART(N) (ARMV8_PMU_HPMN_CNT_MASK(N) | \ + ARMV8_PMU_CNT_MASK_C | \ + ARMV8_PMU_CNT_MASK_F) +#define ARMV8_PMU_HOST_CNT_PART(N) (ARMV8_PMU_CNT_MASK_ALL & \ + ~ARMV8_PMU_GUEST_CNT_PART(N)) /* * PMXEVTYPER: Event selection reg */
The ARM architecture specifies that when MDCR_EL2.HPMN is set, EL1 and EL0, which includes KVM guests, should read that value for PMCR.N.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/kvm/debug.c | 3 +-- arch/arm64/kvm/pmu-emul.c | 8 +++++++- tools/testing/selftests/kvm/arm64/vpmu_counter_access.c | 2 +- 3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c index 0e4c805e7e89..7c04db00bf6c 100644 --- a/arch/arm64/kvm/debug.c +++ b/arch/arm64/kvm/debug.c @@ -36,8 +36,7 @@ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu) * This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK * to disable guest access to the profiling and trace buffers */ - vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, - *host_data_ptr(nr_event_counters)); + vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, read_mdcr()); vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM | MDCR_EL2_TPMS | MDCR_EL2_TTRF | diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index 6c5950b9ceac..052ce8c721fe 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -993,12 +993,18 @@ static bool pmu_irq_is_valid(struct kvm *kvm, int irq) u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm) { struct arm_pmu *arm_pmu = kvm->arch.arm_pmu; + u8 limit; + + if (arm_pmu->partitioned) + limit = arm_pmu->hpmn - 1; + else + limit = ARMV8_PMU_MAX_GENERAL_COUNTERS;
/* * The arm_pmu->cntr_mask considers the fixed counter(s) as well. * Ignore those and return only the general-purpose counters. */ - return bitmap_weight(arm_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS); + return bitmap_weight(arm_pmu->cntr_mask, limit); }
static void kvm_arm_set_pmu(struct kvm *kvm, struct arm_pmu *arm_pmu) diff --git a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c index f16b3b27e32e..b5bc18b7528d 100644 --- a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c +++ b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c @@ -609,7 +609,7 @@ static void run_pmregs_validity_test(uint64_t pmcr_n) */ static void run_error_test(uint64_t pmcr_n) { - pr_debug("Error test with pmcr_n %lu (larger than the host)\n", pmcr_n); + pr_debug("Error test with pmcr_n %lu (larger than the host allows)\n", pmcr_n);
test_create_vpmu_vm_with_pmcr_n(pmcr_n, true); destroy_vpmu_vm();
On Sat, Feb 08, 2025 at 02:01:11AM +0000, Colton Lewis wrote:
The ARM architecture specifies that when MDCR_EL2.HPMN is set, EL1 and EL0, which includes KVM guests, should read that value for PMCR.N.
Signed-off-by: Colton Lewis coltonlewis@google.com
arch/arm64/kvm/debug.c | 3 +-- arch/arm64/kvm/pmu-emul.c | 8 +++++++- tools/testing/selftests/kvm/arm64/vpmu_counter_access.c | 2 +- 3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c index 0e4c805e7e89..7c04db00bf6c 100644 --- a/arch/arm64/kvm/debug.c +++ b/arch/arm64/kvm/debug.c @@ -36,8 +36,7 @@ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu) * This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK * to disable guest access to the profiling and trace buffers */
- vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN,
*host_data_ptr(nr_event_counters));
- vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, read_mdcr());
Please avoid unnecessary accesses to MDCR_EL2 at all costs. This is a guaranteed trap to EL2 with nested virt.
vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM | MDCR_EL2_TPMS | MDCR_EL2_TTRF | diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index 6c5950b9ceac..052ce8c721fe 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -993,12 +993,18 @@ static bool pmu_irq_is_valid(struct kvm *kvm, int irq) u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm) { struct arm_pmu *arm_pmu = kvm->arch.arm_pmu;
- u8 limit;
- if (arm_pmu->partitioned)
limit = arm_pmu->hpmn - 1;
- else
limit = ARMV8_PMU_MAX_GENERAL_COUNTERS;
/* * The arm_pmu->cntr_mask considers the fixed counter(s) as well. * Ignore those and return only the general-purpose counters. */
- return bitmap_weight(arm_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS);
- return bitmap_weight(arm_pmu->cntr_mask, limit);
}
This isn't necessary and is likely to regress the existing behavior.
When the architecture says the lower ELs should see PMCR_EL0.N have the same value as MDCR_EL2.HPMN, what it really means is direct reads from hardware will return the value.
So my expectation would be that a VM using the partitioned PMU implementation would never reach any of the *emulated* PMU handlers, as we would've disabled the associated traps.
The partitioned PMU will not replace the emulated vPMU implementation in KVM, so it'd be good to refactor what we have today to make room for your work. I think that'd be along the lines of:
- Shared code for event filter enforcement and handling of the vPMU overflow IRQ.
- Emulated PMU implementation that provides trap handlers for all PMUv3 registers and backs into host perf
- Partitioned PMU implementation that doesn't rely on traps and instead saves / restores a portion of the PMU that contains the guest context.
These should be done in separate files, i.e. I do not want to see a bunch of inline handling to cope with emulated v. partitioned PMUs.
static void kvm_arm_set_pmu(struct kvm *kvm, struct arm_pmu *arm_pmu) diff --git a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c index f16b3b27e32e..b5bc18b7528d 100644 --- a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c +++ b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c @@ -609,7 +609,7 @@ static void run_pmregs_validity_test(uint64_t pmcr_n) */ static void run_error_test(uint64_t pmcr_n) {
- pr_debug("Error test with pmcr_n %lu (larger than the host)\n", pmcr_n);
- pr_debug("Error test with pmcr_n %lu (larger than the host allows)\n", pmcr_n);
NBD for an RFC, but in the future please do selftests changes in a separate patch.
Hi Oliver, thanks for the review.
Oliver Upton oliver.upton@linux.dev writes:
On Sat, Feb 08, 2025 at 02:01:11AM +0000, Colton Lewis wrote:
The ARM architecture specifies that when MDCR_EL2.HPMN is set, EL1 and EL0, which includes KVM guests, should read that value for PMCR.N.
Signed-off-by: Colton Lewis coltonlewis@google.com
arch/arm64/kvm/debug.c | 3 +-- arch/arm64/kvm/pmu-emul.c | 8 +++++++- tools/testing/selftests/kvm/arm64/vpmu_counter_access.c | 2 +- 3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c index 0e4c805e7e89..7c04db00bf6c 100644 --- a/arch/arm64/kvm/debug.c +++ b/arch/arm64/kvm/debug.c @@ -36,8 +36,7 @@ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu) * This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK * to disable guest access to the profiling and trace buffers */
- vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN,
*host_data_ptr(nr_event_counters));
- vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, read_mdcr());
Please avoid unnecessary accesses to MDCR_EL2 at all costs. This is a guaranteed trap to EL2 with nested virt.
I thought it was necessary when I wrote this, but I see it's not after thinking about it for a while.
The intended value is accessible from vcpu->kvm->arch.arm_pmu.hpmn as written, but could be somewhere else after addressing your suggestion to lift all MDCR handling into KVM.
vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM | MDCR_EL2_TPMS | MDCR_EL2_TTRF | diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index 6c5950b9ceac..052ce8c721fe 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -993,12 +993,18 @@ static bool pmu_irq_is_valid(struct kvm *kvm, int irq) u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm) { struct arm_pmu *arm_pmu = kvm->arch.arm_pmu;
- u8 limit;
- if (arm_pmu->partitioned)
limit = arm_pmu->hpmn - 1;
- else
limit = ARMV8_PMU_MAX_GENERAL_COUNTERS;
/* * The arm_pmu->cntr_mask considers the fixed counter(s) as well. * Ignore those and return only the general-purpose counters. */
- return bitmap_weight(arm_pmu->cntr_mask,
ARMV8_PMU_MAX_GENERAL_COUNTERS);
- return bitmap_weight(arm_pmu->cntr_mask, limit); }
This isn't necessary and is likely to regress the existing behavior.
When the architecture says the lower ELs should see PMCR_EL0.N have the same value as MDCR_EL2.HPMN, what it really means is direct reads from hardware will return the value.
So my expectation would be that a VM using the partitioned PMU implementation would never reach any of the *emulated* PMU handlers, as we would've disabled the associated traps.
Understood. The change was here to see be able to read PMCR_EL0.N from inside a VM since I haven't disabled the associated traps yet.
It shouldn't be in this patch.
The partitioned PMU will not replace the emulated vPMU implementation in KVM, so it'd be good to refactor what we have today to make room for your work. I think that'd be along the lines of:
- Shared code for event filter enforcement and handling of the vPMU overflow IRQ.
- Emulated PMU implementation that provides trap handlers for all PMUv3 registers and backs into host perf
- Partitioned PMU implementation that doesn't rely on traps and instead saves / restores a portion of the PMU that contains the guest context.
These should be done in separate files, i.e. I do not want to see a bunch of inline handling to cope with emulated v. partitioned PMUs.
Agreed.
static void kvm_arm_set_pmu(struct kvm *kvm, struct arm_pmu *arm_pmu) diff --git a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c index f16b3b27e32e..b5bc18b7528d 100644 --- a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c +++ b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c @@ -609,7 +609,7 @@ static void run_pmregs_validity_test(uint64_t pmcr_n) */ static void run_error_test(uint64_t pmcr_n) {
- pr_debug("Error test with pmcr_n %lu (larger than the host)\n",
pmcr_n);
- pr_debug("Error test with pmcr_n %lu (larger than the host allows)\n",
pmcr_n);
NBD for an RFC, but in the future please do selftests changes in a separate patch.
Ok, I'll do that in the future.
linux-kselftest-mirror@lists.linaro.org