Hi Akihiko,
This is an unreasonably large patch that needs to be broken down into smaller patches, ideally one functional change per patch. We need this even for an RFC for the sake of reviews.
On Wed, Aug 06, 2025 at 06:09:54PM +0900, Akihiko Odaki wrote:
+static u64 kvm_pmu_get_pmc_value(struct kvm_vcpu *vcpu, u8 idx) {
- struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
- struct kvm_pmc *pmc = *kvm_vcpu_idx_to_pmc(vcpu, idx); u64 counter, reg, enabled, running;
- unsigned int i;
- reg = counter_index_to_reg(pmc->idx);
- reg = counter_index_to_reg(idx); counter = __vcpu_sys_reg(vcpu, reg);
/* * The real counter value is equal to the value of counter register plus * the value perf event counts. */
- if (pmc->perf_event)
counter += perf_event_read_value(pmc->perf_event, &enabled,
&running);
- if (pmc)
for (i = 0; i < pmc->nr_perf_events; i++)
counter += perf_event_read_value(pmc->perf_events[i],
&enabled, &running);
I'm concerned that this array of events concept you're introducing is going to be error-prone. An approach that reallocates a new PMU event in the case of a vCPU migrating to a new PMU implementation would be desirable.
+static void reset_sample_period(struct perf_event *perf_event) +{
- struct kvm_pmc **pmc = perf_event->overflow_handler_context;
- struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
- struct arm_pmu *cpu_pmu = to_arm_pmu(perf_event->pmu);
- u64 period;
- cpu_pmu->pmu.stop(perf_event, PERF_EF_UPDATE);
- /*
* Reset the sample period to the architectural limit,
* i.e. the point where the counter overflows.
*/
- period = compute_period(pmc, kvm_pmu_get_pmc_value(vcpu, (*pmc)->idx));
- local64_set(&perf_event->hw.period_left, 0);
- perf_event->attr.sample_period = period;
- perf_event->hw.sample_period = period;
- cpu_pmu->pmu.start(perf_event, PERF_EF_RELOAD);
+}
No, we can't start calling into the internal driver interfaces. The fact that we have a pointer to the PMU is an ugly hack and shouldn't be used like this.
@@ -725,8 +729,8 @@ static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc) attr.type = arm_pmu->pmu.type; attr.size = sizeof(attr); attr.pinned = 1;
- attr.disabled = !kvm_pmu_counter_is_enabled(pmc);
- attr.exclude_user = !kvm_pmc_counts_at_el0(pmc);
- attr.disabled = !kvm_pmu_counter_is_enabled(vcpu, (*pmc)->idx);
- attr.exclude_user = !kvm_pmc_counts_at_el0(vcpu, (*pmc)->idx); attr.exclude_hv = 1; /* Don't count EL2 events */ attr.exclude_host = 1; /* Don't count host events */ attr.config = eventsel;
Can we just special-case the fixed CPU cycle counter to use PERF_TYPE_HARDWARE / PERF_COUNT_HW_CPU_CYCLES? That _should_ have the intended effect of opening an event on the PMU for this CPU.
- /*
* If we have a filter in place and that the event isn't allowed, do
* not install a perf event either.
*/
- if (vcpu->kvm->arch.pmu_filter &&
!test_bit(eventsel, vcpu->kvm->arch.pmu_filter))
return;
- if (arm_pmu) {
*pmc = kvm_pmu_alloc_pmc(idx, 1);
if (!*pmc)
goto err;
kvm_pmu_create_perf_event(pmc, arm_pmu, eventsel);
- } else {
guard(mutex)(&arm_pmus_lock);
This is a system-wide lock, the need for which is eliminated if you go for the reallocation approach I mention.
+static int kvm_arm_pmu_v3_set_pmu_composition(struct kvm_vcpu *vcpu) +{
- struct kvm *kvm = vcpu->kvm;
- struct arm_pmu_entry *entry;
- struct arm_pmu *arm_pmu;
- lockdep_assert_held(&kvm->arch.config_lock);
- if (kvm_vm_has_ran_once(kvm) ||
(kvm->arch.pmu_filter && !kvm->arch.nr_composed_host_pmus))
return -EBUSY;
I'm not sure there's much value in preventing the user from configuring the PMU event filter. Even in the case of the fixed CPU cycle counter we allow userspace to filter the event.
It is much more important to have mutual exclusion between this UAPI and userspace explicitly selecting a PMU implementation.
@@ -1223,6 +1328,8 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) return kvm_arm_pmu_v3_set_nr_counters(vcpu, n); }
- case KVM_ARM_VCPU_PMU_V3_COMPOSITION:
return kvm_arm_pmu_v3_set_pmu_composition(vcpu);
I'd prefer naming this something like 'KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY'. We will have the fixed instruction counter eventually which is another event we could potentially provide system-wide.
Thanks, Oliver