 
            On Thu, May 15, 2025, Dapeng Mi wrote:
On 5/15/2025 8:09 AM, Sean Christopherson wrote:
On Mon, Mar 24, 2025, Mingwei Zhang wrote:
- return vcpu->kvm->arch.enable_pmu &&
This is superfluous, pmu->version should never be non-zero without the PMU being enabled at the VM level.
Strictly speaking, "arch.enable_pmu" and pmu->version doesn't indicates fully same thing. "arch.enable_pmu" indicates whether PMU function is enabled in KVM, but the "pmu->version" comes from user space configuration. In theory user space could configure a "0" PMU version just like pmu_counters_test does. Currently I'm not sure if the check for "pmu->version" can be removed, let me have a double check.
Gah, sorry, my comment was vague and confusing. What I was trying to say is that the vcpu->kvm->arch.enable_pmu check is superfluous and can be dropped.
- kvm->arch.enable_pmu = enable_pmu && !enable_mediated_pmu;
So I tried to run a QEMU with this and it failed, because QEMU expected the PMU to be enabled and tried to write to PMU MSRs. I haven't dug through the QEMU code, but I assume that QEMU rightly expects that passing in PMU in CPUID when KVM_GET_SUPPORTED_CPUID says its supported will result in the VM having a PMU.
As long as the module parameter "enable_mediated_pmu" is enabled, qemu needs below extra code to enable mediated vPMU, otherwise PMU is disabled in KVM.
https://lore.kernel.org/all/20250324123712.34096-1-dapeng1.mi@linux.intel.co...
I.e. by trying to get cute with backwards compatibility, I think we broke backwards compatiblity. At this point, I'm leaning toward making the module param off-by-default, but otherwise not messing with the behavior of kvm->arch.enable_pmu. Not sure if that has implications for KVM_PMU_CAP_DISABLE though.
I'm not sure if it's a kind of break for backwards compatibility. As long as "enable_mediated_pmu" is not enabled, the qemu doesn't need any changes, the legacy vPMU can still be enabled by old qemu version. But if user want to enable mediated vPMU, so they should use the new version qemu which has the capability to enable mediated vPMU, it sounds reasonable for me.
I agree it's reasonable to require a userspace update to take advantage of new features, what I don't like is what happens if userspace _hasn't_ been updated. I also don't love that forcing a userspace update in this case is more than a bit contrived. It's very doable to let existing userspace utilize the mediated PMU, forcing KVM_CAP_PMU_CAPABILITY is essentially KVM punting a problem to userspace.
And the complications with the mediated PMU don't really have anything to do with the VMM, they're more about all the other tasks and daemons running on the system, e.g. that might be using perf.
Thinking more about this, the problem isn't so much that enabling mediated PMUs by default is undesirable, it's that giving userspace a binary choise doesn't provide enough flexibility. E.g. for single-user QEMU-based use cases (including my use of QEMU), requiring a new QEMU is painful and annoying, and so having an on-by-default option would be nice.
But for use cases that already utilize KVM_CAP_PMU_CAPABILITY, e.g. to explicitly disable PMUs for a subset of VMs, on-by-default is very undesirable, e.g. would require KVM to support KVM_PMU_CAP_DISABLE, and would generate unnecessary noise and contention in perf.
So, what if we simply make enable_mediated_pmu a tri-state of sorts?
0 == disabled
0 == enabled for all VMs (no opt-in or opt-out supported)
< 0 == enabled, but off by default (requires opt-in)
Then use cases like my personal usage of QEMU can run with enable_mediated_pmu=1, while use cases like Google Cloud can run with enable_mediated_pmu=-1, and everyone is happy (hopefully), without too much added complexity in KVM.